From jbhateja at openjdk.org Mon Jan 1 14:36:06 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 1 Jan 2024 14:36:06 GMT Subject: RFR: 8318650: Optimized subword gather for x86 targets. [v10] In-Reply-To: References: Message-ID: > Hi All, > > This patch optimizes sub-word gather operation for x86 targets with AVX2 and AVX512 features. > > Following is the summary of changes:- > > 1) Intrinsify sub-word gather using hybrid algorithm which initially partially unrolls scalar loop to accumulates values from gather indices into a quadword(64bit) slice followed by vector permutation to place the slice into appropriate vector lanes, it prevents code bloating and generates compact JIT sequence. This coupled with savings from expansive array allocation in existing java implementation translates into significant performance of 1.5-10x gains with included micro on Intel Atom family CPUs and with JVM option UseAVX=2. > > ![image](https://github.com/openjdk/jdk/assets/59989778/e25ba4ad-6a61-42fa-9566-452f741a9c6d) > > > 2) For AVX512 targets algorithm uses integral gather instructions to load values from normalized indices which are multiple of integer size, followed by shuffling and packing exact sub-word values from integral lanes. > > 3) Patch was also compared against modified java fallback implementation by replacing temporary array allocation with zero initialized vector and a scalar loops which inserts gathered values into vector. But, vector insert operation in higher vector lanes is a three step process which first extracts the upper vector 128 bit lane, updates it with gather subword value and then inserts the lane back to its original position. This makes inserts into higher order lanes costly w.r.t to proposed solution. In addition generated JIT code for modified fallback implementation was very bulky. This may impact in-lining decisions into caller contexts. > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: - Accelerating masked sub-word gathers for AVX2 targets, this gives additional 1.5-4x speedups over existing implementation. - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8318650 - Removing JDK-8321648 related changes. - Refined AVX3 implementation with integral gather. - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8318650 - Fix incorrect comment - Review comments resolutions. - Review comments resolutions. - Review comments resolutions. - Restricting masked sub-word gather to AVX512 target to align with integral gather support. - ... and 2 more: https://git.openjdk.org/jdk/compare/518ec971...de47076e ------------- Changes: https://git.openjdk.org/jdk/pull/16354/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16354&range=09 Stats: 1421 lines in 32 files changed: 1373 ins; 20 del; 28 mod Patch: https://git.openjdk.org/jdk/pull/16354.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16354/head:pull/16354 PR: https://git.openjdk.org/jdk/pull/16354 From rehn at openjdk.org Tue Jan 2 06:56:46 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 2 Jan 2024 06:56:46 GMT Subject: RFR: 8322790: RISC-V: Tune costs for shuffles with no conversion In-Reply-To: References: Message-ID: On Sat, 30 Dec 2023 20:07:13 GMT, Ilya Gavrilin wrote: > Hi all, please review this small change to RISC-V nodes insertion costs. > Now we have several nodes which provide shuffles without conversion: https://github.com/openjdk/jdk/blob/32d80e2caf6063b58128bd5f3dc87b276f3bd0cb/src/hotspot/cpu/riscv/riscv.ad#L8525-L8741 > On most RISC-V cpu`s we prefer reg<->reg operations, because they are faster, but now stack<->reg operations used (for details about reasons, please, visit connected jbs issue). > After changing insertion costs reg<->reg operations selected, and we can see performance improvements for benchmarks, which use such shuffles (tested on thead C910 board): > | Benchmark | Upstream build (ops/ms) | Patched build (ops/ms) | difference (%) | > |:-----------------------------------:|:-----------------------:|:----------------------:|:--------------:| > | MathBench.doubleToRawLongBitsDouble | 30935.139 | 32171.761 | +4.00 | > | StrictMathBench.ceilDouble | 24682.810 | 29782.050 | +20.66 | > | StrictMathBench.cosDouble | 6948.309 | 6938.276 | -0.14 | > | StrictMathBench.expDouble | 6816.143 | 7211.021 | +5.79 | > | StrictMathBench.floorDouble | 30699.630 | 34189.509 | +11.37 | > | StrictMathBench.maxDouble | 35157.355 | 34675.191 | -1.37 | > | StrictMathBench.minDouble | 35192.135 | 35183.015 | -0.03 | > | StrictMathBench.sinDouble | 6698.405 | 6721.809 | +0.35 | > > New benchmark for changed nodes: > > --- a/test/micro/org/openjdk/bench/java/lang/MathBench.java > +++ b/test/micro/org/openjdk/bench/java/lang/MathBench.java > @@ -540,4 +540,11 @@ public class MathBench { > return Math.ulp(float7); > } > > + @Benchmark > + public long doubleToRawLongBitsDouble() { > + double dbl162Dot5 = double81 * 2.0d + double0Dot5; > + double dbl3 = double2 + double1; > + return Double.doubleToRawLongBits(dbl162Dot5) + Double.doubleToRawLongBits(dbl3); > + } > + Thanks, seems reasonable to me. ------------- Marked as reviewed by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17206#pullrequestreview-1800006019 From kbarrett at openjdk.org Tue Jan 2 07:27:58 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 2 Jan 2024 07:27:58 GMT Subject: RFR: 8322816: RISC-V: Incorrect guarantee in patch_vtype Message-ID: Please review this change that fixes a test for a guarantee. This also removes a -Wparentheses warning when those are enabled (which is how the problem was discovered). The problem is that operator precedence groups the sub-expressions differently than intended. The fix is to override the operator precedence by adding parentheses to achieve the intended grouping. Testing: Local (linux-x64) cross-build for linux-riscv with this change plus -Wparentheses enabled and other changes to allow that to work. Requesting someone from the riscv porters to properly test this. ------------- Commit messages: - fix subexpression grouping in patch_vtype guarantee Changes: https://git.openjdk.org/jdk/pull/17215/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17215&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8322816 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17215.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17215/head:pull/17215 PR: https://git.openjdk.org/jdk/pull/17215 From fyang at openjdk.org Tue Jan 2 09:01:47 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 2 Jan 2024 09:01:47 GMT Subject: RFR: 8322816: RISC-V: Incorrect guarantee in patch_vtype In-Reply-To: References: Message-ID: <2Dii1vRWzgzjyBGHrx_tH28SlYMBWUZ0h2mO7D5so_4=.817b5636-aa8a-4c6d-807a-29b1855a59a7@github.com> On Tue, 2 Jan 2024 07:23:56 GMT, Kim Barrett wrote: > Please review this change that fixes a test for a guarantee. This also > removes a -Wparentheses warning when those are enabled (which is how the > problem was discovered). > > The problem is that operator precedence groups the sub-expressions differently > than intended. The fix is to override the operator precedence by adding > parentheses to achieve the intended grouping. > > Testing: Local (linux-x64) cross-build for linux-riscv with this change plus > -Wparentheses enabled and other changes to allow that to work. > > Requesting someone from the riscv porters to properly test this. src/hotspot/cpu/riscv/assembler_riscv.hpp line 1160: > 1158: #define patch_vtype(hsb, lsb, vlmul, vsew, vta, vma, vill) \ > 1159: if (vill == 1) { \ > 1160: guarantee((vlmul | vsew | vta | vma) == 0, \ I see the `vill` parameter is always false in current code, which means this guarantee never gets excecuted. And I don't think we would make use of the `vill` field of vtype in future. So I personally perfer to remove this guarantee and its enclosing if block for now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17215#discussion_r1439251060 From stefank at openjdk.org Tue Jan 2 09:28:14 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 2 Jan 2024 09:28:14 GMT Subject: RFR: 8282365: Consolidate and improve division by constant idealizations [v42] In-Reply-To: References: Message-ID: On Mon, 25 Dec 2023 17:57:28 GMT, Quan Anh Mai wrote: >> This patch implements idealisation for unsigned divisions to change a division by a constant into a series of multiplication and shift. I also change the idealisation of `DivI` to get a more efficient series when the magic constant overflows an int32. >> >> In general, the idea behind a signed division transformation is that for a positive constant `d`, we would need to find constants `c` and `m` so that: >> >> floor(x / d) = floor(x * c / 2**m) for 0 < x < 2**(N - 1) (1) >> ceil(x / d) = floor(x * c / 2**m) + 1 for -2**(N - 1) <= x < 0 (2) >> >> The implementation in the original book takes into consideration that the machine may not be able to perform the full multiplication `x * c`, so the constant overflow and we need to add back the dividend as in `DivLNode::Ideal` cases. However, for int32 division, `x * c` cannot overflow an int64. As a result, it is always feasible to just calculate the product and shift the result. >> >> For unsigned multiplication, the situation is somewhat trickier because the condition needs to be twice as strong (the condition (1) and (2) above are mostly the same). This results in the magic constant `c` calculated based on the method presented in Hacker's Delight by Henry S. Warren, Jr. may overflow an uintN. For int division, we can depend on the theorem devised by Arch D. Robison in N-Bit Unsigned Division Via N-Bit Multiply-Add, which states that there exists either: >> >> c1 in uint32 and m1, such that floor(x / d) = floor(x * c1 / 2**m1) for 0 < x < 2**32 (3) >> c2 in uint32 and m2, such that floor(x / d) = floor((x + 1) * c2 / 2**m2) for 0 < x < 2**32 (4) >> >> which means that either `x * c1` never overflows an uint64 or `(x + 1) * c2` never overflows an uint64. And we can perform a full multiplication. >> >> For longs, there is no way to do a full multiplication so we do some basic transformations to achieve a computable formula. The details I have written as comments in the overflow case. >> >> More tests are added to cover the possible patterns. >> >> Please take a look and have some reviews. Thank you very much. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > power of 2 I'm not reviewing the patch itself, but I'd like to request some tweaks to the include blocks in the HotSpot code. src/hotspot/share/opto/divconstants.cpp line 27: > 25: #include "precompiled.hpp" > 26: #include "utilities/powerOfTwo.hpp" > 27: #include Please add a blank line between the HotSpot includes and the system includes. src/hotspot/share/opto/divnode.cpp line 27: > 25: #include "precompiled.hpp" > 26: #include > 27: #include These includes should be moved. src/hotspot/share/opto/divnode.cpp line 42: > 40: #include "utilities/powerOfTwo.hpp" > 41: > 42: Revert this stray addition of a blank line. test/hotspot/gtest/opto/test_constant_division.cpp line 29: > 27: #include "runtime/os.hpp" > 28: #include "utilities/growableArray.hpp" > 29: #include Move include. ------------- PR Review: https://git.openjdk.org/jdk/pull/9947#pullrequestreview-1800139023 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1439270103 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1439270557 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1439270384 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1439271034 From fyang at openjdk.org Tue Jan 2 10:59:46 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 2 Jan 2024 10:59:46 GMT Subject: RFR: 8322790: RISC-V: Tune costs for shuffles with no conversion In-Reply-To: References: Message-ID: On Sat, 30 Dec 2023 20:07:13 GMT, Ilya Gavrilin wrote: > Hi all, please review this small change to RISC-V nodes insertion costs. > Now we have several nodes which provide shuffles without conversion: https://github.com/openjdk/jdk/blob/32d80e2caf6063b58128bd5f3dc87b276f3bd0cb/src/hotspot/cpu/riscv/riscv.ad#L8525-L8741 > On most RISC-V cpu`s we prefer reg<->reg operations, because they are faster, but now stack<->reg operations used (for details about reasons, please, visit connected jbs issue). > After changing insertion costs reg<->reg operations selected, and we can see performance improvements for benchmarks, which use such shuffles (tested on thead C910 board): > | Benchmark | Upstream build (ops/ms) | Patched build (ops/ms) | difference (%) | > |:-----------------------------------:|:-----------------------:|:----------------------:|:--------------:| > | MathBench.doubleToRawLongBitsDouble | 30935.139 | 32171.761 | +4.00 | > | StrictMathBench.ceilDouble | 24682.810 | 29782.050 | +20.66 | > | StrictMathBench.cosDouble | 6948.309 | 6938.276 | -0.14 | > | StrictMathBench.expDouble | 6816.143 | 7211.021 | +5.79 | > | StrictMathBench.floorDouble | 30699.630 | 34189.509 | +11.37 | > | StrictMathBench.maxDouble | 35157.355 | 34675.191 | -1.37 | > | StrictMathBench.minDouble | 35192.135 | 35183.015 | -0.03 | > | StrictMathBench.sinDouble | 6698.405 | 6721.809 | +0.35 | > > New benchmark for changed nodes: > > --- a/test/micro/org/openjdk/bench/java/lang/MathBench.java > +++ b/test/micro/org/openjdk/bench/java/lang/MathBench.java > @@ -540,4 +540,11 @@ public class MathBench { > return Math.ulp(float7); > } > > + @Benchmark > + public long doubleToRawLongBitsDouble() { > + double dbl162Dot5 = double81 * 2.0d + double0Dot5; > + double dbl3 = double2 + double1; > + return Double.doubleToRawLongBits(dbl162Dot5) + Double.doubleToRawLongBits(dbl3); > + } > + src/hotspot/cpu/riscv/riscv.ad line 8534: > 8532: effect(DEF dst, USE src); > 8533: > 8534: ins_cost(ALU_COST + LOAD_COST); Adding an extra cost of `ALU_COST` for these load/store nodes looks a bit strange to me. I suppose only lowering the cost for those fmv.x.w/fmv.w.x/fmv.x.d/fmv.d.x nodes will do? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17206#discussion_r1439341249 From vkempik at openjdk.org Tue Jan 2 10:59:47 2024 From: vkempik at openjdk.org (Vladimir Kempik) Date: Tue, 2 Jan 2024 10:59:47 GMT Subject: RFR: 8322790: RISC-V: Tune costs for shuffles with no conversion In-Reply-To: References: Message-ID: On Tue, 2 Jan 2024 10:55:23 GMT, Fei Yang wrote: >> Hi all, please review this small change to RISC-V nodes insertion costs. >> Now we have several nodes which provide shuffles without conversion: https://github.com/openjdk/jdk/blob/32d80e2caf6063b58128bd5f3dc87b276f3bd0cb/src/hotspot/cpu/riscv/riscv.ad#L8525-L8741 >> On most RISC-V cpu`s we prefer reg<->reg operations, because they are faster, but now stack<->reg operations used (for details about reasons, please, visit connected jbs issue). >> After changing insertion costs reg<->reg operations selected, and we can see performance improvements for benchmarks, which use such shuffles (tested on thead C910 board): >> | Benchmark | Upstream build (ops/ms) | Patched build (ops/ms) | difference (%) | >> |:-----------------------------------:|:-----------------------:|:----------------------:|:--------------:| >> | MathBench.doubleToRawLongBitsDouble | 30935.139 | 32171.761 | +4.00 | >> | StrictMathBench.ceilDouble | 24682.810 | 29782.050 | +20.66 | >> | StrictMathBench.cosDouble | 6948.309 | 6938.276 | -0.14 | >> | StrictMathBench.expDouble | 6816.143 | 7211.021 | +5.79 | >> | StrictMathBench.floorDouble | 30699.630 | 34189.509 | +11.37 | >> | StrictMathBench.maxDouble | 35157.355 | 34675.191 | -1.37 | >> | StrictMathBench.minDouble | 35192.135 | 35183.015 | -0.03 | >> | StrictMathBench.sinDouble | 6698.405 | 6721.809 | +0.35 | >> >> New benchmark for changed nodes: >> >> --- a/test/micro/org/openjdk/bench/java/lang/MathBench.java >> +++ b/test/micro/org/openjdk/bench/java/lang/MathBench.java >> @@ -540,4 +540,11 @@ public class MathBench { >> return Math.ulp(float7); >> } >> >> + @Benchmark >> + public long doubleToRawLongBitsDouble() { >> + double dbl162Dot5 = double81 * 2.0d + double0Dot5; >> + double dbl3 = double2 + double1; >> + return Double.doubleToRawLongBits(dbl162Dot5) + Double.doubleToRawLongBits(dbl3); >> + } >> + > > src/hotspot/cpu/riscv/riscv.ad line 8534: > >> 8532: effect(DEF dst, USE src); >> 8533: >> 8534: ins_cost(ALU_COST + LOAD_COST); > > Adding an extra cost of `ALU_COST` for these load/store nodes looks a bit strange to me. I suppose only lowering the cost for those fmv.x.w/fmv.w.x/fmv.x.d/fmv.d.x nodes will do? those nodes need to go below 100 which then starts looking ugly ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17206#discussion_r1439342747 From davleopo at openjdk.org Tue Jan 2 13:37:19 2024 From: davleopo at openjdk.org (David Leopoldseder) Date: Tue, 2 Jan 2024 13:37:19 GMT Subject: RFR: 8322636: [JVMCI] HotSpotSpeculationLog can be inconsistent across a single compile [v2] In-Reply-To: References: Message-ID: > This PR fixes a subtle inconsistency in `HotSpotSpeculationLog` . > > Normal uses of `HotSpotSpeculationLog` work by using a `SpeculationReason` and asking the speculation log via `maySpeculate` if the speculation can be performed, i.e., if it failed before for the given method. An example for this can be seen in Graal https://github.com/oracle/graal/blob/master/compiler/src/jdk.graal.compiler/src/jdk/graal/compiler/nodes/loop/CountedLoopInfo.java#L591C15-L591C15 > The implicit assumption is that the speculation log, `HotSpotSpeculationLog` in particular collects failed speculations at the beginning of a compile and then stays consistent during the compile. Why is that? - Because if there are new failed speculations added to the failed speculations during the compile - the compiler would speculate again on those in an inconsistent way. E.g. at the beginning of a compile a certain speculation has not failed yet and the compiler thinks it can do optimization xyz using a speculation - later during the compilation process it consults the speculation log but gets a different answer. All those inconsistent speculations that already failed will anyway later fail code installation in jvmci (they will throw a bailout during `HotSpotCodeCacheProvider#installCode` https://github.com/openjdk/jdk/blob/master/src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotSpeculationLog.java#L192 ). Thus, we should at least return a consistent result during a compile. > The problem for consistency here, that also makes troubles on the graal side, is that `maySpeculate` itself can collect failed speculations if there have not been any previously, i.e., `failedSpeculations == null`. > In order to make the speculation log consistent across an entire JVMCI compile this PR removes the collection of failed speculations in `maySpeculate`. David Leopoldseder has updated the pull request incrementally with one additional commit since the last revision: 8322636: [JVMCI] HotSpotSpeculationLog add javadoc to maySpeculate ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17183/files - new: https://git.openjdk.org/jdk/pull/17183/files/810e42ad..ef026267 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17183&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17183&range=00-01 Stats: 6 lines in 1 file changed: 6 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17183.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17183/head:pull/17183 PR: https://git.openjdk.org/jdk/pull/17183 From davleopo at openjdk.org Tue Jan 2 13:37:19 2024 From: davleopo at openjdk.org (David Leopoldseder) Date: Tue, 2 Jan 2024 13:37:19 GMT Subject: RFR: 8322636: [JVMCI] HotSpotSpeculationLog can be inconsistent across a single compile [v2] In-Reply-To: References: Message-ID: <6t_FVenXE1jnRPPqWfNGakr8O-SvV7urhzgUdodieU4=.221b8912-0de0-4f17-875b-a778429310ba@github.com> On Sat, 23 Dec 2023 04:17:24 GMT, Doug Simon wrote: > I think it's worth updating the javadoc for maySpeculate to clarify that it returns consistent results for any given speculation for the lifetime of a SpeculationLog object. @dougxc done - please check ------------- PR Comment: https://git.openjdk.org/jdk/pull/17183#issuecomment-1874031598 From never at openjdk.org Tue Jan 2 18:48:49 2024 From: never at openjdk.org (Tom Rodriguez) Date: Tue, 2 Jan 2024 18:48:49 GMT Subject: RFR: 8322636: [JVMCI] HotSpotSpeculationLog can be inconsistent across a single compile [v2] In-Reply-To: References: Message-ID: On Tue, 2 Jan 2024 13:37:19 GMT, David Leopoldseder wrote: >> This PR fixes a subtle inconsistency in `HotSpotSpeculationLog` . >> >> Normal uses of `HotSpotSpeculationLog` work by using a `SpeculationReason` and asking the speculation log via `maySpeculate` if the speculation can be performed, i.e., if it failed before for the given method. An example for this can be seen in Graal https://github.com/oracle/graal/blob/master/compiler/src/jdk.graal.compiler/src/jdk/graal/compiler/nodes/loop/CountedLoopInfo.java#L591C15-L591C15 >> The implicit assumption is that the speculation log, `HotSpotSpeculationLog` in particular collects failed speculations at the beginning of a compile and then stays consistent during the compile. Why is that? - Because if there are new failed speculations added to the failed speculations during the compile - the compiler would speculate again on those in an inconsistent way. E.g. at the beginning of a compile a certain speculation has not failed yet and the compiler thinks it can do optimization xyz using a speculation - later during the compilation process it consults the speculation log but gets a different answer. All those inconsistent speculations that already failed will anyway later fail code installation in jvmci (they will throw a bailout during `HotSpotCodeCacheProvider#installCode` https://github.com/openjdk/jdk/blob/master/src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotSpeculationLog.java#L192 ). Thus, we should at least return a consistent result durin g a compile. >> The problem for consistency here, that also makes troubles on the graal side, is that `maySpeculate` itself can collect failed speculations if there have not been any previously, i.e., `failedSpeculations == null`. >> In order to make the speculation log consistent across an entire JVMCI compile this PR removes the collection of failed speculations in `maySpeculate`. > > David Leopoldseder has updated the pull request incrementally with one additional commit since the last revision: > > 8322636: [JVMCI] HotSpotSpeculationLog add javadoc to maySpeculate More specifically it only validates against the speculations that failed before the last call to collectFailedSpeculations which must always be called explicitly. And we should point out somewhere that installCode will call collectFailedSpeculations before installation and revalidate the current set of speculations, bailing out if any were violated during compilation. This doesn't seem to be documented anywhere. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17183#issuecomment-1874408643 From kvn at openjdk.org Tue Jan 2 20:08:37 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 2 Jan 2024 20:08:37 GMT Subject: RFR: 8322758: Eliminate -Wparentheses warnings in C2 code In-Reply-To: References: Message-ID: On Fri, 29 Dec 2023 02:01:08 GMT, Kim Barrett wrote: > Please review this change to eliminate some -Wparentheses warnings. In most > cases, this involved simply adding a few parentheses to make some implicit > operator precedence explicit. > > In PhaseIdealLoop::rc_predicate, I also added a comment describing the test > being performed, since it didn't seem obvious even with the additional > parentheses. > > Testing: mach5 tier1 > > Also ran mach5 tier1 with these changes in conjunction enabling -Wparentheses > and other changes needed to make that work. Looks good to me. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17199#pullrequestreview-1800947505 From kvn at openjdk.org Tue Jan 2 20:16:47 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 2 Jan 2024 20:16:47 GMT Subject: RFR: 8322759: Eliminate -Wparentheses warnings in compiler code In-Reply-To: <496tGkQ1KUCrW1IHOETyvhqopkNYjsEoupxjo0Ze3Wg=.0223f494-fd34-4e4a-a31a-5030603f2113@github.com> References: <496tGkQ1KUCrW1IHOETyvhqopkNYjsEoupxjo0Ze3Wg=.0223f494-fd34-4e4a-a31a-5030603f2113@github.com> Message-ID: <2WJkEZqCHKmE27ORwdudo3QC0JLzBxShw6HBBJ8k2qE=.4f172823-b930-418a-924d-578342d2c991@github.com> On Fri, 29 Dec 2023 03:33:11 GMT, Kim Barrett wrote: > Please review this change to eliminate some -Wparentheses warnings. This > involved simply adding a few parentheses to make some implicit operator > precedence explicit. > > This change addresses non-C2 parts of the compiler component. > > Testing: mach5 tier1 > > Also ran mach5 tier1 with these changes in conjunction enabling -Wparentheses > and other changes needed to make that work. Looks good. src/hotspot/share/compiler/compilerDefinitions.inline.hpp line 2: > 1: /* > 2: * Copyright (c) 2016, 2023, Oracle and/or its affiliates. All rights reserved. 2024 ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17200#pullrequestreview-1800955588 PR Review Comment: https://git.openjdk.org/jdk/pull/17200#discussion_r1439783672 From kvn at openjdk.org Tue Jan 2 20:19:38 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 2 Jan 2024 20:19:38 GMT Subject: RFR: 8322781: C1: Debug build crash in GraphBuilder::vmap() when print stats In-Reply-To: References: Message-ID: On Fri, 29 Dec 2023 15:02:21 GMT, Denghui Dong wrote: > Hi, > > Could I have a review of this fix patch that fixes a crash problem in the debug build when -XX:+PrintValueNumbering -XX:+Verbose -XX:-UseLocalValueNumbering > > Thanks Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17205#pullrequestreview-1800958430 From kvn at openjdk.org Tue Jan 2 20:19:39 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 2 Jan 2024 20:19:39 GMT Subject: RFR: 8322779: C1: Remove the unused counter 'totalInstructionNodes' In-Reply-To: References: Message-ID: <0CPSYAgq79WDpVp9zYhNzExp-5jafLmEdLaD-tAXBNA=.a0e2eaac-ea7b-41b1-adf9-4caf3c7d2298@github.com> On Fri, 29 Dec 2023 14:30:59 GMT, Denghui Dong wrote: > Hi, > > Could I have a review of this small cleanup patch that removes the unused counter 'totalInstructionNodes'. JDK-8058968 refactored the Compiler time traces and deleted the only place that read the counter. > > Thanks Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17204#pullrequestreview-1800958993 From kbarrett at openjdk.org Tue Jan 2 22:27:01 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 2 Jan 2024 22:27:01 GMT Subject: RFR: 8322758: Eliminate -Wparentheses warnings in C2 code [v2] In-Reply-To: References: Message-ID: > Please review this change to eliminate some -Wparentheses warnings. In most > cases, this involved simply adding a few parentheses to make some implicit > operator precedence explicit. > > In PhaseIdealLoop::rc_predicate, I also added a comment describing the test > being performed, since it didn't seem obvious even with the additional > parentheses. > > Testing: mach5 tier1 > > Also ran mach5 tier1 with these changes in conjunction enabling -Wparentheses > and other changes needed to make that work. Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: update copyrights for 2024 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17199/files - new: https://git.openjdk.org/jdk/pull/17199/files/8acc005e..abacbe0e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17199&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17199&range=00-01 Stats: 9 lines in 9 files changed: 0 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/17199.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17199/head:pull/17199 PR: https://git.openjdk.org/jdk/pull/17199 From kbarrett at openjdk.org Tue Jan 2 22:27:02 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 2 Jan 2024 22:27:02 GMT Subject: RFR: 8322758: Eliminate -Wparentheses warnings in C2 code [v2] In-Reply-To: References: Message-ID: <6dFRm7UiXi5ef2W0MRLvZ3wT20zYPMBGCWd2c_OXDdM=.7603d68e-c2bc-4372-b78a-a1e4c43cb37b@github.com> On Fri, 29 Dec 2023 18:21:26 GMT, Andrew Haley wrote: >> Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: >> >> update copyrights for 2024 > > Marked as reviewed by aph (Reviewer). Thanks for reviews @theRealAph and @vnkozlov . > src/hotspot/share/opto/loopPredicate.cpp line 801: > >> 799: const TypeInt* idx_type = TypeInt::INT; >> 800: // same signs and upper, or different signs and not upper. >> 801: if (((stride > 0) == (scale > 0)) == upper) { > > This is rather l33t code, but I guess it's OK with the comment. This > Suggestion: > > _Bool same_signs = (stride > 0) == (scale > 0); > if ((same_signs & upper) > || (!same_signs && !upper)) { > > generates slightly more code with GCC -O2. I'd be happy with either. I agree it's a little odd, but I don't feel strongly about it, so leaving it as is. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17199#issuecomment-1874637742 PR Review Comment: https://git.openjdk.org/jdk/pull/17199#discussion_r1439914558 From kbarrett at openjdk.org Tue Jan 2 22:36:23 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 2 Jan 2024 22:36:23 GMT Subject: RFR: 8322758: Eliminate -Wparentheses warnings in C2 code [v3] In-Reply-To: References: Message-ID: > Please review this change to eliminate some -Wparentheses warnings. In most > cases, this involved simply adding a few parentheses to make some implicit > operator precedence explicit. > > In PhaseIdealLoop::rc_predicate, I also added a comment describing the test > being performed, since it didn't seem obvious even with the additional > parentheses. > > Testing: mach5 tier1 > > Also ran mach5 tier1 with these changes in conjunction enabling -Wparentheses > and other changes needed to make that work. Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into c2-wparentheses - update copyrights for 2024 - fix -Wparentheses warnings in C2 code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17199/files - new: https://git.openjdk.org/jdk/pull/17199/files/abacbe0e..2ad3798d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17199&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17199&range=01-02 Stats: 863 lines in 58 files changed: 610 ins; 44 del; 209 mod Patch: https://git.openjdk.org/jdk/pull/17199.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17199/head:pull/17199 PR: https://git.openjdk.org/jdk/pull/17199 From kbarrett at openjdk.org Tue Jan 2 22:36:24 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 2 Jan 2024 22:36:24 GMT Subject: Integrated: 8322758: Eliminate -Wparentheses warnings in C2 code In-Reply-To: References: Message-ID: On Fri, 29 Dec 2023 02:01:08 GMT, Kim Barrett wrote: > Please review this change to eliminate some -Wparentheses warnings. In most > cases, this involved simply adding a few parentheses to make some implicit > operator precedence explicit. > > In PhaseIdealLoop::rc_predicate, I also added a comment describing the test > being performed, since it didn't seem obvious even with the additional > parentheses. > > Testing: mach5 tier1 > > Also ran mach5 tier1 with these changes in conjunction enabling -Wparentheses > and other changes needed to make that work. This pull request has now been integrated. Changeset: 122bc777 Author: Kim Barrett URL: https://git.openjdk.org/jdk/commit/122bc7770e1487cc754e17b9356217009bd6b13e Stats: 27 lines in 9 files changed: 2 ins; 0 del; 25 mod 8322758: Eliminate -Wparentheses warnings in C2 code Reviewed-by: aph, kvn ------------- PR: https://git.openjdk.org/jdk/pull/17199 From kbarrett at openjdk.org Wed Jan 3 00:12:47 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 3 Jan 2024 00:12:47 GMT Subject: RFR: 8322816: RISC-V: Incorrect guarantee in patch_vtype In-Reply-To: <2Dii1vRWzgzjyBGHrx_tH28SlYMBWUZ0h2mO7D5so_4=.817b5636-aa8a-4c6d-807a-29b1855a59a7@github.com> References: <2Dii1vRWzgzjyBGHrx_tH28SlYMBWUZ0h2mO7D5so_4=.817b5636-aa8a-4c6d-807a-29b1855a59a7@github.com> Message-ID: On Tue, 2 Jan 2024 08:56:08 GMT, Fei Yang wrote: >> Please review this change that fixes a test for a guarantee. This also >> removes a -Wparentheses warning when those are enabled (which is how the >> problem was discovered). >> >> The problem is that operator precedence groups the sub-expressions differently >> than intended. The fix is to override the operator precedence by adding >> parentheses to achieve the intended grouping. >> >> Testing: Local (linux-x64) cross-build for linux-riscv with this change plus >> -Wparentheses enabled and other changes to allow that to work. >> >> Requesting someone from the riscv porters to properly test this. > > src/hotspot/cpu/riscv/assembler_riscv.hpp line 1160: > >> 1158: #define patch_vtype(hsb, lsb, vlmul, vsew, vta, vma, vill) \ >> 1159: if (vill == 1) { \ >> 1160: guarantee((vlmul | vsew | vta | vma) == 0, \ > > I see the `vill` parameter is always false in current code, which means this guarantee never gets excecuted. And I don't think we would make use of the `vill` field of vtype in future. So I personally perfer to remove this guarantee and its enclosing if block for now. Rather than removing the guarantee, wouldn't it be better to guarantee/assert `vill == 0`? Although looking at uses, that argument is a bool, so it should be `guarantee(!vill, ...)`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17215#discussion_r1439970844 From fyang at openjdk.org Wed Jan 3 02:01:49 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 3 Jan 2024 02:01:49 GMT Subject: RFR: 8322816: RISC-V: Incorrect guarantee in patch_vtype In-Reply-To: References: <2Dii1vRWzgzjyBGHrx_tH28SlYMBWUZ0h2mO7D5so_4=.817b5636-aa8a-4c6d-807a-29b1855a59a7@github.com> Message-ID: On Wed, 3 Jan 2024 00:10:25 GMT, Kim Barrett wrote: >> src/hotspot/cpu/riscv/assembler_riscv.hpp line 1160: >> >>> 1158: #define patch_vtype(hsb, lsb, vlmul, vsew, vta, vma, vill) \ >>> 1159: if (vill == 1) { \ >>> 1160: guarantee((vlmul | vsew | vta | vma) == 0, \ >> >> I see the `vill` parameter is always false in current code, which means this guarantee never gets excecuted. And I don't think we would make use of the `vill` field of vtype in future. So I personally perfer to remove this guarantee and its enclosing if block for now. > > Rather than removing the guarantee, wouldn't it be better to guarantee/assert `vill == 0`? > Although looking at uses, that argument is a bool, so it should be `guarantee(!vill, ...)`. Hi, Yes, that's better. Maybe: `guarantee(!vill, "should be");` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17215#discussion_r1440005337 From kbarrett at openjdk.org Wed Jan 3 05:15:55 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 3 Jan 2024 05:15:55 GMT Subject: RFR: 8322879: Eliminate -Wparentheses warnings in x86-32 code Message-ID: Please review this trivial change to eliminate a -Wparentheses warning. This involved simply adding parentheses to make the implicit operator precedence explicit. Testing: Locally (linux-x64) cross-compiled for linux-x86. Also ran GHA with -Wparentheses enabled along with this and other changes needed to make that work. ------------- Commit messages: - fix -Wparentheses warnings in x86-32 code Changes: https://git.openjdk.org/jdk/pull/17237/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17237&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8322879 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17237.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17237/head:pull/17237 PR: https://git.openjdk.org/jdk/pull/17237 From fyang at openjdk.org Wed Jan 3 05:24:46 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 3 Jan 2024 05:24:46 GMT Subject: RFR: 8322790: RISC-V: Tune costs for shuffles with no conversion In-Reply-To: References: Message-ID: On Tue, 2 Jan 2024 10:57:22 GMT, Vladimir Kempik wrote: >> src/hotspot/cpu/riscv/riscv.ad line 8534: >> >>> 8532: effect(DEF dst, USE src); >>> 8533: >>> 8534: ins_cost(ALU_COST + LOAD_COST); >> >> Adding an extra cost of `ALU_COST` for these load/store nodes looks a bit strange to me. I suppose only lowering the cost for those fmv.x.w/fmv.w.x/fmv.x.d/fmv.d.x nodes will do? > > those nodes need to go below 100 which then starts looking ugly Seems that the performance gain is still there (tested on lichee-pi-4a board) when reverting part of the changes. I haven't checked the JIT code though. Try this addon change: [addon-change.diff.txt](https://github.com/openjdk/jdk/files/13815870/addon-change.diff.txt) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17206#discussion_r1440083334 From thartmann at openjdk.org Wed Jan 3 06:41:46 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 3 Jan 2024 06:41:46 GMT Subject: RFR: 8322781: C1: Debug build crash in GraphBuilder::vmap() when print stats In-Reply-To: References: Message-ID: On Fri, 29 Dec 2023 15:02:21 GMT, Denghui Dong wrote: > Hi, > > Could I have a review of this fix patch that fixes a crash problem in the debug build when -XX:+PrintValueNumbering -XX:+Verbose -XX:-UseLocalValueNumbering > > Thanks Please add a test case to `test/hotspot/jtreg/compiler/arguments/TestC1Globals.java`. Thanks! ------------- Changes requested by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17205#pullrequestreview-1801440815 From kvn at openjdk.org Wed Jan 3 07:21:37 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 3 Jan 2024 07:21:37 GMT Subject: RFR: 8322879: Eliminate -Wparentheses warnings in x86-32 code In-Reply-To: References: Message-ID: On Wed, 3 Jan 2024 05:10:50 GMT, Kim Barrett wrote: > Please review this trivial change to eliminate a -Wparentheses warning. > This involved simply adding parentheses to make the implicit operator > precedence explicit. > > Testing: Locally (linux-x64) cross-compiled for linux-x86. Also ran GHA with > -Wparentheses enabled along with this and other changes needed to make that > work. Trivial. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17237#pullrequestreview-1801480220 From ddong at openjdk.org Wed Jan 3 07:34:00 2024 From: ddong at openjdk.org (Denghui Dong) Date: Wed, 3 Jan 2024 07:34:00 GMT Subject: RFR: 8322781: C1: Debug build crash in GraphBuilder::vmap() when print stats [v2] In-Reply-To: References: Message-ID: <6uLy4L6t2o_KFfe5CNlXg8boNYERM9hryaHCTGou16I=.4988281c-10b7-4e08-8fc2-abe14ce4938d@github.com> > Hi, > > Could I have a review of this fix patch that fixes a crash problem in the debug build when -XX:+PrintValueNumbering -XX:+Verbose -XX:-UseLocalValueNumbering > > Thanks Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: add test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17205/files - new: https://git.openjdk.org/jdk/pull/17205/files/49f90f41..3d5280ce Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17205&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17205&range=00-01 Stats: 11 lines in 1 file changed: 11 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17205.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17205/head:pull/17205 PR: https://git.openjdk.org/jdk/pull/17205 From thartmann at openjdk.org Wed Jan 3 07:44:46 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 3 Jan 2024 07:44:46 GMT Subject: RFR: 8322779: C1: Remove the unused counter 'totalInstructionNodes' In-Reply-To: References: Message-ID: On Fri, 29 Dec 2023 14:30:59 GMT, Denghui Dong wrote: > Hi, > > Could I have a review of this small cleanup patch that removes the unused counter 'totalInstructionNodes'. JDK-8058968 refactored the Compiler time traces and deleted the only place that read the counter. > > Thanks Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17204#pullrequestreview-1801501711 From ddong at openjdk.org Wed Jan 3 07:50:01 2024 From: ddong at openjdk.org (Denghui Dong) Date: Wed, 3 Jan 2024 07:50:01 GMT Subject: RFR: 8322781: C1: Debug build crash in GraphBuilder::vmap() when print stats [v3] In-Reply-To: References: Message-ID: > Hi, > > Could I have a review of this fix patch that fixes a crash problem in the debug build when -XX:+PrintValueNumbering -XX:+Verbose -XX:-UseLocalValueNumbering > > Thanks Denghui Dong has updated the pull request incrementally with two additional commits since the last revision: - update - update ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17205/files - new: https://git.openjdk.org/jdk/pull/17205/files/3d5280ce..3408bc02 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17205&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17205&range=01-02 Stats: 3 lines in 1 file changed: 1 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17205.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17205/head:pull/17205 PR: https://git.openjdk.org/jdk/pull/17205 From ddong at openjdk.org Wed Jan 3 07:53:47 2024 From: ddong at openjdk.org (Denghui Dong) Date: Wed, 3 Jan 2024 07:53:47 GMT Subject: RFR: 8322781: C1: Debug build crash in GraphBuilder::vmap() when print stats [v3] In-Reply-To: References: Message-ID: On Wed, 3 Jan 2024 06:38:46 GMT, Tobias Hartmann wrote: > Please add a test case to `test/hotspot/jtreg/compiler/arguments/TestC1Globals.java`. Thanks! Added and verified in my Linux env. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17205#issuecomment-1874966579 From davleopo at openjdk.org Wed Jan 3 08:52:48 2024 From: davleopo at openjdk.org (David Leopoldseder) Date: Wed, 3 Jan 2024 08:52:48 GMT Subject: RFR: 8322636: [JVMCI] HotSpotSpeculationLog can be inconsistent across a single compile [v2] In-Reply-To: References: Message-ID: On Tue, 2 Jan 2024 18:45:41 GMT, Tom Rodriguez wrote: >> David Leopoldseder has updated the pull request incrementally with one additional commit since the last revision: >> >> 8322636: [JVMCI] HotSpotSpeculationLog add javadoc to maySpeculate > > More specifically it only validates against the speculations that failed before the last call to collectFailedSpeculations which must always be called explicitly. And we should point out somewhere that installCode will call collectFailedSpeculations before installation and revalidate the current set of speculations, bailing out if any were violated during compilation. This doesn't seem to be documented anywhere. @tkrodriguez where do you want to put it? Id suggest to add some additional javadoc to maySpeculate so we end up with something like /** * @return {@code true} if the given speculation can be performed, i.e., it never failed so far, otherwise * return {@code false}. Note, that this method returns consistent results for any given speculation for the * entire lifetime of the enclosing SpeculationLog object. This means that speculations failed during a * compilation will not be updated. Validation of speculations only considers those failed since the last * call to {@link #collectFailedSpeculations()}. * * Users of {@link SpeculationLog} must explicitly call {@link #collectFailedSpeculations()} to collect * failed speculations. This should be done before starting a compile. * * Code installation performs a revalidation of the current set of speculations. If this fails, i.e. since the * start of the compile new speculations failed, the compilation is aborted with a bailout. This is done in * {@link #getFlattenedSpeculations(boolean)}. */ ------------- PR Comment: https://git.openjdk.org/jdk/pull/17183#issuecomment-1875022576 From epeter at openjdk.org Wed Jan 3 09:01:31 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 3 Jan 2024 09:01:31 GMT Subject: RFR: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs [v58] In-Reply-To: References: Message-ID: > I want to push this in JDK23. > After this fix here, I'm doing [this refactoring](https://github.com/openjdk/jdk/pull/16620). > > To calm your nerves: most of the changes are in auto-generated tests, and tests in general. > > **Context** > > `-XX:+AlignVector` ensures that SuperWord only creates LoadVector and StoreVector that can be memory aligned. This is achieved by iterating in the pre-loop until we reach the alignment boundary, then we can start the main loop properly aligned. However, this is not possible in all cases, sometimes some memory accesses cannot be guaranteed to be aligned, and we need to reject vectorization (at least partially, for some of the packs). > > Alignment is split into two tasks: > - Alignment Correctness Checks: only relevant if `-XX:+AlignVector`. Need to reject vectorization if alignment is not possible. We must check if the address of the vector load/store is aligned with (divisible by) `ObjectAlignmentInBytes`. > - Alignment by adjusting pre-loop limit: alignment is desirable even if `-XX:-AlignVector`. We would like to align the vectors with their vector width. > > **Problem** > > I have recently found a bug with our AlignVector [JDK-8310190](https://bugs.openjdk.org/browse/JDK-8310190). > In that bug, we perform a misaligned memory vector access, which results in a `SIGBUS` on an ARM32 machine. > Thanks @fg1417 for confirming this! > Hence, we need to fix the alignment correctness checks. > > While working on this task, I also found some bugs in the "alignment by adjusting pre-loop limit": there were cases where it did not align the vectors correctly. > > **Problem Details** > > Reproducer: > > > static void test(short[] a, short[] b, short mask) { > for (int i = 0; i < RANGE; i+=8) { > // Problematic for AlignVector > b[i+0] = (short)(a[i+0] & mask); // best_memref, align 0 > > b[i+3] = (short)(a[i+3] & mask); // pack at offset 6 bytes > b[i+4] = (short)(a[i+4] & mask); > b[i+5] = (short)(a[i+5] & mask); > b[i+6] = (short)(a[i+6] & mask); > } > } > > > During `SuperWord::find_adjacent_refs` we used to check if the references are expected to be aligned. For that, we look at each "group" of references (eg all `LoadS`) and take the reference with the lowest offset. For that chosen reference, we check if it is alignable. If yes, we accept all references of that group, if no we reject all. > > This is problematic as shown in this example. We have references at index offset `0, 3, 4, 5, 6`, and by... Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 124 commits: - Merge branch 'JDK-8311586' of https://github.com/eme64/jdk into JDK-8311586 - Apply suggestions from code review by Christian Co-authored-by: Christian Hagedorn - fix copyright year 2024 - Merge branch 'master' into JDK-8311586 - more comments in SuperWord::adjust_pre_loop_limit_to_align_main_loop_vectors - comments about modulo positive / negative values - Apply suggestions from code review from Christian Co-authored-by: Christian Hagedorn - more small fixes by Christian - fix for yesterday's reviews by Christian - improve case analysis empty / constrained / trivial - ... and 114 more: https://git.openjdk.org/jdk/compare/06dd7353...d01a0cd9 ------------- Changes: https://git.openjdk.org/jdk/pull/14785/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14785&range=57 Stats: 8883 lines in 23 files changed: 7561 ins; 363 del; 959 mod Patch: https://git.openjdk.org/jdk/pull/14785.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14785/head:pull/14785 PR: https://git.openjdk.org/jdk/pull/14785 From shade at openjdk.org Wed Jan 3 11:38:47 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 3 Jan 2024 11:38:47 GMT Subject: RFR: 8322879: Eliminate -Wparentheses warnings in x86-32 code In-Reply-To: References: Message-ID: <1UfzIj3lfDKsWO6bURA4Fz-txwAaefwzEniMHpfcnTs=.90caaa75-f975-440d-a8b6-a9f602400e99@github.com> On Wed, 3 Jan 2024 05:10:50 GMT, Kim Barrett wrote: > Please review this trivial change to eliminate a -Wparentheses warning. > This involved simply adding parentheses to make the implicit operator > precedence explicit. > > Testing: Locally (linux-x64) cross-compiled for linux-x86. Also ran GHA with > -Wparentheses enabled along with this and other changes needed to make that > work. Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/17237#pullrequestreview-1801831066 From shade at openjdk.org Wed Jan 3 12:02:37 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 3 Jan 2024 12:02:37 GMT Subject: RFR: 8322781: C1: Debug build crash in GraphBuilder::vmap() when print stats [v3] In-Reply-To: References: Message-ID: On Wed, 3 Jan 2024 07:50:01 GMT, Denghui Dong wrote: >> Hi, >> >> Could I have a review of this fix patch that fixes a crash problem in the debug build when -XX:+PrintValueNumbering -XX:+Verbose -XX:-UseLocalValueNumbering >> >> Thanks > > Denghui Dong has updated the pull request incrementally with two additional commits since the last revision: > > - update > - update Looks fine. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17205#pullrequestreview-1801869703 From shade at openjdk.org Wed Jan 3 12:12:48 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 3 Jan 2024 12:12:48 GMT Subject: RFR: 8322759: Eliminate -Wparentheses warnings in compiler code In-Reply-To: <496tGkQ1KUCrW1IHOETyvhqopkNYjsEoupxjo0Ze3Wg=.0223f494-fd34-4e4a-a31a-5030603f2113@github.com> References: <496tGkQ1KUCrW1IHOETyvhqopkNYjsEoupxjo0Ze3Wg=.0223f494-fd34-4e4a-a31a-5030603f2113@github.com> Message-ID: On Fri, 29 Dec 2023 03:33:11 GMT, Kim Barrett wrote: > Please review this change to eliminate some -Wparentheses warnings. This > involved simply adding a few parentheses to make some implicit operator > precedence explicit. > > This change addresses non-C2 parts of the compiler component. > > Testing: mach5 tier1 > > Also ran mach5 tier1 with these changes in conjunction enabling -Wparentheses > and other changes needed to make that work. src/hotspot/share/compiler/compilerDefinitions.inline.hpp line 60: > 58: > 59: inline bool CompilerConfig::is_c1_or_interpreter_only_no_jvmci() { > 60: assert((is_jvmci_compiler() && is_jvmci()) || !is_jvmci_compiler(), "JVMCI compiler implies enabled JVMCI"); This looks like simply: assert(!is_jvmci_compiler() || is_jvmci(), "JVMCI compiler implies enabled JVMCI"); src/hotspot/share/compiler/compilerDefinitions.inline.hpp line 117: > 115: // Tiered is basically C1 & (C2 | JVMCI) minus all the odd cases with restrictions. > 116: inline bool CompilerConfig::is_tiered() { > 117: assert((is_c1_simple_only() && is_c1_only()) || !is_c1_simple_only(), "c1 simple mode must imply c1-only mode"); Ditto, assert(!is_c1_simple_only() || is_c1_only(), "c1 simple mode must imply c1-only mode"); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17200#discussion_r1440379521 PR Review Comment: https://git.openjdk.org/jdk/pull/17200#discussion_r1440381032 From shade at openjdk.org Wed Jan 3 12:25:49 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 3 Jan 2024 12:25:49 GMT Subject: RFR: 8322694: C1: Handle Constant and IfOp in NullCheckEliminator In-Reply-To: <1vvyuwLRjlWItKwCyighjCSM5SNbO4CSEE59hQtCU24=.b4783e52-328a-4ce3-8c92-7b736cea7546@github.com> References: <1vvyuwLRjlWItKwCyighjCSM5SNbO4CSEE59hQtCU24=.b4783e52-328a-4ce3-8c92-7b736cea7546@github.com> Message-ID: On Mon, 25 Dec 2023 15:43:52 GMT, Denghui Dong wrote: > This patch added the support for Constant and IfOn in NullCheckEliminator to eliminate more null check. > > testing: tier1-4 in progress Nice corner case! src/hotspot/share/c1/c1_Optimizer.cpp line 888: > 886: mark_visitable(instr); > 887: if (instr->is_pinned() || instr->can_trap() || (instr->as_NullCheck() != nullptr) > 888: || (instr->as_Constant() != nullptr && instr->as_Constant()->type()->is_object())) { Is this just `instr->as_ObjectConstant() != nullptr`? src/hotspot/share/c1/c1_Optimizer.cpp line 1206: > 1204: void NullCheckEliminator::handle_Constant(Constant *x) { > 1205: ObjectType* ot = x->type()->as_ObjectType(); > 1206: if (ot && ot->is_loaded()) { Hotspot style guide insists we avoid implicit bool conversions. Check `ot != nullptr` explicitly. src/hotspot/share/c1/c1_Optimizer.cpp line 1208: > 1206: if (ot && ot->is_loaded()) { > 1207: ObjectConstant* oc = ot->as_ObjectConstant(); > 1208: if (!oc || !oc->value()->is_null_object()) { Ditto, check `oc == nullptr`. Now, the fact that `as_ObjectConstant` returns `nullptr` means this is not an _object constant_, but some other constant, right? I think this is similar to what other places in C1 do, so while awkward, this looks okay. ------------- PR Review: https://git.openjdk.org/jdk/pull/17191#pullrequestreview-1801885674 PR Review Comment: https://git.openjdk.org/jdk/pull/17191#discussion_r1440392602 PR Review Comment: https://git.openjdk.org/jdk/pull/17191#discussion_r1440383548 PR Review Comment: https://git.openjdk.org/jdk/pull/17191#discussion_r1440391525 From ddong at openjdk.org Wed Jan 3 13:11:44 2024 From: ddong at openjdk.org (Denghui Dong) Date: Wed, 3 Jan 2024 13:11:44 GMT Subject: Integrated: 8322779: C1: Remove the unused counter 'totalInstructionNodes' In-Reply-To: References: Message-ID: On Fri, 29 Dec 2023 14:30:59 GMT, Denghui Dong wrote: > Hi, > > Could I have a review of this small cleanup patch that removes the unused counter 'totalInstructionNodes'. JDK-8058968 refactored the Compiler time traces and deleted the only place that read the counter. > > Thanks This pull request has now been integrated. Changeset: 539da248 Author: Denghui Dong URL: https://git.openjdk.org/jdk/commit/539da24863bc47b977ee86c584af2332426993a7 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod 8322779: C1: Remove the unused counter 'totalInstructionNodes' Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/17204 From ddong at openjdk.org Wed Jan 3 13:37:21 2024 From: ddong at openjdk.org (Denghui Dong) Date: Wed, 3 Jan 2024 13:37:21 GMT Subject: RFR: 8322694: C1: Handle Constant and IfOp in NullCheckEliminator [v2] In-Reply-To: References: <1vvyuwLRjlWItKwCyighjCSM5SNbO4CSEE59hQtCU24=.b4783e52-328a-4ce3-8c92-7b736cea7546@github.com> Message-ID: On Wed, 3 Jan 2024 12:22:43 GMT, Aleksey Shipilev wrote: >> Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: >> >> update > > src/hotspot/share/c1/c1_Optimizer.cpp line 888: > >> 886: mark_visitable(instr); >> 887: if (instr->is_pinned() || instr->can_trap() || (instr->as_NullCheck() != nullptr) >> 888: || (instr->as_Constant() != nullptr && instr->as_Constant()->type()->is_object())) { > > Is this just `instr->as_ObjectConstant() != nullptr`? Do you mean `insr->type()->as_ObjectConstant() != nullptr`? But we should include other Constants (e.g. `ArrayConstant`, `InstanceConstant`), and those classes don't implement `as_ObjectConstant` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17191#discussion_r1440457024 From ddong at openjdk.org Wed Jan 3 13:37:21 2024 From: ddong at openjdk.org (Denghui Dong) Date: Wed, 3 Jan 2024 13:37:21 GMT Subject: RFR: 8322694: C1: Handle Constant and IfOp in NullCheckEliminator [v2] In-Reply-To: <1vvyuwLRjlWItKwCyighjCSM5SNbO4CSEE59hQtCU24=.b4783e52-328a-4ce3-8c92-7b736cea7546@github.com> References: <1vvyuwLRjlWItKwCyighjCSM5SNbO4CSEE59hQtCU24=.b4783e52-328a-4ce3-8c92-7b736cea7546@github.com> Message-ID: > This patch added the support for Constant and IfOn in NullCheckEliminator to eliminate more null check. > > testing: tier1-4 in progress Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: update ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17191/files - new: https://git.openjdk.org/jdk/pull/17191/files/fe1f54a9..68952fcc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17191&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17191&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17191.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17191/head:pull/17191 PR: https://git.openjdk.org/jdk/pull/17191 From ddong at openjdk.org Wed Jan 3 13:41:40 2024 From: ddong at openjdk.org (Denghui Dong) Date: Wed, 3 Jan 2024 13:41:40 GMT Subject: RFR: 8322694: C1: Handle Constant and IfOp in NullCheckEliminator [v2] In-Reply-To: References: <1vvyuwLRjlWItKwCyighjCSM5SNbO4CSEE59hQtCU24=.b4783e52-328a-4ce3-8c92-7b736cea7546@github.com> Message-ID: On Wed, 3 Jan 2024 12:12:24 GMT, Aleksey Shipilev wrote: >> Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: >> >> update > > src/hotspot/share/c1/c1_Optimizer.cpp line 1206: > >> 1204: void NullCheckEliminator::handle_Constant(Constant *x) { >> 1205: ObjectType* ot = x->type()->as_ObjectType(); >> 1206: if (ot && ot->is_loaded()) { > > Hotspot style guide insists we avoid implicit bool conversions. Check `ot != nullptr` explicitly. fixed. > src/hotspot/share/c1/c1_Optimizer.cpp line 1208: > >> 1206: if (ot && ot->is_loaded()) { >> 1207: ObjectConstant* oc = ot->as_ObjectConstant(); >> 1208: if (!oc || !oc->value()->is_null_object()) { > > Ditto, check `oc == nullptr`. > > Now, the fact that `as_ObjectConstant` returns `nullptr` means this is not an _object constant_, but some other constant, right? I think this is similar to what other places in C1 do, so while awkward, this looks okay. fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17191#discussion_r1440462233 PR Review Comment: https://git.openjdk.org/jdk/pull/17191#discussion_r1440462604 From roland at openjdk.org Wed Jan 3 14:11:50 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 3 Jan 2024 14:11:50 GMT Subject: RFR: 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 [v8] In-Reply-To: References: Message-ID: On Fri, 15 Dec 2023 14:32:57 GMT, Roland Westrelin wrote: >> Range check smearing and range check predication make an array access >> dependent on 2 (or more in the case of RC smearing) conditions. As a >> consequence, if a range check can be eliminated because there's an >> identical dominating range check, the control dependent nodes that >> could float and become dependent on the dominating range check cannot >> be allowed to float because there's a risk that they would then bypass >> one of the checks that make the access legal. >> >> `IfNode::dominated_by()` and `PhaseIdealLoop::dominated_by()` have >> logic to prevent this: nodes that are control dependent on a range >> check or predicate are not allowed to float. This is however not >> sufficient as demonstrated by the test cases. >> >> In `TestArrayAccessAboveRCAfterSmearingOrPredication.testRangeCheckSmearing()`: >> >> >> v += array[i]; >> if (flag2) { >> if (flag3) { >> field = 0x42; >> } >> } >> if (flagField == 1) { >> v += array[i]; >> } >> >> >> The range check for the second `array[i]` load is replaced by the >> dominating range check for the first `array[i]` but because the second >> `array[i]` load could really be dependent on multiple range checks (in >> case smearing happened which is not the case here), c2 doesn't allow >> the second `array[i]` to float when the second range check is >> removed. The second `array[i]` is then control dependent on: >> >> >> if (flagField == 1) { >> >> >> which is next found to be dominated by the same test: >> >> >> if (flag == 1) { >> >> >> and is removed. However nothing in `dominated_by()` treats node >> dependent on tests that are not range check or predicates >> specially. So the second `array[i]` is allowed to float and become >> dependent on: >> >> >> if (flag == 1) { >> >> >> which is above the range check for that access. The test method in its >> last invocation is passed an index for the array access that's widely >> out of range. The array load happens before the range check and >> crashes the VM. `testLoopPredication()` is a similar test where array >> loads become dependent on predicates and end up above range checks. >> >> `TestArrayAccessCastIIAboveRC.java` is the test case from the bug >> where for similar reasons a range check `CastII` ends up above its >> range check, becomes top because its input becomes some integer that >> conflicts with its... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Anyone else for the review of this? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16886#issuecomment-1875430778 From shade at openjdk.org Wed Jan 3 14:48:48 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 3 Jan 2024 14:48:48 GMT Subject: RFR: 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 [v8] In-Reply-To: References: Message-ID: On Fri, 15 Dec 2023 14:32:57 GMT, Roland Westrelin wrote: >> Range check smearing and range check predication make an array access >> dependent on 2 (or more in the case of RC smearing) conditions. As a >> consequence, if a range check can be eliminated because there's an >> identical dominating range check, the control dependent nodes that >> could float and become dependent on the dominating range check cannot >> be allowed to float because there's a risk that they would then bypass >> one of the checks that make the access legal. >> >> `IfNode::dominated_by()` and `PhaseIdealLoop::dominated_by()` have >> logic to prevent this: nodes that are control dependent on a range >> check or predicate are not allowed to float. This is however not >> sufficient as demonstrated by the test cases. >> >> In `TestArrayAccessAboveRCAfterSmearingOrPredication.testRangeCheckSmearing()`: >> >> >> v += array[i]; >> if (flag2) { >> if (flag3) { >> field = 0x42; >> } >> } >> if (flagField == 1) { >> v += array[i]; >> } >> >> >> The range check for the second `array[i]` load is replaced by the >> dominating range check for the first `array[i]` but because the second >> `array[i]` load could really be dependent on multiple range checks (in >> case smearing happened which is not the case here), c2 doesn't allow >> the second `array[i]` to float when the second range check is >> removed. The second `array[i]` is then control dependent on: >> >> >> if (flagField == 1) { >> >> >> which is next found to be dominated by the same test: >> >> >> if (flag == 1) { >> >> >> and is removed. However nothing in `dominated_by()` treats node >> dependent on tests that are not range check or predicates >> specially. So the second `array[i]` is allowed to float and become >> dependent on: >> >> >> if (flag == 1) { >> >> >> which is above the range check for that access. The test method in its >> last invocation is passed an index for the array access that's widely >> out of range. The array load happens before the range check and >> crashes the VM. `testLoopPredication()` is a similar test where array >> loads become dependent on predicates and end up above range checks. >> >> `TestArrayAccessCastIIAboveRC.java` is the test case from the bug >> where for similar reasons a range check `CastII` ends up above its >> range check, becomes top because its input becomes some integer that >> conflicts with its... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review There are a couple of GHA failures, and those are probably resolved in current master. It would be helpful if you can pull from current master and get a clean run. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16886#issuecomment-1875487678 From roland at openjdk.org Wed Jan 3 15:53:04 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 3 Jan 2024 15:53:04 GMT Subject: RFR: 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 [v9] In-Reply-To: References: Message-ID: > Range check smearing and range check predication make an array access > dependent on 2 (or more in the case of RC smearing) conditions. As a > consequence, if a range check can be eliminated because there's an > identical dominating range check, the control dependent nodes that > could float and become dependent on the dominating range check cannot > be allowed to float because there's a risk that they would then bypass > one of the checks that make the access legal. > > `IfNode::dominated_by()` and `PhaseIdealLoop::dominated_by()` have > logic to prevent this: nodes that are control dependent on a range > check or predicate are not allowed to float. This is however not > sufficient as demonstrated by the test cases. > > In `TestArrayAccessAboveRCAfterSmearingOrPredication.testRangeCheckSmearing()`: > > > v += array[i]; > if (flag2) { > if (flag3) { > field = 0x42; > } > } > if (flagField == 1) { > v += array[i]; > } > > > The range check for the second `array[i]` load is replaced by the > dominating range check for the first `array[i]` but because the second > `array[i]` load could really be dependent on multiple range checks (in > case smearing happened which is not the case here), c2 doesn't allow > the second `array[i]` to float when the second range check is > removed. The second `array[i]` is then control dependent on: > > > if (flagField == 1) { > > > which is next found to be dominated by the same test: > > > if (flag == 1) { > > > and is removed. However nothing in `dominated_by()` treats node > dependent on tests that are not range check or predicates > specially. So the second `array[i]` is allowed to float and become > dependent on: > > > if (flag == 1) { > > > which is above the range check for that access. The test method in its > last invocation is passed an index for the array access that's widely > out of range. The array load happens before the range check and > crashes the VM. `testLoopPredication()` is a similar test where array > loads become dependent on predicates and end up above range checks. > > `TestArrayAccessCastIIAboveRC.java` is the test case from the bug > where for similar reasons a range check `CastII` ends up above its > range check, becomes top because its input becomes some integer that > conflicts with its type (but there's no condition to catch it). The > graph becomes broken and c2 crashes. > > Logic in the `dominated_by()` methods ... Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: - Merge branch 'master' into JDK-8319793 - review - Revert "Update src/hotspot/share/opto/castnode.hpp" This reverts commit 356c91cca911ed486f9f87f3eff53ce21e1e3ec9. - Revert "Update src/hotspot/share/opto/memnode.hpp" This reverts commit bdb731ea562f314f44d327f7243ef5cf9ad40b2e. - review - Update src/hotspot/share/opto/memnode.hpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/castnode.hpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/castnode.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/ifnode.cpp Co-authored-by: Christian Hagedorn - Merge branch 'master' into JDK-8319793 - ... and 1 more: https://git.openjdk.org/jdk/compare/15519285...dbe3c4c1 ------------- Changes: https://git.openjdk.org/jdk/pull/16886/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16886&range=08 Stats: 367 lines in 14 files changed: 309 ins; 27 del; 31 mod Patch: https://git.openjdk.org/jdk/pull/16886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16886/head:pull/16886 PR: https://git.openjdk.org/jdk/pull/16886 From mli at openjdk.org Wed Jan 3 16:12:06 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 3 Jan 2024 16:12:06 GMT Subject: RFR: 8322959: vectorapi: get wrong argument for `limit` in indexPartiallyInUpperRange intrinsic Message-ID: Hi, Can you review this simple fix for indexPartiallyInUpperRange intrinsic? Thanks. ------------- Commit messages: - Initial commit Changes: https://git.openjdk.org/jdk/pull/17247/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17247&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8322959 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17247.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17247/head:pull/17247 PR: https://git.openjdk.org/jdk/pull/17247 From sviswanathan at openjdk.org Wed Jan 3 17:12:49 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 3 Jan 2024 17:12:49 GMT Subject: RFR: 8322959: vectorapi: get wrong argument for `limit` in indexPartiallyInUpperRange intrinsic In-Reply-To: References: Message-ID: On Wed, 3 Jan 2024 16:05:48 GMT, Hamlin Li wrote: > Hi, > Can you review this simple fix for indexPartiallyInUpperRange intrinsic? > Thanks. src/hotspot/share/opto/vectorIntrinsics.cpp line 3151: > 3149: > 3150: Node* offset = argument(3); > 3151: Node* limit = argument(4); The offset is of long type so will take 2 spots (3 and 4) of argument. So limit will be argument(5). The original code (limit = argument(5)) looks correct to me. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17247#discussion_r1440691288 From roland at openjdk.org Wed Jan 3 17:17:54 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 3 Jan 2024 17:17:54 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v3] In-Reply-To: References: Message-ID: > This change implements C2 optimizations for calls to > ScopedValue.get(). Indeed, in: > > > v1 = scopedValue.get(); > ... > v2 = scopedValue.get(); > > > `v2` can be replaced by `v1` and the second call to `get()` can be > optimized out. That's true whatever is between the 2 calls unless a > new mapping for `scopedValue` is created in between (when that happens > no optimizations is performed for the method being compiled). Hoisting > a `get()` call out of loop for a loop invariant `scopedValue` should > also be legal in most cases. > > `ScopedValue.get()` is implemented in java code as a 2 step process. A > cache is attached to the current thread object. If the `ScopedValue` > object is in the cache then the result from `get()` is read from > there. Otherwise a slow call is performed that also inserts the > mapping in the cache. The cache itself is lazily allocated. One > `ScopedValue` can be hashed to 2 different indexes in the cache. On a > cache probe, both indexes are checked. As a consequence, the process > of probing the cache is a multi step process (check if the cache is > present, check first index, check second index if first index > failed). If the cache is populated early on, then when the method that > calls `ScopedValue.get()` is compiled, profile reports the slow path > as never taken and only the read from the cache is compiled. > > To perform the optimizations, I added 3 new node types to C2: > > - the pair > ScopedValueGetHitsInCacheNode/ScopedValueGetLoadFromCacheNode for > the cache probe > > - a cfg node ScopedValueGetResultNode to help locate the result of the > `get()` call in the IR graph. > > In pseudo code, once the nodes are inserted, the code of a `get()` is: > > > hits_in_the_cache = ScopedValueGetHitsInCache(scopedValue) > if (hits_in_the_cache) { > res = ScopedValueGetLoadFromCache(hits_in_the_cache); > } else { > res = ..; //slow call possibly inlined. Subgraph can be arbitray complex > } > res = ScopedValueGetResult(res) > > > In the snippet: > > > v1 = scopedValue.get(); > ... > v2 = scopedValue.get(); > > > Replacing `v2` by `v1` is then done by starting from the > `ScopedValueGetResult` node for the second `get()` and looking for a > dominating `ScopedValueGetResult` for the same `ScopedValue` > object. When one is found, it is used as a replacement. Eliminating > the second `get()` call is achieved by making > `ScopedValueGetHitsInCache` always successful if there's a dominating > `ScopedValueGetResult` and replacing its companion > `ScopedValueGetLoadFromCache` by the dominating > `ScopedValueGetResult`. > > Hoisting a `g... Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: - Merge branch 'master' into JDK-8320649 - test failures - white spaces + bug id in test - test & fix ------------- Changes: https://git.openjdk.org/jdk/pull/16966/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16966&range=02 Stats: 2037 lines in 33 files changed: 2007 ins; 1 del; 29 mod Patch: https://git.openjdk.org/jdk/pull/16966.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16966/head:pull/16966 PR: https://git.openjdk.org/jdk/pull/16966 From shade at openjdk.org Wed Jan 3 17:26:43 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 3 Jan 2024 17:26:43 GMT Subject: RFR: 8320128: Clean up Parse constructor for OSR [v2] In-Reply-To: References: <0mxhhLR9DDdpiBhbXWPxCmMzo_OI9DYiIX36xGhxC_c=.1eed99b9-2dc3-4403-b3c1-22b6ebed6079@github.com> Message-ID: On Tue, 21 Nov 2023 06:00:29 GMT, Xin Liu wrote: >> There's a special case for the constructor of Parse. If current compilation is OSR and it is handling the top-level method(depth() == 1), then >> >> 1. _tf = C->tf(); >> 2. _entry_bci = C->entry_bci(); >> 3. _flow = method()->get_osr_flow_analysis(_entry_bci); >> >> We don't need to assign those member data twice. We can also factor out _flow->failing() for the special case and normal cases. >> >> It's worth mentioning that we can't save ciTypeFlow computation because >> get_osr_flow_analysis(_entry_bci) actually needs get_flow_analysis(method()). > > Xin Liu has updated the pull request incrementally with one additional commit since the last revision: > > Update according to reviewer's feedback. This looks reasonable. I have a few cosmetic comments/suggestions. src/hotspot/share/opto/parse1.cpp line 513: > 511: tty->print("OSR @%d ", _entry_bci); > 512: } > 513: tty->print_cr("type flow bailout: %s", _flow->failure_reason()); Not sure if we want to keep the single `print_cr` for log atomicity reasons. I think this would be good too: if (is_osr_parse()) { tty->print_cr("OSR @%d type flow bailout: %s", _entry_bci, _flow->failure_reason()); } else { tty->print_cr("type flow bailout: %s", _flow->failure_reason()); } src/hotspot/share/opto/parse1.cpp line 529: > 527: } > 528: > 529: #ifdef ASSERT I think the goal for this `#ifdef` block is to eliminate even the `if (depth() == 1)` in product builds. Yes, most of the code is dead, but it is safer not to rely on it. Leave it as is. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16669#pullrequestreview-1802720853 PR Review Comment: https://git.openjdk.org/jdk/pull/16669#discussion_r1440702469 PR Review Comment: https://git.openjdk.org/jdk/pull/16669#discussion_r1440685800 From kvn at openjdk.org Wed Jan 3 18:00:39 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 3 Jan 2024 18:00:39 GMT Subject: RFR: 8322959: vectorapi: get wrong argument for `limit` in indexPartiallyInUpperRange intrinsic In-Reply-To: References: Message-ID: On Wed, 3 Jan 2024 17:10:07 GMT, Sandhya Viswanathan wrote: >> Hi, >> Can you review this simple fix for indexPartiallyInUpperRange intrinsic? >> Thanks. > > src/hotspot/share/opto/vectorIntrinsics.cpp line 3151: > >> 3149: >> 3150: Node* offset = argument(3); >> 3151: Node* limit = argument(4); > > The offset is of long type so will take 2 spots (3 and 4) of argument. So limit will be argument(5). The original code (limit = argument(5)) looks correct to me. @sviswa7 is right. @Hamlin-Li you do you have a test case where the value is wrong? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17247#discussion_r1440743534 From kvn at openjdk.org Wed Jan 3 18:39:02 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 3 Jan 2024 18:39:02 GMT Subject: RFR: JDK-8321599 Data loss in AVX3 Base64 decoding [v6] In-Reply-To: References: Message-ID: On Wed, 20 Dec 2023 16:28:03 GMT, Scott Gibbons wrote: >> Fix for looking for padding characters within the encoded string. Was not adding start offset to length, so was looking at potentially freed or uninitialized memory. >> >> Tested teir1 and with testcase supplied with JBS issue. >> >> The problem will only occur when all of the following are true: >> 1. The source offset of the string to be decoded is != 0. >> 2. The characters at the beginning of the string (minus the offset) plus the string length mod 64 are either "=" or "==". >> 3. The string is >= 32 characters. >> 4. The string is not MIME encoded. >> >> If any of these conditions are not met, the decode works as expected. This was due to omitting the source offset of the string when checking for padding characters. > > Scott Gibbons has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: > > - Merge branch 'openjdk:master' into Base64-fix > - Updated copyright year > - Updated copyright year > - Revert code size change - wa for an experiment only. > - Added some comments to the test > - Merge branch 'openjdk:master' into Base64-fix > - Merge branch 'Base64-fix' of https://github.com/asgibbons/jdk into Base64-fix > - Merge branch 'openjdk:master' into Base64-fix > - Added tests for proper length and padding checks > - Fix for JDK-8321599 Looks reasonable. Please, update copyright year to 2024 in source file and test. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17039#pullrequestreview-1802869126 From kvn at openjdk.org Wed Jan 3 19:45:44 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 3 Jan 2024 19:45:44 GMT Subject: RFR: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs [v58] In-Reply-To: References: Message-ID: On Wed, 3 Jan 2024 09:01:31 GMT, Emanuel Peter wrote: >> I want to push this in JDK23. >> After this fix here, I'm doing [this refactoring](https://github.com/openjdk/jdk/pull/16620). >> >> To calm your nerves: most of the changes are in auto-generated tests, and tests in general. >> >> **Context** >> >> `-XX:+AlignVector` ensures that SuperWord only creates LoadVector and StoreVector that can be memory aligned. This is achieved by iterating in the pre-loop until we reach the alignment boundary, then we can start the main loop properly aligned. However, this is not possible in all cases, sometimes some memory accesses cannot be guaranteed to be aligned, and we need to reject vectorization (at least partially, for some of the packs). >> >> Alignment is split into two tasks: >> - Alignment Correctness Checks: only relevant if `-XX:+AlignVector`. Need to reject vectorization if alignment is not possible. We must check if the address of the vector load/store is aligned with (divisible by) `ObjectAlignmentInBytes`. >> - Alignment by adjusting pre-loop limit: alignment is desirable even if `-XX:-AlignVector`. We would like to align the vectors with their vector width. >> >> **Problem** >> >> I have recently found a bug with our AlignVector [JDK-8310190](https://bugs.openjdk.org/browse/JDK-8310190). >> In that bug, we perform a misaligned memory vector access, which results in a `SIGBUS` on an ARM32 machine. >> Thanks @fg1417 for confirming this! >> Hence, we need to fix the alignment correctness checks. >> >> While working on this task, I also found some bugs in the "alignment by adjusting pre-loop limit": there were cases where it did not align the vectors correctly. >> >> **Problem Details** >> >> Reproducer: >> >> >> static void test(short[] a, short[] b, short mask) { >> for (int i = 0; i < RANGE; i+=8) { >> // Problematic for AlignVector >> b[i+0] = (short)(a[i+0] & mask); // best_memref, align 0 >> >> b[i+3] = (short)(a[i+3] & mask); // pack at offset 6 bytes >> b[i+4] = (short)(a[i+4] & mask); >> b[i+5] = (short)(a[i+5] & mask); >> b[i+6] = (short)(a[i+6] & mask); >> } >> } >> >> >> During `SuperWord::find_adjacent_refs` we used to check if the references are expected to be aligned. For that, we look at each "group" of references (eg all `LoadS`) and take the reference with the lowest offset. For that chosen reference, we check if it is alignable. If yes, we accept all references of that group, if no we reject all. >> >> This is problemati... > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 124 commits: > > - Merge branch 'JDK-8311586' of https://github.com/eme64/jdk into JDK-8311586 > - Apply suggestions from code review by Christian > > Co-authored-by: Christian Hagedorn > - fix copyright year 2024 > - Merge branch 'master' into JDK-8311586 > - more comments in SuperWord::adjust_pre_loop_limit_to_align_main_loop_vectors > - comments about modulo positive / negative values > - Apply suggestions from code review from Christian > > Co-authored-by: Christian Hagedorn > - more small fixes by Christian > - fix for yesterday's reviews by Christian > - improve case analysis empty / constrained / trivial > - ... and 114 more: https://git.openjdk.org/jdk/compare/06dd7353...d01a0cd9 Few comments. src/hotspot/share/opto/chaitin.cpp line 1795: > 1793: // See if already computed; if so return it > 1794: if( derived_base_map[derived->_idx] ) > 1795: return derived_base_map[derived->_idx]; Please fix code style for these lines since you are touching this code. Spacing and missing {}. src/hotspot/share/opto/chaitin.cpp line 1797: > 1795: return derived_base_map[derived->_idx]; > 1796: > 1797: if (derived->is_Mach() && derived->as_Mach()->ideal_Opcode() == Op_VerifyVectorAlignment) { Missing #ifdef ASSERT src/hotspot/share/opto/compile.cpp line 1059: > 1057: > 1058: if (AllowVectorizeOnDemand) { > 1059: if (has_method() && _directive->VectorizeOption) { This seems no related. Please explain it. src/hotspot/share/opto/compile.cpp line 3713: > 3711: // to ObjectAlignmentInBytes. Hence, even if multiple arrays are accessed in > 3712: // a loop we can expect at least the following alignment: > 3713: jlong guaranteed_alignment = MIN2(vector_width, (jlong)ObjectAlignmentInBytes); This is more relaxed check than the actual alignment required. As I understand it is because it checks only base address of array and not actually memory address to which vector instruction is accessed (which is (base,index,offset)). It is useful but does not guarantee correct alignment of vector access instructions. Consider using `lea` instruction on x86 to load memory address into register and check it. src/hotspot/share/opto/machnode.cpp line 360: > 358: } > 359: > 360: if (base != nullptr && base->is_Mach() && base->as_Mach()->ideal_Opcode() == Op_VerifyVectorAlignment) { Missing #ifdef ASSERT src/hotspot/share/opto/superword.cpp line 674: > 672: "packset empty or we find the alignment reference"); > 673: > 674: if (TraceSuperWord) { Missing #ifndef PRODUCT src/hotspot/share/opto/superword.cpp line 1605: > 1603: compress_packset(); > 1604: > 1605: if (TraceSuperWord) { Missing #ifndef PRODUCT ------------- PR Review: https://git.openjdk.org/jdk/pull/14785#pullrequestreview-1802885677 PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1440813017 PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1440791887 PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1440792545 PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1440864022 PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1440796064 PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1440826791 PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1440828355 From sgibbons at openjdk.org Wed Jan 3 19:51:02 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 3 Jan 2024 19:51:02 GMT Subject: RFR: JDK-8321599 Data loss in AVX3 Base64 decoding [v7] In-Reply-To: References: Message-ID: > Fix for looking for padding characters within the encoded string. Was not adding start offset to length, so was looking at potentially freed or uninitialized memory. > > Tested teir1 and with testcase supplied with JBS issue. > > The problem will only occur when all of the following are true: > 1. The source offset of the string to be decoded is != 0. > 2. The characters at the beginning of the string (minus the offset) plus the string length mod 64 are either "=" or "==". > 3. The string is >= 32 characters. > 4. The string is not MIME encoded. > > If any of these conditions are not met, the decode works as expected. This was due to omitting the source offset of the string when checking for padding characters. Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Fixed copyrights ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17039/files - new: https://git.openjdk.org/jdk/pull/17039/files/ba60ac59..5f0e0d59 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17039&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17039&range=05-06 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17039.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17039/head:pull/17039 PR: https://git.openjdk.org/jdk/pull/17039 From kvn at openjdk.org Wed Jan 3 19:53:39 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 3 Jan 2024 19:53:39 GMT Subject: RFR: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs [v58] In-Reply-To: References: Message-ID: On Wed, 3 Jan 2024 19:41:57 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 124 commits: >> >> - Merge branch 'JDK-8311586' of https://github.com/eme64/jdk into JDK-8311586 >> - Apply suggestions from code review by Christian >> >> Co-authored-by: Christian Hagedorn >> - fix copyright year 2024 >> - Merge branch 'master' into JDK-8311586 >> - more comments in SuperWord::adjust_pre_loop_limit_to_align_main_loop_vectors >> - comments about modulo positive / negative values >> - Apply suggestions from code review from Christian >> >> Co-authored-by: Christian Hagedorn >> - more small fixes by Christian >> - fix for yesterday's reviews by Christian >> - improve case analysis empty / constrained / trivial >> - ... and 114 more: https://git.openjdk.org/jdk/compare/06dd7353...d01a0cd9 > > src/hotspot/share/opto/compile.cpp line 3713: > >> 3711: // to ObjectAlignmentInBytes. Hence, even if multiple arrays are accessed in >> 3712: // a loop we can expect at least the following alignment: >> 3713: jlong guaranteed_alignment = MIN2(vector_width, (jlong)ObjectAlignmentInBytes); > > This is more relaxed check than the actual alignment required. As I understand it is because it checks only base address of array and not actually memory address to which vector instruction is accessed (which is (base,index,offset)). > It is useful but does not guarantee correct alignment of vector access instructions. > > Consider using `lea` instruction on x86 to load memory address into register and check it. May be hack only `loadV` and `storeV` instructions in .ad file to use `lea` and do the check. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1440873324 From xliu at openjdk.org Wed Jan 3 20:04:17 2024 From: xliu at openjdk.org (Xin Liu) Date: Wed, 3 Jan 2024 20:04:17 GMT Subject: RFR: 8320128: Clean up Parse constructor for OSR [v3] In-Reply-To: <0mxhhLR9DDdpiBhbXWPxCmMzo_OI9DYiIX36xGhxC_c=.1eed99b9-2dc3-4403-b3c1-22b6ebed6079@github.com> References: <0mxhhLR9DDdpiBhbXWPxCmMzo_OI9DYiIX36xGhxC_c=.1eed99b9-2dc3-4403-b3c1-22b6ebed6079@github.com> Message-ID: > There's a special case for the constructor of Parse. If current compilation is OSR and it is handling the top-level method(depth() == 1), then > > 1. _tf = C->tf(); > 2. _entry_bci = C->entry_bci(); > 3. _flow = method()->get_osr_flow_analysis(_entry_bci); > > We don't need to assign those member data twice. We can also factor out _flow->failing() for the special case and normal cases. > > It's worth mentioning that we can't save ciTypeFlow computation because > get_osr_flow_analysis(_entry_bci) actually needs get_flow_analysis(method()). Xin Liu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Use atomic logline and resume #ifdef ASSERT. - Merge branch 'master' into JDK-8320128 - Update according to reviewer's feedback. - 8320128: Clean up Parse constructor for OSR ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16669/files - new: https://git.openjdk.org/jdk/pull/16669/files/1f7c956c..ec89638c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16669&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16669&range=01-02 Stats: 789833 lines in 4137 files changed: 177560 ins; 537711 del; 74562 mod Patch: https://git.openjdk.org/jdk/pull/16669.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16669/head:pull/16669 PR: https://git.openjdk.org/jdk/pull/16669 From xliu at openjdk.org Wed Jan 3 20:14:31 2024 From: xliu at openjdk.org (Xin Liu) Date: Wed, 3 Jan 2024 20:14:31 GMT Subject: RFR: 8320128: Clean up Parse constructor for OSR [v3] In-Reply-To: References: <0mxhhLR9DDdpiBhbXWPxCmMzo_OI9DYiIX36xGhxC_c=.1eed99b9-2dc3-4403-b3c1-22b6ebed6079@github.com> Message-ID: On Wed, 3 Jan 2024 17:03:56 GMT, Aleksey Shipilev wrote: >> Xin Liu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Use atomic logline and resume #ifdef ASSERT. >> - Merge branch 'master' into JDK-8320128 >> - Update according to reviewer's feedback. >> - 8320128: Clean up Parse constructor for OSR > > src/hotspot/share/opto/parse1.cpp line 529: > >> 527: } >> 528: >> 529: #ifdef ASSERT > > I think the goal for this `#ifdef` block is to eliminate even the `if (depth() == 1)` in product builds. Yes, most of the code is dead, but it is safer not to rely on it. Leave it as is. I tried to improve readability by reducing macros. okay. I bring it back. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16669#discussion_r1440895697 From kbarrett at openjdk.org Wed Jan 3 20:16:28 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 3 Jan 2024 20:16:28 GMT Subject: RFR: 8322879: Eliminate -Wparentheses warnings in x86-32 code In-Reply-To: References: Message-ID: On Wed, 3 Jan 2024 07:19:18 GMT, Vladimir Kozlov wrote: >> Please review this trivial change to eliminate a -Wparentheses warning. >> This involved simply adding parentheses to make the implicit operator >> precedence explicit. >> >> Testing: Locally (linux-x64) cross-compiled for linux-x86. Also ran GHA with >> -Wparentheses enabled along with this and other changes needed to make that >> work. > > Trivial. Thanks for reviews, @vnkozlov and @shipilev . ------------- PR Comment: https://git.openjdk.org/jdk/pull/17237#issuecomment-1875913222 From kbarrett at openjdk.org Wed Jan 3 20:16:29 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 3 Jan 2024 20:16:29 GMT Subject: Integrated: 8322879: Eliminate -Wparentheses warnings in x86-32 code In-Reply-To: References: Message-ID: On Wed, 3 Jan 2024 05:10:50 GMT, Kim Barrett wrote: > Please review this trivial change to eliminate a -Wparentheses warning. > This involved simply adding parentheses to make the implicit operator > precedence explicit. > > Testing: Locally (linux-x64) cross-compiled for linux-x86. Also ran GHA with > -Wparentheses enabled along with this and other changes needed to make that > work. This pull request has now been integrated. Changeset: 30a0c61d Author: Kim Barrett URL: https://git.openjdk.org/jdk/commit/30a0c61de080a0cc52ec163095fe0f02f324474e Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8322879: Eliminate -Wparentheses warnings in x86-32 code Reviewed-by: kvn, shade ------------- PR: https://git.openjdk.org/jdk/pull/17237 From kvn at openjdk.org Wed Jan 3 20:32:23 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 3 Jan 2024 20:32:23 GMT Subject: RFR: JDK-8321599 Data loss in AVX3 Base64 decoding [v7] In-Reply-To: References: Message-ID: On Wed, 3 Jan 2024 19:51:02 GMT, Scott Gibbons wrote: >> Fix for looking for padding characters within the encoded string. Was not adding start offset to length, so was looking at potentially freed or uninitialized memory. >> >> Tested teir1 and with testcase supplied with JBS issue. >> >> The problem will only occur when all of the following are true: >> 1. The source offset of the string to be decoded is != 0. >> 2. The characters at the beginning of the string (minus the offset) plus the string length mod 64 are either "=" or "==". >> 3. The string is >= 32 characters. >> 4. The string is not MIME encoded. >> >> If any of these conditions are not met, the decode works as expected. This was due to omitting the source offset of the string when checking for padding characters. > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Fixed copyrights Good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17039#issuecomment-1875932205 From never at openjdk.org Wed Jan 3 20:42:21 2024 From: never at openjdk.org (Tom Rodriguez) Date: Wed, 3 Jan 2024 20:42:21 GMT Subject: RFR: 8322636: [JVMCI] HotSpotSpeculationLog can be inconsistent across a single compile [v2] In-Reply-To: References: Message-ID: On Tue, 2 Jan 2024 13:37:19 GMT, David Leopoldseder wrote: >> This PR fixes a subtle inconsistency in `HotSpotSpeculationLog` . >> >> Normal uses of `HotSpotSpeculationLog` work by using a `SpeculationReason` and asking the speculation log via `maySpeculate` if the speculation can be performed, i.e., if it failed before for the given method. An example for this can be seen in Graal https://github.com/oracle/graal/blob/master/compiler/src/jdk.graal.compiler/src/jdk/graal/compiler/nodes/loop/CountedLoopInfo.java#L591C15-L591C15 >> The implicit assumption is that the speculation log, `HotSpotSpeculationLog` in particular collects failed speculations at the beginning of a compile and then stays consistent during the compile. Why is that? - Because if there are new failed speculations added to the failed speculations during the compile - the compiler would speculate again on those in an inconsistent way. E.g. at the beginning of a compile a certain speculation has not failed yet and the compiler thinks it can do optimization xyz using a speculation - later during the compilation process it consults the speculation log but gets a different answer. All those inconsistent speculations that already failed will anyway later fail code installation in jvmci (they will throw a bailout during `HotSpotCodeCacheProvider#installCode` https://github.com/openjdk/jdk/blob/master/src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotSpeculationLog.java#L192 ). Thus, we should at least return a consistent result durin g a compile. >> The problem for consistency here, that also makes troubles on the graal side, is that `maySpeculate` itself can collect failed speculations if there have not been any previously, i.e., `failedSpeculations == null`. >> In order to make the speculation log consistent across an entire JVMCI compile this PR removes the collection of failed speculations in `maySpeculate`. > > David Leopoldseder has updated the pull request incrementally with one additional commit since the last revision: > > 8322636: [JVMCI] HotSpotSpeculationLog add javadoc to maySpeculate So I looked more closely the HotSpot and substrate implementations and I'm not sure we can currently align the implementation and the javadoc. In the HotSpot world, HotSpotSpeculationLog is a compiler local object that reads data from the real speculation data that's kept in the MDO. This means that it has full control over when collectFailedSpeculations is called. SubstrateSpeculationLog is the actual log so if two threads are operating on the same log then one of them could see the effects of a call to collectFailedSpeculations by the other thread. Maybe in practice 2 threads never do this because it would mean they are compiling the same root method but it doesn't seem guaranteed. installCode on substrate also doesn't perform the speculation log check that HotSpot does. So maybe we punt on javadoc updates for now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17183#issuecomment-1875942561 From sgibbons at openjdk.org Wed Jan 3 21:18:15 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 3 Jan 2024 21:18:15 GMT Subject: RFR: JDK-8321599 Data loss in AVX3 Base64 decoding [v8] In-Reply-To: References: Message-ID: > Fix for looking for padding characters within the encoded string. Was not adding start offset to length, so was looking at potentially freed or uninitialized memory. > > Tested teir1 and with testcase supplied with JBS issue. > > The problem will only occur when all of the following are true: > 1. The source offset of the string to be decoded is != 0. > 2. The characters at the beginning of the string (minus the offset) plus the string length mod 64 are either "=" or "==". > 3. The string is >= 32 characters. > 4. The string is not MIME encoded. > > If any of these conditions are not met, the decode works as expected. This was due to omitting the source offset of the string when checking for padding characters. Scott Gibbons has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: - Merge branch 'openjdk:master' into Base64-fix - Fixed copyrights - Merge branch 'openjdk:master' into Base64-fix - Updated copyright year - Updated copyright year - Revert code size change - wa for an experiment only. - Added some comments to the test - Merge branch 'openjdk:master' into Base64-fix - Merge branch 'Base64-fix' of https://github.com/asgibbons/jdk into Base64-fix - Merge branch 'openjdk:master' into Base64-fix - ... and 2 more: https://git.openjdk.org/jdk/compare/919ef219...dbccc16e ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17039/files - new: https://git.openjdk.org/jdk/pull/17039/files/5f0e0d59..dbccc16e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17039&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17039&range=06-07 Stats: 2960 lines in 243 files changed: 1658 ins; 531 del; 771 mod Patch: https://git.openjdk.org/jdk/pull/17039.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17039/head:pull/17039 PR: https://git.openjdk.org/jdk/pull/17039 From duke at openjdk.org Wed Jan 3 21:18:48 2024 From: duke at openjdk.org (Eric Murphy) Date: Wed, 3 Jan 2024 21:18:48 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v42] In-Reply-To: References: <3wco3meaBNwjfDWtVvkkoRfgG7-Wu1XZJTfJFduX5LE=.adbcd599-bcab-45a8-896f-cd2c65510352@github.com> <_MGkyOjyeyCIOE_HpYGCpzN3zN6bJEtaMGo_3T66e7M=.446e6122-c301-4dd9-9704-b72606275f4c@github.com> Message-ID: On Sun, 15 Oct 2023 07:40:06 GMT, himichael wrote: > sing a physical machine, I am using a virtual machine, this virtual machine supports the AVX512 instruction set. > How do I open libsimdsort ? @himichael Did you ever resolve your issue? I am using JDK22 from SDKMan and have the same errors: java -Xlog:library [0.013s][info][library] Loaded library libjsvml.so, handle 0x00007fc9a40229b0 [0.024s][info][library] Failed to find JNI_OnLoad_nio in library with handle 0x00007fca6b2dc220 [0.024s][info][library] Loaded library /home/eric/.sdkman/candidates/java/22.ea.29-open/lib/libnio.so, handle 0x00007fca6419b9b0 [0.024s][info][library] Found JNI_OnLoad in library with handle 0x00007fca6419b9b0 [0.024s][info][library] Found Java_sun_nio_fs_UnixNativeDispatcher_init in library with handle 0x00007fca6419b9b0 [0.024s][info][library] Found Java_sun_nio_fs_UnixNativeDispatcher_getcwd in library with handle 0x00007fca6419b9b0 [0.025s][info][library] Failed to find JNI_OnLoad_jimage in library with handle 0x00007fca6b2dc220 [0.025s][info][library] Loaded library /home/eric/.sdkman/candidates/java/22.ea.29-open/lib/libjimage.so, handle 0x00007fca64006380 [0.025s][info][library] Failed to find JNI_OnLoad in library with handle 0x00007fca64006380 [0.025s][info][library] Failed to find Java_jdk_internal_jimage_NativeImageBuffer_getNativeMap in library with handle 0x00007fca6419b9b0 [0.025s][info][library] Found Java_jdk_internal_jimage_NativeImageBuffer_getNativeMap in library with handle 0x00007fca64006380 ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1875978761 From duke at openjdk.org Wed Jan 3 22:21:07 2024 From: duke at openjdk.org (Joshua Cao) Date: Wed, 3 Jan 2024 22:21:07 GMT Subject: RFR: 8322976: Remove reference to transform_no_reclaim Message-ID: Passes hotspot:tier1 locally ------------- Commit messages: - 8322976: Remove reference to transform_no_reclaim Changes: https://git.openjdk.org/jdk/pull/17255/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17255&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8322976 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17255.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17255/head:pull/17255 PR: https://git.openjdk.org/jdk/pull/17255 From duke at openjdk.org Wed Jan 3 22:21:07 2024 From: duke at openjdk.org (Joshua Cao) Date: Wed, 3 Jan 2024 22:21:07 GMT Subject: RFR: 8322976: Remove reference to transform_no_reclaim In-Reply-To: References: Message-ID: On Wed, 3 Jan 2024 22:12:03 GMT, Joshua Cao wrote: > Passes hotspot:tier1 locally Its worth considering a hard cap here. For example, calling `apply_ideal` at most eight times might be sufficient for almost all cases. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17255#issuecomment-1876041941 From sgibbons at openjdk.org Thu Jan 4 01:39:39 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 4 Jan 2024 01:39:39 GMT Subject: Integrated: JDK-8321599 Data loss in AVX3 Base64 decoding In-Reply-To: References: Message-ID: On Fri, 8 Dec 2023 20:56:52 GMT, Scott Gibbons wrote: > Fix for looking for padding characters within the encoded string. Was not adding start offset to length, so was looking at potentially freed or uninitialized memory. > > Tested teir1 and with testcase supplied with JBS issue. > > The problem will only occur when all of the following are true: > 1. The source offset of the string to be decoded is != 0. > 2. The characters at the beginning of the string (minus the offset) plus the string length mod 64 are either "=" or "==". > 3. The string is >= 32 characters. > 4. The string is not MIME encoded. > > If any of these conditions are not met, the decode works as expected. This was due to omitting the source offset of the string when checking for padding characters. This pull request has now been integrated. Changeset: 13c11487 Author: Scott Gibbons Committer: Sandhya Viswanathan URL: https://git.openjdk.org/jdk/commit/13c11487f7126a370d9ce8e62f661ea83eedefe6 Stats: 124 lines in 2 files changed: 121 ins; 0 del; 3 mod 8321599: Data loss in AVX3 Base64 decoding Reviewed-by: sviswanathan, kvn ------------- PR: https://git.openjdk.org/jdk/pull/17039 From sviswanathan at openjdk.org Thu Jan 4 01:45:28 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 4 Jan 2024 01:45:28 GMT Subject: RFR: JDK-8321599 Data loss in AVX3 Base64 decoding [v7] In-Reply-To: References: Message-ID: On Wed, 3 Jan 2024 20:29:56 GMT, Vladimir Kozlov wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixed copyrights > > Good. Thanks a lot @vnkozlov. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17039#issuecomment-1876195513 From jbhateja at openjdk.org Thu Jan 4 05:33:35 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 4 Jan 2024 05:33:35 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. Message-ID: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Hi, Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only targets. Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 instruction set. These are very frequently used operation in columnar database filter operation. Implementation uses a lookup table to record permute indices. Table index is computed using mask argument of compress/expand operation. Following are the performance number of JMH micro included with the patch. System : Intel(R) Xeon(R) Platinum 8480+ (Sapphire Rapids) Baseline: Benchmark (size) Mode Cnt Score Error Units ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 142.767 ops/ms ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 71.436 ops/ms ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 35.992 ops/ms ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 182.151 ops/ms ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 91.096 ops/ms ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 44.757 ops/ms ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 184.099 ops/ms ColumnFilterBenchmark.filterIntColumn 2047 thrpt 2 91.981 ops/ms ColumnFilterBenchmark.filterIntColumn 4096 thrpt 2 45.170 ops/ms ColumnFilterBenchmark.filterLongColumn 1024 thrpt 2 148.017 ops/ms ColumnFilterBenchmark.filterLongColumn 2047 thrpt 2 73.516 ops/ms ColumnFilterBenchmark.filterLongColumn 4096 thrpt 2 36.844 ops/ms Withopt: Benchmark (size) Mode Cnt Score Error Units ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 2051.707 ops/ms ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 914.072 ops/ms ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 489.898 ops/ms ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 5324.195 ops/ms ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 2587.229 ops/ms ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 1278.665 ops/ms ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 4149.384 ops/ms ColumnFilterBenchmark.filterIntColumn 2047 thrpt 2 1791.170 ops/ms ColumnFilterBenchmark.filterIntColumn 4096 thrpt 2 974.888 ops/ms ColumnFilterBenchmark.filterLongColumn 1024 thrpt 2 1128.281 ops/ms ColumnFilterBenchmark.filterLongColumn 2047 thrpt 2 686.334 ops/ms ColumnFilterBenchmark.filterLongColumn 4096 thrpt 2 337.170 ops/ms Kindly review and share your feedback. Best Regards, Jatin ------------- Commit messages: - 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. Changes: https://git.openjdk.org/jdk/pull/17261/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17261&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8322768 Stats: 336 lines in 10 files changed: 323 ins; 8 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/17261.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17261/head:pull/17261 PR: https://git.openjdk.org/jdk/pull/17261 From jbhateja at openjdk.org Thu Jan 4 05:39:01 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 4 Jan 2024 05:39:01 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2] In-Reply-To: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Message-ID: > Hi, > > Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only targets. > Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 instruction set. > These are very frequently used operation in columnar database filter operation. > > Implementation uses a lookup table to record permute indices. Table index is computed using > mask argument of compress/expand operation. > > Following are the performance number of JMH micro included with the patch. > > > System : Intel(R) Xeon(R) Platinum 8480+ (Sapphire Rapids) > > Baseline: > Benchmark (size) Mode Cnt Score Error Units > ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 142.767 ops/ms > ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 71.436 ops/ms > ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 35.992 ops/ms > ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 182.151 ops/ms > ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 91.096 ops/ms > ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 44.757 ops/ms > ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 184.099 ops/ms > ColumnFilterBenchmark.filterIntColumn 2047 thrpt 2 91.981 ops/ms > ColumnFilterBenchmark.filterIntColumn 4096 thrpt 2 45.170 ops/ms > ColumnFilterBenchmark.filterLongColumn 1024 thrpt 2 148.017 ops/ms > ColumnFilterBenchmark.filterLongColumn 2047 thrpt 2 73.516 ops/ms > ColumnFilterBenchmark.filterLongColumn 4096 thrpt 2 36.844 ops/ms > > Withopt: > Benchmark (size) Mode Cnt Score Error Units > ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 2051.707 ops/ms > ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 914.072 ops/ms > ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 489.898 ops/ms > ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 5324.195 ops/ms > ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 2587.229 ops/ms > ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 1278.665 ops/ms > ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 4149.384 ops/ms > ColumnFilterBenchmark.filterIntColumn 2047 thrpt 2 1791.170 ops/ms > ColumnFilterBenchmark.filterIntColumn ... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Updating copyright year of modified files. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17261/files - new: https://git.openjdk.org/jdk/pull/17261/files/3f2b6105..6bd9b0ad Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17261&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17261&range=00-01 Stats: 6 lines in 6 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/17261.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17261/head:pull/17261 PR: https://git.openjdk.org/jdk/pull/17261 From thartmann at openjdk.org Thu Jan 4 06:19:39 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 4 Jan 2024 06:19:39 GMT Subject: [jdk22] RFR: 8321599: Data loss in AVX3 Base64 decoding Message-ID: Hi all, This pull request contains a backport of commit [13c11487](https://github.com/openjdk/jdk/commit/13c11487f7126a370d9ce8e62f661ea83eedefe6) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. The commit being backported was authored by Scott Gibbons on 4 Jan 2024 and was reviewed by Sandhya Viswanathan and Vladimir Kozlov. Thanks! ------------- Commit messages: - Backport 13c11487f7126a370d9ce8e62f661ea83eedefe6 Changes: https://git.openjdk.org/jdk22/pull/28/files Webrev: https://webrevs.openjdk.org/?repo=jdk22&pr=28&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8321599 Stats: 124 lines in 2 files changed: 121 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk22/pull/28.diff Fetch: git fetch https://git.openjdk.org/jdk22.git pull/28/head:pull/28 PR: https://git.openjdk.org/jdk22/pull/28 From epeter at openjdk.org Thu Jan 4 07:00:51 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 4 Jan 2024 07:00:51 GMT Subject: RFR: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs [v58] In-Reply-To: References: Message-ID: On Wed, 3 Jan 2024 19:50:49 GMT, Vladimir Kozlov wrote: >> src/hotspot/share/opto/compile.cpp line 3713: >> >>> 3711: // to ObjectAlignmentInBytes. Hence, even if multiple arrays are accessed in >>> 3712: // a loop we can expect at least the following alignment: >>> 3713: jlong guaranteed_alignment = MIN2(vector_width, (jlong)ObjectAlignmentInBytes); >> >> This is more relaxed check than the actual alignment required. As I understand it is because it checks only base address of array and not actually memory address to which vector instruction is accessed (which is (base,index,offset)). >> It is useful but does not guarantee correct alignment of vector access instructions. >> >> Consider using `lea` instruction on x86 to load memory address into register and check it. > > May be hack only `loadV` and `storeV` instructions in .ad file to use `lea` and do the check. I don't understand this comment. The `LoadVector` and `StoreVector` have both a `MemNode::Address` input, which I think it the memory address. The address itself usually consists of `AddP` nodes, which do the (base, index, offset) computation. These nodes later can be folded into the load/store itself, or be computed with a `lea`. I simply take the address value, check it for alignment and pass it on to the load/store. Take this example: public class Test { static int RANGE = 1024*64; public static void main(String[] strArr) { int a[] = new int[RANGE]; test0(a); } static void test0(int[] a) { for (int i = 0; i < RANGE; i++) { a[i]++; } } } `./java -XX:CompileCommand=compileonly,Test::test* -XX:+TraceSuperWord -Xcomp -XX:+PrintIdeal -XX:+AlignVector -XX:+VerifyAlignVector -XX:CompileCommand=print,Test::test* Test.java` This looks like the main loop: ;; B22: # out( B22 B23 ) <- in( B21 B22 ) Loop( B22-B22 inner post of N743) Freq: 4.49988 0x00007f83c8bb2f68: lea 0x10(%rbp,%rbx,4),%r10 ;*iaload {reexecute=0 rethrow=0 return_oop=0} ; - Test::test0 at 12 (line 11) 0x00007f83c8bb2f6d: mov %r10,%r8 0x00007f83c8bb2f70: test $0x7,%r8b 0x00007f83c8bb2f74: je 0x00007f83c8bb2f8a 0x00007f83c8bb2f76: movabs $0x7f83d77e2fc8,%rdi ; {external_word} 0x00007f83c8bb2f80: and $0xfffffffffffffff0,%rsp 0x00007f83c8bb2f84: callq 0x00007f83d71a0162 ; {runtime_call MacroAssembler::debug64(char*, long, long*)} 0x00007f83c8bb2f89: hlt 0x00007f83c8bb2f8a: test $0x7,%r10b 0x00007f83c8bb2f8e: je 0x00007f83c8bb2fa4 0x00007f83c8bb2f90: movabs $0x7f83d77e2fc8,%rdi ; {external_word} 0x00007f83c8bb2f9a: and $0xfffffffffffffff0,%rsp 0x00007f83c8bb2f9e: callq 0x00007f83d71a0162 ; {runtime_call MacroAssembler::debug64(char*, long, long*)} 0x00007f83c8bb2fa3: hlt 0x00007f83c8bb2fa4: vpaddd (%r10),%zmm5,%zmm0 0x00007f83c8bb2faa: vmovdqu32 %zmm0,(%r8) ;*iastore {reexecute=0 rethrow=0 return_oop=0} ; - Test::test0 at 15 (line 11) 0x00007f83c8bb2fb0: add $0x10,%ebx ;*iinc {reexecute=0 rethrow=0 return_oop=0} ; - Test::test0 at 16 (line 10) 0x00007f83c8bb2fb3: cmp %r11d,%ebx 0x00007f83c8bb2fb6: jl 0x00007f83c8bb2f68 ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0} ; - Test::test0 at 6 (line 10) What I see: `lea` computes address, stores to register `r10`. Move value to `r8`, do alignment check `test $0x7,%r8b`, which checks for 8 byte alignment. We do the same check again with `r10b`, since we use the same address for load and store. And then we directly load/store with those register values: vpaddd (%r10),%zmm5,%zmm0 vmovdqu32 %zmm0,(%r8) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1441398603 From epeter at openjdk.org Thu Jan 4 07:00:48 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 4 Jan 2024 07:00:48 GMT Subject: RFR: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs [v59] In-Reply-To: References: Message-ID: > I want to push this in JDK23. > After this fix here, I'm doing [this refactoring](https://github.com/openjdk/jdk/pull/16620). > > To calm your nerves: most of the changes are in auto-generated tests, and tests in general. > > **Context** > > `-XX:+AlignVector` ensures that SuperWord only creates LoadVector and StoreVector that can be memory aligned. This is achieved by iterating in the pre-loop until we reach the alignment boundary, then we can start the main loop properly aligned. However, this is not possible in all cases, sometimes some memory accesses cannot be guaranteed to be aligned, and we need to reject vectorization (at least partially, for some of the packs). > > Alignment is split into two tasks: > - Alignment Correctness Checks: only relevant if `-XX:+AlignVector`. Need to reject vectorization if alignment is not possible. We must check if the address of the vector load/store is aligned with (divisible by) `ObjectAlignmentInBytes`. > - Alignment by adjusting pre-loop limit: alignment is desirable even if `-XX:-AlignVector`. We would like to align the vectors with their vector width. > > **Problem** > > I have recently found a bug with our AlignVector [JDK-8310190](https://bugs.openjdk.org/browse/JDK-8310190). > In that bug, we perform a misaligned memory vector access, which results in a `SIGBUS` on an ARM32 machine. > Thanks @fg1417 for confirming this! > Hence, we need to fix the alignment correctness checks. > > While working on this task, I also found some bugs in the "alignment by adjusting pre-loop limit": there were cases where it did not align the vectors correctly. > > **Problem Details** > > Reproducer: > > > static void test(short[] a, short[] b, short mask) { > for (int i = 0; i < RANGE; i+=8) { > // Problematic for AlignVector > b[i+0] = (short)(a[i+0] & mask); // best_memref, align 0 > > b[i+3] = (short)(a[i+3] & mask); // pack at offset 6 bytes > b[i+4] = (short)(a[i+4] & mask); > b[i+5] = (short)(a[i+5] & mask); > b[i+6] = (short)(a[i+6] & mask); > } > } > > > During `SuperWord::find_adjacent_refs` we used to check if the references are expected to be aligned. For that, we look at each "group" of references (eg all `LoadS`) and take the reference with the lowest offset. For that chosen reference, we check if it is alignable. If yes, we accept all references of that group, if no we reject all. > > This is problematic as shown in this example. We have references at index offset `0, 3, 4, 5, 6`, and by... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: some minor changes for Vladimir ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14785/files - new: https://git.openjdk.org/jdk/pull/14785/files/d01a0cd9..aef48ab4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14785&range=58 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14785&range=57-58 Stats: 11 lines in 3 files changed: 9 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/14785.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14785/head:pull/14785 PR: https://git.openjdk.org/jdk/pull/14785 From chagedorn at openjdk.org Thu Jan 4 07:24:25 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 4 Jan 2024 07:24:25 GMT Subject: [jdk22] RFR: 8321599: Data loss in AVX3 Base64 decoding In-Reply-To: References: Message-ID: On Thu, 4 Jan 2024 06:13:11 GMT, Tobias Hartmann wrote: > Hi all, > > This pull request contains a backport of commit [13c11487](https://github.com/openjdk/jdk/commit/13c11487f7126a370d9ce8e62f661ea83eedefe6) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Scott Gibbons on 4 Jan 2024 and was reviewed by Sandhya Viswanathan and Vladimir Kozlov. > > Thanks! Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk22/pull/28#pullrequestreview-1803615430 From thartmann at openjdk.org Thu Jan 4 07:52:21 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 4 Jan 2024 07:52:21 GMT Subject: [jdk22] RFR: 8321599: Data loss in AVX3 Base64 decoding In-Reply-To: References: Message-ID: <32lBHdvCnLiTa1NAYA40iq4Cq8YrZSBhLkkHr8qOgvY=.16f829c4-bc5e-4f76-adc3-2f54441c7a01@github.com> On Thu, 4 Jan 2024 06:13:11 GMT, Tobias Hartmann wrote: > Hi all, > > This pull request contains a backport of commit [13c11487](https://github.com/openjdk/jdk/commit/13c11487f7126a370d9ce8e62f661ea83eedefe6) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Scott Gibbons on 4 Jan 2024 and was reviewed by Sandhya Viswanathan and Vladimir Kozlov. > > Thanks! Thanks, Christian! ------------- PR Comment: https://git.openjdk.org/jdk22/pull/28#issuecomment-1876649448 From roland at openjdk.org Thu Jan 4 08:12:22 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 4 Jan 2024 08:12:22 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v4] In-Reply-To: References: Message-ID: > This change implements C2 optimizations for calls to > ScopedValue.get(). Indeed, in: > > > v1 = scopedValue.get(); > ... > v2 = scopedValue.get(); > > > `v2` can be replaced by `v1` and the second call to `get()` can be > optimized out. That's true whatever is between the 2 calls unless a > new mapping for `scopedValue` is created in between (when that happens > no optimizations is performed for the method being compiled). Hoisting > a `get()` call out of loop for a loop invariant `scopedValue` should > also be legal in most cases. > > `ScopedValue.get()` is implemented in java code as a 2 step process. A > cache is attached to the current thread object. If the `ScopedValue` > object is in the cache then the result from `get()` is read from > there. Otherwise a slow call is performed that also inserts the > mapping in the cache. The cache itself is lazily allocated. One > `ScopedValue` can be hashed to 2 different indexes in the cache. On a > cache probe, both indexes are checked. As a consequence, the process > of probing the cache is a multi step process (check if the cache is > present, check first index, check second index if first index > failed). If the cache is populated early on, then when the method that > calls `ScopedValue.get()` is compiled, profile reports the slow path > as never taken and only the read from the cache is compiled. > > To perform the optimizations, I added 3 new node types to C2: > > - the pair > ScopedValueGetHitsInCacheNode/ScopedValueGetLoadFromCacheNode for > the cache probe > > - a cfg node ScopedValueGetResultNode to help locate the result of the > `get()` call in the IR graph. > > In pseudo code, once the nodes are inserted, the code of a `get()` is: > > > hits_in_the_cache = ScopedValueGetHitsInCache(scopedValue) > if (hits_in_the_cache) { > res = ScopedValueGetLoadFromCache(hits_in_the_cache); > } else { > res = ..; //slow call possibly inlined. Subgraph can be arbitray complex > } > res = ScopedValueGetResult(res) > > > In the snippet: > > > v1 = scopedValue.get(); > ... > v2 = scopedValue.get(); > > > Replacing `v2` by `v1` is then done by starting from the > `ScopedValueGetResult` node for the second `get()` and looking for a > dominating `ScopedValueGetResult` for the same `ScopedValue` > object. When one is found, it is used as a replacement. Eliminating > the second `get()` call is achieved by making > `ScopedValueGetHitsInCache` always successful if there's a dominating > `ScopedValueGetResult` and replacing its companion > `ScopedValueGetLoadFromCache` by the dominating > `ScopedValueGetResult`. > > Hoisting a `g... Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: merge fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16966/files - new: https://git.openjdk.org/jdk/pull/16966/files/c7d1fe84..28fa7f74 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16966&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16966&range=02-03 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16966.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16966/head:pull/16966 PR: https://git.openjdk.org/jdk/pull/16966 From roland at openjdk.org Thu Jan 4 08:08:24 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 4 Jan 2024 08:08:24 GMT Subject: RFR: 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 [v8] In-Reply-To: References: Message-ID: On Wed, 3 Jan 2024 14:46:10 GMT, Aleksey Shipilev wrote: > There are a couple of GHA failures, and those are probably resolved in current master. It would be helpful if you can pull from current master and get a clean run. Done. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16886#issuecomment-1876674519 From epeter at openjdk.org Thu Jan 4 08:18:38 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 4 Jan 2024 08:18:38 GMT Subject: RFR: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs [v58] In-Reply-To: References: Message-ID: <9zUqzeJrnPyjjEC0_F9Z5OWzHLqnTN_5c1bzOiK-LqA=.050c0992-7f7f-48f3-b2da-ec81aafa41a6@github.com> On Wed, 3 Jan 2024 18:38:16 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 124 commits: >> >> - Merge branch 'JDK-8311586' of https://github.com/eme64/jdk into JDK-8311586 >> - Apply suggestions from code review by Christian >> >> Co-authored-by: Christian Hagedorn >> - fix copyright year 2024 >> - Merge branch 'master' into JDK-8311586 >> - more comments in SuperWord::adjust_pre_loop_limit_to_align_main_loop_vectors >> - comments about modulo positive / negative values >> - Apply suggestions from code review from Christian >> >> Co-authored-by: Christian Hagedorn >> - more small fixes by Christian >> - fix for yesterday's reviews by Christian >> - improve case analysis empty / constrained / trivial >> - ... and 114 more: https://git.openjdk.org/jdk/compare/06dd7353...d01a0cd9 > > src/hotspot/share/opto/compile.cpp line 1059: > >> 1057: >> 1058: if (AllowVectorizeOnDemand) { >> 1059: if (has_method() && _directive->VectorizeOption) { > > This seems no related. Please explain it. This is my justification in the PR description: > Other Details > > I made VectorizeDebugOption a debug print only flag now. Before this fix, it also had the same effect as VectorizeOption (which ensures that only nodes from the same original pre-unrolling node are packed, preventing hand-unrolled code to be vectorized but enabling some edge cases to be vectorized that would not otherwise vectorize). > > I added is_trace_align_vector with bit 128, since 64 was recently used for is_trace_loop_reverse, removed with [JDK-8309204](https://bugs.openjdk.org/browse/JDK-8309204). > > I plan to refactor VectorizeDebugOption soon, as it now has a few subflags / bits that are not used. I may also refactor how TraceSuperWord works in general. Filed [JDK-8317572](https://bugs.openjdk.org/browse/JDK-8317572). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1441455977 From epeter at openjdk.org Thu Jan 4 08:28:36 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 4 Jan 2024 08:28:36 GMT Subject: RFR: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs [v58] In-Reply-To: References: Message-ID: On Thu, 4 Jan 2024 06:55:46 GMT, Emanuel Peter wrote: >> May be hack only `loadV` and `storeV` instructions in .ad file to use `lea` and do the check. > > I don't understand this comment. > The `LoadVector` and `StoreVector` have both a `MemNode::Address` input, which I think it the memory address. > The address itself usually consists of `AddP` nodes, which do the (base, index, offset) computation. These nodes later can be folded into the load/store itself, or be computed with a `lea`. > I simply take the address value, check it for alignment and pass it on to the load/store. > > Take this example: > > public class Test { > static int RANGE = 1024*64; > > public static void main(String[] strArr) { > int a[] = new int[RANGE]; > test0(a); > } > > static void test0(int[] a) { > for (int i = 0; i < RANGE; i++) { > a[i]++; > } > } > } > > > `./java -XX:CompileCommand=compileonly,Test::test* -XX:+TraceSuperWord -Xcomp -XX:+PrintIdeal -XX:+AlignVector -XX:+VerifyAlignVector -XX:CompileCommand=print,Test::test* Test.java` > > This looks like the main loop: > > ;; B22: # out( B22 B23 ) <- in( B21 B22 ) Loop( B22-B22 inner post of N743) Freq: 4.49988 > 0x00007f83c8bb2f68: lea 0x10(%rbp,%rbx,4),%r10 ;*iaload {reexecute=0 rethrow=0 return_oop=0} > ; - Test::test0 at 12 (line 11) > 0x00007f83c8bb2f6d: mov %r10,%r8 > 0x00007f83c8bb2f70: test $0x7,%r8b > 0x00007f83c8bb2f74: je 0x00007f83c8bb2f8a > 0x00007f83c8bb2f76: movabs $0x7f83d77e2fc8,%rdi ; {external_word} > 0x00007f83c8bb2f80: and $0xfffffffffffffff0,%rsp > 0x00007f83c8bb2f84: callq 0x00007f83d71a0162 ; {runtime_call MacroAssembler::debug64(char*, long, long*)} > 0x00007f83c8bb2f89: hlt > 0x00007f83c8bb2f8a: test $0x7,%r10b > 0x00007f83c8bb2f8e: je 0x00007f83c8bb2fa4 > 0x00007f83c8bb2f90: movabs $0x7f83d77e2fc8,%rdi ; {external_word} > 0x00007f83c8bb2f9a: and $0xfffffffffffffff0,%rsp > 0x00007f83c8bb2f9e: callq 0x00007f83d71a0162 ; {runtime_call MacroAssembler::debug64(char*, long, long*)} > 0x00007f83c8bb2fa3: hlt > 0x00007f83c8bb2fa4: vpaddd (%r10),%zmm5,%zmm0 > 0x00007f83c8bb2faa: vmovdqu32 %zmm0,(%r8) ;*iastore {reexecute=0 rethrow=0 return_oop=0} > ; - Test::test0 at 15 (line 11) > 0x00007f83c8bb2fb0: add $0x10,%ebx ;*iinc {reexecute=0 rethrow=0 return_oop=0} > ; - Test::test0 at 16 (line 10) > 0x00007f83c8bb2fb3: cmp %r11d... And without `-XX:-VerifyAlignVector` ;; B20: # out( B20 B21 ) <- in( B19 B20 ) Loop( B20-B20 inner post of N743) Freq: 4.49988 0x00007ff22cbb2924: vpaddd 0x10(%rbx,%r13,4),%zmm0,%zmm1 0x00007ff22cbb292f: vmovdqu32 %zmm1,0x10(%rbx,%r13,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0} ; - Test::test0 at 15 (line 11) 0x00007ff22cbb293a: add $0x10,%r13d ;*iinc {reexecute=0 rethrow=0 return_oop=0} ; - Test::test0 at 16 (line 10) 0x00007ff22cbb293e: cmp %r10d,%r13d 0x00007ff22cbb2941: jl 0x00007ff22cbb2924 ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0} ; - Test::test0 at 6 (line 10) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1441463566 From shade at openjdk.org Thu Jan 4 08:51:29 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 4 Jan 2024 08:51:29 GMT Subject: RFR: 8320128: Clean up Parse constructor for OSR [v3] In-Reply-To: References: <0mxhhLR9DDdpiBhbXWPxCmMzo_OI9DYiIX36xGhxC_c=.1eed99b9-2dc3-4403-b3c1-22b6ebed6079@github.com> Message-ID: <2T5HC8Dvq7RUNkNeYMgFWk5niXTiPnE2k4RDlE3BJZs=.1ec18356-631f-4028-af37-f2ad0d8ec05c@github.com> On Wed, 3 Jan 2024 20:04:17 GMT, Xin Liu wrote: >> There's a special case for the constructor of Parse. If current compilation is OSR and it is handling the top-level method(depth() == 1), then >> >> 1. _tf = C->tf(); >> 2. _entry_bci = C->entry_bci(); >> 3. _flow = method()->get_osr_flow_analysis(_entry_bci); >> >> We don't need to assign those member data twice. We can also factor out _flow->failing() for the special case and normal cases. >> >> It's worth mentioning that we can't save ciTypeFlow computation because >> get_osr_flow_analysis(_entry_bci) actually needs get_flow_analysis(method()). > > Xin Liu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Use atomic logline and resume #ifdef ASSERT. > - Merge branch 'master' into JDK-8320128 > - Update according to reviewer's feedback. > - 8320128: Clean up Parse constructor for OSR src/hotspot/share/opto/parse1.cpp line 511: > 509: if (PrintOpto && (Verbose || WizardMode)) { > 510: if (is_osr_parse()) { > 511: tty->print("OSR @%d type flow bailout: %s", _entry_bci, _flow->failure_reason()); Should be `print_cr`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16669#discussion_r1441483574 From shade at openjdk.org Thu Jan 4 08:47:22 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 4 Jan 2024 08:47:22 GMT Subject: RFR: 8322694: C1: Handle Constant and IfOp in NullCheckEliminator [v2] In-Reply-To: References: <1vvyuwLRjlWItKwCyighjCSM5SNbO4CSEE59hQtCU24=.b4783e52-328a-4ce3-8c92-7b736cea7546@github.com> Message-ID: On Wed, 3 Jan 2024 13:32:32 GMT, Denghui Dong wrote: >> src/hotspot/share/c1/c1_Optimizer.cpp line 888: >> >>> 886: mark_visitable(instr); >>> 887: if (instr->is_pinned() || instr->can_trap() || (instr->as_NullCheck() != nullptr) >>> 888: || (instr->as_Constant() != nullptr && instr->as_Constant()->type()->is_object())) { >> >> Is this just `instr->as_ObjectConstant() != nullptr`? > > Do you mean `insr->type()->as_ObjectConstant() != nullptr`? > But we should include other Constants (e.g. `ArrayConstant`, `InstanceConstant`), and those classes don't implement `as_ObjectConstant` Ah, OK then. Yes, I thought ObjectConstant includes ArrayConstant and InstanceConstant. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17191#discussion_r1441479817 From davleopo at openjdk.org Thu Jan 4 08:56:24 2024 From: davleopo at openjdk.org (David Leopoldseder) Date: Thu, 4 Jan 2024 08:56:24 GMT Subject: RFR: 8322636: [JVMCI] HotSpotSpeculationLog can be inconsistent across a single compile [v2] In-Reply-To: References: Message-ID: <-1NI3HcMGiPeVKGOxV7AYi9Zd_hVjO7OEhLOIebDCxc=.d51be9ba-3353-471a-82d5-fdbe6bf74271@github.com> On Tue, 2 Jan 2024 13:37:19 GMT, David Leopoldseder wrote: >> This PR fixes a subtle inconsistency in `HotSpotSpeculationLog` . >> >> Normal uses of `HotSpotSpeculationLog` work by using a `SpeculationReason` and asking the speculation log via `maySpeculate` if the speculation can be performed, i.e., if it failed before for the given method. An example for this can be seen in Graal https://github.com/oracle/graal/blob/master/compiler/src/jdk.graal.compiler/src/jdk/graal/compiler/nodes/loop/CountedLoopInfo.java#L591C15-L591C15 >> The implicit assumption is that the speculation log, `HotSpotSpeculationLog` in particular collects failed speculations at the beginning of a compile and then stays consistent during the compile. Why is that? - Because if there are new failed speculations added to the failed speculations during the compile - the compiler would speculate again on those in an inconsistent way. E.g. at the beginning of a compile a certain speculation has not failed yet and the compiler thinks it can do optimization xyz using a speculation - later during the compilation process it consults the speculation log but gets a different answer. All those inconsistent speculations that already failed will anyway later fail code installation in jvmci (they will throw a bailout during `HotSpotCodeCacheProvider#installCode` https://github.com/openjdk/jdk/blob/master/src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotSpeculationLog.java#L192 ). Thus, we should at least return a consistent result durin g a compile. >> The problem for consistency here, that also makes troubles on the graal side, is that `maySpeculate` itself can collect failed speculations if there have not been any previously, i.e., `failedSpeculations == null`. >> In order to make the speculation log consistent across an entire JVMCI compile this PR removes the collection of failed speculations in `maySpeculate`. > > David Leopoldseder has updated the pull request incrementally with one additional commit since the last revision: > > 8322636: [JVMCI] HotSpotSpeculationLog add javadoc to maySpeculate I did not consider the substrate runtime compilation use case - that may actually lead to the same error of inconsistency we have seen as here. Probably not relevant now but if it ever pops up we need to relax the invariant on the graal side then. Regarding doc changes - what is our final call now ? (a) drop all new doc again or (b) keep (whatever) form of the new doc I added? - Its hotspot specific so not wrong. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17183#issuecomment-1876726238 From mli at openjdk.org Thu Jan 4 09:17:28 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 4 Jan 2024 09:17:28 GMT Subject: RFR: 8322959: vectorapi: get wrong argument for `limit` in indexPartiallyInUpperRange intrinsic In-Reply-To: References: Message-ID: On Wed, 3 Jan 2024 17:10:07 GMT, Sandhya Viswanathan wrote: >> Hi, >> Can you review this simple fix for indexPartiallyInUpperRange intrinsic? >> Thanks. > > src/hotspot/share/opto/vectorIntrinsics.cpp line 3151: > >> 3149: >> 3150: Node* offset = argument(3); >> 3151: Node* limit = argument(4); > > The offset is of long type so will take 2 spots (3 and 4) of argument. So limit will be argument(5). The original code (limit = argument(5)) looks correct to me. @sviswa7 Oh, thanks for correct me, I did not realise this. @vnkozlov I did run the tests `test/jdk/jdk/incubator/vector` and `test/hotspot/jtreg/compiler/vectorapi/` after applying the patch. I thought it's because this intrinsic is not covered yet, but seems I'm wrong. I will close this pr and bug later. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17247#discussion_r1441509274 From mli at openjdk.org Thu Jan 4 09:17:29 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 4 Jan 2024 09:17:29 GMT Subject: Withdrawn: 8322959: vectorapi: get wrong argument for `limit` in indexPartiallyInUpperRange intrinsic In-Reply-To: References: Message-ID: On Wed, 3 Jan 2024 16:05:48 GMT, Hamlin Li wrote: > Hi, > Can you review this simple fix for indexPartiallyInUpperRange intrinsic? > Thanks. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/17247 From thartmann at openjdk.org Thu Jan 4 09:19:21 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 4 Jan 2024 09:19:21 GMT Subject: [jdk22] Integrated: 8321599: Data loss in AVX3 Base64 decoding In-Reply-To: References: Message-ID: On Thu, 4 Jan 2024 06:13:11 GMT, Tobias Hartmann wrote: > Hi all, > > This pull request contains a backport of commit [13c11487](https://github.com/openjdk/jdk/commit/13c11487f7126a370d9ce8e62f661ea83eedefe6) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Scott Gibbons on 4 Jan 2024 and was reviewed by Sandhya Viswanathan and Vladimir Kozlov. > > Thanks! This pull request has now been integrated. Changeset: b8c88a3e Author: Tobias Hartmann URL: https://git.openjdk.org/jdk22/commit/b8c88a3e9129bd2f976a8c7631d754fed0765324 Stats: 124 lines in 2 files changed: 121 ins; 0 del; 3 mod 8321599: Data loss in AVX3 Base64 decoding Reviewed-by: chagedorn Backport-of: 13c11487f7126a370d9ce8e62f661ea83eedefe6 ------------- PR: https://git.openjdk.org/jdk22/pull/28 From epeter at openjdk.org Thu Jan 4 10:42:37 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 4 Jan 2024 10:42:37 GMT Subject: RFR: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs [v58] In-Reply-To: <9zUqzeJrnPyjjEC0_F9Z5OWzHLqnTN_5c1bzOiK-LqA=.050c0992-7f7f-48f3-b2da-ec81aafa41a6@github.com> References: <9zUqzeJrnPyjjEC0_F9Z5OWzHLqnTN_5c1bzOiK-LqA=.050c0992-7f7f-48f3-b2da-ec81aafa41a6@github.com> Message-ID: On Thu, 4 Jan 2024 08:15:58 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/compile.cpp line 1059: >> >>> 1057: >>> 1058: if (AllowVectorizeOnDemand) { >>> 1059: if (has_method() && _directive->VectorizeOption) { >> >> This seems no related. Please explain it. > > This is my justification in the PR description: > >> Other Details >> >> I made VectorizeDebugOption a debug print only flag now. Before this fix, it also had the same effect as VectorizeOption (which ensures that only nodes from the same original pre-unrolling node are packed, preventing hand-unrolled code to be vectorized but enabling some edge cases to be vectorized that would not otherwise vectorize). >> >> I added is_trace_align_vector with bit 128, since 64 was recently used for is_trace_loop_reverse, removed with [JDK-8309204](https://bugs.openjdk.org/browse/JDK-8309204). >> >> I plan to refactor VectorizeDebugOption soon, as it now has a few subflags / bits that are not used. I may also refactor how TraceSuperWord works in general. Filed [JDK-8317572](https://bugs.openjdk.org/browse/JDK-8317572). If you really want, then I can not touch `VectorizeDebugOption` at all, i.e. not activate `is_trace_align_vector` with that flag, but instead simply use `TraceSuperWord` (that might be a little verbose though). I already have the CSR for [JDK-8315361](https://bugs.openjdk.org/browse/JDK-8315361), so that I can remove `VectorizeDebugOption`. This has 2 effects: 1. remove the product effect of `VectorizeDebugOption`, which is the same effect as enabling `VectorizeOption`. 2. introduce a more general auto-vectorization tracing flag that allows more fine-grained control for debug printing. My idea here was to simply add the alignment tracing to `VectorizeDebugOption`. But currently one cannot enable that tracing without having the side-effects that also `VectorizeOption` has. Hence, I already now remove that product-side effect. @vnkozlov what do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1441592772 From shade at openjdk.org Thu Jan 4 10:49:22 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 4 Jan 2024 10:49:22 GMT Subject: RFR: 8322976: Remove reference to transform_no_reclaim In-Reply-To: References: Message-ID: On Wed, 3 Jan 2024 22:12:03 GMT, Joshua Cao wrote: > Passes hotspot:tier1 locally Looks good and trivial. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17255#pullrequestreview-1803913477 From epeter at openjdk.org Thu Jan 4 12:38:22 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 4 Jan 2024 12:38:22 GMT Subject: RFR: 8322735: C2: minor improvements of bubble sort used in SuperWord::packset_sort In-Reply-To: References: Message-ID: On Mon, 25 Dec 2023 15:34:55 GMT, Denghui Dong wrote: > A minor improvement could be made for bubble sort in SuperWord::packset_sort to reduce the comparison count in bad cases. > > See https://en.wikipedia.org/wiki/Bubble_sort @D-D-H can you explain what this improves? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17190#issuecomment-1877028706 From thartmann at openjdk.org Thu Jan 4 12:44:36 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 4 Jan 2024 12:44:36 GMT Subject: RFR: 8310844: [AArch64] C1 compilation fails because monitor offset in OSR buffer is too large for immediate Message-ID: <6vNa0ZYwKL0HXjHjWTyIe6kB5hza1aaXlJbxM4qh8-w=.edc55dc9-08ff-4861-9038-d4b150bcf4f7@github.com> [JDK-8287349](https://bugs.openjdk.org/browse/JDK-8287349) changed the code in `LIR_Assembler::osr_entry()` to use a single `ldp` instruction instead of two `ldr` instructions to load the monitor lock and object from the OSR state. This is not correct because the `ldp` instruction only supports a [7-bit signed immediate value](https://developer.arm.com/documentation/den0024/a/The-A64-instruction-set/Memory-access-instructions/Accessing-multiple-memory-locations). If the offset is larger, for example due to a large number of locals as in `TestLargeMonitorOffset::test`, we hit the `Field too big for insn` guarantee. I suggest to revert [JDK-8287349](https://bugs.openjdk.org/browse/JDK-8287349). I also found two unrelated bugs when working on the reproducer: [JDK-8322992](https://bugs.openjdk.org/browse/JDK-8322992) (javac) and [JDK-8322996](https://bugs.openjdk.org/browse/JDK-8322996) (C2). Thanks, Tobias ------------- Commit messages: - 8310844: [AArch64] C1 compilation fails because monitor offset in OSR buffer is too large for immediate Changes: https://git.openjdk.org/jdk/pull/17266/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17266&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8310844 Stats: 150 lines in 2 files changed: 147 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/17266.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17266/head:pull/17266 PR: https://git.openjdk.org/jdk/pull/17266 From thartmann at openjdk.org Thu Jan 4 13:09:22 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 4 Jan 2024 13:09:22 GMT Subject: RFR: 8322781: C1: Debug build crash in GraphBuilder::vmap() when print stats [v3] In-Reply-To: References: Message-ID: On Wed, 3 Jan 2024 07:50:01 GMT, Denghui Dong wrote: >> Hi, >> >> Could I have a review of this fix patch that fixes a crash problem in the debug build when -XX:+PrintValueNumbering -XX:+Verbose -XX:-UseLocalValueNumbering >> >> Thanks > > Denghui Dong has updated the pull request incrementally with two additional commits since the last revision: > > - update > - update Thanks, looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17205#pullrequestreview-1804124535 From thartmann at openjdk.org Thu Jan 4 13:07:25 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 4 Jan 2024 13:07:25 GMT Subject: RFR: 8322976: Remove reference to transform_no_reclaim In-Reply-To: References: Message-ID: On Wed, 3 Jan 2024 22:12:03 GMT, Joshua Cao wrote: > Passes hotspot:tier1 locally Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17255#pullrequestreview-1804119363 From ddong at openjdk.org Thu Jan 4 13:14:34 2024 From: ddong at openjdk.org (Denghui Dong) Date: Thu, 4 Jan 2024 13:14:34 GMT Subject: RFR: 8322781: C1: Debug build crash in GraphBuilder::vmap() when print stats [v3] In-Reply-To: References: Message-ID: On Wed, 3 Jan 2024 07:50:01 GMT, Denghui Dong wrote: >> Hi, >> >> Could I have a review of this fix patch that fixes a crash problem in the debug build when -XX:+PrintValueNumbering -XX:+Verbose -XX:-UseLocalValueNumbering >> >> Thanks > > Denghui Dong has updated the pull request incrementally with two additional commits since the last revision: > > - update > - update Thanks for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17205#issuecomment-1877073238 From ddong at openjdk.org Thu Jan 4 13:14:35 2024 From: ddong at openjdk.org (Denghui Dong) Date: Thu, 4 Jan 2024 13:14:35 GMT Subject: Integrated: 8322781: C1: Debug build crash in GraphBuilder::vmap() when print stats In-Reply-To: References: Message-ID: On Fri, 29 Dec 2023 15:02:21 GMT, Denghui Dong wrote: > Hi, > > Could I have a review of this fix patch that fixes a crash problem in the debug build when -XX:+PrintValueNumbering -XX:+Verbose -XX:-UseLocalValueNumbering > > Thanks This pull request has now been integrated. Changeset: 27d5f5c2 Author: Denghui Dong URL: https://git.openjdk.org/jdk/commit/27d5f5c237910bc3d2df62367d2e0a83c1132885 Stats: 14 lines in 2 files changed: 13 ins; 0 del; 1 mod 8322781: C1: Debug build crash in GraphBuilder::vmap() when print stats Reviewed-by: kvn, thartmann, shade ------------- PR: https://git.openjdk.org/jdk/pull/17205 From epeter at openjdk.org Thu Jan 4 13:45:26 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 4 Jan 2024 13:45:26 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2] In-Reply-To: References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Message-ID: On Thu, 4 Jan 2024 05:39:01 GMT, Jatin Bhateja wrote: >> Hi, >> >> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only targets. >> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 instruction set. >> These are very frequently used APIs in columnar database filter operation. >> >> Implementation uses a lookup table to record permute indices. Table index is computed using >> mask argument of compress/expand operation. >> >> Following are the performance number of JMH micro included with the patch. >> >> >> System : Intel(R) Xeon(R) Platinum 8480+ (Sapphire Rapids) >> >> Baseline: >> Benchmark (size) Mode Cnt Score Error Units >> ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 142.767 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 71.436 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 35.992 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 182.151 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 91.096 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 44.757 ops/ms >> ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 184.099 ops/ms >> ColumnFilterBenchmark.filterIntColumn 2047 thrpt 2 91.981 ops/ms >> ColumnFilterBenchmark.filterIntColumn 4096 thrpt 2 45.170 ops/ms >> ColumnFilterBenchmark.filterLongColumn 1024 thrpt 2 148.017 ops/ms >> ColumnFilterBenchmark.filterLongColumn 2047 thrpt 2 73.516 ops/ms >> ColumnFilterBenchmark.filterLongColumn 4096 thrpt 2 36.844 ops/ms >> >> Withopt: >> Benchmark (size) Mode Cnt Score Error Units >> ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 2051.707 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 914.072 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 489.898 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 5324.195 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 2587.229 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 1278.665 ops/ms >> ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 4149.384 ops/ms >> ColumnFilterBenchmark.filterIntColumn 2047 thrpt ... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Updating copyright year of modified files. @jatin-bhateja this looks like a great improvement! I have a few questions and requests below. FYI, this feels very inspiring. I'm dreaming of a day where we could do this filtering in the auto-vectorizer directly. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 5303: > 5301: // Blend the results with zero vector using permute vector as mask, its > 5302: // non-participating lanes holds a -1 value. > 5303: vblendvps(dst, dst, xtmp, permv, vec_enc); would you mind adding a few more comments to explain what happens here? I would also really appreciate more expressive register/variable names. I think you are basically converting the `mask` to a permutation `permv`, by a lookup in the table. Then you permute the `src` and blend it with a -1 vector, so that the unused (high) lanes are -1. xtmp -> min_one rtmp -> table_index rscratch -> table_adr src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 5307: > 5305: assert(bt == T_LONG || bt == T_DOUBLE, ""); > 5306: vmovmskpd(rtmp, mask, vec_enc); > 5307: shlq(rtmp, 5); Might this need to be 6? If I understand right, then you want to have a 64bit stride, hence 2^6, right? If that is correct, then this did not show in your tests, and you need a regression test anyway. src/hotspot/cpu/x86/c2_MacroAssembler_x86.hpp line 488: > 486: KRegister ktmp1, int vec_enc); > 487: > 488: Remove useless empty line src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 957: > 955: __ align(CodeEntryAlignment); > 956: StubCodeMark mark(this, "StubRoutines", stub_name); > 957: address start = __ pc(); Could you please add some comments here why you are filling the data like this? Presumably, you are emitting 32 bits and 64 bits respectively, right? So the cells have different size, correct? test/micro/org/openjdk/bench/jdk/incubator/vector/ColumnFilterBenchmark.java line 76: > 74: longinCol = new long[size]; > 75: longoutCol = new long[size]; > 76: lpivot = size / 2; I'd be interested to see what happens if you move up or down the "density" of elements that you accept. Would the simple branch prediction be faster if the density is low enough, i.e. we almost take no element. Though maybe that is not compiler problem but a user-problem? test/micro/org/openjdk/bench/jdk/incubator/vector/ColumnFilterBenchmark.java line 94: > 92: IntVector vec = IntVector.fromArray(ispecies, intinCol, i); > 93: VectorMask pred = vec.compare(VectorOperators.GT, ipivot); > 94: vec.compress(pred).intoArray(intoutCol, j); Could there be equivalent `expand` tests? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17261#pullrequestreview-1804121213 PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1441749005 PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1441761312 PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1441724949 PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1441759984 PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1441753158 PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1441729256 From epeter at openjdk.org Thu Jan 4 13:45:27 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 4 Jan 2024 13:45:27 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2] In-Reply-To: References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Message-ID: On Thu, 4 Jan 2024 13:09:30 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Updating copyright year of modified files. > > test/micro/org/openjdk/bench/jdk/incubator/vector/ColumnFilterBenchmark.java line 94: > >> 92: IntVector vec = IntVector.fromArray(ispecies, intinCol, i); >> 93: VectorMask pred = vec.compare(VectorOperators.GT, ipivot); >> 94: vec.compress(pred).intoArray(intoutCol, j); > > Could there be equivalent `expand` tests? And what about some result verification? Or is there another test that does that? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1441750595 From aph at openjdk.org Thu Jan 4 14:11:25 2024 From: aph at openjdk.org (Andrew Haley) Date: Thu, 4 Jan 2024 14:11:25 GMT Subject: RFR: 8310844: [AArch64] C1 compilation fails because monitor offset in OSR buffer is too large for immediate In-Reply-To: <6vNa0ZYwKL0HXjHjWTyIe6kB5hza1aaXlJbxM4qh8-w=.edc55dc9-08ff-4861-9038-d4b150bcf4f7@github.com> References: <6vNa0ZYwKL0HXjHjWTyIe6kB5hza1aaXlJbxM4qh8-w=.edc55dc9-08ff-4861-9038-d4b150bcf4f7@github.com> Message-ID: On Thu, 4 Jan 2024 12:39:18 GMT, Tobias Hartmann wrote: > [JDK-8287349](https://bugs.openjdk.org/browse/JDK-8287349) changed the code in `LIR_Assembler::osr_entry()` to use a single `ldp` instruction instead of two `ldr` instructions to load the monitor lock and object from the OSR state. This is not correct because the `ldp` instruction only supports a [7-bit signed immediate value](https://developer.arm.com/documentation/den0024/a/The-A64-instruction-set/Memory-access-instructions/Accessing-multiple-memory-locations). If the offset is larger, for example due to a large number of locals as in `TestLargeMonitorOffset::test`, we hit the `Field too big for insn` guarantee. > > I suggest to revert [JDK-8287349](https://bugs.openjdk.org/browse/JDK-8287349). > > I also found two unrelated bugs when working on the reproducer: [JDK-8322992](https://bugs.openjdk.org/browse/JDK-8322992) (javac) and [JDK-8322996](https://bugs.openjdk.org/browse/JDK-8322996) (C2). > > Thanks, > Tobias src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp line 289: > 287: __ ldr(r19, Address(OSR_buf, slot_offset + 1*BytesPerWord)); > 288: __ str(r19, frame_map()->address_for_monitor_object(i)); > 289: } The macro assembler automagically fuses `ldr` pairs. It'd be better to fix this with: --- a/src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp +++ b/src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp @@ -282,7 +282,8 @@ void LIR_Assembler::osr_entry() { __ bind(L); } #endif - __ ldp(r19, r20, Address(OSR_buf, slot_offset)); + __ ldr(r19, Address(OSR_buf, slot_offset)); + __ ldr(r20, Address(OSR_buf, slot_offset + BytesPerWord)); __ str(r19, frame_map()->address_for_monitor_lock(i)); __ str(r20, frame_map()->address_for_monitor_object(i)); } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17266#discussion_r1441789599 From thartmann at openjdk.org Thu Jan 4 14:20:39 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 4 Jan 2024 14:20:39 GMT Subject: RFR: 8310844: [AArch64] C1 compilation fails because monitor offset in OSR buffer is too large for immediate [v2] In-Reply-To: <6vNa0ZYwKL0HXjHjWTyIe6kB5hza1aaXlJbxM4qh8-w=.edc55dc9-08ff-4861-9038-d4b150bcf4f7@github.com> References: <6vNa0ZYwKL0HXjHjWTyIe6kB5hza1aaXlJbxM4qh8-w=.edc55dc9-08ff-4861-9038-d4b150bcf4f7@github.com> Message-ID: <2Hnw1VDu1Ml2XB-VfUV8l-t8dtrMzPNh3RyNPfV-tgA=.fc400df2-9529-4bdd-8e0c-fb49ca4f0b48@github.com> > [JDK-8287349](https://bugs.openjdk.org/browse/JDK-8287349) changed the code in `LIR_Assembler::osr_entry()` to use a single `ldp` instruction instead of two `ldr` instructions to load the monitor lock and object from the OSR state. This is not correct because the `ldp` instruction only supports a [7-bit signed immediate value](https://developer.arm.com/documentation/den0024/a/The-A64-instruction-set/Memory-access-instructions/Accessing-multiple-memory-locations). If the offset is larger, for example due to a large number of locals as in `TestLargeMonitorOffset::test`, we hit the `Field too big for insn` guarantee. > > I suggest to revert [JDK-8287349](https://bugs.openjdk.org/browse/JDK-8287349). > > I also found two unrelated bugs when working on the reproducer: [JDK-8322992](https://bugs.openjdk.org/browse/JDK-8322992) (javac) and [JDK-8322996](https://bugs.openjdk.org/browse/JDK-8322996) (C2). > > Thanks, > Tobias Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: Adjusted according to review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17266/files - new: https://git.openjdk.org/jdk/pull/17266/files/de6684fd..f888a56d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17266&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17266&range=00-01 Stats: 4 lines in 1 file changed: 1 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17266.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17266/head:pull/17266 PR: https://git.openjdk.org/jdk/pull/17266 From thartmann at openjdk.org Thu Jan 4 14:20:41 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 4 Jan 2024 14:20:41 GMT Subject: RFR: 8310844: [AArch64] C1 compilation fails because monitor offset in OSR buffer is too large for immediate [v2] In-Reply-To: References: <6vNa0ZYwKL0HXjHjWTyIe6kB5hza1aaXlJbxM4qh8-w=.edc55dc9-08ff-4861-9038-d4b150bcf4f7@github.com> Message-ID: On Thu, 4 Jan 2024 14:08:51 GMT, Andrew Haley wrote: >> Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: >> >> Adjusted according to review > > src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp line 289: > >> 287: __ ldr(r19, Address(OSR_buf, slot_offset + 1*BytesPerWord)); >> 288: __ str(r19, frame_map()->address_for_monitor_object(i)); >> 289: } > > The macro assembler automagically fuses `ldr` pairs. It'd be better to fix this with: > > > --- a/src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp > +++ b/src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp > @@ -282,7 +282,8 @@ void LIR_Assembler::osr_entry() { > __ bind(L); > } > #endif > - __ ldp(r19, r20, Address(OSR_buf, slot_offset)); > + __ ldr(r19, Address(OSR_buf, slot_offset)); > + __ ldr(r20, Address(OSR_buf, slot_offset + BytesPerWord)); > __ str(r19, frame_map()->address_for_monitor_lock(i)); > __ str(r20, frame_map()->address_for_monitor_object(i)); > } Thanks for the review. I adjusted the fix accordingly. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17266#discussion_r1441798599 From aph at openjdk.org Thu Jan 4 15:36:22 2024 From: aph at openjdk.org (Andrew Haley) Date: Thu, 4 Jan 2024 15:36:22 GMT Subject: RFR: 8310844: [AArch64] C1 compilation fails because monitor offset in OSR buffer is too large for immediate [v2] In-Reply-To: <2Hnw1VDu1Ml2XB-VfUV8l-t8dtrMzPNh3RyNPfV-tgA=.fc400df2-9529-4bdd-8e0c-fb49ca4f0b48@github.com> References: <6vNa0ZYwKL0HXjHjWTyIe6kB5hza1aaXlJbxM4qh8-w=.edc55dc9-08ff-4861-9038-d4b150bcf4f7@github.com> <2Hnw1VDu1Ml2XB-VfUV8l-t8dtrMzPNh3RyNPfV-tgA=.fc400df2-9529-4bdd-8e0c-fb49ca4f0b48@github.com> Message-ID: On Thu, 4 Jan 2024 14:20:39 GMT, Tobias Hartmann wrote: >> [JDK-8287349](https://bugs.openjdk.org/browse/JDK-8287349) changed the code in `LIR_Assembler::osr_entry()` to use a single `ldp` instruction instead of two `ldr` instructions to load the monitor lock and object from the OSR state. This is not correct because the `ldp` instruction only supports a [7-bit signed immediate value](https://developer.arm.com/documentation/den0024/a/The-A64-instruction-set/Memory-access-instructions/Accessing-multiple-memory-locations). If the offset is larger, for example due to a large number of locals as in `TestLargeMonitorOffset::test`, we hit the `Field too big for insn` guarantee. >> >> I suggest to revert [JDK-8287349](https://bugs.openjdk.org/browse/JDK-8287349). >> >> I also found two unrelated bugs when working on the reproducer: [JDK-8322992](https://bugs.openjdk.org/browse/JDK-8322992) (javac) and [JDK-8322996](https://bugs.openjdk.org/browse/JDK-8322996) (C2). >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Adjusted according to review Marked as reviewed by aph (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/17266#pullrequestreview-1804415472 From thartmann at openjdk.org Thu Jan 4 15:44:21 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 4 Jan 2024 15:44:21 GMT Subject: RFR: 8310844: [AArch64] C1 compilation fails because monitor offset in OSR buffer is too large for immediate [v2] In-Reply-To: <2Hnw1VDu1Ml2XB-VfUV8l-t8dtrMzPNh3RyNPfV-tgA=.fc400df2-9529-4bdd-8e0c-fb49ca4f0b48@github.com> References: <6vNa0ZYwKL0HXjHjWTyIe6kB5hza1aaXlJbxM4qh8-w=.edc55dc9-08ff-4861-9038-d4b150bcf4f7@github.com> <2Hnw1VDu1Ml2XB-VfUV8l-t8dtrMzPNh3RyNPfV-tgA=.fc400df2-9529-4bdd-8e0c-fb49ca4f0b48@github.com> Message-ID: On Thu, 4 Jan 2024 14:20:39 GMT, Tobias Hartmann wrote: >> [JDK-8287349](https://bugs.openjdk.org/browse/JDK-8287349) changed the code in `LIR_Assembler::osr_entry()` to use a single `ldp` instruction instead of two `ldr` instructions to load the monitor lock and object from the OSR state. This is not correct because the `ldp` instruction only supports a [7-bit signed immediate value](https://developer.arm.com/documentation/den0024/a/The-A64-instruction-set/Memory-access-instructions/Accessing-multiple-memory-locations). If the offset is larger, for example due to a large number of locals as in `TestLargeMonitorOffset::test`, we hit the `Field too big for insn` guarantee. >> >> I suggest to revert [JDK-8287349](https://bugs.openjdk.org/browse/JDK-8287349). >> >> I also found two unrelated bugs when working on the reproducer: [JDK-8322992](https://bugs.openjdk.org/browse/JDK-8322992) (javac) and [JDK-8322996](https://bugs.openjdk.org/browse/JDK-8322996) (C2). >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Adjusted according to review Thanks for the review, Andrew. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17266#issuecomment-1877311832 From aph at openjdk.org Thu Jan 4 15:51:21 2024 From: aph at openjdk.org (Andrew Haley) Date: Thu, 4 Jan 2024 15:51:21 GMT Subject: RFR: 8310844: [AArch64] C1 compilation fails because monitor offset in OSR buffer is too large for immediate [v2] In-Reply-To: <2Hnw1VDu1Ml2XB-VfUV8l-t8dtrMzPNh3RyNPfV-tgA=.fc400df2-9529-4bdd-8e0c-fb49ca4f0b48@github.com> References: <6vNa0ZYwKL0HXjHjWTyIe6kB5hza1aaXlJbxM4qh8-w=.edc55dc9-08ff-4861-9038-d4b150bcf4f7@github.com> <2Hnw1VDu1Ml2XB-VfUV8l-t8dtrMzPNh3RyNPfV-tgA=.fc400df2-9529-4bdd-8e0c-fb49ca4f0b48@github.com> Message-ID: <1bw-NMQZIKilTwHkcwrOxVeSIYYPI2WEHiCIRnYvFEc=.3da3a815-2fad-4619-ac08-399324ca7e63@github.com> On Thu, 4 Jan 2024 14:20:39 GMT, Tobias Hartmann wrote: >> [JDK-8287349](https://bugs.openjdk.org/browse/JDK-8287349) changed the code in `LIR_Assembler::osr_entry()` to use a single `ldp` instruction instead of two `ldr` instructions to load the monitor lock and object from the OSR state. This is not correct because the `ldp` instruction only supports a [7-bit signed immediate value](https://developer.arm.com/documentation/den0024/a/The-A64-instruction-set/Memory-access-instructions/Accessing-multiple-memory-locations). If the offset is larger, for example due to a large number of locals as in `TestLargeMonitorOffset::test`, we hit the `Field too big for insn` guarantee. >> >> I suggest to revert [JDK-8287349](https://bugs.openjdk.org/browse/JDK-8287349). >> >> I also found two unrelated bugs when working on the reproducer: [JDK-8322992](https://bugs.openjdk.org/browse/JDK-8322992) (javac) and [JDK-8322996](https://bugs.openjdk.org/browse/JDK-8322996) (C2). >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Adjusted according to review I looked through the history and I see this bug is my fault, and your fix will have to be back ported to all releases. Argh! Thanks for fixing it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17266#issuecomment-1877322652 From ddong at openjdk.org Thu Jan 4 15:54:22 2024 From: ddong at openjdk.org (Denghui Dong) Date: Thu, 4 Jan 2024 15:54:22 GMT Subject: RFR: 8322735: C2: minor improvements of bubble sort used in SuperWord::packset_sort In-Reply-To: References: Message-ID: On Thu, 4 Jan 2024 12:36:01 GMT, Emanuel Peter wrote: > And a more fundamental question: Why do we need this improvement? Do you see any timing bottleneck and improvement? And what is faster: bubbling up or down? > > And do you know why we sort at all in `extend_packlist` and why we do it again and again? Sorry, I don't know the theory or implementation of `superword`. (I hope to grasp it someday...) I just found it when browsing the code. This change is trivial; if you think it's unnecessary, I'm fine with closing it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17190#issuecomment-1877328019 From thartmann at openjdk.org Thu Jan 4 15:56:21 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 4 Jan 2024 15:56:21 GMT Subject: RFR: 8310844: [AArch64] C1 compilation fails because monitor offset in OSR buffer is too large for immediate [v2] In-Reply-To: <2Hnw1VDu1Ml2XB-VfUV8l-t8dtrMzPNh3RyNPfV-tgA=.fc400df2-9529-4bdd-8e0c-fb49ca4f0b48@github.com> References: <6vNa0ZYwKL0HXjHjWTyIe6kB5hza1aaXlJbxM4qh8-w=.edc55dc9-08ff-4861-9038-d4b150bcf4f7@github.com> <2Hnw1VDu1Ml2XB-VfUV8l-t8dtrMzPNh3RyNPfV-tgA=.fc400df2-9529-4bdd-8e0c-fb49ca4f0b48@github.com> Message-ID: On Thu, 4 Jan 2024 14:20:39 GMT, Tobias Hartmann wrote: >> [JDK-8287349](https://bugs.openjdk.org/browse/JDK-8287349) changed the code in `LIR_Assembler::osr_entry()` to use a single `ldp` instruction instead of two `ldr` instructions to load the monitor lock and object from the OSR state. This is not correct because the `ldp` instruction only supports a [7-bit signed immediate value](https://developer.arm.com/documentation/den0024/a/The-A64-instruction-set/Memory-access-instructions/Accessing-multiple-memory-locations). If the offset is larger, for example due to a large number of locals as in `TestLargeMonitorOffset::test`, we hit the `Field too big for insn` guarantee. >> >> I suggest to revert [JDK-8287349](https://bugs.openjdk.org/browse/JDK-8287349). >> >> I also found two unrelated bugs when working on the reproducer: [JDK-8322992](https://bugs.openjdk.org/browse/JDK-8322992) (javac) and [JDK-8322996](https://bugs.openjdk.org/browse/JDK-8322996) (C2). >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Adjusted according to review But, as I mentioned in the description, it's a regression from [JDK-8287349](https://bugs.openjdk.org/browse/JDK-8287349), right? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17266#issuecomment-1877332118 From aph at openjdk.org Thu Jan 4 16:08:22 2024 From: aph at openjdk.org (Andrew Haley) Date: Thu, 4 Jan 2024 16:08:22 GMT Subject: RFR: 8310844: [AArch64] C1 compilation fails because monitor offset in OSR buffer is too large for immediate [v2] In-Reply-To: References: <6vNa0ZYwKL0HXjHjWTyIe6kB5hza1aaXlJbxM4qh8-w=.edc55dc9-08ff-4861-9038-d4b150bcf4f7@github.com> <2Hnw1VDu1Ml2XB-VfUV8l-t8dtrMzPNh3RyNPfV-tgA=.fc400df2-9529-4bdd-8e0c-fb49ca4f0b48@github.com> Message-ID: On Thu, 4 Jan 2024 15:53:55 GMT, Tobias Hartmann wrote: > But, as I mentioned in the description, it's a regression from [JDK-8287349](https://bugs.openjdk.org/browse/JDK-8287349), right? Yeah, that's true. A "trivial performance fix," as was said at the time. Memo to myself: there are no trivial performance fixes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17266#issuecomment-1877354409 From adinn at openjdk.org Thu Jan 4 16:17:21 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 4 Jan 2024 16:17:21 GMT Subject: RFR: 8310844: [AArch64] C1 compilation fails because monitor offset in OSR buffer is too large for immediate [v2] In-Reply-To: References: <6vNa0ZYwKL0HXjHjWTyIe6kB5hza1aaXlJbxM4qh8-w=.edc55dc9-08ff-4861-9038-d4b150bcf4f7@github.com> Message-ID: On Thu, 4 Jan 2024 14:17:25 GMT, Tobias Hartmann wrote: >> src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp line 289: >> >>> 287: __ ldr(r19, Address(OSR_buf, slot_offset + 1*BytesPerWord)); >>> 288: __ str(r19, frame_map()->address_for_monitor_object(i)); >>> 289: } >> >> The macro assembler automagically fuses `ldr` pairs. It'd be better to fix this with: >> >> >> --- a/src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp >> +++ b/src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp >> @@ -282,7 +282,8 @@ void LIR_Assembler::osr_entry() { >> __ bind(L); >> } >> #endif >> - __ ldp(r19, r20, Address(OSR_buf, slot_offset)); >> + __ ldr(r19, Address(OSR_buf, slot_offset)); >> + __ ldr(r20, Address(OSR_buf, slot_offset + BytesPerWord)); >> __ str(r19, frame_map()->address_for_monitor_lock(i)); >> __ str(r20, frame_map()->address_for_monitor_object(i)); >> } > > Thanks for the review. I adjusted the fix accordingly. I'm not sure why the recommended adjustment is needed. The macro assembler does fuse pairs of adjacent ldr instructions into an ldp but only when the sizes match and the offsets fit into the requisite number of bits. So, if the two ldr instrctions ar egenerated next to each other the macroasembler should only convert to ldp *where appropriate*. Am I missing something here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17266#discussion_r1441959403 From adinn at openjdk.org Thu Jan 4 16:24:22 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 4 Jan 2024 16:24:22 GMT Subject: RFR: 8310844: [AArch64] C1 compilation fails because monitor offset in OSR buffer is too large for immediate [v2] In-Reply-To: References: <6vNa0ZYwKL0HXjHjWTyIe6kB5hza1aaXlJbxM4qh8-w=.edc55dc9-08ff-4861-9038-d4b150bcf4f7@github.com> Message-ID: On Thu, 4 Jan 2024 16:14:41 GMT, Andrew Dinn wrote: >> Thanks for the review. I adjusted the fix accordingly. > > I'm not sure why the recommended adjustment is needed. The macro assembler does fuse pairs of adjacent ldr instructions into an ldp but only when the sizes match and the offsets fit into the requisite number of bits. > > So, if the two ldr instrctions ar egenerated next to each other the macroasembler should only convert to ldp *where appropriate*. Am I missing something here? Doh, sorry - I misread Andrew's proposed code! Ignore the noise. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17266#discussion_r1441978216 From aph at openjdk.org Thu Jan 4 16:24:25 2024 From: aph at openjdk.org (Andrew Haley) Date: Thu, 4 Jan 2024 16:24:25 GMT Subject: RFR: 8310844: [AArch64] C1 compilation fails because monitor offset in OSR buffer is too large for immediate [v2] In-Reply-To: References: <6vNa0ZYwKL0HXjHjWTyIe6kB5hza1aaXlJbxM4qh8-w=.edc55dc9-08ff-4861-9038-d4b150bcf4f7@github.com> Message-ID: On Thu, 4 Jan 2024 14:17:25 GMT, Tobias Hartmann wrote: >> src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp line 289: >> >>> 287: __ ldr(r19, Address(OSR_buf, slot_offset + 1*BytesPerWord)); >>> 288: __ str(r19, frame_map()->address_for_monitor_object(i)); >>> 289: } >> >> The macro assembler automagically fuses `ldr` pairs. It'd be better to fix this with: >> >> >> --- a/src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp >> +++ b/src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp >> @@ -282,7 +282,8 @@ void LIR_Assembler::osr_entry() { >> __ bind(L); >> } >> #endif >> - __ ldp(r19, r20, Address(OSR_buf, slot_offset)); >> + __ ldr(r19, Address(OSR_buf, slot_offset)); >> + __ ldr(r20, Address(OSR_buf, slot_offset + BytesPerWord)); >> __ str(r19, frame_map()->address_for_monitor_lock(i)); >> __ str(r20, frame_map()->address_for_monitor_object(i)); >> } > > Thanks for the review. I adjusted the fix accordingly. Yes, the problem @TobiHartmann is fixing is that we currently use `ldp`, but in very rare cases`ldp` can't reach, so the fix we need is to change one `ldp` to two `ldr`s. In almost all cases, macroassembler will merge the `ldr`s. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17266#discussion_r1441983810 From epeter at openjdk.org Thu Jan 4 16:25:36 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 4 Jan 2024 16:25:36 GMT Subject: RFR: 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 [v9] In-Reply-To: References: Message-ID: On Wed, 3 Jan 2024 15:53:04 GMT, Roland Westrelin wrote: >> Range check smearing and range check predication make an array access >> dependent on 2 (or more in the case of RC smearing) conditions. As a >> consequence, if a range check can be eliminated because there's an >> identical dominating range check, the control dependent nodes that >> could float and become dependent on the dominating range check cannot >> be allowed to float because there's a risk that they would then bypass >> one of the checks that make the access legal. >> >> `IfNode::dominated_by()` and `PhaseIdealLoop::dominated_by()` have >> logic to prevent this: nodes that are control dependent on a range >> check or predicate are not allowed to float. This is however not >> sufficient as demonstrated by the test cases. >> >> In `TestArrayAccessAboveRCAfterSmearingOrPredication.testRangeCheckSmearing()`: >> >> >> v += array[i]; >> if (flag2) { >> if (flag3) { >> field = 0x42; >> } >> } >> if (flagField == 1) { >> v += array[i]; >> } >> >> >> The range check for the second `array[i]` load is replaced by the >> dominating range check for the first `array[i]` but because the second >> `array[i]` load could really be dependent on multiple range checks (in >> case smearing happened which is not the case here), c2 doesn't allow >> the second `array[i]` to float when the second range check is >> removed. The second `array[i]` is then control dependent on: >> >> >> if (flagField == 1) { >> >> >> which is next found to be dominated by the same test: >> >> >> if (flag == 1) { >> >> >> and is removed. However nothing in `dominated_by()` treats node >> dependent on tests that are not range check or predicates >> specially. So the second `array[i]` is allowed to float and become >> dependent on: >> >> >> if (flag == 1) { >> >> >> which is above the range check for that access. The test method in its >> last invocation is passed an index for the array access that's widely >> out of range. The array load happens before the range check and >> crashes the VM. `testLoopPredication()` is a similar test where array >> loads become dependent on predicates and end up above range checks. >> >> `TestArrayAccessCastIIAboveRC.java` is the test case from the bug >> where for similar reasons a range check `CastII` ends up above its >> range check, becomes top because its input becomes some integer that >> conflicts with its... > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: > > - Merge branch 'master' into JDK-8319793 > - review > - Revert "Update src/hotspot/share/opto/castnode.hpp" > > This reverts commit 356c91cca911ed486f9f87f3eff53ce21e1e3ec9. > - Revert "Update src/hotspot/share/opto/memnode.hpp" > > This reverts commit bdb731ea562f314f44d327f7243ef5cf9ad40b2e. > - review > - Update src/hotspot/share/opto/memnode.hpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/castnode.hpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/ifnode.cpp > > Co-authored-by: Christian Hagedorn > - Merge branch 'master' into JDK-8319793 > - ... and 1 more: https://git.openjdk.org/jdk/compare/15519285...dbe3c4c1 @rwestrel thanks for all the work! Generally I'm very happy with the approach. I mostly left suggestions for better comments and improved naming. src/hotspot/share/opto/ifnode.cpp line 573: > 571: // that these Loads/Casts do not float above any of the dominating checks (even when the lowest dominating check is > 572: // later replaced by yet another dominating check), we need to pin them at the lowest dominating check. > 573: proj->pin_array_loads(igvn); `pin_array_loads` suggests we only care about `Load`. But the comment suggests otherwise. I would also appreciate if the comment said why there are now multiple dependencies. Actually, the problem is that we **would** have multiple dependency, but we only have one dependency input we can set, hence forgetting about the others. Pinning makes sure that there is no bypassing of dependencies, right? src/hotspot/share/opto/ifnode.cpp line 1501: > 1499: > 1500: //------------------------------dominated_by----------------------------------- > 1501: Node* IfNode::dominated_by(Node* prev_dom, PhaseIterGVN* igvn, bool range_check_smearing) { I suggest that you replace `range_check_smearing` with `pin_dependencies` or similar. Basically say what it will do in this method, rather than what is the use case. Then add a comment above what is a usecase, and more comments in the case where you call it with `true`. Because the range-check-smearing is not happening here but outside. src/hotspot/share/opto/ifnode.cpp line 1517: > 1515: prev_dom = idom; > 1516: } > 1517: Can you say what exactly this did, and why it is safe to remove? src/hotspot/share/opto/ifnode.cpp line 1541: > 1539: // control dependent nodes end up at the lowest/nearest dominating check in the graph. To ensure that these > 1540: // Loads/Casts do not float above any of the dominating checks (even when the lowest dominating check is later > 1541: // replaced by yet another dominating check), we need to pin them at the lowest dominating check. I like this comment. A picture would be a really nice addition. RC[0] -> true ... RC[6] -> false RC[0] -> true ... RC[3] -> false ctrl dependent node x, assuming array[3] is safe. x is first dependent on RC[3], which is now smeared to RC[0] (the lower one) and RC[6]. Now we discover that the lower RC[0] is dominated by the upper one, and skip RC[6]. Now x is only dependent on RC[6], which is true, and does not first check RC[6], which it should check. I suggest you move all of this to where the range-check-smearing happens. src/hotspot/share/opto/ifnode.cpp line 1805: > 1803: --i; > 1804: } > 1805: } This logic looks like it would not just pin array loads, but really any node that has `depends_only_on_test`. That could also be `CastII` or even other nodes like `LoadKlass`, right? If that is true, you should rename this method to something more precise. src/hotspot/share/opto/ifnode.cpp line 1958: > 1956: return nullptr; > 1957: } > 1958: Why is it ok to delay this to post-loop-opts? Does it not prevent moving some CFG from being eliminated out of loops? Would be nice to have a little justification comment. src/hotspot/share/opto/loopopts.cpp line 308: > 306: // IGVN worklist for later cleanup. Move control-dependent data Nodes on the > 307: // live path up to the dominating control. > 308: void PhaseIdealLoop::dominated_by(IfProjNode* prevdom, IfNode* iff, bool flip, bool range_check_predicate) { Can we also rename `range_check_predicate` -> `must_pin_dependencies`, so that it says what it does? And then add a comment to say that it is on when we are doing range check predication, and hence the eliminated RC lay between two predicates, and hence has a dependency on both. src/hotspot/share/opto/loopopts.cpp line 356: > 354: _igvn.replace_input_of(cd, 0, prevdom); > 355: if (range_check_predicate) { > 356: // Loads and range check Cast nodes that are control dependent on this range check (that is about to be removed) Here we should now be talking about range check predicates, and not just range checks, right? src/hotspot/share/opto/loopopts.cpp line 361: > 359: return; // Let IGVN transformation change control dependence. > 360: } > 361: Why it ok to remove this bailout? src/hotspot/share/opto/memnode.cpp line 851: > 849: return !Type::cmp( _type, ((LoadNode&)n)._type ) && > 850: _control_dependency == ((LoadNode&)n)._control_dependency && > 851: _mo == ((LoadNode&)n)._mo; might look nicer if you cast `n` once -> `load` and then use that. src/hotspot/share/opto/node.hpp line 1140: > 1138: virtual Node* pin_for_array_access() const { > 1139: return nullptr; > 1140: } Can you please add a comment, what this method is for? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16886#pullrequestreview-1804393994 PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1441946602 PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1441920140 PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1441949862 PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1441930213 PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1441935806 PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1441953152 PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1441963806 PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1441959206 PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1441972024 PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1441979826 PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1441895016 From epeter at openjdk.org Thu Jan 4 16:25:37 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 4 Jan 2024 16:25:37 GMT Subject: RFR: 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 [v9] In-Reply-To: References: Message-ID: On Thu, 4 Jan 2024 16:09:05 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: >> >> - Merge branch 'master' into JDK-8319793 >> - review >> - Revert "Update src/hotspot/share/opto/castnode.hpp" >> >> This reverts commit 356c91cca911ed486f9f87f3eff53ce21e1e3ec9. >> - Revert "Update src/hotspot/share/opto/memnode.hpp" >> >> This reverts commit bdb731ea562f314f44d327f7243ef5cf9ad40b2e. >> - review >> - Update src/hotspot/share/opto/memnode.hpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/castnode.hpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/castnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/ifnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Merge branch 'master' into JDK-8319793 >> - ... and 1 more: https://git.openjdk.org/jdk/compare/15519285...dbe3c4c1 > > src/hotspot/share/opto/ifnode.cpp line 1958: > >> 1956: return nullptr; >> 1957: } >> 1958: > > Why is it ok to delay this to post-loop-opts? Does it not prevent moving some CFG from being eliminated out of loops? Would be nice to have a little justification comment. Ah. Does this mean that if there are multiple RangeCheck in a loop, where some could be smeared, these are not smeared, and then we have more RangeChecks to eliminate out of the loop? Maybe in the end this all comes down to the same anyway. Just wondering. > src/hotspot/share/opto/node.hpp line 1140: > >> 1138: virtual Node* pin_for_array_access() const { >> 1139: return nullptr; >> 1140: } > > Can you please add a comment, what this method is for? Effectively, you want to replace some nodes, such as `Load` and `CastII` into pinned nodes, which have `StrongDependency` or `UnknownControl`. In either case, this means that we will not allow these to float any more. Generally, I'm not really happy with the name of `UnknownControl`. Sounds like the control is unknown. In what sense is it unknown, after all we have a control and want the Load to be pinned to it...? Maybe then we could rename `pin_for_array_access` -> `make_pinned`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1441954551 PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1441907080 From kvn at openjdk.org Thu Jan 4 16:39:38 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 4 Jan 2024 16:39:38 GMT Subject: RFR: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs [v58] In-Reply-To: References: <9zUqzeJrnPyjjEC0_F9Z5OWzHLqnTN_5c1bzOiK-LqA=.050c0992-7f7f-48f3-b2da-ec81aafa41a6@github.com> Message-ID: <1oTqtU5lm0As9tKfnWuGNh2sHXfQimLdMCzV2g1D2ho=.6a5a20f7-3ab3-43fb-b640-fa043131fef8@github.com> On Thu, 4 Jan 2024 10:39:11 GMT, Emanuel Peter wrote: >> This is my justification in the PR description: >> >>> Other Details >>> >>> I made VectorizeDebugOption a debug print only flag now. Before this fix, it also had the same effect as VectorizeOption (which ensures that only nodes from the same original pre-unrolling node are packed, preventing hand-unrolled code to be vectorized but enabling some edge cases to be vectorized that would not otherwise vectorize). >>> >>> I added is_trace_align_vector with bit 128, since 64 was recently used for is_trace_loop_reverse, removed with [JDK-8309204](https://bugs.openjdk.org/browse/JDK-8309204). >>> >>> I plan to refactor VectorizeDebugOption soon, as it now has a few subflags / bits that are not used. I may also refactor how TraceSuperWord works in general. Filed [JDK-8317572](https://bugs.openjdk.org/browse/JDK-8317572). > > If you really want, then I can not touch `VectorizeDebugOption` at all, i.e. not activate `is_trace_align_vector` with that flag, but instead simply use `TraceSuperWord` (that might be a little verbose though). > > I already have the CSR for [JDK-8315361](https://bugs.openjdk.org/browse/JDK-8315361), so that I can remove `VectorizeDebugOption`. This has 2 effects: > 1. remove the product effect of `VectorizeDebugOption`, which is the same effect as enabling `VectorizeOption`. > 2. introduce a more general auto-vectorization tracing flag that allows more fine-grained control for debug printing. > > My idea here was to simply add the alignment tracing to `VectorizeDebugOption`. But currently one cannot enable that tracing without having the side-effects that also `VectorizeOption` has. Hence, I already now remove that product-side effect. > > @vnkozlov what do you think? I missed that in your long description ;^) I agree with your suggestion. The option was indeed strange: mixing prints with affects on code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1442000827 From kvn at openjdk.org Thu Jan 4 16:47:35 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 4 Jan 2024 16:47:35 GMT Subject: RFR: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs [v58] In-Reply-To: References: Message-ID: <9qbGMjWjQfwA5r6XimWz_GZUdQEgo6GEzY-uYwIoP4s=.9c68a4f4-a82b-4760-a428-3b5025eda61b@github.com> On Thu, 4 Jan 2024 08:25:25 GMT, Emanuel Peter wrote: >> I don't understand this comment. >> The `LoadVector` and `StoreVector` have both a `MemNode::Address` input, which I think it the memory address. >> The address itself usually consists of `AddP` nodes, which do the (base, index, offset) computation. These nodes later can be folded into the load/store itself, or be computed with a `lea`. >> I simply take the address value, check it for alignment and pass it on to the load/store. >> >> Take this example: >> >> public class Test { >> static int RANGE = 1024*64; >> >> public static void main(String[] strArr) { >> int a[] = new int[RANGE]; >> test0(a); >> } >> >> static void test0(int[] a) { >> for (int i = 0; i < RANGE; i++) { >> a[i]++; >> } >> } >> } >> >> >> `./java -XX:CompileCommand=compileonly,Test::test* -XX:+TraceSuperWord -Xcomp -XX:+PrintIdeal -XX:+AlignVector -XX:+VerifyAlignVector -XX:CompileCommand=print,Test::test* Test.java` >> >> This looks like the main loop: >> >> ;; B22: # out( B22 B23 ) <- in( B21 B22 ) Loop( B22-B22 inner post of N743) Freq: 4.49988 >> 0x00007f83c8bb2f68: lea 0x10(%rbp,%rbx,4),%r10 ;*iaload {reexecute=0 rethrow=0 return_oop=0} >> ; - Test::test0 at 12 (line 11) >> 0x00007f83c8bb2f6d: mov %r10,%r8 >> 0x00007f83c8bb2f70: test $0x7,%r8b >> 0x00007f83c8bb2f74: je 0x00007f83c8bb2f8a >> 0x00007f83c8bb2f76: movabs $0x7f83d77e2fc8,%rdi ; {external_word} >> 0x00007f83c8bb2f80: and $0xfffffffffffffff0,%rsp >> 0x00007f83c8bb2f84: callq 0x00007f83d71a0162 ; {runtime_call MacroAssembler::debug64(char*, long, long*)} >> 0x00007f83c8bb2f89: hlt >> 0x00007f83c8bb2f8a: test $0x7,%r10b >> 0x00007f83c8bb2f8e: je 0x00007f83c8bb2fa4 >> 0x00007f83c8bb2f90: movabs $0x7f83d77e2fc8,%rdi ; {external_word} >> 0x00007f83c8bb2f9a: and $0xfffffffffffffff0,%rsp >> 0x00007f83c8bb2f9e: callq 0x00007f83d71a0162 ; {runtime_call MacroAssembler::debug64(char*, long, long*)} >> 0x00007f83c8bb2fa3: hlt >> 0x00007f83c8bb2fa4: vpaddd (%r10),%zmm5,%zmm0 >> 0x00007f83c8bb2faa: vmovdqu32 %zmm0,(%r8) ;*iastore {reexecute=0 rethrow=0 return_oop=0} >> ; - Test::test0 at 15 (line 11) >> 0x00007f83c8bb2fb0: add $0x10,%ebx ;*iinc {reexecute=0 rethrow=0 return_oop=0} >> ... > > And without `-XX:-VerifyAlignVector` > > > ;; B20: # out( B20 B21 ) <- in( B19 B20 ) Loop( B20-B20 inner post of N743) Freq: 4.49988 > 0x00007ff22cbb2924: vpaddd 0x10(%rbx,%r13,4),%zmm0,%zmm1 > 0x00007ff22cbb292f: vmovdqu32 %zmm1,0x10(%rbx,%r13,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0} > ; - Test::test0 at 15 (line 11) > 0x00007ff22cbb293a: add $0x10,%r13d ;*iinc {reexecute=0 rethrow=0 return_oop=0} > ; - Test::test0 at 16 (line 10) > 0x00007ff22cbb293e: cmp %r10d,%r13d > 0x00007ff22cbb2941: jl 0x00007ff22cbb2924 ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0} > ; - Test::test0 at 6 (line 10) Can you show assembler code for simple load and store instructions (move data from one array to another)? My concern is that LoadV and StoreV are defined only with `memory` input: instruct loadV(vec dst, memory mem) %{ match(Set dst (LoadVector mem)); I would assume it will be embedded memory only. But C2 may be smart enough to generate `lea` if it sees not AddP node. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1442017423 From kvn at openjdk.org Thu Jan 4 16:54:37 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 4 Jan 2024 16:54:37 GMT Subject: RFR: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs [v58] In-Reply-To: <9qbGMjWjQfwA5r6XimWz_GZUdQEgo6GEzY-uYwIoP4s=.9c68a4f4-a82b-4760-a428-3b5025eda61b@github.com> References: <9qbGMjWjQfwA5r6XimWz_GZUdQEgo6GEzY-uYwIoP4s=.9c68a4f4-a82b-4760-a428-3b5025eda61b@github.com> Message-ID: <8FqVnqkZB8_FsKx6kJfkJN8Oi90W6hAeBNl8uQy_QwE=.7d6061b5-4a3d-4d3f-aaa4-1c30151d066d@github.com> On Thu, 4 Jan 2024 16:45:11 GMT, Vladimir Kozlov wrote: >> And without `-XX:-VerifyAlignVector` >> >> >> ;; B20: # out( B20 B21 ) <- in( B19 B20 ) Loop( B20-B20 inner post of N743) Freq: 4.49988 >> 0x00007ff22cbb2924: vpaddd 0x10(%rbx,%r13,4),%zmm0,%zmm1 >> 0x00007ff22cbb292f: vmovdqu32 %zmm1,0x10(%rbx,%r13,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0} >> ; - Test::test0 at 15 (line 11) >> 0x00007ff22cbb293a: add $0x10,%r13d ;*iinc {reexecute=0 rethrow=0 return_oop=0} >> ; - Test::test0 at 16 (line 10) >> 0x00007ff22cbb293e: cmp %r10d,%r13d >> 0x00007ff22cbb2941: jl 0x00007ff22cbb2924 ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0} >> ; - Test::test0 at 6 (line 10) > > Can you show assembler code for simple load and store instructions (move data from one array to another)? > My concern is that LoadV and StoreV are defined only with `memory` input: > > instruct loadV(vec dst, memory mem) %{ > match(Set dst (LoadVector mem)); > > I would assume it will be embedded memory only. But C2 may be smart enough to generate `lea` if it sees not AddP node. Also why your assembler example have tested alignment twice for the same address? May be because the same array's element for load and store?: 0x00007f83c8bb2f6d: mov %r10,%r8 0x00007f83c8bb2f70: test $0x7,%r8b 0x00007f83c8bb2f74: je 0x00007f83c8bb2f8a ... 0x00007f83c8bb2f8a: test $0x7,%r10b 0x00007f83c8bb2f8e: je 0x00007f83c8bb2fa4 No need to optimize I think since it is only for debugging. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1442025069 From duke at openjdk.org Thu Jan 4 16:57:31 2024 From: duke at openjdk.org (Joshua Cao) Date: Thu, 4 Jan 2024 16:57:31 GMT Subject: Integrated: 8322976: Remove reference to transform_no_reclaim In-Reply-To: References: Message-ID: On Wed, 3 Jan 2024 22:12:03 GMT, Joshua Cao wrote: > Passes hotspot:tier1 locally This pull request has now been integrated. Changeset: ade40741 Author: Joshua Cao Committer: Xin Liu URL: https://git.openjdk.org/jdk/commit/ade40741cab0b5e4d8519a55ebcd51e386999f5d Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8322976: Remove reference to transform_no_reclaim Reviewed-by: shade, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/17255 From xliu at openjdk.org Thu Jan 4 17:06:38 2024 From: xliu at openjdk.org (Xin Liu) Date: Thu, 4 Jan 2024 17:06:38 GMT Subject: RFR: 8320128: Clean up Parse constructor for OSR [v4] In-Reply-To: <0mxhhLR9DDdpiBhbXWPxCmMzo_OI9DYiIX36xGhxC_c=.1eed99b9-2dc3-4403-b3c1-22b6ebed6079@github.com> References: <0mxhhLR9DDdpiBhbXWPxCmMzo_OI9DYiIX36xGhxC_c=.1eed99b9-2dc3-4403-b3c1-22b6ebed6079@github.com> Message-ID: <-QQZANSu7PVpZ3CUX7kbp00f_xkhjiy7la3Z3gXutKA=.a1c938e4-12e0-44b8-a2dd-7cc0c0c9f09f@github.com> > There's a special case for the constructor of Parse. If current compilation is OSR and it is handling the top-level method(depth() == 1), then > > 1. _tf = C->tf(); > 2. _entry_bci = C->entry_bci(); > 3. _flow = method()->get_osr_flow_analysis(_entry_bci); > > We don't need to assign those member data twice. We can also factor out _flow->failing() for the special case and normal cases. > > It's worth mentioning that we can't save ciTypeFlow computation because > get_osr_flow_analysis(_entry_bci) actually needs get_flow_analysis(method()). Xin Liu has updated the pull request incrementally with one additional commit since the last revision: Use print_cr for the log message. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16669/files - new: https://git.openjdk.org/jdk/pull/16669/files/ec89638c..1e566c97 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16669&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16669&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16669.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16669/head:pull/16669 PR: https://git.openjdk.org/jdk/pull/16669 From xliu at openjdk.org Thu Jan 4 17:06:45 2024 From: xliu at openjdk.org (Xin Liu) Date: Thu, 4 Jan 2024 17:06:45 GMT Subject: RFR: 8320128: Clean up Parse constructor for OSR [v3] In-Reply-To: <2T5HC8Dvq7RUNkNeYMgFWk5niXTiPnE2k4RDlE3BJZs=.1ec18356-631f-4028-af37-f2ad0d8ec05c@github.com> References: <0mxhhLR9DDdpiBhbXWPxCmMzo_OI9DYiIX36xGhxC_c=.1eed99b9-2dc3-4403-b3c1-22b6ebed6079@github.com> <2T5HC8Dvq7RUNkNeYMgFWk5niXTiPnE2k4RDlE3BJZs=.1ec18356-631f-4028-af37-f2ad0d8ec05c@github.com> Message-ID: On Thu, 4 Jan 2024 08:48:59 GMT, Aleksey Shipilev wrote: >> Xin Liu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Use atomic logline and resume #ifdef ASSERT. >> - Merge branch 'master' into JDK-8320128 >> - Update according to reviewer's feedback. >> - 8320128: Clean up Parse constructor for OSR > > src/hotspot/share/opto/parse1.cpp line 511: > >> 509: if (PrintOpto && (Verbose || WizardMode)) { >> 510: if (is_osr_parse()) { >> 511: tty->print("OSR @%d type flow bailout: %s", _entry_bci, _flow->failure_reason()); > > Should be `print_cr`. Sorry, I would have discovered this by myself. updated! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16669#discussion_r1442038930 From never at openjdk.org Thu Jan 4 17:13:23 2024 From: never at openjdk.org (Tom Rodriguez) Date: Thu, 4 Jan 2024 17:13:23 GMT Subject: RFR: 8322636: [JVMCI] HotSpotSpeculationLog can be inconsistent across a single compile [v2] In-Reply-To: References: Message-ID: On Tue, 2 Jan 2024 13:37:19 GMT, David Leopoldseder wrote: >> This PR fixes a subtle inconsistency in `HotSpotSpeculationLog` . >> >> Normal uses of `HotSpotSpeculationLog` work by using a `SpeculationReason` and asking the speculation log via `maySpeculate` if the speculation can be performed, i.e., if it failed before for the given method. An example for this can be seen in Graal https://github.com/oracle/graal/blob/master/compiler/src/jdk.graal.compiler/src/jdk/graal/compiler/nodes/loop/CountedLoopInfo.java#L591C15-L591C15 >> The implicit assumption is that the speculation log, `HotSpotSpeculationLog` in particular collects failed speculations at the beginning of a compile and then stays consistent during the compile. Why is that? - Because if there are new failed speculations added to the failed speculations during the compile - the compiler would speculate again on those in an inconsistent way. E.g. at the beginning of a compile a certain speculation has not failed yet and the compiler thinks it can do optimization xyz using a speculation - later during the compilation process it consults the speculation log but gets a different answer. All those inconsistent speculations that already failed will anyway later fail code installation in jvmci (they will throw a bailout during `HotSpotCodeCacheProvider#installCode` https://github.com/openjdk/jdk/blob/master/src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotSpeculationLog.java#L192 ). Thus, we should at least return a consistent result durin g a compile. >> The problem for consistency here, that also makes troubles on the graal side, is that `maySpeculate` itself can collect failed speculations if there have not been any previously, i.e., `failedSpeculations == null`. >> In order to make the speculation log consistent across an entire JVMCI compile this PR removes the collection of failed speculations in `maySpeculate`. > > David Leopoldseder has updated the pull request incrementally with one additional commit since the last revision: > > 8322636: [JVMCI] HotSpotSpeculationLog add javadoc to maySpeculate I think for now lets just stick with your updates. It does seem like the substrate runtime compilation case is potentially exposed to the original problem but we should address that separately. ------------- Marked as reviewed by never (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17183#pullrequestreview-1804639539 From shade at openjdk.org Thu Jan 4 17:19:28 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 4 Jan 2024 17:19:28 GMT Subject: RFR: 8320128: Clean up Parse constructor for OSR [v4] In-Reply-To: <-QQZANSu7PVpZ3CUX7kbp00f_xkhjiy7la3Z3gXutKA=.a1c938e4-12e0-44b8-a2dd-7cc0c0c9f09f@github.com> References: <0mxhhLR9DDdpiBhbXWPxCmMzo_OI9DYiIX36xGhxC_c=.1eed99b9-2dc3-4403-b3c1-22b6ebed6079@github.com> <-QQZANSu7PVpZ3CUX7kbp00f_xkhjiy7la3Z3gXutKA=.a1c938e4-12e0-44b8-a2dd-7cc0c0c9f09f@github.com> Message-ID: <1HiTgoB9oAtaZk5ubbAw6fW5c6c4XAgEjIV0xAtOc5Q=.08d9d0f2-0d76-4fbe-bc2a-4c80037aad43@github.com> On Thu, 4 Jan 2024 17:06:38 GMT, Xin Liu wrote: >> There's a special case for the constructor of Parse. If current compilation is OSR and it is handling the top-level method(depth() == 1), then >> >> 1. _tf = C->tf(); >> 2. _entry_bci = C->entry_bci(); >> 3. _flow = method()->get_osr_flow_analysis(_entry_bci); >> >> We don't need to assign those member data twice. We can also factor out _flow->failing() for the special case and normal cases. >> >> It's worth mentioning that we can't save ciTypeFlow computation because >> get_osr_flow_analysis(_entry_bci) actually needs get_flow_analysis(method()). > > Xin Liu has updated the pull request incrementally with one additional commit since the last revision: > > Use print_cr for the log message. src/hotspot/share/opto/parse1.cpp line 414: > 412: if (PrintCompilation || PrintOpto) { > 413: // Make sure I have an inline tree, so I can print messages about it. > 414: InlineTree::find_subtree_from_root(C->ilt(), caller, parse_method); Reading this again, you sure that we don't need `caller->caller()` on `is_osr_parse()` path? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16669#discussion_r1442052577 From xliu at openjdk.org Thu Jan 4 20:13:25 2024 From: xliu at openjdk.org (Xin Liu) Date: Thu, 4 Jan 2024 20:13:25 GMT Subject: RFR: 8320128: Clean up Parse constructor for OSR [v4] In-Reply-To: <1HiTgoB9oAtaZk5ubbAw6fW5c6c4XAgEjIV0xAtOc5Q=.08d9d0f2-0d76-4fbe-bc2a-4c80037aad43@github.com> References: <0mxhhLR9DDdpiBhbXWPxCmMzo_OI9DYiIX36xGhxC_c=.1eed99b9-2dc3-4403-b3c1-22b6ebed6079@github.com> <-QQZANSu7PVpZ3CUX7kbp00f_xkhjiy7la3Z3gXutKA=.a1c938e4-12e0-44b8-a2dd-7cc0c0c9f09f@github.com> <1HiTgoB9oAtaZk5ubbAw6fW5c6c4XAgEjIV0xAtOc5Q=.08d9d0f2-0d76-4fbe-bc2a-4c80037aad43@github.com> Message-ID: <4O0GWJlH3hqZoS74u-cG95rPRNZkzhTj-uINcKxHXNk=.aeca474c-bae3-4216-b968-92233a949c83@github.com> On Thu, 4 Jan 2024 17:16:20 GMT, Aleksey Shipilev wrote: >> Xin Liu has updated the pull request incrementally with one additional commit since the last revision: >> >> Use print_cr for the log message. > > src/hotspot/share/opto/parse1.cpp line 414: > >> 412: if (PrintCompilation || PrintOpto) { >> 413: // Make sure I have an inline tree, so I can print messages about it. >> 414: InlineTree::find_subtree_from_root(C->ilt(), caller, parse_method); > > Reading this again, you sure that we don't need `caller->caller()` on `is_osr_parse()` path? first of all, is_osr_parse() was false at line 415 because _entry_bci was assigned to InvocationEntryBci right before. That's why I use *caller* directly. Even we consider to build InlineTree for OSR, I don't think caller->caller() is correct. I explain this in item 2 here. https://github.com/openjdk/jdk/pull/16669#issuecomment-1820258714 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16669#discussion_r1442207206 From shade at openjdk.org Thu Jan 4 20:19:26 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 4 Jan 2024 20:19:26 GMT Subject: RFR: 8320128: Clean up Parse constructor for OSR [v4] In-Reply-To: <-QQZANSu7PVpZ3CUX7kbp00f_xkhjiy7la3Z3gXutKA=.a1c938e4-12e0-44b8-a2dd-7cc0c0c9f09f@github.com> References: <0mxhhLR9DDdpiBhbXWPxCmMzo_OI9DYiIX36xGhxC_c=.1eed99b9-2dc3-4403-b3c1-22b6ebed6079@github.com> <-QQZANSu7PVpZ3CUX7kbp00f_xkhjiy7la3Z3gXutKA=.a1c938e4-12e0-44b8-a2dd-7cc0c0c9f09f@github.com> Message-ID: On Thu, 4 Jan 2024 17:06:38 GMT, Xin Liu wrote: >> There's a special case for the constructor of Parse. If current compilation is OSR and it is handling the top-level method(depth() == 1), then >> >> 1. _tf = C->tf(); >> 2. _entry_bci = C->entry_bci(); >> 3. _flow = method()->get_osr_flow_analysis(_entry_bci); >> >> We don't need to assign those member data twice. We can also factor out _flow->failing() for the special case and normal cases. >> >> It's worth mentioning that we can't save ciTypeFlow computation because >> get_osr_flow_analysis(_entry_bci) actually needs get_flow_analysis(method()). > > Xin Liu has updated the pull request incrementally with one additional commit since the last revision: > > Use print_cr for the log message. What testing was done here? I suggest at least `tier{1,2,3}` to capture surprises. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16669#issuecomment-1877705645 From shade at openjdk.org Thu Jan 4 20:19:28 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 4 Jan 2024 20:19:28 GMT Subject: RFR: 8320128: Clean up Parse constructor for OSR [v4] In-Reply-To: <4O0GWJlH3hqZoS74u-cG95rPRNZkzhTj-uINcKxHXNk=.aeca474c-bae3-4216-b968-92233a949c83@github.com> References: <0mxhhLR9DDdpiBhbXWPxCmMzo_OI9DYiIX36xGhxC_c=.1eed99b9-2dc3-4403-b3c1-22b6ebed6079@github.com> <-QQZANSu7PVpZ3CUX7kbp00f_xkhjiy7la3Z3gXutKA=.a1c938e4-12e0-44b8-a2dd-7cc0c0c9f09f@github.com> <1HiTgoB9oAtaZk5ubbAw6fW5c6c4XAgEjIV0xAtOc5Q=.08d9d0f2-0d76-4fbe-bc2a-4c80037aad43@github.com> <4O0GWJlH3hqZoS74u-cG95rPRNZkzhTj-uINcKxHXNk=.aeca474c-bae3-4216-b968-92233a949c83@github.com> Message-ID: On Thu, 4 Jan 2024 20:10:56 GMT, Xin Liu wrote: >> src/hotspot/share/opto/parse1.cpp line 414: >> >>> 412: if (PrintCompilation || PrintOpto) { >>> 413: // Make sure I have an inline tree, so I can print messages about it. >>> 414: InlineTree::find_subtree_from_root(C->ilt(), caller, parse_method); >> >> Reading this again, you sure that we don't need `caller->caller()` on `is_osr_parse()` path? > > first of all, is_osr_parse() was false at line 415 because _entry_bci was assigned to InvocationEntryBci right before. That's why I use *caller* directly. > > Even we consider to build InlineTree for OSR, I don't think caller->caller() is correct. > I explain this in item 2 here. > https://github.com/openjdk/jdk/pull/16669#issuecomment-1820258714 Ah OK, trippy... All good then. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16669#discussion_r1442211363 From dlong at openjdk.org Fri Jan 5 01:01:26 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 5 Jan 2024 01:01:26 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v4] In-Reply-To: References: Message-ID: On Thu, 4 Jan 2024 08:12:22 GMT, Roland Westrelin wrote: >> This change implements C2 optimizations for calls to >> ScopedValue.get(). Indeed, in: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> `v2` can be replaced by `v1` and the second call to `get()` can be >> optimized out. That's true whatever is between the 2 calls unless a >> new mapping for `scopedValue` is created in between (when that happens >> no optimizations is performed for the method being compiled). Hoisting >> a `get()` call out of loop for a loop invariant `scopedValue` should >> also be legal in most cases. >> >> `ScopedValue.get()` is implemented in java code as a 2 step process. A >> cache is attached to the current thread object. If the `ScopedValue` >> object is in the cache then the result from `get()` is read from >> there. Otherwise a slow call is performed that also inserts the >> mapping in the cache. The cache itself is lazily allocated. One >> `ScopedValue` can be hashed to 2 different indexes in the cache. On a >> cache probe, both indexes are checked. As a consequence, the process >> of probing the cache is a multi step process (check if the cache is >> present, check first index, check second index if first index >> failed). If the cache is populated early on, then when the method that >> calls `ScopedValue.get()` is compiled, profile reports the slow path >> as never taken and only the read from the cache is compiled. >> >> To perform the optimizations, I added 3 new node types to C2: >> >> - the pair >> ScopedValueGetHitsInCacheNode/ScopedValueGetLoadFromCacheNode for >> the cache probe >> >> - a cfg node ScopedValueGetResultNode to help locate the result of the >> `get()` call in the IR graph. >> >> In pseudo code, once the nodes are inserted, the code of a `get()` is: >> >> >> hits_in_the_cache = ScopedValueGetHitsInCache(scopedValue) >> if (hits_in_the_cache) { >> res = ScopedValueGetLoadFromCache(hits_in_the_cache); >> } else { >> res = ..; //slow call possibly inlined. Subgraph can be arbitray complex >> } >> res = ScopedValueGetResult(res) >> >> >> In the snippet: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> Replacing `v2` by `v1` is then done by starting from the >> `ScopedValueGetResult` node for the second `get()` and looking for a >> dominating `ScopedValueGetResult` for the same `ScopedValue` >> object. When one is found, it is used as a replacement. Eliminating >> the second `get()` call is achieved by making >> `ScopedValueGetHitsInCache` always successful if there's a dominating >> `Scoped... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > merge fix I'm not a C2 expert, so my high-level comments might not all make sense, but here goes. - My first reaction was why does this need to be so complicated? Can't we treat the slow path and cache implementation as a black box, and just common up the high-level get()? In my mind the ScopedValue get should be similar to a read from a "condy" dynamic constant. - The reason for breaking up the operations into individual nodes seems to be because of the profiling information. So I'm wondering how much this helps, given the added complexity. - I would expect multiple get() calls in the same method or in loops to be rare and/or programmer errors. The real advantage seems to be when the binding and the get() can be connected from caller to callee through inlining. - Are we able to optimize a get() on a constant/final ScopedValue into a simple array load at a constant offset? - Needing to do things like treat ScopedValueGetHitsInCache as always successful give be a bad feeling for some reason, and seem unnecessary if we did more at a higher (macro?) level rather than eagerly expanding the high-level operation into individual nodes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-1877968950 From jbhateja at openjdk.org Fri Jan 5 07:08:35 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 5 Jan 2024 07:08:35 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v3] In-Reply-To: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Message-ID: > Hi, > > Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only targets. > Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 instruction set. > These are very frequently used APIs in columnar database filter operation. > > Implementation uses a lookup table to record permute indices. Table index is computed using > mask argument of compress/expand operation. > > Following are the performance number of JMH micro included with the patch. > > > System : Intel(R) Xeon(R) Platinum 8480+ (Sapphire Rapids) > > Baseline: > Benchmark (size) Mode Cnt Score Error Units > ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 142.767 ops/ms > ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 71.436 ops/ms > ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 35.992 ops/ms > ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 182.151 ops/ms > ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 91.096 ops/ms > ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 44.757 ops/ms > ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 184.099 ops/ms > ColumnFilterBenchmark.filterIntColumn 2047 thrpt 2 91.981 ops/ms > ColumnFilterBenchmark.filterIntColumn 4096 thrpt 2 45.170 ops/ms > ColumnFilterBenchmark.filterLongColumn 1024 thrpt 2 148.017 ops/ms > ColumnFilterBenchmark.filterLongColumn 2047 thrpt 2 73.516 ops/ms > ColumnFilterBenchmark.filterLongColumn 4096 thrpt 2 36.844 ops/ms > > Withopt: > Benchmark (size) Mode Cnt Score Error Units > ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 2051.707 ops/ms > ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 914.072 ops/ms > ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 489.898 ops/ms > ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 5324.195 ops/ms > ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 2587.229 ops/ms > ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 1278.665 ops/ms > ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 4149.384 ops/ms > ColumnFilterBenchmark.filterIntColumn 2047 thrpt 2 1791.170 ops/ms > ColumnFilterBenchmark.filterIntColumn 4096... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolutions. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17261/files - new: https://git.openjdk.org/jdk/pull/17261/files/6bd9b0ad..ea0aa0b4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17261&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17261&range=01-02 Stats: 49 lines in 4 files changed: 44 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/17261.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17261/head:pull/17261 PR: https://git.openjdk.org/jdk/pull/17261 From jbhateja at openjdk.org Fri Jan 5 07:08:37 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 5 Jan 2024 07:08:37 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2] In-Reply-To: References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Message-ID: On Thu, 4 Jan 2024 13:41:40 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Updating copyright year of modified files. > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 5307: > >> 5305: assert(bt == T_LONG || bt == T_DOUBLE, ""); >> 5306: vmovmskpd(rtmp, mask, vec_enc); >> 5307: shlq(rtmp, 5); > > Might this need to be 6? If I understand right, then you want to have a 64bit stride, hence 2^6, right? > If that is correct, then this did not show in your tests, and you need a regression test anyway. This computes the byte offset from start of the table, both integer and long permute table have same row sizes, 8 int elements vs 4 long elements. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1442555037 From jbhateja at openjdk.org Fri Jan 5 07:08:39 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 5 Jan 2024 07:08:39 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2] In-Reply-To: References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Message-ID: <2ykpNJfFJYotuyI59zfh966TIQYUF2Id6NR56zpq_Vw=.6b3b5411-e086-45df-8666-2496ac013548@github.com> On Thu, 4 Jan 2024 13:30:24 GMT, Emanuel Peter wrote: >> test/micro/org/openjdk/bench/jdk/incubator/vector/ColumnFilterBenchmark.java line 94: >> >>> 92: IntVector vec = IntVector.fromArray(ispecies, intinCol, i); >>> 93: VectorMask pred = vec.compare(VectorOperators.GT, ipivot); >>> 94: vec.compress(pred).intoArray(intoutCol, j); >> >> Could there be equivalent `expand` tests? > > And what about some result verification? Or is there another test that does that? We do have extensive functional tests for compress/expand APIs in [test/jdk/jdk/incubator/vector](https://github.com/openjdk/jdk/tree/master/test/jdk/jdk/incubator/vector) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1442554968 From jbhateja at openjdk.org Fri Jan 5 07:08:40 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 5 Jan 2024 07:08:40 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2] In-Reply-To: <2ykpNJfFJYotuyI59zfh966TIQYUF2Id6NR56zpq_Vw=.6b3b5411-e086-45df-8666-2496ac013548@github.com> References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> <2ykpNJfFJYotuyI59zfh966TIQYUF2Id6NR56zpq_Vw=.6b3b5411-e086-45df-8666-2496ac013548@github.com> Message-ID: On Fri, 5 Jan 2024 07:03:26 GMT, Jatin Bhateja wrote: >> And what about some result verification? Or is there another test that does that? > > We do have extensive functional tests for compress/expand APIs in [test/jdk/jdk/incubator/vector](https://github.com/openjdk/jdk/tree/master/test/jdk/jdk/incubator/vector) > Could there be equivalent `expand` tests? Here are the performance number for existing [VectorAPI JMH micros.](https://github.com/openjdk/panama-vector/tree/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation) ![image](https://github.com/openjdk/jdk/assets/59989778/4b260814-3d3c-4e9b-b81a-61492ea48cce) ![image](https://github.com/openjdk/jdk/assets/59989778/50048281-ad50-44f6-a875-308e02537be2) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1442556253 From jbhateja at openjdk.org Fri Jan 5 07:11:23 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 5 Jan 2024 07:11:23 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2] In-Reply-To: References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Message-ID: <_guczAND7qope6gMYcZVaolzJE0FnlRfhm9RsgFS5eY=.15982e8f-229f-4d8d-a184-06a62288775a@github.com> On Thu, 4 Jan 2024 13:33:08 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Updating copyright year of modified files. > > test/micro/org/openjdk/bench/jdk/incubator/vector/ColumnFilterBenchmark.java line 76: > >> 74: longinCol = new long[size]; >> 75: longoutCol = new long[size]; >> 76: lpivot = size / 2; > > I'd be interested to see what happens if you move up or down the "density" of elements that you accept. Would the simple branch prediction be faster if the density is low enough, i.e. we almost take no element. > > Though maybe that is not compiler problem but a user-problem? Included fuzzy filter micro with varying mask density. ![image](https://github.com/openjdk/jdk/assets/59989778/a6af21cc-36c0-4503-aeb3-e66b862da2e1) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1442557565 From thartmann at openjdk.org Fri Jan 5 07:16:23 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 5 Jan 2024 07:16:23 GMT Subject: RFR: 8310844: [AArch64] C1 compilation fails because monitor offset in OSR buffer is too large for immediate [v2] In-Reply-To: References: <6vNa0ZYwKL0HXjHjWTyIe6kB5hza1aaXlJbxM4qh8-w=.edc55dc9-08ff-4861-9038-d4b150bcf4f7@github.com> <2Hnw1VDu1Ml2XB-VfUV8l-t8dtrMzPNh3RyNPfV-tgA=.fc400df2-9529-4bdd-8e0c-fb49ca4f0b48@github.com> Message-ID: On Thu, 4 Jan 2024 16:05:49 GMT, Andrew Haley wrote: > Memo to myself: there are no trivial performance fixes I'll copy that memo, it did look harmless at the time. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17266#issuecomment-1878231823 From thartmann at openjdk.org Fri Jan 5 07:16:24 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 5 Jan 2024 07:16:24 GMT Subject: RFR: 8310844: [AArch64] C1 compilation fails because monitor offset in OSR buffer is too large for immediate [v2] In-Reply-To: References: <6vNa0ZYwKL0HXjHjWTyIe6kB5hza1aaXlJbxM4qh8-w=.edc55dc9-08ff-4861-9038-d4b150bcf4f7@github.com> Message-ID: On Thu, 4 Jan 2024 16:19:22 GMT, Andrew Dinn wrote: >> I'm not sure why the recommended adjustment is needed. The macro assembler does fuse pairs of adjacent ldr instructions into an ldp but only when the sizes match and the offsets fit into the requisite number of bits. >> >> So, if the two ldr instrctions ar egenerated next to each other the macroasembler should only convert to ldp *where appropriate*. Am I missing something here? > > Doh, sorry - I misread Andrew's proposed code! Ignore the noise. Thanks for looking at this @adinn. Right, the macro assembler merge magic is nice, I didn't know about it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17266#discussion_r1442560775 From epeter at openjdk.org Fri Jan 5 08:25:25 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Jan 2024 08:25:25 GMT Subject: RFR: 8322735: C2: minor improvements of bubble sort used in SuperWord::packset_sort In-Reply-To: References: Message-ID: On Mon, 25 Dec 2023 15:34:55 GMT, Denghui Dong wrote: > A minor improvement could be made for bubble sort in SuperWord::packset_sort to reduce the comparison count in bad cases. > > See https://en.wikipedia.org/wiki/Bubble_sort Ok. This change is fine with me. Thanks for taking the time to look into this :) I was just curious what was your motivation. I may completely redo this code once I remove the alignment constraints (here used for sorting), but that will have to be decided in a few months. Please do the renaming, and then I can run testing and give you my approval. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17190#issuecomment-1878293104 From epeter at openjdk.org Fri Jan 5 08:28:37 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Jan 2024 08:28:37 GMT Subject: RFR: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs [v58] In-Reply-To: <1oTqtU5lm0As9tKfnWuGNh2sHXfQimLdMCzV2g1D2ho=.6a5a20f7-3ab3-43fb-b640-fa043131fef8@github.com> References: <9zUqzeJrnPyjjEC0_F9Z5OWzHLqnTN_5c1bzOiK-LqA=.050c0992-7f7f-48f3-b2da-ec81aafa41a6@github.com> <1oTqtU5lm0As9tKfnWuGNh2sHXfQimLdMCzV2g1D2ho=.6a5a20f7-3ab3-43fb-b640-fa043131fef8@github.com> Message-ID: On Thu, 4 Jan 2024 16:36:19 GMT, Vladimir Kozlov wrote: >> If you really want, then I can not touch `VectorizeDebugOption` at all, i.e. not activate `is_trace_align_vector` with that flag, but instead simply use `TraceSuperWord` (that might be a little verbose though). >> >> I already have the CSR for [JDK-8315361](https://bugs.openjdk.org/browse/JDK-8315361), so that I can remove `VectorizeDebugOption`. This has 2 effects: >> 1. remove the product effect of `VectorizeDebugOption`, which is the same effect as enabling `VectorizeOption`. >> 2. introduce a more general auto-vectorization tracing flag that allows more fine-grained control for debug printing. >> >> My idea here was to simply add the alignment tracing to `VectorizeDebugOption`. But currently one cannot enable that tracing without having the side-effects that also `VectorizeOption` has. Hence, I already now remove that product-side effect. >> >> @vnkozlov what do you think? > > I missed that in your long description ;^) > I agree with your suggestion. The option was indeed strange: mixing prints with affects on code. Ok, great, I will leave it then ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1442607787 From epeter at openjdk.org Fri Jan 5 08:38:36 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Jan 2024 08:38:36 GMT Subject: RFR: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs [v58] In-Reply-To: <8FqVnqkZB8_FsKx6kJfkJN8Oi90W6hAeBNl8uQy_QwE=.7d6061b5-4a3d-4d3f-aaa4-1c30151d066d@github.com> References: <9qbGMjWjQfwA5r6XimWz_GZUdQEgo6GEzY-uYwIoP4s=.9c68a4f4-a82b-4760-a428-3b5025eda61b@github.com> <8FqVnqkZB8_FsKx6kJfkJN8Oi90W6hAeBNl8uQy_QwE=.7d6061b5-4a3d-4d3f-aaa4-1c30151d066d@github.com> Message-ID: On Thu, 4 Jan 2024 16:51:19 GMT, Vladimir Kozlov wrote: >> Can you show assembler code for simple load and store instructions (move data from one array to another)? >> My concern is that LoadV and StoreV are defined only with `memory` input: >> >> instruct loadV(vec dst, memory mem) %{ >> match(Set dst (LoadVector mem)); >> >> I would assume it will be embedded memory only. But C2 may be smart enough to generate `lea` if it sees not AddP node. > > Also why your assembler example have tested alignment twice for the same address? May be because the same array's element for load and store?: > > 0x00007f83c8bb2f6d: mov %r10,%r8 > 0x00007f83c8bb2f70: test $0x7,%r8b > 0x00007f83c8bb2f74: je 0x00007f83c8bb2f8a > ... > 0x00007f83c8bb2f8a: test $0x7,%r10b > 0x00007f83c8bb2f8e: je 0x00007f83c8bb2fa4 > > No need to optimize I think since it is only for debugging. @vnkozlov > Can you show assembler code for simple load and store instructions (move data from one array to another)? Here the example with simple load -> store with two different arrays: public class Test { static int RANGE = 1024*64; public static void main(String[] strArr) { int a[] = new int[RANGE]; int b[] = new int[RANGE]; test0(a, b); } static void test0(int[] a, int[] b) { for (int i = 0; i < RANGE; i++) { a[i] = b[i]; } } } With `-XX:+VerifyAlignVector`: `./java -XX:CompileCommand=compileonly,Test::test* -XX:+TraceSuperWord -Xcomp -XX:+PrintIdeal -XX:+AlignVector -XX:+VerifyAlignVector -XX:CompileCommand=print,Test::test* Test.java` ;; B32: # out( B32 B33 ) <- in( B31 B32 ) Loop( B32-B32 inner post of N1028) Freq: 4.49976 0x00007fbef8bb31ec: movslq %ebx,%r10 0x00007fbef8bb31ef: shl $0x2,%r10 ;*iaload {reexecute=0 rethrow=0 return_oop=0} ; - Test::test0 at 13 (line 12) 0x00007fbef8bb31f3: lea 0x10(%r13,%r10,1),%r8 0x00007fbef8bb31f8: lea 0x10(%r11,%r10,1),%r10 0x00007fbef8bb31fd: test $0x7,%r8b 0x00007fbef8bb3201: je 0x00007fbef8bb3217 0x00007fbef8bb3203: movabs $0x7fbf08c15fc8,%rdi ; {external_word} 0x00007fbef8bb320d: and $0xfffffffffffffff0,%rsp 0x00007fbef8bb3211: callq 0x00007fbf085d3162 ; {runtime_call MacroAssembler::debug64(char*, long, long*)} 0x00007fbef8bb3216: hlt 0x00007fbef8bb3217: vmovdqu32 (%r8),%zmm0 0x00007fbef8bb321d: test $0x7,%r10b 0x00007fbef8bb3221: je 0x00007fbef8bb3237 0x00007fbef8bb3223: movabs $0x7fbf08c15fc8,%rdi ; {external_word} 0x00007fbef8bb322d: and $0xfffffffffffffff0,%rsp 0x00007fbef8bb3231: callq 0x00007fbf085d3162 ; {runtime_call MacroAssembler::debug64(char*, long, long*)} 0x00007fbef8bb3236: hlt 0x00007fbef8bb3237: vmovdqu32 %zmm0,(%r10) ;*iastore {reexecute=0 rethrow=0 return_oop=0} ; - Test::test0 at 14 (line 12) 0x00007fbef8bb323d: add $0x10,%ebx ;*iinc {reexecute=0 rethrow=0 return_oop=0} ; - Test::test0 at 15 (line 11) 0x00007fbef8bb3240: cmp %r9d,%ebx 0x00007fbef8bb3243: jl 0x00007fbef8bb31ec ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0} ; - Test::test0 at 6 (line 11) With `-XX:-VerifyAlignVector`: `./java -XX:CompileCommand=compileonly,Test::test* -XX:+TraceSuperWord -Xcomp -XX:+PrintIdeal -XX:+AlignVector -XX:-VerifyAlignVector -XX:CompileCommand=print,Test::test* Test.java` ;; B30: # out( B30 B31 ) <- in( B29 B30 ) Loop( B30-B30 inner post of N1028) Freq: 4.49976 0x00007f90e4bb2ab8: vmovdqu32 0x10(%rbx,%r13,4),%zmm0 0x00007f90e4bb2ac3: vmovdqu32 %zmm0,0x10(%rcx,%r13,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0} ; - Test::test0 at 14 (line 12) 0x00007f90e4bb2ace: add $0x10,%r13d ;*iinc {reexecute=0 rethrow=0 return_oop=0} ; - Test::test0 at 15 (line 11) 0x00007f90e4bb2ad2: cmp %r11d,%r13d 0x00007f90e4bb2ad5: jl 0x00007f90e4bb2ab8 ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0} ; - Test::test0 at 6 (line 11) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1442616543 From epeter at openjdk.org Fri Jan 5 08:51:37 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Jan 2024 08:51:37 GMT Subject: RFR: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs [v58] In-Reply-To: References: <9qbGMjWjQfwA5r6XimWz_GZUdQEgo6GEzY-uYwIoP4s=.9c68a4f4-a82b-4760-a428-3b5025eda61b@github.com> <8FqVnqkZB8_FsKx6kJfkJN8Oi90W6hAeBNl8uQy_QwE=.7d6061b5-4a3d-4d3f-aaa4-1c30151d066d@github.com> Message-ID: On Fri, 5 Jan 2024 08:35:46 GMT, Emanuel Peter wrote: >> Also why your assembler example have tested alignment twice for the same address? May be because the same array's element for load and store?: >> >> 0x00007f83c8bb2f6d: mov %r10,%r8 >> 0x00007f83c8bb2f70: test $0x7,%r8b >> 0x00007f83c8bb2f74: je 0x00007f83c8bb2f8a >> ... >> 0x00007f83c8bb2f8a: test $0x7,%r10b >> 0x00007f83c8bb2f8e: je 0x00007f83c8bb2fa4 >> >> No need to optimize I think since it is only for debugging. > > @vnkozlov >> Can you show assembler code for simple load and store instructions (move data from one array to another)? > > Here the example with simple load -> store with two different arrays: > > public class Test { > static int RANGE = 1024*64; > > public static void main(String[] strArr) { > int a[] = new int[RANGE]; > int b[] = new int[RANGE]; > test0(a, b); > } > > static void test0(int[] a, int[] b) { > for (int i = 0; i < RANGE; i++) { > a[i] = b[i]; > } > } > } > > > With `-XX:+VerifyAlignVector`: > `./java -XX:CompileCommand=compileonly,Test::test* -XX:+TraceSuperWord -Xcomp -XX:+PrintIdeal -XX:+AlignVector -XX:+VerifyAlignVector -XX:CompileCommand=print,Test::test* Test.java` > > > ;; B32: # out( B32 B33 ) <- in( B31 B32 ) Loop( B32-B32 inner post of N1028) Freq: 4.49976 > 0x00007fbef8bb31ec: movslq %ebx,%r10 > 0x00007fbef8bb31ef: shl $0x2,%r10 ;*iaload {reexecute=0 rethrow=0 return_oop=0} > ; - Test::test0 at 13 (line 12) > 0x00007fbef8bb31f3: lea 0x10(%r13,%r10,1),%r8 > 0x00007fbef8bb31f8: lea 0x10(%r11,%r10,1),%r10 > 0x00007fbef8bb31fd: test $0x7,%r8b > 0x00007fbef8bb3201: je 0x00007fbef8bb3217 > 0x00007fbef8bb3203: movabs $0x7fbf08c15fc8,%rdi ; {external_word} > 0x00007fbef8bb320d: and $0xfffffffffffffff0,%rsp > 0x00007fbef8bb3211: callq 0x00007fbf085d3162 ; {runtime_call MacroAssembler::debug64(char*, long, long*)} > 0x00007fbef8bb3216: hlt > 0x00007fbef8bb3217: vmovdqu32 (%r8),%zmm0 > 0x00007fbef8bb321d: test $0x7,%r10b > 0x00007fbef8bb3221: je 0x00007fbef8bb3237 > 0x00007fbef8bb3223: movabs $0x7fbf08c15fc8,%rdi ; {external_word} > 0x00007fbef8bb322d: and $0xfffffffffffffff0,%rsp > 0x00007fbef8bb3231: callq 0x00007fbf085d3162 ; {runtime_call MacroAssembler::debug64(char*, long, long*)} > 0x00007fbef8bb3236: hlt > 0x00007fbef8bb3237: vmovdqu32 %zmm0,(%r10) ;*iastore {reexecute=0 rethrow=0 return_oop=0} > ; - Test::test0 at 14 (line 12) > 0x00007fbef8bb323d: add $0x10,%ebx ;*iinc {reexecute=0 rethrow=0 return_oop=0} > ; - Test::test0 at 15 (line 11) > 0x00007fbef8bb3240: cmp %r9d,%ebx > 0x00007fbef8bb3243: jl 0x00007fbef8bb31ec ;*if_icmpge {reexecute=0 rethrow... > My concern is that LoadV and StoreV are defined only with memory input // Indirect Memory Operand operand indirect(any_RegP reg) %{ constraint(ALLOC_IN_RC(ptr_reg)); match(reg); format %{ "[$reg]" %} interface(MEMORY_INTER) %{ base($reg); index(0x4); scale(0x0); disp(0x0); %} %} opclass memory(indirect, indOffset8, indOffset32, indIndexOffset, indIndex, indIndexScale, indPosIndexScale, indIndexScaleOffset, indPosIndexOffset, indPosIndexScaleOffset, indCompressedOopOffset, indirectNarrow, indOffset8Narrow, indOffset32Narrow, indIndexOffsetNarrow, indIndexNarrow, indIndexScaleNarrow, indIndexScaleOffsetNarrow, indPosIndexOffsetNarrow, indPosIndexScaleOffsetNarrow); It seems that `memory` summarizes many different patterns. One of them is the `indirect` one, which simply loads the address from a register. In our case this address was computed by a `lea`, then used in the alignment verification, and then passed on as `memory` to load / store. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1442625027 From epeter at openjdk.org Fri Jan 5 08:51:38 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Jan 2024 08:51:38 GMT Subject: RFR: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs [v58] In-Reply-To: References: <9qbGMjWjQfwA5r6XimWz_GZUdQEgo6GEzY-uYwIoP4s=.9c68a4f4-a82b-4760-a428-3b5025eda61b@github.com> <8FqVnqkZB8_FsKx6kJfkJN8Oi90W6hAeBNl8uQy_QwE=.7d6061b5-4a3d-4d3f-aaa4-1c30151d066d@github.com> Message-ID: On Fri, 5 Jan 2024 08:47:09 GMT, Emanuel Peter wrote: >> @vnkozlov >>> Can you show assembler code for simple load and store instructions (move data from one array to another)? >> >> Here the example with simple load -> store with two different arrays: >> >> public class Test { >> static int RANGE = 1024*64; >> >> public static void main(String[] strArr) { >> int a[] = new int[RANGE]; >> int b[] = new int[RANGE]; >> test0(a, b); >> } >> >> static void test0(int[] a, int[] b) { >> for (int i = 0; i < RANGE; i++) { >> a[i] = b[i]; >> } >> } >> } >> >> >> With `-XX:+VerifyAlignVector`: >> `./java -XX:CompileCommand=compileonly,Test::test* -XX:+TraceSuperWord -Xcomp -XX:+PrintIdeal -XX:+AlignVector -XX:+VerifyAlignVector -XX:CompileCommand=print,Test::test* Test.java` >> >> >> ;; B32: # out( B32 B33 ) <- in( B31 B32 ) Loop( B32-B32 inner post of N1028) Freq: 4.49976 >> 0x00007fbef8bb31ec: movslq %ebx,%r10 >> 0x00007fbef8bb31ef: shl $0x2,%r10 ;*iaload {reexecute=0 rethrow=0 return_oop=0} >> ; - Test::test0 at 13 (line 12) >> 0x00007fbef8bb31f3: lea 0x10(%r13,%r10,1),%r8 >> 0x00007fbef8bb31f8: lea 0x10(%r11,%r10,1),%r10 >> 0x00007fbef8bb31fd: test $0x7,%r8b >> 0x00007fbef8bb3201: je 0x00007fbef8bb3217 >> 0x00007fbef8bb3203: movabs $0x7fbf08c15fc8,%rdi ; {external_word} >> 0x00007fbef8bb320d: and $0xfffffffffffffff0,%rsp >> 0x00007fbef8bb3211: callq 0x00007fbf085d3162 ; {runtime_call MacroAssembler::debug64(char*, long, long*)} >> 0x00007fbef8bb3216: hlt >> 0x00007fbef8bb3217: vmovdqu32 (%r8),%zmm0 >> 0x00007fbef8bb321d: test $0x7,%r10b >> 0x00007fbef8bb3221: je 0x00007fbef8bb3237 >> 0x00007fbef8bb3223: movabs $0x7fbf08c15fc8,%rdi ; {external_word} >> 0x00007fbef8bb322d: and $0xfffffffffffffff0,%rsp >> 0x00007fbef8bb3231: callq 0x00007fbf085d3162 ; {runtime_call MacroAssembler::debug64(char*, long, long*)} >> 0x00007fbef8bb3236: hlt >> 0x00007fbef8bb3237: vmovdqu32 %zmm0,(%r10) ;*iastore {reexecute=0 rethrow=0 return_oop=0} >> ; - Test::test0 at 14 (line 12) >> 0x00007fbef8bb323d: add $0x10,%ebx ;*iinc {reexecute=0 rethrow=0 return_oop=0} >> ; - Test::test0 at 15 (line 11) >> 0x00007fbef8bb3240: cmp %r9... > >> My concern is that LoadV and StoreV are defined only with memory input > > > // Indirect Memory Operand > operand indirect(any_RegP reg) > %{ > constraint(ALLOC_IN_RC(ptr_reg)); > match(reg); > > format %{ "[$reg]" %} > interface(MEMORY_INTER) %{ > base($reg); > index(0x4); > scale(0x0); > disp(0x0); > %} > %} > > > > opclass memory(indirect, indOffset8, indOffset32, indIndexOffset, indIndex, > indIndexScale, indPosIndexScale, indIndexScaleOffset, indPosIndexOffset, indPosIndexScaleOffset, > indCompressedOopOffset, > indirectNarrow, indOffset8Narrow, indOffset32Narrow, > indIndexOffsetNarrow, indIndexNarrow, indIndexScaleNarrow, > indIndexScaleOffsetNarrow, indPosIndexOffsetNarrow, indPosIndexScaleOffsetNarrow); > > It seems that `memory` summarizes many different patterns. One of them is the `indirect` one, which simply loads the address from a register. In our case this address was computed by a `lea`, then used in the alignment verification, and then passed on as `memory` to load / store. > Also why your assembler example have tested alignment twice for the same address? May be because the same array's element for load and store? Yes, exactly. I emit verification for every loadV / storeV. And since it is debug only, and only with the extra flag `-XX:+VerifyAlignVector` I thought optimizing is not necessary. And it seems you agree with that :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1442626289 From chagedorn at openjdk.org Fri Jan 5 08:52:28 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 5 Jan 2024 08:52:28 GMT Subject: RFR: 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 [v9] In-Reply-To: References: Message-ID: <0467mugqEs3f6AxCG9krY6cAXXuqRJKptnwEaLJ0gIA=.ce7c02be-1d51-470f-b309-380f1de83f30@github.com> On Wed, 3 Jan 2024 15:53:04 GMT, Roland Westrelin wrote: >> Range check smearing and range check predication make an array access >> dependent on 2 (or more in the case of RC smearing) conditions. As a >> consequence, if a range check can be eliminated because there's an >> identical dominating range check, the control dependent nodes that >> could float and become dependent on the dominating range check cannot >> be allowed to float because there's a risk that they would then bypass >> one of the checks that make the access legal. >> >> `IfNode::dominated_by()` and `PhaseIdealLoop::dominated_by()` have >> logic to prevent this: nodes that are control dependent on a range >> check or predicate are not allowed to float. This is however not >> sufficient as demonstrated by the test cases. >> >> In `TestArrayAccessAboveRCAfterSmearingOrPredication.testRangeCheckSmearing()`: >> >> >> v += array[i]; >> if (flag2) { >> if (flag3) { >> field = 0x42; >> } >> } >> if (flagField == 1) { >> v += array[i]; >> } >> >> >> The range check for the second `array[i]` load is replaced by the >> dominating range check for the first `array[i]` but because the second >> `array[i]` load could really be dependent on multiple range checks (in >> case smearing happened which is not the case here), c2 doesn't allow >> the second `array[i]` to float when the second range check is >> removed. The second `array[i]` is then control dependent on: >> >> >> if (flagField == 1) { >> >> >> which is next found to be dominated by the same test: >> >> >> if (flag == 1) { >> >> >> and is removed. However nothing in `dominated_by()` treats node >> dependent on tests that are not range check or predicates >> specially. So the second `array[i]` is allowed to float and become >> dependent on: >> >> >> if (flag == 1) { >> >> >> which is above the range check for that access. The test method in its >> last invocation is passed an index for the array access that's widely >> out of range. The array load happens before the range check and >> crashes the VM. `testLoopPredication()` is a similar test where array >> loads become dependent on predicates and end up above range checks. >> >> `TestArrayAccessCastIIAboveRC.java` is the test case from the bug >> where for similar reasons a range check `CastII` ends up above its >> range check, becomes top because its input becomes some integer that >> conflicts with its... > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: > > - Merge branch 'master' into JDK-8319793 > - review > - Revert "Update src/hotspot/share/opto/castnode.hpp" > > This reverts commit 356c91cca911ed486f9f87f3eff53ce21e1e3ec9. > - Revert "Update src/hotspot/share/opto/memnode.hpp" > > This reverts commit bdb731ea562f314f44d327f7243ef5cf9ad40b2e. > - review > - Update src/hotspot/share/opto/memnode.hpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/castnode.hpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/ifnode.cpp > > Co-authored-by: Christian Hagedorn > - Merge branch 'master' into JDK-8319793 > - ... and 1 more: https://git.openjdk.org/jdk/compare/15519285...dbe3c4c1 src/hotspot/share/opto/loopopts.cpp line 345: > 343: > 344: if (dp == nullptr) > 345: return; Since we bail out above if `iff->outcnt() != 2` (can it even be that we have an `If` at this point which does not have 2 out projections?) this bailout seems redundant. Looks like it was only added due to a parfait report with https://github.com/openjdk/jdk/commit/25c4a7fccdbdaa9da0a7aa5e04e80966138fe42c. Maybe we can remove that as well and change `proj_out_or_null()` back to `proj_out()` (not sure though if parfait will then report this again). But could also be done separately. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1442626740 From ddong at openjdk.org Fri Jan 5 08:57:33 2024 From: ddong at openjdk.org (Denghui Dong) Date: Fri, 5 Jan 2024 08:57:33 GMT Subject: RFR: 8322735: C2: minor improvements of bubble sort used in SuperWord::packset_sort [v2] In-Reply-To: References: Message-ID: <1W_HcoiwEnXLvtqprc7L8mghr0BRH3a_ITn6Cerzb_c=.f6c90e24-7d41-418d-bc50-41bff396663b@github.com> > A minor improvement could be made for bubble sort in SuperWord::packset_sort to reduce the comparison count in bad cases. > > See https://en.wikipedia.org/wiki/Bubble_sort Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: update ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17190/files - new: https://git.openjdk.org/jdk/pull/17190/files/7d64bd8d..ba53ed56 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17190&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17190&range=00-01 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/17190.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17190/head:pull/17190 PR: https://git.openjdk.org/jdk/pull/17190 From ddong at openjdk.org Fri Jan 5 08:57:35 2024 From: ddong at openjdk.org (Denghui Dong) Date: Fri, 5 Jan 2024 08:57:35 GMT Subject: RFR: 8322735: C2: minor improvements of bubble sort used in SuperWord::packset_sort In-Reply-To: References: Message-ID: On Fri, 5 Jan 2024 08:22:23 GMT, Emanuel Peter wrote: > Please do the renaming, and then I can run testing and give you my approval. Updated. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17190#issuecomment-1878329186 From davleopo at openjdk.org Fri Jan 5 09:02:23 2024 From: davleopo at openjdk.org (David Leopoldseder) Date: Fri, 5 Jan 2024 09:02:23 GMT Subject: RFR: 8322636: [JVMCI] HotSpotSpeculationLog can be inconsistent across a single compile [v2] In-Reply-To: References: Message-ID: On Wed, 3 Jan 2024 20:39:21 GMT, Tom Rodriguez wrote: >> David Leopoldseder has updated the pull request incrementally with one additional commit since the last revision: >> >> 8322636: [JVMCI] HotSpotSpeculationLog add javadoc to maySpeculate > > So I looked more closely the HotSpot and substrate implementations and I'm not sure we can currently align the implementation and the javadoc. In the HotSpot world, HotSpotSpeculationLog is a compiler local object that reads data from the real speculation data that's kept in the MDO. This means that it has full control over when collectFailedSpeculations is called. SubstrateSpeculationLog is the actual log so if two threads are operating on the same log then one of them could see the effects of a call to collectFailedSpeculations by the other thread. Maybe in practice 2 threads never do this because it would mean they are compiling the same root method but it doesn't seem guaranteed. installCode on substrate also doesn't perform the speculation log check that HotSpot does. So maybe we punt on javadoc updates for now. @tkrodriguez please sponsor ------------- PR Comment: https://git.openjdk.org/jdk/pull/17183#issuecomment-1878335348 From roland at openjdk.org Fri Jan 5 09:17:27 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 5 Jan 2024 09:17:27 GMT Subject: RFR: 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 [v9] In-Reply-To: References: Message-ID: <6tg6hV9e4ZDXHm5x22pLlKgs1hC6wLyIU6Jr14oJafY=.b20d08b1-1734-4136-a447-4c36aa92fb68@github.com> On Thu, 4 Jan 2024 15:32:21 GMT, Emanuel Peter wrote: > Generally, I'm not really happy with the name of `UnknownControl`. Sounds like the control is unknown. In what sense is it unknown, after all we have a control and want the Load to be pinned to it...? `UnknownControl` was not added by this change. > Maybe then we could rename `pin_for_array_access` -> `make_pinned`. But `make_pinned` seems to imply that it operates on any node type when it only does something for a subset of nodes (those used for array accesses). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1442649362 From thartmann at openjdk.org Fri Jan 5 09:34:30 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 5 Jan 2024 09:34:30 GMT Subject: RFR: 8323012: C2 fails with fatal error: no reachable node should have no use Message-ID: Incorrect refactoring from [JDK-8322490](https://bugs.openjdk.org/browse/JDK-8322490), probably a copy-paste error, leads to a dead `CastPP` and might have other not yet observed effects. See JBS for details. I wasn't able to extract a regression test for this but it reliably reproduced with replay compilation. Thanks, Tobias ------------- Commit messages: - 8323012: C2 fails with fatal error: no reachable node should have no use Changes: https://git.openjdk.org/jdk/pull/17276/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17276&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8323012 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17276.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17276/head:pull/17276 PR: https://git.openjdk.org/jdk/pull/17276 From roland at openjdk.org Fri Jan 5 09:48:27 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 5 Jan 2024 09:48:27 GMT Subject: RFR: 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 [v9] In-Reply-To: References: Message-ID: On Thu, 4 Jan 2024 16:03:16 GMT, Emanuel Peter wrote: > Actually, the problem is that we **would** have multiple dependency, but we only have one dependency input we can set, hence forgetting about the others. Pinning makes sure that there is no bypassing of dependencies, right? Right. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1442677055 From roland at openjdk.org Fri Jan 5 09:55:30 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 5 Jan 2024 09:55:30 GMT Subject: RFR: 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 [v9] In-Reply-To: References: Message-ID: On Thu, 4 Jan 2024 16:10:24 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/ifnode.cpp line 1958: >> >>> 1956: return nullptr; >>> 1957: } >>> 1958: >> >> Why is it ok to delay this to post-loop-opts? Does it not prevent moving some CFG from being eliminated out of loops? Would be nice to have a little justification comment. > > Ah. Does this mean that if there are multiple RangeCheck in a loop, where some could be smeared, these are not smeared, and then we have more RangeChecks to eliminate out of the loop? Maybe in the end this all comes down to the same anyway. Just wondering. > Why is it ok to delay this to post-loop-opts? Does it not prevent moving some CFG from being eliminated out of loops? Would be nice to have a little justification comment. Maybe. With this fix, range check smearing requires pinning nodes. So running it early also has a drawback: it can cause nodes that would otherwise float to be pinned. The way I see it, range check smearing is a local optimization for cases where range checks can't be eliminated some other way so running it late should not make a difference. If the range check is in a loop and predication removes it then running RC smearing early doesn't make a difference. If the range check is part of a range check sequence that can only be optimized by RC smearing then having a longer range check sequence for the duration of loop opts probably makes no difference. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1442683646 From roland at openjdk.org Fri Jan 5 10:00:37 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 5 Jan 2024 10:00:37 GMT Subject: RFR: 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 [v9] In-Reply-To: References: Message-ID: <6uOq3OJeUPIG2SMHYTKnIA-GHPTIQTobNmvCuKrFNUM=.3e37dadd-2982-423b-86bc-bed54366068a@github.com> On Thu, 4 Jan 2024 16:18:16 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: >> >> - Merge branch 'master' into JDK-8319793 >> - review >> - Revert "Update src/hotspot/share/opto/castnode.hpp" >> >> This reverts commit 356c91cca911ed486f9f87f3eff53ce21e1e3ec9. >> - Revert "Update src/hotspot/share/opto/memnode.hpp" >> >> This reverts commit bdb731ea562f314f44d327f7243ef5cf9ad40b2e. >> - review >> - Update src/hotspot/share/opto/memnode.hpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/castnode.hpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/castnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/ifnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Merge branch 'master' into JDK-8319793 >> - ... and 1 more: https://git.openjdk.org/jdk/compare/15519285...dbe3c4c1 > > src/hotspot/share/opto/loopopts.cpp line 361: > >> 359: return; // Let IGVN transformation change control dependence. >> 360: } >> 361: > > Why it ok to remove this bailout? It's: "IfNode::dominated_by() and PhaseIdealLoop::dominated_by() have logic to prevent this: nodes that are control dependent on a range check or predicate are not allowed to float." that I mentioned in the fix description. It's the way array access nodes are currently prevented from floating above the range checks they depend on. It's flawed, replaced by pinning of the array access nodes in the patch. So this logic is no longer useful. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1442687996 From roland at openjdk.org Fri Jan 5 10:03:32 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 5 Jan 2024 10:03:32 GMT Subject: RFR: 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 [v9] In-Reply-To: <0467mugqEs3f6AxCG9krY6cAXXuqRJKptnwEaLJ0gIA=.ce7c02be-1d51-470f-b309-380f1de83f30@github.com> References: <0467mugqEs3f6AxCG9krY6cAXXuqRJKptnwEaLJ0gIA=.ce7c02be-1d51-470f-b309-380f1de83f30@github.com> Message-ID: <01kR0e8YhFsKsSlClNnbE2A4IDAeJn1q2Xxs3gNxGcU=.0cad5e28-7f54-4f76-b386-56788c94e932@github.com> On Fri, 5 Jan 2024 08:49:12 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: >> >> - Merge branch 'master' into JDK-8319793 >> - review >> - Revert "Update src/hotspot/share/opto/castnode.hpp" >> >> This reverts commit 356c91cca911ed486f9f87f3eff53ce21e1e3ec9. >> - Revert "Update src/hotspot/share/opto/memnode.hpp" >> >> This reverts commit bdb731ea562f314f44d327f7243ef5cf9ad40b2e. >> - review >> - Update src/hotspot/share/opto/memnode.hpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/castnode.hpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/castnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/ifnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Merge branch 'master' into JDK-8319793 >> - ... and 1 more: https://git.openjdk.org/jdk/compare/15519285...dbe3c4c1 > > src/hotspot/share/opto/loopopts.cpp line 345: > >> 343: >> 344: if (dp == nullptr) >> 345: return; > > Since we bail out above if `iff->outcnt() != 2` (can it even be that we have an `If` at this point which does not have 2 out projections?) this bailout seems redundant. Looks like it was only added due to a parfait report with https://github.com/openjdk/jdk/commit/25c4a7fccdbdaa9da0a7aa5e04e80966138fe42c. Maybe we can remove that as well and change `proj_out_or_null()` back to `proj_out()` (not sure though if parfait will then report this again). But could also be done separately. Thanks for the details. Why would it have it been necessary before but no longer necessary now? What is it that has changed so parfait would not complain? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1442690872 From epeter at openjdk.org Fri Jan 5 10:05:22 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Jan 2024 10:05:22 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v3] In-Reply-To: References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Message-ID: On Fri, 5 Jan 2024 07:08:35 GMT, Jatin Bhateja wrote: >> Hi, >> >> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only targets. >> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 instruction set. >> These are very frequently used APIs in columnar database filter operation. >> >> Implementation uses a lookup table to record permute indices. Table index is computed using >> mask argument of compress/expand operation. >> >> Following are the performance number of JMH micro included with the patch. >> >> >> System : Intel(R) Xeon(R) Platinum 8480+ (Sapphire Rapids) >> >> Baseline: >> Benchmark (size) Mode Cnt Score Error Units >> ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 142.767 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 71.436 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 35.992 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 182.151 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 91.096 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 44.757 ops/ms >> ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 184.099 ops/ms >> ColumnFilterBenchmark.filterIntColumn 2047 thrpt 2 91.981 ops/ms >> ColumnFilterBenchmark.filterIntColumn 4096 thrpt 2 45.170 ops/ms >> ColumnFilterBenchmark.filterLongColumn 1024 thrpt 2 148.017 ops/ms >> ColumnFilterBenchmark.filterLongColumn 2047 thrpt 2 73.516 ops/ms >> ColumnFilterBenchmark.filterLongColumn 4096 thrpt 2 36.844 ops/ms >> >> Withopt: >> Benchmark (size) Mode Cnt Score Error Units >> ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 2051.707 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 914.072 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 489.898 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 5324.195 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 2587.229 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 1278.665 ops/ms >> ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 4149.384 ops/ms >> ColumnFilterBenchmark.filterIntColumn 2047 thrpt ... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolutions. Thanks for the updates! One more idea: Your AVX2 solution has a lot of cost for converting the mask to a permutation. Might it make sense to split this off into a separate vector-node, so that it can float out of a loop if the mask is invariant? src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 963: > 961: // or a -1 (default) value. > 962: for (int i = 0; i < 256; i++) { > 963: int tmp = i; why is `tmp` needed? Would it not be better to replace `i` with `mask` (i.e. the bit pattern that is then translated to a permutation)? src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 966: > 964: int ctr = 0; > 965: for (int j = 0; j < 8; j++) { > 966: if (tmp & (1 << j)) { Suggestion: if (mask & (1 << j)) { would be much more readable ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17261#pullrequestreview-1805616736 PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1442664755 PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1442668939 From epeter at openjdk.org Fri Jan 5 10:05:23 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Jan 2024 10:05:23 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2] In-Reply-To: References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Message-ID: On Fri, 5 Jan 2024 07:03:34 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 5307: >> >>> 5305: assert(bt == T_LONG || bt == T_DOUBLE, ""); >>> 5306: vmovmskpd(rtmp, mask, vec_enc); >>> 5307: shlq(rtmp, 5); >> >> Might this need to be 6? If I understand right, then you want to have a 64bit stride, hence 2^6, right? >> If that is correct, then this did not show in your tests, and you need a regression test anyway. > > This computes the byte offset from start of the table, both integer and long permute table have same row sizes, 8 int elements vs 4 long elements. Ah, I understand now. Maybe leave a comment for that? >> test/micro/org/openjdk/bench/jdk/incubator/vector/ColumnFilterBenchmark.java line 76: >> >>> 74: longinCol = new long[size]; >>> 75: longoutCol = new long[size]; >>> 76: lpivot = size / 2; >> >> I'd be interested to see what happens if you move up or down the "density" of elements that you accept. Would the simple branch prediction be faster if the density is low enough, i.e. we almost take no element. >> >> Though maybe that is not compiler problem but a user-problem? > > Included fuzzy filter micro with varying mask density. > ![image](https://github.com/openjdk/jdk/assets/59989778/a6af21cc-36c0-4503-aeb3-e66b862da2e1) You are using `VectorMask pred = VectorMask.fromLong(ispecies, maskctr++);`. That basically systematically iterates over all masks, which is nice for a correctness test. But that would use different density inside one test run, right? The average over the loop is still at `50%`, correct? I was thinking more a run where the percentage over the whole loop is lower than maybe `1%`. That would get us to a point where maybe the branch prediction of non-vectorized code might be faster, what do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1442670411 PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1442676633 From epeter at openjdk.org Fri Jan 5 10:05:24 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Jan 2024 10:05:24 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2] In-Reply-To: References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Message-ID: On Fri, 5 Jan 2024 09:37:55 GMT, Emanuel Peter wrote: >> This computes the byte offset from start of the table, both integer and long permute table have same row sizes, 8 int elements vs 4 long elements. > > Ah, I understand now. Maybe leave a comment for that? I would say something like this: Given a `mask`, we compute the index into the permutation table, and load the corresponding `permutation` (4 long elements). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1442688495 From epeter at openjdk.org Fri Jan 5 10:05:27 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Jan 2024 10:05:27 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2] In-Reply-To: References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Message-ID: On Thu, 4 Jan 2024 13:40:19 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Updating copyright year of modified files. > > src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 957: > >> 955: __ align(CodeEntryAlignment); >> 956: StubCodeMark mark(this, "StubRoutines", stub_name); >> 957: address start = __ pc(); > > Could you please add some comments here why you are filling the data like this? > Presumably, you are emitting 32 bits and 64 bits respectively, right? So the cells have different size, correct? Thanks for the comment addition! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1442665042 From epeter at openjdk.org Fri Jan 5 10:05:28 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Jan 2024 10:05:28 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2] In-Reply-To: References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> <2ykpNJfFJYotuyI59zfh966TIQYUF2Id6NR56zpq_Vw=.6b3b5411-e086-45df-8666-2496ac013548@github.com> Message-ID: <8arXva3XJTvJpbElEu8ubw6SF58TL2hVlAgoJFZ3_6s=.c6bd79f0-ecd1-4d26-8294-40f8e99bf59c@github.com> On Fri, 5 Jan 2024 07:05:51 GMT, Jatin Bhateja wrote: >> We do have extensive functional tests for compress/expand APIs in [test/jdk/jdk/incubator/vector](https://github.com/openjdk/jdk/tree/master/test/jdk/jdk/incubator/vector) > >> Could there be equivalent `expand` tests? > > Here are the performance number for existing [VectorAPI JMH micros.](https://github.com/openjdk/panama-vector/tree/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation) > > ![image](https://github.com/openjdk/jdk/assets/59989778/4b260814-3d3c-4e9b-b81a-61492ea48cce) > ![image](https://github.com/openjdk/jdk/assets/59989778/50048281-ad50-44f6-a875-308e02537be2) Ah, excellent. Thanks for the numbers! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1442673797 From epeter at openjdk.org Fri Jan 5 10:05:28 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Jan 2024 10:05:28 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2] In-Reply-To: References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Message-ID: On Fri, 5 Jan 2024 09:31:50 GMT, Emanuel Peter wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 957: >> >>> 955: __ align(CodeEntryAlignment); >>> 956: StubCodeMark mark(this, "StubRoutines", stub_name); >>> 957: address start = __ pc(); >> >> Could you please add some comments here why you are filling the data like this? >> Presumably, you are emitting 32 bits and 64 bits respectively, right? So the cells have different size, correct? > > Thanks for the comment addition! Improvement suggestion: For a vector with 8 ints, we get `2^8 = 256` many bit patterns for the mask. The table has a row for each `mask` value, consisting of 8 ints, which provide the valid permute index corresponding to set bit position in the `mask`, or a -1 (default) value. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1442668440 From epeter at openjdk.org Fri Jan 5 10:16:21 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Jan 2024 10:16:21 GMT Subject: RFR: 8322735: C2: minor improvements of bubble sort used in SuperWord::packset_sort In-Reply-To: References: Message-ID: On Fri, 5 Jan 2024 08:54:36 GMT, Denghui Dong wrote: >> Ok. This change is fine with me. Thanks for taking the time to look into this :) >> >> I was just curious what was your motivation. I may completely redo this code once I remove the alignment constraints (here used for sorting), but that will have to be decided in a few months. >> >> Please do the renaming, and then I can run testing and give you my approval. > >> Please do the renaming, and then I can run testing and give you my approval. > > Updated. Thanks. @D-D-H nice. Testing running. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17190#issuecomment-1878430341 From aph at openjdk.org Fri Jan 5 10:19:22 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 5 Jan 2024 10:19:22 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v4] In-Reply-To: References: Message-ID: On Thu, 4 Jan 2024 08:12:22 GMT, Roland Westrelin wrote: >> This change implements C2 optimizations for calls to >> ScopedValue.get(). Indeed, in: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> `v2` can be replaced by `v1` and the second call to `get()` can be >> optimized out. That's true whatever is between the 2 calls unless a >> new mapping for `scopedValue` is created in between (when that happens >> no optimizations is performed for the method being compiled). Hoisting >> a `get()` call out of loop for a loop invariant `scopedValue` should >> also be legal in most cases. >> >> `ScopedValue.get()` is implemented in java code as a 2 step process. A >> cache is attached to the current thread object. If the `ScopedValue` >> object is in the cache then the result from `get()` is read from >> there. Otherwise a slow call is performed that also inserts the >> mapping in the cache. The cache itself is lazily allocated. One >> `ScopedValue` can be hashed to 2 different indexes in the cache. On a >> cache probe, both indexes are checked. As a consequence, the process >> of probing the cache is a multi step process (check if the cache is >> present, check first index, check second index if first index >> failed). If the cache is populated early on, then when the method that >> calls `ScopedValue.get()` is compiled, profile reports the slow path >> as never taken and only the read from the cache is compiled. >> >> To perform the optimizations, I added 3 new node types to C2: >> >> - the pair >> ScopedValueGetHitsInCacheNode/ScopedValueGetLoadFromCacheNode for >> the cache probe >> >> - a cfg node ScopedValueGetResultNode to help locate the result of the >> `get()` call in the IR graph. >> >> In pseudo code, once the nodes are inserted, the code of a `get()` is: >> >> >> hits_in_the_cache = ScopedValueGetHitsInCache(scopedValue) >> if (hits_in_the_cache) { >> res = ScopedValueGetLoadFromCache(hits_in_the_cache); >> } else { >> res = ..; //slow call possibly inlined. Subgraph can be arbitray complex >> } >> res = ScopedValueGetResult(res) >> >> >> In the snippet: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> Replacing `v2` by `v1` is then done by starting from the >> `ScopedValueGetResult` node for the second `get()` and looking for a >> dominating `ScopedValueGetResult` for the same `ScopedValue` >> object. When one is found, it is used as a replacement. Eliminating >> the second `get()` call is achieved by making >> `ScopedValueGetHitsInCache` always successful if there's a dominating >> `Scoped... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > merge fix A couple of answers: > I'm not a C2 expert, so my high-level comments might not all make sense, but here goes. > > * I would expect multiple get() calls in the same method or in loops to be rare and/or programmer errors. The real advantage seems to be when the binding and the get() can be connected from caller to callee through inlining. Binding and get() are usually separated by a long way. It's a common pattern to use get() inside a loop when a ScopedValue is used to hold a capability object which is private within a library context. > * Are we able to optimize a get() on a constant/final ScopedValue into a simple array load at a constant offset? Maybe I'm misunderstanding this question, but that's what the scoped value cache does. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-1878433272 From chagedorn at openjdk.org Fri Jan 5 10:37:30 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 5 Jan 2024 10:37:30 GMT Subject: RFR: 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 [v9] In-Reply-To: <01kR0e8YhFsKsSlClNnbE2A4IDAeJn1q2Xxs3gNxGcU=.0cad5e28-7f54-4f76-b386-56788c94e932@github.com> References: <0467mugqEs3f6AxCG9krY6cAXXuqRJKptnwEaLJ0gIA=.ce7c02be-1d51-470f-b309-380f1de83f30@github.com> <01kR0e8YhFsKsSlClNnbE2A4IDAeJn1q2Xxs3gNxGcU=.0cad5e28-7f54-4f76-b386-56788c94e932@github.com> Message-ID: On Fri, 5 Jan 2024 10:00:57 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/loopopts.cpp line 345: >> >>> 343: >>> 344: if (dp == nullptr) >>> 345: return; >> >> Since we bail out above if `iff->outcnt() != 2` (can it even be that we have an `If` at this point which does not have 2 out projections?) this bailout seems redundant. Looks like it was only added due to a parfait report with https://github.com/openjdk/jdk/commit/25c4a7fccdbdaa9da0a7aa5e04e80966138fe42c. Maybe we can remove that as well and change `proj_out_or_null()` back to `proj_out()` (not sure though if parfait will then report this again). But could also be done separately. > > Thanks for the details. Why would it have it been necessary before but no longer necessary now? What is it that has changed so parfait would not complain? Unfortunately, the report details are no longer available today. I think the fix back there should have been that it's a false positive and it cannot happen that `dp` is null - even though parfait fails to prove that (it probably still cannot). I'm not exactly sure how parfait works and how we could ensure that it will not complain about this again but maybe adding an assert that `dp` is not null would help. Anyway, this should not block this PR and might be better handled separately. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1442720560 From chagedorn at openjdk.org Fri Jan 5 10:39:21 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 5 Jan 2024 10:39:21 GMT Subject: RFR: 8323012: C2 fails with fatal error: no reachable node should have no use In-Reply-To: References: Message-ID: On Fri, 5 Jan 2024 09:29:15 GMT, Tobias Hartmann wrote: > Incorrect refactoring from [JDK-8322490](https://bugs.openjdk.org/browse/JDK-8322490), probably a copy-paste error, leads to a dead `CastPP` and might have other not yet observed effects. See JBS for details. > > I wasn't able to extract a regression test for this but it reliably reproduced with replay compilation. > > Thanks, > Tobias Nice catch - that was hard to spot. Looks good and trivial. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17276#pullrequestreview-1805713175 From thartmann at openjdk.org Fri Jan 5 10:48:25 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 5 Jan 2024 10:48:25 GMT Subject: RFR: 8320310: CompiledMethod::has_monitors flag can be incorrect [v4] In-Reply-To: References: <4efExybeWDkEbcsckI1Qdz8kpYFqd-Rbmt7oiWz5qlo=.d8d38d0e-affa-48dc-b963-45f958041c4e@github.com> Message-ID: On Mon, 11 Dec 2023 18:38:55 GMT, Jorn Vernee wrote: >> Currently, the `CompiledMethod::has_monitors` flag is set when either a `monitorenter` is parsed by C1, and `monitorexit` is parsed by C1 or C2 during method compilation. However, not necessarily every bytecode of a method is parsed, which means that we could miss all `monitorenter`/`monitorexit` byte codes in a method, while it actually does use monitors. This can lead to situations where a thread holds a monitor, but `has_monitors` for all frames is set to `false`, leading to an assertion failure in 'freeze_internal' in continuationFreezeThaw.cpp: >> >> assert(monitors_on_stack(current) == ((current->held_monitor_count() - current->jni_monitor_count()) > 0), >> "Held monitor count and locks on stack invariant: " INT64_FORMAT " JNI: " INT64_FORMAT, (int64_t)current->held_monitor_count(), (int64_t)current->jni_monitor_count()); >> >> The proposed fix is to rely on `Method::has_monitor_bytecodes` to set the `has_monitors` flag when compiling, which is immune to issues where not all byte codes of a method are parsed during compilation. We can follow the pattern established for `has_reserved_stack_access`, which is similar. >> >> Note that this PR is based on: https://github.com/openjdk/jdk/pull/16416 which disables the assertion. The goal of this PR is to fix the issue, and then re-enable the assertion. >> >> Testing: Tier 1-4, `java/lang/Thread/virtual/stress/PinALot.java` > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > re-enable assert again Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16799#pullrequestreview-1805725501 From chagedorn at openjdk.org Fri Jan 5 10:58:41 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 5 Jan 2024 10:58:41 GMT Subject: RFR: 8310711: [IR Framework] Remove safepoint while printing handling [v2] In-Reply-To: References: Message-ID: <278BZfCuvI5xJSWh2PvZtONVaZ2QjxkWKj1NifCbFYE=.af0a47ab-63a6-47bd-953b-4b0756107227@github.com> > This clean-up PR removes the handling of the `` message in the IR framework. It is no longer required since we dump the output of `PrintIdeal` to the hotspot_pid file differently since [JDK-8306922](https://bugs.openjdk.org/browse/JDK-8306922). There is no interrupting `` message anymore. I removed the corresponding now unneeded code together with the previously added test case for it. > > Testing: tier1-4 > > Thanks, > Christian Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Update copyright year - Merge branch 'master' into JDK-8310711 - 8310711: [IR Framework] Remove safepoint while printing handling ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16921/files - new: https://git.openjdk.org/jdk/pull/16921/files/ed5ef1fd..38f00cc3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16921&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16921&range=00-01 Stats: 121382 lines in 2594 files changed: 66365 ins; 45354 del; 9663 mod Patch: https://git.openjdk.org/jdk/pull/16921.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16921/head:pull/16921 PR: https://git.openjdk.org/jdk/pull/16921 From thartmann at openjdk.org Fri Jan 5 11:02:23 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 5 Jan 2024 11:02:23 GMT Subject: RFR: 8322077: Add Ideal transformation: (~a) | (~b) => ~(a & b) [v4] In-Reply-To: <5KKTYmY7dJ4nW0OEQ2UPuloIWOYc-pI9M8HRjoaRzw4=.f5eda6f9-c38c-4a2b-9690-cbb1791a2622@github.com> References: <5KKTYmY7dJ4nW0OEQ2UPuloIWOYc-pI9M8HRjoaRzw4=.f5eda6f9-c38c-4a2b-9690-cbb1791a2622@github.com> Message-ID: On Fri, 15 Dec 2023 23:35:56 GMT, Zhiqiang Zang wrote: >> Hello, >> >> (~a) | (~b) => ~(a & b) is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1617C28-L1617C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. >> >> Thanks. > > Zhiqiang Zang has updated the pull request incrementally with one additional commit since the last revision: > > untabify. Looks good to me. Please update the copyright dates. I submitted testing and will report back once it passed. ------------- PR Review: https://git.openjdk.org/jdk/pull/16334#pullrequestreview-1805744288 From thartmann at openjdk.org Fri Jan 5 11:07:22 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 5 Jan 2024 11:07:22 GMT Subject: RFR: 8322589: Add Ideal transformation: (~a) & (~b) => ~(a | b) In-Reply-To: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> References: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> Message-ID: On Tue, 24 Oct 2023 04:49:20 GMT, Zhiqiang Zang wrote: > Hello, > > `(~a) & (~b) => ~(a | b)` is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1616C28-L1616C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. > > Thanks. I think dedicated methods like you used in https://github.com/openjdk/jdk/pull/16334 would be good. Please also update the copyright dates. ------------- PR Review: https://git.openjdk.org/jdk/pull/16333#pullrequestreview-1805750056 From thartmann at openjdk.org Fri Jan 5 11:16:30 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 5 Jan 2024 11:16:30 GMT Subject: RFR: 8323012: C2 fails with fatal error: no reachable node should have no use In-Reply-To: References: Message-ID: On Fri, 5 Jan 2024 09:29:15 GMT, Tobias Hartmann wrote: > Incorrect refactoring from [JDK-8322490](https://bugs.openjdk.org/browse/JDK-8322490), probably a copy-paste error, leads to a dead `CastPP` and might have other not yet observed effects. See JBS for details. > > I wasn't able to extract a regression test for this but it reliably reproduced with replay compilation. > > Thanks, > Tobias Thanks Christian! ------------- PR Comment: https://git.openjdk.org/jdk/pull/17276#issuecomment-1878501253 From thartmann at openjdk.org Fri Jan 5 11:16:31 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 5 Jan 2024 11:16:31 GMT Subject: Integrated: 8323012: C2 fails with fatal error: no reachable node should have no use In-Reply-To: References: Message-ID: On Fri, 5 Jan 2024 09:29:15 GMT, Tobias Hartmann wrote: > Incorrect refactoring from [JDK-8322490](https://bugs.openjdk.org/browse/JDK-8322490), probably a copy-paste error, leads to a dead `CastPP` and might have other not yet observed effects. See JBS for details. > > I wasn't able to extract a regression test for this but it reliably reproduced with replay compilation. > > Thanks, > Tobias This pull request has now been integrated. Changeset: 78623c95 Author: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/78623c95f2a3954384963c4c761d2e4e5f4aefed Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8323012: C2 fails with fatal error: no reachable node should have no use Reviewed-by: chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/17276 From thartmann at openjdk.org Fri Jan 5 11:17:25 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 5 Jan 2024 11:17:25 GMT Subject: RFR: 8320128: Clean up Parse constructor for OSR [v4] In-Reply-To: <-QQZANSu7PVpZ3CUX7kbp00f_xkhjiy7la3Z3gXutKA=.a1c938e4-12e0-44b8-a2dd-7cc0c0c9f09f@github.com> References: <0mxhhLR9DDdpiBhbXWPxCmMzo_OI9DYiIX36xGhxC_c=.1eed99b9-2dc3-4403-b3c1-22b6ebed6079@github.com> <-QQZANSu7PVpZ3CUX7kbp00f_xkhjiy7la3Z3gXutKA=.a1c938e4-12e0-44b8-a2dd-7cc0c0c9f09f@github.com> Message-ID: <5FSKfaW9bdkNn6Wr7MTr0-A3Zouqm8veKGyC9y11-vo=.b3aacd6e-492b-4e31-b04b-0de64a06cc9b@github.com> On Thu, 4 Jan 2024 17:06:38 GMT, Xin Liu wrote: >> There's a special case for the constructor of Parse. If current compilation is OSR and it is handling the top-level method(depth() == 1), then >> >> 1. _tf = C->tf(); >> 2. _entry_bci = C->entry_bci(); >> 3. _flow = method()->get_osr_flow_analysis(_entry_bci); >> >> We don't need to assign those member data twice. We can also factor out _flow->failing() for the special case and normal cases. >> >> It's worth mentioning that we can't save ciTypeFlow computation because >> get_osr_flow_analysis(_entry_bci) actually needs get_flow_analysis(method()). > > Xin Liu has updated the pull request incrementally with one additional commit since the last revision: > > Use print_cr for the log message. I performed some testing. Submitted it again and will report back once it passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16669#issuecomment-1878504674 From shade at openjdk.org Fri Jan 5 11:36:34 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 5 Jan 2024 11:36:34 GMT Subject: Withdrawn: 8321137: Relax ICStub alignment In-Reply-To: References: Message-ID: On Thu, 30 Nov 2023 18:31:24 GMT, Aleksey Shipilev wrote: > WIP, submitting for others to poke holes in it. > > Similarly to [JDK-8284578](https://bugs.openjdk.org/browse/JDK-8284578), we would like to handle `ICStub` alignment. Currently, the small stub that takes only 24 bytes of code is covered by 128 bytes on AArch64. This is due to the same thing fixed by [JDK-8284578](https://bugs.openjdk.org/browse/JDK-8284578) for interpreter codelets: aligning twice the `CodeEntryAlignment`. > > 128 bytes per `ICStub` means we deplete 10K `ICBuffer` with only 79 stubs. This actually happens multiple times even on a simple `HelloWorld.java` invocation that invokes some javac code, causing `ICBufferFull` safepoints. We can increase `ICBuffer` size, especially after [JDK-8314220](https://bugs.openjdk.org/browse/JDK-8314220), but we cannot do this without limits, since it eats up code cache. > > But if we assume that code entry alignment is not a strict requirement, and used to improve performance for frequently used code, then maybe we do not have to over-align the IC stub, given it is probably only used during IC transitions? It would significantly improve `ICStub` footprint and require smaller `ICBuffer`. > > Current patch affects ICStub size in different ways on different platforms, since current size is effectively 2x`CodeEntryAlignment`, and new size is cache line size: > - AArch64: 128 -> 64 bytes > - x86_64: 64 -> 64 bytes > - PPC64: 512 -> 128 bytes > - S390X: 128 -> 256 bytes (!) > - ARM: 32 -> 64 bytes (!) > - Zero: > > Additional testing: > - [x] Linux x86_64 server fastdebug `tier1 tier2 tier3` > - [x] Linux AArch64 server fastdebug `tier1 tier2 tier3` This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/16911 From bulasevich at openjdk.org Fri Jan 5 11:37:44 2024 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Fri, 5 Jan 2024 11:37:44 GMT Subject: RFR: 8322858: compiler/c2/aarch64/TestFarJump.java fails on AArch64 due to unexpected PrintAssembly output Message-ID: <2CaDCuUHVZ7kwRvW-aKzLtsSM0yJ5crh-DZ5QbHevRU=.28845f90-4c77-4875-b88d-8d5af411ea17@github.com> Test checks for the ADRP instruction in the [Exception Handler] section of the TestFarJump::main PrintAssembly output. The -XX:CompileOnly=TestFarJump::main option is not sufficient to restrict PrintAssembly output to a specific method. A number of method assemblies precede the output of the TestFarJump::main method: - java.lang.invoke.MethodHandle::linkToStatic(LLLLLLL)L - java.lang.invoke.MethodHandle::linkToStatic(LLLL)V - java.lang.invoke.MethodHandle::invokeBasic(LLLLLL)L - java.lang.invoke.MethodHandle::linkToSpecial(LLLLLLLL)L - java.lang.invoke.MethodHandle::linkToSpecial(LLLLL)V - java.lang.invoke.MethodHandle::invokeBasic(LLL)L - java.lang.invoke.MethodHandle::linkToSpecial(LLLLL)L - java.lang.invoke.MethodHandle::linkToStatic(LL)L - java.lang.invoke.MethodHandle::linkToSpecial(LL)V - java.lang.invoke.MethodHandle::invokeBasic()L - java.lang.invoke.MethodHandle::linkToSpecial(LL)L - java.lang.invoke.MethodHandle::linkToStatic(LLL)L - java.lang.invoke.MethodHandle::linkToStatic(LLLL)L - java.lang.invoke.MethodHandle::linkToSpecial(LLL)V - java.lang.invoke.MethodHandle::invokeBasic(L)L - java.lang.invoke.MethodHandle::linkToSpecial(LLL)L - java.lang.invoke.MethodHandle::linkToStatic(LLL)V - java.lang.invoke.MethodHandle::linkToStatic(LL)I - jdk.internal.vm.Continuation::enterSpecial - compiler.c2.aarch64.TestFarJump::main With this change, I use the -XX:CompileCommand=option,TestFarJump::main,bool,PrintAssembly,true option to restrict the PrintAssembly output to a specific method. Now the only [Exception Handler] in the listing is the section of TestFarJump::main method. ------------- Commit messages: - 8322858: compiler/c2/aarch64/TestFarJump.java fails on AArch64 due to unexpected PrintAssembly output Changes: https://git.openjdk.org/jdk/pull/17278/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17278&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8322858 Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17278.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17278/head:pull/17278 PR: https://git.openjdk.org/jdk/pull/17278 From roland at openjdk.org Fri Jan 5 12:47:22 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 5 Jan 2024 12:47:22 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v4] In-Reply-To: References: Message-ID: On Fri, 5 Jan 2024 00:59:03 GMT, Dean Long wrote: > I'm not a C2 expert, so my high-level comments might not all make sense, but here goes. > > * My first reaction was why does this need to be so complicated? That's a fair reaction. > Can't we treat the slow path and cache implementation as a black box, and just common up the high-level get()? In my mind the ScopedValue get should be similar to a read from a "condy" dynamic constant. Initially, I thought about delaying the inlining of `get()` methods and simply have a pass that look for `get()` calls with the same inputs. I don't think that works well because the current late inlining framework can't delay inlining very late. We don't run loop opts before we're done with inlining for instance. If we wanted to hoist a call out of loop we would need loop opts. For instance, tt's likely a call to `get()` depends on a null check that we would need to hoist first. The other thing about optimizing `get()` calls is that they are heavy weight nodes (a high level `get()` macro node would be very similar to a `get()` call node whichever way you look at it). We don't know how to hoist a call out of loop. A call acts as a barrier on the entire memory state and get in the way of memory optimizations. If profile reports the slow path to be never taken then the shape of the `get()` becomes lighter weight. It doesn't disrupt other optimizations. Probing the cache acts as a load + test which we know how to hoist from a loop. It felt to me that it would be fairly common for the slow path to not be needed and given the shape without the slow path is much easier to optimize, it was important to be able to expose early on if the slow path was there or not. > > * The reason for breaking up the operations into individual nodes seems to be because of the profiling information. So I'm wondering how much this helps, given the added complexity. The thing about `get()` is that in simple cases, it optimizes well because of profile data. A `get()` call once inlined can essentially be hoisted out of loop if all goes well. It doesn't take much for simple optimizations on `get()` to not happen anymore. The goal of this patch is to bring consistency and have optimizations work well in all sort of scenarios. But it would be hard to sell if the simple cases don't work as well as they do without the patch. And I believe that requires profile data. > > * I would expect multiple get() calls in the same method or in loops to be rare and/or programmer errors. The real advantage seems to be when the binding and the get() can be connected from caller to callee through inlining. Eliminating `get()` calls with the same inputs may not be common in java code but that transformation is a building block for optimizations. Hoisting a `get()` out of loop can be achieved by peeling one iteration and letting the `get()` from the loop body be removed because it's redundant with the one from the peeled iteration. Also, code that c2 optimizes once inlining has happened and dead paths have been trimmed doesn't necessarily look like the java code the programmer wrote. > > * Needing to do things like treat ScopedValueGetHitsInCache as always successful give be a bad feeling for some reason, and seem unnecessary if we did more at a higher (macro?) level rather than eagerly expanding the high-level operation into individual nodes. I think my comments above cover that one. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-1878609794 From aph at openjdk.org Fri Jan 5 13:34:23 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 5 Jan 2024 13:34:23 GMT Subject: RFR: 8322858: compiler/c2/aarch64/TestFarJump.java fails on AArch64 due to unexpected PrintAssembly output In-Reply-To: <2CaDCuUHVZ7kwRvW-aKzLtsSM0yJ5crh-DZ5QbHevRU=.28845f90-4c77-4875-b88d-8d5af411ea17@github.com> References: <2CaDCuUHVZ7kwRvW-aKzLtsSM0yJ5crh-DZ5QbHevRU=.28845f90-4c77-4875-b88d-8d5af411ea17@github.com> Message-ID: <93wsMJPSK3Sk7jSR4J8QbHq7T56rUZTP0Y1kHYrUc6U=.7621639c-39f4-4fb7-a6df-9bff8419e86a@github.com> On Fri, 5 Jan 2024 11:32:16 GMT, Boris Ulasevich wrote: > Test checks for the ADRP instruction in the [Exception Handler] section of the TestFarJump::main PrintAssembly output. > > The -XX:CompileOnly=TestFarJump::main option is not sufficient to restrict PrintAssembly output to a specific method. A number of method assemblies precede the output of the TestFarJump::main method: > - java.lang.invoke.MethodHandle::linkToStatic(LLLLLLL)L > - java.lang.invoke.MethodHandle::linkToStatic(LLLL)V > - java.lang.invoke.MethodHandle::invokeBasic(LLLLLL)L > - java.lang.invoke.MethodHandle::linkToSpecial(LLLLLLLL)L > - java.lang.invoke.MethodHandle::linkToSpecial(LLLLL)V > - java.lang.invoke.MethodHandle::invokeBasic(LLL)L > - java.lang.invoke.MethodHandle::linkToSpecial(LLLLL)L > - java.lang.invoke.MethodHandle::linkToStatic(LL)L > - java.lang.invoke.MethodHandle::linkToSpecial(LL)V > - java.lang.invoke.MethodHandle::invokeBasic()L > - java.lang.invoke.MethodHandle::linkToSpecial(LL)L > - java.lang.invoke.MethodHandle::linkToStatic(LLL)L > - java.lang.invoke.MethodHandle::linkToStatic(LLLL)L > - java.lang.invoke.MethodHandle::linkToSpecial(LLL)V > - java.lang.invoke.MethodHandle::invokeBasic(L)L > - java.lang.invoke.MethodHandle::linkToSpecial(LLL)L > - java.lang.invoke.MethodHandle::linkToStatic(LLL)V > - java.lang.invoke.MethodHandle::linkToStatic(LL)I > - jdk.internal.vm.Continuation::enterSpecial > - compiler.c2.aarch64.TestFarJump::main > > With this change, I use the -XX:CompileCommand=option,TestFarJump::main,bool,PrintAssembly,true option to restrict the PrintAssembly output to a specific method. Now the only [Exception Handler] in the listing is the section of TestFarJump::main method. Marked as reviewed by aph (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/17278#pullrequestreview-1805971972 From chagedorn at openjdk.org Fri Jan 5 13:41:23 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 5 Jan 2024 13:41:23 GMT Subject: RFR: 8310844: [AArch64] C1 compilation fails because monitor offset in OSR buffer is too large for immediate [v2] In-Reply-To: <2Hnw1VDu1Ml2XB-VfUV8l-t8dtrMzPNh3RyNPfV-tgA=.fc400df2-9529-4bdd-8e0c-fb49ca4f0b48@github.com> References: <6vNa0ZYwKL0HXjHjWTyIe6kB5hza1aaXlJbxM4qh8-w=.edc55dc9-08ff-4861-9038-d4b150bcf4f7@github.com> <2Hnw1VDu1Ml2XB-VfUV8l-t8dtrMzPNh3RyNPfV-tgA=.fc400df2-9529-4bdd-8e0c-fb49ca4f0b48@github.com> Message-ID: On Thu, 4 Jan 2024 14:20:39 GMT, Tobias Hartmann wrote: >> [JDK-8287349](https://bugs.openjdk.org/browse/JDK-8287349) changed the code in `LIR_Assembler::osr_entry()` to use a single `ldp` instruction instead of two `ldr` instructions to load the monitor lock and object from the OSR state. This is not correct because the `ldp` instruction only supports a [7-bit signed immediate value](https://developer.arm.com/documentation/den0024/a/The-A64-instruction-set/Memory-access-instructions/Accessing-multiple-memory-locations). If the offset is larger, for example due to a large number of locals as in `TestLargeMonitorOffset::test`, we hit the `Field too big for insn` guarantee. >> >> I suggest to revert [JDK-8287349](https://bugs.openjdk.org/browse/JDK-8287349). >> >> I also found two unrelated bugs when working on the reproducer: [JDK-8322992](https://bugs.openjdk.org/browse/JDK-8322992) (javac) and [JDK-8322996](https://bugs.openjdk.org/browse/JDK-8322996) (C2). >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Adjusted according to review Looks good. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17266#pullrequestreview-1805986676 From thartmann at openjdk.org Fri Jan 5 13:51:36 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 5 Jan 2024 13:51:36 GMT Subject: RFR: 8310844: [AArch64] C1 compilation fails because monitor offset in OSR buffer is too large for immediate [v2] In-Reply-To: <2Hnw1VDu1Ml2XB-VfUV8l-t8dtrMzPNh3RyNPfV-tgA=.fc400df2-9529-4bdd-8e0c-fb49ca4f0b48@github.com> References: <6vNa0ZYwKL0HXjHjWTyIe6kB5hza1aaXlJbxM4qh8-w=.edc55dc9-08ff-4861-9038-d4b150bcf4f7@github.com> <2Hnw1VDu1Ml2XB-VfUV8l-t8dtrMzPNh3RyNPfV-tgA=.fc400df2-9529-4bdd-8e0c-fb49ca4f0b48@github.com> Message-ID: On Thu, 4 Jan 2024 14:20:39 GMT, Tobias Hartmann wrote: >> [JDK-8287349](https://bugs.openjdk.org/browse/JDK-8287349) changed the code in `LIR_Assembler::osr_entry()` to use a single `ldp` instruction instead of two `ldr` instructions to load the monitor lock and object from the OSR state. This is not correct because the `ldp` instruction only supports a [7-bit signed immediate value](https://developer.arm.com/documentation/den0024/a/The-A64-instruction-set/Memory-access-instructions/Accessing-multiple-memory-locations). If the offset is larger, for example due to a large number of locals as in `TestLargeMonitorOffset::test`, we hit the `Field too big for insn` guarantee. >> >> I suggest to revert [JDK-8287349](https://bugs.openjdk.org/browse/JDK-8287349). >> >> I also found two unrelated bugs when working on the reproducer: [JDK-8322992](https://bugs.openjdk.org/browse/JDK-8322992) (javac) and [JDK-8322996](https://bugs.openjdk.org/browse/JDK-8322996) (C2). >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Adjusted according to review Thanks, Christian! ------------- PR Comment: https://git.openjdk.org/jdk/pull/17266#issuecomment-1878684009 From thartmann at openjdk.org Fri Jan 5 13:51:37 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 5 Jan 2024 13:51:37 GMT Subject: Integrated: 8310844: [AArch64] C1 compilation fails because monitor offset in OSR buffer is too large for immediate In-Reply-To: <6vNa0ZYwKL0HXjHjWTyIe6kB5hza1aaXlJbxM4qh8-w=.edc55dc9-08ff-4861-9038-d4b150bcf4f7@github.com> References: <6vNa0ZYwKL0HXjHjWTyIe6kB5hza1aaXlJbxM4qh8-w=.edc55dc9-08ff-4861-9038-d4b150bcf4f7@github.com> Message-ID: <5L1hf9r4_BrbzI-pXsVoaB7MFhbNkuppE1N8Jp6lV8I=.d9660cd6-962b-4ec1-99d7-5b94ae67d88c@github.com> On Thu, 4 Jan 2024 12:39:18 GMT, Tobias Hartmann wrote: > [JDK-8287349](https://bugs.openjdk.org/browse/JDK-8287349) changed the code in `LIR_Assembler::osr_entry()` to use a single `ldp` instruction instead of two `ldr` instructions to load the monitor lock and object from the OSR state. This is not correct because the `ldp` instruction only supports a [7-bit signed immediate value](https://developer.arm.com/documentation/den0024/a/The-A64-instruction-set/Memory-access-instructions/Accessing-multiple-memory-locations). If the offset is larger, for example due to a large number of locals as in `TestLargeMonitorOffset::test`, we hit the `Field too big for insn` guarantee. > > I suggest to revert [JDK-8287349](https://bugs.openjdk.org/browse/JDK-8287349). > > I also found two unrelated bugs when working on the reproducer: [JDK-8322992](https://bugs.openjdk.org/browse/JDK-8322992) (javac) and [JDK-8322996](https://bugs.openjdk.org/browse/JDK-8322996) (C2). > > Thanks, > Tobias This pull request has now been integrated. Changeset: ade21a96 Author: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/ade21a965f8a5fc889cd48bba76fad507bdeddf5 Stats: 149 lines in 2 files changed: 147 ins; 0 del; 2 mod 8310844: [AArch64] C1 compilation fails because monitor offset in OSR buffer is too large for immediate Reviewed-by: aph, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/17266 From shade at openjdk.org Fri Jan 5 14:43:41 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 5 Jan 2024 14:43:41 GMT Subject: RFR: 8323065: Unneccesary CodeBlob lookup in CompiledIC::internal_set_ic_destination Message-ID: I was looking at hotpath for IC stub cleaning (happens at safepoint), and one obvious thing is that we look-up `CodeBlob` from `call->instruction_address()` only to assert that is compiled one. It used to be protected by `#ifdef ASSERT` before [JDK-8212681](https://bugs.openjdk.org/browse/JDK-8212681), and pulled from it to be used in Mutex: https://hg.openjdk.org/jdk/jdk/rev/d6dc479bcdd3#l15.62 ...which is now gone after [JDK-8214257](https://bugs.openjdk.org/browse/JDK-8214257). This fix reinstates the `ASSERT` block again. There are small improvements (~1..10us) for safepoint cleanup on small ad-hoc tests. But since this whole thing involves looking up things in code cache, it may cost quite a lot. ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/17281/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17281&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8323065 Stats: 4 lines in 1 file changed: 3 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17281.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17281/head:pull/17281 PR: https://git.openjdk.org/jdk/pull/17281 From epeter at openjdk.org Fri Jan 5 15:36:22 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Jan 2024 15:36:22 GMT Subject: RFR: 8319690: [AArch64] C2 compilation hits offset_ok_for_immed: assert "c2 compiler bug" In-Reply-To: References: <16J-lJ2AceGTVcRWBcP15yKcwO-1IA1XsngyOuNjf7k=.0776f081-ae2c-4279-87cf-d909806c2bc4@github.com> <4KRCqYxn02wMjYDuN3_HbYWxc9BDtYMNd40bNrJ4K8w=.5362dda4-8d30-48fd-8915-909eb15a6023@github.com> Message-ID: <82i5GmtoNQdveJShSuWQa7dGHszWLCLVbsJNC6Mulx4=.bd5b27f5-098b-4350-abe1-98af81f0bb3e@github.com> On Thu, 7 Dec 2023 06:45:30 GMT, Fei Gao wrote: >> After this change, `immIOffset` and `immLOffset` appear to be obsolete. > >> After this change, `immIOffset` and `immLOffset` appear to be obsolete. > > Removed them in the new commit. Thanks! @fg1417 what is the state on this? The example here may look like a strange edge-case. But with my plans this may become more common: [JDK-8318446](https://bugs.openjdk.org/browse/JDK-8318446) C2: implement StoreNode::Ideal_merge_stores I will merge stores, which can then look like unaligned stores of a larger type. This bug here blocks my progress. I'm not in a hurry, just curious if there is now a plan how to proceed here ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/16991#issuecomment-1878857862 From epeter at openjdk.org Fri Jan 5 16:03:27 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Jan 2024 16:03:27 GMT Subject: RFR: 8322589: Add Ideal transformation: (~a) & (~b) => ~(a | b) In-Reply-To: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> References: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> Message-ID: On Tue, 24 Oct 2023 04:49:20 GMT, Zhiqiang Zang wrote: > Hello, > > `(~a) & (~b) => ~(a | b)` is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1616C28-L1616C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. > > Thanks. Looks like a good idea :) I left a few suggestions below. src/hotspot/share/opto/mulnode.cpp line 617: > 615: && phase->type(in(1)->in(2)) == TypeInt::MINUS_1 > 616: && in(2)->Opcode() == Op_XorI > 617: && in(1)->in(2) == in(2)->in(2)) { minor code style issue: please take the `&&` to the end of the line. That is what I usually see. It also makes reading the lines easier, as they are aligned with the first line. src/hotspot/share/opto/mulnode.cpp line 618: > 616: && in(2)->Opcode() == Op_XorI > 617: && in(1)->in(2) == in(2)->in(2)) { > 618: return new XorINode(phase->transform(new OrINode(in(1)->in(1), in(2)->in(1))), in(1)->in(2)); The nesting of this line is difficult to read. I suggest you take multiple lines and name intermediate results with something helpful. test/hotspot/jtreg/compiler/c2/irTests/AndINodeIdealizationTests.java line 50: > 48: > 49: assertResult(0, 0); > 50: assertResult(a, a); Suggestion: assertResult(a, b); I assume you wanted this? Otherwise `b` is useless ;) test/hotspot/jtreg/compiler/c2/irTests/AndLNodeIdealizationTests.java line 50: > 48: > 49: assertResult(0, 0); > 50: assertResult(a, a); Suggestion: assertResult(a, b); ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16333#pullrequestreview-1806210975 PR Review Comment: https://git.openjdk.org/jdk/pull/16333#discussion_r1443035127 PR Review Comment: https://git.openjdk.org/jdk/pull/16333#discussion_r1443038716 PR Review Comment: https://git.openjdk.org/jdk/pull/16333#discussion_r1443043016 PR Review Comment: https://git.openjdk.org/jdk/pull/16333#discussion_r1443043862 From epeter at openjdk.org Fri Jan 5 16:03:29 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Jan 2024 16:03:29 GMT Subject: RFR: 8322589: Add Ideal transformation: (~a) & (~b) => ~(a | b) In-Reply-To: References: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> Message-ID: <2Xcm4mZkW4mN9d8LBmLSkwEM3Hq4I0Vy8NEZz9HL70Y=.6b3f843b-e28c-4f12-b422-12e982d81f6c@github.com> On Fri, 5 Jan 2024 15:51:14 GMT, Emanuel Peter wrote: >> Hello, >> >> `(~a) & (~b) => ~(a | b)` is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1616C28-L1616C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. >> >> Thanks. > > src/hotspot/share/opto/mulnode.cpp line 617: > >> 615: && phase->type(in(1)->in(2)) == TypeInt::MINUS_1 >> 616: && in(2)->Opcode() == Op_XorI >> 617: && in(1)->in(2) == in(2)->in(2)) { > > minor code style issue: please take the `&&` to the end of the line. That is what I usually see. It also makes reading the lines easier, as they are aligned with the first line. Suggestion: && phase->type(in(2)->in(2)) == TypeInt::MINUS_1) { Could be nice for symmetry. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16333#discussion_r1443037674 From epeter at openjdk.org Fri Jan 5 16:25:27 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Jan 2024 16:25:27 GMT Subject: RFR: 8322077: Add Ideal transformation: (~a) | (~b) => ~(a & b) [v4] In-Reply-To: <5KKTYmY7dJ4nW0OEQ2UPuloIWOYc-pI9M8HRjoaRzw4=.f5eda6f9-c38c-4a2b-9690-cbb1791a2622@github.com> References: <5KKTYmY7dJ4nW0OEQ2UPuloIWOYc-pI9M8HRjoaRzw4=.f5eda6f9-c38c-4a2b-9690-cbb1791a2622@github.com> Message-ID: On Fri, 15 Dec 2023 23:35:56 GMT, Zhiqiang Zang wrote: >> Hello, >> >> (~a) | (~b) => ~(a & b) is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1617C28-L1617C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. >> >> Thanks. > > Zhiqiang Zang has updated the pull request incrementally with one additional commit since the last revision: > > untabify. Looks like a good idea. Left a few comments. I would have merged this with https://github.com/openjdk/jdk/pull/16333, since it is essentially the symmetric case. But leave it separate now. It would be nice to have some shared tests, where both optimizations need to be combined. Like: `(~a | ~b) & (~c | ~d)` -> `~(a & b) & ~(c & d)` -> `~((a & b) | (c & d))` src/hotspot/share/opto/addnode.cpp line 787: > 785: } > 786: return nullptr; > 787: } If you are going to use this also for your changes in https://github.com/openjdk/jdk/pull/16333, then you probably want this to go into a shared file. src/hotspot/share/opto/addnode.cpp line 816: > 814: return make_not(phase, > 815: phase->transform(new AndINode(in(1)->in(1), in(2)->in(1))), > 816: T_INT); I'd put the `AndI` node on a separate line. Call it `add_a_b` or similar. Then you can transform on the next line. And then on a third line the `make_not`. ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16334#pullrequestreview-1806238724 PR Review Comment: https://git.openjdk.org/jdk/pull/16334#discussion_r1443055241 PR Review Comment: https://git.openjdk.org/jdk/pull/16334#discussion_r1443052473 From kvn at openjdk.org Fri Jan 5 18:45:29 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 5 Jan 2024 18:45:29 GMT Subject: RFR: 8323012: C2 fails with fatal error: no reachable node should have no use In-Reply-To: References: Message-ID: <5P6XCWehuGjMZUGRNFcu8Z7bPP6t95Z8PYYojMwWi5I=.56eae905-629e-4488-992d-41d5b9dd5f67@github.com> On Fri, 5 Jan 2024 09:29:15 GMT, Tobias Hartmann wrote: > Incorrect refactoring from [JDK-8322490](https://bugs.openjdk.org/browse/JDK-8322490), probably a copy-paste error, leads to a dead `CastPP` and might have other not yet observed effects. See JBS for details. > > I wasn't able to extract a regression test for this but it reliably reproduced with replay compilation. > > Thanks, > Tobias Good. ------------- PR Review: https://git.openjdk.org/jdk/pull/17276#pullrequestreview-1806644365 From kvn at openjdk.org Fri Jan 5 18:55:39 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 5 Jan 2024 18:55:39 GMT Subject: RFR: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs [v59] In-Reply-To: References: Message-ID: On Thu, 4 Jan 2024 07:00:48 GMT, Emanuel Peter wrote: >> I want to push this in JDK23. >> After this fix here, I'm doing [this refactoring](https://github.com/openjdk/jdk/pull/16620). >> >> To calm your nerves: most of the changes are in auto-generated tests, and tests in general. >> >> **Context** >> >> `-XX:+AlignVector` ensures that SuperWord only creates LoadVector and StoreVector that can be memory aligned. This is achieved by iterating in the pre-loop until we reach the alignment boundary, then we can start the main loop properly aligned. However, this is not possible in all cases, sometimes some memory accesses cannot be guaranteed to be aligned, and we need to reject vectorization (at least partially, for some of the packs). >> >> Alignment is split into two tasks: >> - Alignment Correctness Checks: only relevant if `-XX:+AlignVector`. Need to reject vectorization if alignment is not possible. We must check if the address of the vector load/store is aligned with (divisible by) `ObjectAlignmentInBytes`. >> - Alignment by adjusting pre-loop limit: alignment is desirable even if `-XX:-AlignVector`. We would like to align the vectors with their vector width. >> >> **Problem** >> >> I have recently found a bug with our AlignVector [JDK-8310190](https://bugs.openjdk.org/browse/JDK-8310190). >> In that bug, we perform a misaligned memory vector access, which results in a `SIGBUS` on an ARM32 machine. >> Thanks @fg1417 for confirming this! >> Hence, we need to fix the alignment correctness checks. >> >> While working on this task, I also found some bugs in the "alignment by adjusting pre-loop limit": there were cases where it did not align the vectors correctly. >> >> **Problem Details** >> >> Reproducer: >> >> >> static void test(short[] a, short[] b, short mask) { >> for (int i = 0; i < RANGE; i+=8) { >> // Problematic for AlignVector >> b[i+0] = (short)(a[i+0] & mask); // best_memref, align 0 >> >> b[i+3] = (short)(a[i+3] & mask); // pack at offset 6 bytes >> b[i+4] = (short)(a[i+4] & mask); >> b[i+5] = (short)(a[i+5] & mask); >> b[i+6] = (short)(a[i+6] & mask); >> } >> } >> >> >> During `SuperWord::find_adjacent_refs` we used to check if the references are expected to be aligned. For that, we look at each "group" of references (eg all `LoadS`) and take the reference with the lowest offset. For that chosen reference, we check if it is alignable. If yes, we accept all references of that group, if no we reject all. >> >> This is problemati... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > some minor changes for Vladimir Okay, so C2 is smart enough to use `lea` when needed. Good. No more questions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14785#issuecomment-1879123189 From davleopo at openjdk.org Fri Jan 5 19:03:37 2024 From: davleopo at openjdk.org (David Leopoldseder) Date: Fri, 5 Jan 2024 19:03:37 GMT Subject: Integrated: 8322636: [JVMCI] HotSpotSpeculationLog can be inconsistent across a single compile In-Reply-To: References: Message-ID: On Fri, 22 Dec 2023 09:55:16 GMT, David Leopoldseder wrote: > This PR fixes a subtle inconsistency in `HotSpotSpeculationLog` . > > Normal uses of `HotSpotSpeculationLog` work by using a `SpeculationReason` and asking the speculation log via `maySpeculate` if the speculation can be performed, i.e., if it failed before for the given method. An example for this can be seen in Graal https://github.com/oracle/graal/blob/master/compiler/src/jdk.graal.compiler/src/jdk/graal/compiler/nodes/loop/CountedLoopInfo.java#L591C15-L591C15 > The implicit assumption is that the speculation log, `HotSpotSpeculationLog` in particular collects failed speculations at the beginning of a compile and then stays consistent during the compile. Why is that? - Because if there are new failed speculations added to the failed speculations during the compile - the compiler would speculate again on those in an inconsistent way. E.g. at the beginning of a compile a certain speculation has not failed yet and the compiler thinks it can do optimization xyz using a speculation - later during the compilation process it consults the speculation log but gets a different answer. All those inconsistent speculations that already failed will anyway later fail code installation in jvmci (they will throw a bailout during `HotSpotCodeCacheProvider#installCode` https://github.com/openjdk/jdk/blob/master/src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotSpeculationLog.java#L192 ). Thus, we should at least return a consistent result during a compile. > The problem for consistency here, that also makes troubles on the graal side, is that `maySpeculate` itself can collect failed speculations if there have not been any previously, i.e., `failedSpeculations == null`. > In order to make the speculation log consistent across an entire JVMCI compile this PR removes the collection of failed speculations in `maySpeculate`. This pull request has now been integrated. Changeset: 35a1b77d Author: David Leopoldseder Committer: Tom Rodriguez URL: https://git.openjdk.org/jdk/commit/35a1b77da541e4df3c4d1bab0825ea39e653808c Stats: 9 lines in 1 file changed: 6 ins; 3 del; 0 mod 8322636: [JVMCI] HotSpotSpeculationLog can be inconsistent across a single compile Reviewed-by: dnsimon, never ------------- PR: https://git.openjdk.org/jdk/pull/17183 From kvn at openjdk.org Fri Jan 5 19:05:38 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 5 Jan 2024 19:05:38 GMT Subject: RFR: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs [v59] In-Reply-To: References: Message-ID: On Thu, 4 Jan 2024 07:00:48 GMT, Emanuel Peter wrote: >> I want to push this in JDK23. >> After this fix here, I'm doing [this refactoring](https://github.com/openjdk/jdk/pull/16620). >> >> To calm your nerves: most of the changes are in auto-generated tests, and tests in general. >> >> **Context** >> >> `-XX:+AlignVector` ensures that SuperWord only creates LoadVector and StoreVector that can be memory aligned. This is achieved by iterating in the pre-loop until we reach the alignment boundary, then we can start the main loop properly aligned. However, this is not possible in all cases, sometimes some memory accesses cannot be guaranteed to be aligned, and we need to reject vectorization (at least partially, for some of the packs). >> >> Alignment is split into two tasks: >> - Alignment Correctness Checks: only relevant if `-XX:+AlignVector`. Need to reject vectorization if alignment is not possible. We must check if the address of the vector load/store is aligned with (divisible by) `ObjectAlignmentInBytes`. >> - Alignment by adjusting pre-loop limit: alignment is desirable even if `-XX:-AlignVector`. We would like to align the vectors with their vector width. >> >> **Problem** >> >> I have recently found a bug with our AlignVector [JDK-8310190](https://bugs.openjdk.org/browse/JDK-8310190). >> In that bug, we perform a misaligned memory vector access, which results in a `SIGBUS` on an ARM32 machine. >> Thanks @fg1417 for confirming this! >> Hence, we need to fix the alignment correctness checks. >> >> While working on this task, I also found some bugs in the "alignment by adjusting pre-loop limit": there were cases where it did not align the vectors correctly. >> >> **Problem Details** >> >> Reproducer: >> >> >> static void test(short[] a, short[] b, short mask) { >> for (int i = 0; i < RANGE; i+=8) { >> // Problematic for AlignVector >> b[i+0] = (short)(a[i+0] & mask); // best_memref, align 0 >> >> b[i+3] = (short)(a[i+3] & mask); // pack at offset 6 bytes >> b[i+4] = (short)(a[i+4] & mask); >> b[i+5] = (short)(a[i+5] & mask); >> b[i+6] = (short)(a[i+6] & mask); >> } >> } >> >> >> During `SuperWord::find_adjacent_refs` we used to check if the references are expected to be aligned. For that, we look at each "group" of references (eg all `LoadS`) and take the reference with the lowest offset. For that chosen reference, we check if it is alignable. If yes, we accept all references of that group, if no we reject all. >> >> This is problemati... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > some minor changes for Vladimir Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14785#pullrequestreview-1806680693 From duke at openjdk.org Fri Jan 5 20:32:39 2024 From: duke at openjdk.org (Zhiqiang Zang) Date: Fri, 5 Jan 2024 20:32:39 GMT Subject: RFR: 8322589: Add Ideal transformation: (~a) & (~b) => ~(a | b) [v2] In-Reply-To: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> References: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> Message-ID: > Hello, > > `(~a) & (~b) => ~(a | b)` is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1616C28-L1616C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. > > Thanks. Zhiqiang Zang has updated the pull request incrementally with one additional commit since the last revision: update the copyright dates. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16333/files - new: https://git.openjdk.org/jdk/pull/16333/files/cf2edb46..5072eb14 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16333&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16333&range=00-01 Stats: 4 lines in 4 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/16333.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16333/head:pull/16333 PR: https://git.openjdk.org/jdk/pull/16333 From duke at openjdk.org Fri Jan 5 20:45:41 2024 From: duke at openjdk.org (Zhiqiang Zang) Date: Fri, 5 Jan 2024 20:45:41 GMT Subject: RFR: 8322589: Add Ideal transformation: (~a) & (~b) => ~(a | b) [v3] In-Reply-To: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> References: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> Message-ID: <8DbGA3jHE7obdtiXueTWKSwKbQ7-q66G09lcOmaAcu8=.9a4d4e9f-523e-444e-a740-ca23a12f45f2@github.com> > Hello, > > `(~a) & (~b) => ~(a | b)` is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1616C28-L1616C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. > > Thanks. Zhiqiang Zang has updated the pull request incrementally with one additional commit since the last revision: address comments. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16333/files - new: https://git.openjdk.org/jdk/pull/16333/files/5072eb14..3b95720a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16333&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16333&range=01-02 Stats: 9 lines in 3 files changed: 2 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/16333.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16333/head:pull/16333 PR: https://git.openjdk.org/jdk/pull/16333 From xliu at openjdk.org Fri Jan 5 20:47:24 2024 From: xliu at openjdk.org (Xin Liu) Date: Fri, 5 Jan 2024 20:47:24 GMT Subject: RFR: 8320128: Clean up Parse constructor for OSR [v4] In-Reply-To: References: <0mxhhLR9DDdpiBhbXWPxCmMzo_OI9DYiIX36xGhxC_c=.1eed99b9-2dc3-4403-b3c1-22b6ebed6079@github.com> <-QQZANSu7PVpZ3CUX7kbp00f_xkhjiy7la3Z3gXutKA=.a1c938e4-12e0-44b8-a2dd-7cc0c0c9f09f@github.com> Message-ID: On Thu, 4 Jan 2024 20:16:48 GMT, Aleksey Shipilev wrote: >> Xin Liu has updated the pull request incrementally with one additional commit since the last revision: >> >> Use print_cr for the log message. > > What testing was done here? I suggest at least `tier{1,2,3}` to capture surprises. hi, @shipilev I ran tier1~3 yesterday. It only had 2 failures: 1. java/util/Base64/TestEncodingDecodingLength.java (I guess it's due to out of memory JDK-8295153) 2. sun/security/pkcs11/Provider/MultipleLogins.sh ( unsupported OS: Linux-amd64-64, please initialize NSS library location, skipping test) ============================== Test summary ============================== TEST TOTAL PASS FAIL ERROR jtreg:test/hotspot/jtreg:tier1 2511 2511 0 0 >> jtreg:test/jdk:tier1 2400 2399 1 0 << jtreg:test/langtools:tier1 4458 4458 0 0 jtreg:test/jaxp:tier1 0 0 0 0 jtreg:test/lib-test:tier1 32 32 0 0 jtreg:test/hotspot/jtreg:tier2 742 742 0 0 >> jtreg:test/jdk:tier2 4081 4080 1 0 << jtreg:test/langtools:tier2 11 11 0 0 jtreg:test/jaxp:tier2 512 512 0 0 jtreg:test/hotspot/jtreg:tier3 256 256 0 0 jtreg:test/jdk:tier3 1434 1434 0 0 jtreg:test/langtools:tier3 0 0 0 0 jtreg:test/jaxp:tier3 0 0 0 0 ============================== TEST FAILURE ------------- PR Comment: https://git.openjdk.org/jdk/pull/16669#issuecomment-1879235918 From duke at openjdk.org Fri Jan 5 21:37:36 2024 From: duke at openjdk.org (Zhiqiang Zang) Date: Fri, 5 Jan 2024 21:37:36 GMT Subject: RFR: 8322589: Add Ideal transformation: (~a) & (~b) => ~(a | b) [v4] In-Reply-To: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> References: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> Message-ID: > Hello, > > `(~a) & (~b) => ~(a | b)` is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1616C28-L1616C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. > > Thanks. Zhiqiang Zang has updated the pull request incrementally with one additional commit since the last revision: use utility functions. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16333/files - new: https://git.openjdk.org/jdk/pull/16333/files/3b95720a..6ee5f182 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16333&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16333&range=02-03 Stats: 34 lines in 3 files changed: 23 ins; 4 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/16333.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16333/head:pull/16333 PR: https://git.openjdk.org/jdk/pull/16333 From duke at openjdk.org Fri Jan 5 21:43:39 2024 From: duke at openjdk.org (Zhiqiang Zang) Date: Fri, 5 Jan 2024 21:43:39 GMT Subject: RFR: 8322589: Add Ideal transformation: (~a) & (~b) => ~(a | b) [v5] In-Reply-To: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> References: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> Message-ID: > Hello, > > `(~a) & (~b) => ~(a | b)` is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1616C28-L1616C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. > > Thanks. Zhiqiang Zang has updated the pull request incrementally with one additional commit since the last revision: update the copyright dates. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16333/files - new: https://git.openjdk.org/jdk/pull/16333/files/6ee5f182..ecb2098b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16333&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16333&range=03-04 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16333.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16333/head:pull/16333 PR: https://git.openjdk.org/jdk/pull/16333 From duke at openjdk.org Fri Jan 5 21:57:38 2024 From: duke at openjdk.org (Zhiqiang Zang) Date: Fri, 5 Jan 2024 21:57:38 GMT Subject: RFR: 8322077: Add Ideal transformation: (~a) | (~b) => ~(a & b) [v5] In-Reply-To: References: Message-ID: > Hello, > > (~a) | (~b) => ~(a & b) is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1617C28-L1617C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. > > Thanks. Zhiqiang Zang has updated the pull request incrementally with two additional commits since the last revision: - update the copyright dates. - address comments. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16334/files - new: https://git.openjdk.org/jdk/pull/16334/files/8697e399..154c69e5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16334&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16334&range=03-04 Stats: 54 lines in 5 files changed: 23 ins; 18 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/16334.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16334/head:pull/16334 PR: https://git.openjdk.org/jdk/pull/16334 From duke at openjdk.org Fri Jan 5 22:18:22 2024 From: duke at openjdk.org (Joshua Cao) Date: Fri, 5 Jan 2024 22:18:22 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines In-Reply-To: References: Message-ID: <_UiWT_w5oDMw_UTviUeL7vmlicXvi9c983ARz4FXcYo=.ef453e4e-9992-4520-9f2f-53a5fbc94cb1@github.com> On Thu, 16 Nov 2023 12:07:10 GMT, Roland Westrelin wrote: >> I can see why its confusing. I reworded the JBS title and added more to the summary. >> >> >> This confused me when first starting looking at compilation units. I would see a method reported as inlined, but in the early compilation phases, I still see the method call. I was not aware of late inlines. I think it would be a nice enhancement for PrintInlining to report which methods are late inlined. >> >> >> Yes, `PrintInlining` reports late inlines, but I think it would be nice for it to explicitly state which inlines are late inlines. I want to print `late inline`. >> >>> There's an open bug to clean it up: https://bugs.openjdk.org/browse/JDK-8039555 FWIW, I gave it a try at some point but I couldn't find a better solution. >> >> I can echo this issue, the inlining code does feel a little messy. I hope this patch does not make it worse, I'd say it keeps the messiness the same. > >> Yes, `PrintInlining` reports late inlines, but I think it would be nice for it to explicitly state which inlines are late inlines. I want to print `late inline`. > > I get it now and that looks reasonable to me. What about method handle invokes and late inlining of virtual calls. For those 2, the call site is initially found to not be a candidate for inlining and only later the compiler finds that it can inline. Does your change cover those 2 cases? @rwestrel could you take a look? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16595#issuecomment-1879318818 From dlong at openjdk.org Fri Jan 5 23:10:23 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 5 Jan 2024 23:10:23 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v4] In-Reply-To: References: Message-ID: <_s5v6PDwZFV4oLrpaNVKf-hBoB73NjCw2r_uMzK5XlQ=.1bfc49db-6786-45ed-a2ca-8e719a910a6b@github.com> On Fri, 5 Jan 2024 12:45:02 GMT, Roland Westrelin wrote: >> I'm not a C2 expert, so my high-level comments might not all make sense, but here goes. >> - My first reaction was why does this need to be so complicated? Can't we treat the slow path and cache implementation as a black box, and just common up the high-level get()? In my mind the ScopedValue get should be similar to a read from a "condy" dynamic constant. >> - The reason for breaking up the operations into individual nodes seems to be because of the profiling information. So I'm wondering how much this helps, given the added complexity. >> - I would expect multiple get() calls in the same method or in loops to be rare and/or programmer errors. The real advantage seems to be when the binding and the get() can be connected from caller to callee through inlining. >> - Are we able to optimize a get() on a constant/final ScopedValue into a simple array load at a constant offset? >> - Needing to do things like treat ScopedValueGetHitsInCache as always successful give be a bad feeling for some reason, and seem unnecessary if we did more at a higher (macro?) level rather than eagerly expanding the high-level operation into individual nodes. > >> I'm not a C2 expert, so my high-level comments might not all make sense, but here goes. >> >> * My first reaction was why does this need to be so complicated? > > That's a fair reaction. > >> Can't we treat the slow path and cache implementation as a black box, and just common up the high-level get()? In my mind the ScopedValue get should be similar to a read from a "condy" dynamic constant. > > Initially, I thought about delaying the inlining of `get()` methods and simply have a pass that look for `get()` calls with the same inputs. I don't think that works well because the current late inlining framework can't delay inlining very late. We don't run loop opts before we're done with inlining for instance. If we wanted to hoist a call out of loop we would need loop opts. For instance, tt's likely a call to `get()` depends on a null check that we would need to hoist first. > > The other thing about optimizing `get()` calls is that they are heavy weight nodes (a high level `get()` macro node would be very similar to a `get()` call node whichever way you look at it). We don't know how to hoist a call out of loop. A call acts as a barrier on the entire memory state and get in the way of memory optimizations. If profile reports the slow path to be never taken then the shape of the `get()` becomes lighter weight. It doesn't disrupt other optimizations. Probing the cache acts as a load + test which we know how to hoist from a loop. > > It felt to me that it would be fairly common for the slow path to not be needed and given the shape without the slow path is much easier to optimize, it was important to be able to expose early on if the slow path was there or not. > >> >> * The reason for breaking up the operations into individual nodes seems to be because of the profiling information. So I'm wondering how much this helps, given the added complexity. > > The thing about `get()` is that in simple cases, it optimizes well because of profile data. A `get()` call once inlined can essentially be hoisted out of loop if all goes well. It doesn't take much for simple optimizations on `get()` to not happen anymore. The goal of this patch is to bring consistency and have optimizations work well in all sort of scenarios. But it would be hard to sell if the simple cases don't work as well as they do without the patch. And I believe that requires profile data. > >> >> * I would expect multiple get() calls in the same method or in loops to be rare and/or programmer errors. The re... Thanks @rwestrel, that helps. I have no objections to this change, but I don't understand C2 enough to do a deeper review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-1879363796 From duke at openjdk.org Fri Jan 5 23:12:48 2024 From: duke at openjdk.org (Joshua Cao) Date: Fri, 5 Jan 2024 23:12:48 GMT Subject: RFR: 8323095: Expand TraceOptoParse block output abbreviations Message-ID: `e` -> `exception block` `lphd` -> `loop head` Also removing an unnecessary space. The successor ids have a space before them. Examples from `java -Xcomp -XX:+TraceOptoParse -version`: Parsing block #8 at bci [33,39), successors: 9 16(exception block) loop head ------------- Commit messages: - 8323095: Expand TraceOptoParse block output abbreviations Changes: https://git.openjdk.org/jdk/pull/17289/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17289&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8323095 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/17289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17289/head:pull/17289 PR: https://git.openjdk.org/jdk/pull/17289 From dlong at openjdk.org Fri Jan 5 23:13:21 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 5 Jan 2024 23:13:21 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v4] In-Reply-To: References: Message-ID: On Fri, 5 Jan 2024 10:16:24 GMT, Andrew Haley wrote: > Maybe I'm misunderstanding this question, but that's what the scoped value cache does. @theRealAph I guess it boils down to whether the hash value can be treated as a compile-time constant, which seems possible because it's marked final. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-1879365710 From duke at openjdk.org Fri Jan 5 23:43:43 2024 From: duke at openjdk.org (Zhiqiang Zang) Date: Fri, 5 Jan 2024 23:43:43 GMT Subject: RFR: 8322077: Add Ideal transformation: (~a) | (~b) => ~(a & b) [v6] In-Reply-To: References: Message-ID: > Hello, > > (~a) | (~b) => ~(a & b) is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1617C28-L1617C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. > > Thanks. Zhiqiang Zang has updated the pull request incrementally with one additional commit since the last revision: remove unused code from tests. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16334/files - new: https://git.openjdk.org/jdk/pull/16334/files/154c69e5..f7e57ce4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16334&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16334&range=04-05 Stats: 2 lines in 2 files changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16334.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16334/head:pull/16334 PR: https://git.openjdk.org/jdk/pull/16334 From dlong at openjdk.org Fri Jan 5 23:45:22 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 5 Jan 2024 23:45:22 GMT Subject: RFR: 8323065: Unneccesary CodeBlob lookup in CompiledIC::internal_set_ic_destination In-Reply-To: References: Message-ID: On Fri, 5 Jan 2024 14:37:16 GMT, Aleksey Shipilev wrote: > I was looking at hotpath for IC stub cleaning (happens at safepoint), and one obvious thing is that we look-up `CodeBlob` from `call->instruction_address()` only to assert that is compiled one. It used to be protected by `#ifdef ASSERT` before [JDK-8212681](https://bugs.openjdk.org/browse/JDK-8212681), and pulled from it to be used in Mutex in JDK 12: https://hg.openjdk.org/jdk/jdk/rev/d6dc479bcdd3#l15.62 And the Mutex was shortly gone after [JDK-8214257](https://bugs.openjdk.org/browse/JDK-8214257). So we are exposing this code to product binaries since JDK 12. > > This fix reinstates the `ASSERT` block again. There are small improvements (~1..10us) for safepoint cleanup on small ad-hoc tests in release builds on my Mac. But since this whole thing involves looking up things in code cache, it may cost quite a lot. Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/17281#pullrequestreview-1807085790 From duke at openjdk.org Sat Jan 6 00:20:45 2024 From: duke at openjdk.org (Zhiqiang Zang) Date: Sat, 6 Jan 2024 00:20:45 GMT Subject: RFR: 8322077: Add Ideal transformation: (~a) | (~b) => ~(a & b) [v7] In-Reply-To: References: Message-ID: > Hello, > > (~a) | (~b) => ~(a & b) is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1617C28-L1617C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. > > Thanks. Zhiqiang Zang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: - remove unused code from tests. - update the copyright dates. - address comments. - untabify. - use common helpful functions. - include bug id. - include new optimization and tests. ------------- Changes: https://git.openjdk.org/jdk/pull/16334/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16334&range=06 Stats: 151 lines in 3 files changed: 151 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16334.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16334/head:pull/16334 PR: https://git.openjdk.org/jdk/pull/16334 From duke at openjdk.org Sat Jan 6 00:44:07 2024 From: duke at openjdk.org (Zhiqiang Zang) Date: Sat, 6 Jan 2024 00:44:07 GMT Subject: RFR: 8322077: Add Ideal transformation: (~a) | (~b) => ~(a & b) [v8] In-Reply-To: References: Message-ID: <7nWuqQFeSjlCr0JfuC9sxty9HNF37Ytrz6H6EWTApZg=.b9ef1659-5815-4540-828e-69f1f450afe0@github.com> > Hello, > > (~a) | (~b) => ~(a & b) is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1617C28-L1617C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. > > Thanks. Zhiqiang Zang has updated the pull request incrementally with one additional commit since the last revision: Add tests for using De Morgan's Law for both optimizations. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16334/files - new: https://git.openjdk.org/jdk/pull/16334/files/15a38bda..d8ed0f35 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16334&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16334&range=06-07 Stats: 218 lines in 2 files changed: 218 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16334.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16334/head:pull/16334 PR: https://git.openjdk.org/jdk/pull/16334 From duke at openjdk.org Sat Jan 6 00:44:29 2024 From: duke at openjdk.org (Zhiqiang Zang) Date: Sat, 6 Jan 2024 00:44:29 GMT Subject: RFR: 8322589: Add Ideal transformation: (~a) & (~b) => ~(a | b) [v5] In-Reply-To: References: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> Message-ID: On Fri, 5 Jan 2024 16:00:22 GMT, Emanuel Peter wrote: >> Zhiqiang Zang has updated the pull request incrementally with one additional commit since the last revision: >> >> update the copyright dates. > > Looks like a good idea :) > I left a few suggestions below. @eme64 @TobiHartmann Thanks for the comments. All addressed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16333#issuecomment-1879462996 From duke at openjdk.org Sat Jan 6 00:47:28 2024 From: duke at openjdk.org (Zhiqiang Zang) Date: Sat, 6 Jan 2024 00:47:28 GMT Subject: RFR: 8322077: Add Ideal transformation: (~a) | (~b) => ~(a & b) [v4] In-Reply-To: References: <5KKTYmY7dJ4nW0OEQ2UPuloIWOYc-pI9M8HRjoaRzw4=.f5eda6f9-c38c-4a2b-9690-cbb1791a2622@github.com> Message-ID: <2yuELMZxtnZVtWL2rhdtdzwIjg0tgZEi0csMHasBXtQ=.3e62b735-b4e5-4bfe-9594-bfb66e19205c@github.com> On Fri, 5 Jan 2024 16:22:38 GMT, Emanuel Peter wrote: >> Zhiqiang Zang has refreshed the contents of this pull request, and previous commits have been removed. Incremental views are not available. > > Looks like a good idea. Left a few comments. > > I would have merged this with https://github.com/openjdk/jdk/pull/16333, since it is essentially the symmetric case. But leave it separate now. > > It would be nice to have some shared tests, where both optimizations need to be combined. Like: > `(~a | ~b) & (~c | ~d)` -> `~(a & b) & ~(c & d)` -> `~((a & b) | (c & d))` @eme64 @TobiHartmann Thanks for the comments. All addressed. I rebased this PR onto #16333 so I was able to add these tests for using both optimizations. (the history was messed up). ------------- PR Comment: https://git.openjdk.org/jdk/pull/16334#issuecomment-1879464432 From aph at openjdk.org Sat Jan 6 10:10:23 2024 From: aph at openjdk.org (Andrew Haley) Date: Sat, 6 Jan 2024 10:10:23 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v4] In-Reply-To: References: Message-ID: On Fri, 5 Jan 2024 23:10:37 GMT, Dean Long wrote: > > Maybe I'm misunderstanding this question, but that's what the scoped value cache does. > > @theRealAph I guess it boils down to whether the hash value can be treated as a compile-time constant, which seems possible because it's marked final. It always has been in the tests I've done. One of the interesting challenges with this work has been to make sure scoped value performance doesn't regress. A great advantage of this PR is that a dedicated scoped value optimization helps to make such regressions less likely. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-1879624947 From igavrilin at openjdk.org Sat Jan 6 16:27:41 2024 From: igavrilin at openjdk.org (Ilya Gavrilin) Date: Sat, 6 Jan 2024 16:27:41 GMT Subject: RFR: 8322790: RISC-V: Tune costs for shuffles with no conversion [v2] In-Reply-To: References: Message-ID: > Hi all, please review this small change to RISC-V nodes insertion costs. > Now we have several nodes which provide shuffles without conversion: https://github.com/openjdk/jdk/blob/32d80e2caf6063b58128bd5f3dc87b276f3bd0cb/src/hotspot/cpu/riscv/riscv.ad#L8525-L8741 > On most RISC-V cpu`s we prefer reg<->reg operations, because they are faster, but now stack<->reg operations used (for details about reasons, please, visit connected jbs issue). > After changing insertion costs reg<->reg operations selected, and we can see performance improvements for benchmarks, which use such shuffles (tested on thead C910 board): > | Benchmark | Upstream build (ops/ms) | Patched build (ops/ms) | difference (%) | > |:-----------------------------------:|:-----------------------:|:----------------------:|:--------------:| > | MathBench.doubleToRawLongBitsDouble | 30935.139 | 32171.761 | +4.00 | > | StrictMathBench.ceilDouble | 24682.810 | 29782.050 | +20.66 | > | StrictMathBench.cosDouble | 6948.309 | 6938.276 | -0.14 | > | StrictMathBench.expDouble | 6816.143 | 7211.021 | +5.79 | > | StrictMathBench.floorDouble | 30699.630 | 34189.509 | +11.37 | > | StrictMathBench.maxDouble | 35157.355 | 34675.191 | -1.37 | > | StrictMathBench.minDouble | 35192.135 | 35183.015 | -0.03 | > | StrictMathBench.sinDouble | 6698.405 | 6721.809 | +0.35 | > > New benchmark for changed nodes: > > --- a/test/micro/org/openjdk/bench/java/lang/MathBench.java > +++ b/test/micro/org/openjdk/bench/java/lang/MathBench.java > @@ -540,4 +540,11 @@ public class MathBench { > return Math.ulp(float7); > } > > + @Benchmark > + public long doubleToRawLongBitsDouble() { > + double dbl162Dot5 = double81 * 2.0d + double0Dot5; > + double dbl3 = double2 + double1; > + return Double.doubleToRawLongBits(dbl162Dot5) + Double.doubleToRawLongBits(dbl3); > + } > + Ilya Gavrilin has updated the pull request incrementally with one additional commit since the last revision: Revert some costs changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17206/files - new: https://git.openjdk.org/jdk/pull/17206/files/31066965..ae8bca99 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17206&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17206&range=00-01 Stats: 8 lines in 1 file changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/17206.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17206/head:pull/17206 PR: https://git.openjdk.org/jdk/pull/17206 From igavrilin at openjdk.org Sat Jan 6 16:27:43 2024 From: igavrilin at openjdk.org (Ilya Gavrilin) Date: Sat, 6 Jan 2024 16:27:43 GMT Subject: RFR: 8322790: RISC-V: Tune costs for shuffles with no conversion [v2] In-Reply-To: References: Message-ID: On Wed, 3 Jan 2024 05:19:43 GMT, Fei Yang wrote: >> those nodes need to go below 100 which then starts looking ugly > > Seems that the performance gain is still there (tested on lichee-pi-4a board) when reverting part of the changes. I haven't checked the JIT code though. Try this addon change: > > [addon-change.diff.txt](https://github.com/openjdk/jdk/files/13815870/addon-change.diff.txt) Thanks, reverting some changes still leaves good generation. I have performed some more benchmarks on thead board, in all cases necessary instructions are generated in JIT code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17206#discussion_r1443795840 From igavrilin at openjdk.org Sat Jan 6 16:30:21 2024 From: igavrilin at openjdk.org (Ilya Gavrilin) Date: Sat, 6 Jan 2024 16:30:21 GMT Subject: RFR: 8322790: RISC-V: Tune costs for shuffles with no conversion In-Reply-To: References: Message-ID: On Sat, 30 Dec 2023 20:07:13 GMT, Ilya Gavrilin wrote: > Hi all, please review this small change to RISC-V nodes insertion costs. > Now we have several nodes which provide shuffles without conversion: https://github.com/openjdk/jdk/blob/32d80e2caf6063b58128bd5f3dc87b276f3bd0cb/src/hotspot/cpu/riscv/riscv.ad#L8525-L8741 > On most RISC-V cpu`s we prefer reg<->reg operations, because they are faster, but now stack<->reg operations used (for details about reasons, please, visit connected jbs issue). > After changing insertion costs reg<->reg operations selected, and we can see performance improvements for benchmarks, which use such shuffles (tested on thead C910 board): > | Benchmark | Upstream build (ops/ms) | Patched build (ops/ms) | difference (%) | > |:-----------------------------------:|:-----------------------:|:----------------------:|:--------------:| > | MathBench.doubleToRawLongBitsDouble | 30935.139 | 32171.761 | +4.00 | > | StrictMathBench.ceilDouble | 24682.810 | 29782.050 | +20.66 | > | StrictMathBench.cosDouble | 6948.309 | 6938.276 | -0.14 | > | StrictMathBench.expDouble | 6816.143 | 7211.021 | +5.79 | > | StrictMathBench.floorDouble | 30699.630 | 34189.509 | +11.37 | > | StrictMathBench.maxDouble | 35157.355 | 34675.191 | -1.37 | > | StrictMathBench.minDouble | 35192.135 | 35183.015 | -0.03 | > | StrictMathBench.sinDouble | 6698.405 | 6721.809 | +0.35 | > > New benchmark for changed nodes: > > --- a/test/micro/org/openjdk/bench/java/lang/MathBench.java > +++ b/test/micro/org/openjdk/bench/java/lang/MathBench.java > @@ -540,4 +540,11 @@ public class MathBench { > return Math.ulp(float7); > } > > + @Benchmark > + public long doubleToRawLongBitsDouble() { > + double dbl162Dot5 = double81 * 2.0d + double0Dot5; > + double dbl3 = double2 + double1; > + return Double.doubleToRawLongBits(dbl162Dot5) + Double.doubleToRawLongBits(dbl3); > + } > + Thanks @RealFYang for suggested changes, performed some additional tests on thead board, also checked JIT code for some tests. | Benchmark | Upstream | Old patch | Current patch | |------------------------------------------|-----------|-----------|---------------| | lang.MathBench.doubleToRawLongBitsDouble | 30495.868 | 32332.48 | 31635.15 | | lang.MathBench.longBitsToDoubleLong | 35161.101 | 34542.878 | 34146.705 | | lang.StrictMathBench.ceilDouble | 24272.224 | 29797.862 | 29094.981 | | lang.StrictMathBench.cosDouble | 6967.161 | 6930.468 | 6960.957 | | lang.StrictMathBench.expDouble | 6812.605 | 7211.988 | 7123.429 | | lang.StrictMathBench.floorDouble | 29893.151 | 34193.412 | 33257.669 | | lang.StrictMathBench.maxDouble | 34684.497 | 35194.694 | 35199.944 | | lang.StrictMathBench.minDouble | 34692.521 | 34673.531 | 34678.324 | | lang.StrictMathBench.sinDouble | 6769.593 | 6714.003 | 6736.884 | | math.FpRoundingBenchmark.testnativeceil | 67.801 | 115.6 | 116.822 | | math.FpRoundingBenchmark.testnativefloor | 71.745 | 116.59 | 116.662 | Additional benchmarks: diff --git a/test/micro/org/openjdk/bench/java/lang/MathBench.java b/test/micro/org/openjdk/bench/java/lang/MathBench.java index 27d8033b8b7..fd39cc58222 100644 --- a/test/micro/org/openjdk/bench/java/lang/MathBench.java +++ b/test/micro/org/openjdk/bench/java/lang/MathBench.java @@ -540,4 +540,17 @@ public class MathBench { return Math.ulp(float7); } + @Benchmark + public long doubleToRawLongBitsDouble() { + double dbl162Dot5 = double81 * 2.0d + double0Dot5; + double dbl3 = double2 + double1; + return Double.doubleToRawLongBits(dbl162Dot5) + Double.doubleToRawLongBits(dbl3); + } + + @Benchmark + public double longBitsToDoubleLong() { + long lng14 = long13 + long1; + long lng750 = long747 + 3; + return Double.longBitsToDouble(lng14) + Double.longBitsToDouble(lng750); + } } diff --git a/test/micro/org/openjdk/bench/java/math/FpRoundingBenchmark.java b/test/micro/org/openjdk/bench/java/math/FpRoundingBenchmark.java index cf0eed32e07..3687f43b886 100644 --- a/test/micro/org/openjdk/bench/java/math/FpRoundingBenchmark.java +++ b/test/micro/org/openjdk/bench/java/math/FpRoundingBenchmark.java @@ -75,4 +75,16 @@ public class FpRoundingBenchmark { for (int i = 0; i < TESTSIZE; i++) Res[i] = Math.rint(DargV1[i]); } + + @Benchmark + public void testnativeceil(Blackhole bh) { + for (int i = 0; i < TESTSIZE; i++) + Res[i] = StrictMath.ceil(DargV1[i]); + } + + @Benchmark + public void testnativefloor(Blackhole bh) { + for (int i = 0; i < TESTSIZE; i++) + Res[i] = StrictMath.floor(DargV1[i]); + } } ------------- PR Comment: https://git.openjdk.org/jdk/pull/17206#issuecomment-1879745479 From aph at openjdk.org Sat Jan 6 17:46:21 2024 From: aph at openjdk.org (Andrew Haley) Date: Sat, 6 Jan 2024 17:46:21 GMT Subject: RFR: 8319690: [AArch64] C2 compilation hits offset_ok_for_immed: assert "c2 compiler bug" In-Reply-To: References: <16J-lJ2AceGTVcRWBcP15yKcwO-1IA1XsngyOuNjf7k=.0776f081-ae2c-4279-87cf-d909806c2bc4@github.com> <4KRCqYxn02wMjYDuN3_HbYWxc9BDtYMNd40bNrJ4K8w=.5362dda4-8d30-48fd-8915-909eb15a6023@github.com> Message-ID: On Thu, 7 Dec 2023 06:45:30 GMT, Fei Gao wrote: >> After this change, `immIOffset` and `immLOffset` appear to be obsolete. > >> After this change, `immIOffset` and `immLOffset` appear to be obsolete. > > Removed them in the new commit. Thanks! > @fg1417 what is the state on this? > > The example here may look like a strange edge-case. But with my plans this may become more common: [JDK-8318446](https://bugs.openjdk.org/browse/JDK-8318446) C2: implement StoreNode::Ideal_merge_stores > > I will merge stores, which can then look like unaligned stores of a larger type. This bug here blocks my progress. I'm not in a hurry, just curious if there is now a plan how to proceed here ;) The problem with this PR is that the code is way too complex for such a simple problem. The port is correct as it is, in the release build. The only problem is an assertion. We could simply remove that assertion, but if it were me I'd fix the problem properly. Both @dean-long and I have suggested ways to improve this patch with less code. If @fg1417 decides to drop this PR I'll fix it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16991#issuecomment-1879765450 From qamai at openjdk.org Sun Jan 7 15:52:17 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Sun, 7 Jan 2024 15:52:17 GMT Subject: RFR: 8282365: Consolidate and improve division by constant idealizations [v43] In-Reply-To: References: Message-ID: > This patch implements idealisation for unsigned divisions to change a division by a constant into a series of multiplication and shift. I also change the idealisation of `DivI` to get a more efficient series when the magic constant overflows an int32. > > In general, the idea behind a signed division transformation is that for a positive constant `d`, we would need to find constants `c` and `m` so that: > > floor(x / d) = floor(x * c / 2**m) for 0 < x < 2**(N - 1) (1) > ceil(x / d) = floor(x * c / 2**m) + 1 for -2**(N - 1) <= x < 0 (2) > > The implementation in the original book takes into consideration that the machine may not be able to perform the full multiplication `x * c`, so the constant overflow and we need to add back the dividend as in `DivLNode::Ideal` cases. However, for int32 division, `x * c` cannot overflow an int64. As a result, it is always feasible to just calculate the product and shift the result. > > For unsigned multiplication, the situation is somewhat trickier because the condition needs to be twice as strong (the condition (1) and (2) above are mostly the same). This results in the magic constant `c` calculated based on the method presented in Hacker's Delight by Henry S. Warren, Jr. may overflow an uintN. For int division, we can depend on the theorem devised by Arch D. Robison in N-Bit Unsigned Division Via N-Bit Multiply-Add, which states that there exists either: > > c1 in uint32 and m1, such that floor(x / d) = floor(x * c1 / 2**m1) for 0 < x < 2**32 (3) > c2 in uint32 and m2, such that floor(x / d) = floor((x + 1) * c2 / 2**m2) for 0 < x < 2**32 (4) > > which means that either `x * c1` never overflows an uint64 or `(x + 1) * c2` never overflows an uint64. And we can perform a full multiplication. > > For longs, there is no way to do a full multiplication so we do some basic transformations to achieve a computable formula. The details I have written as comments in the overflow case. > > More tests are added to cover the possible patterns. > > Please take a look and have some reviews. Thank you very much. Quan Anh Mai has updated the pull request incrementally with two additional commits since the last revision: - parentheses - another round of reviews ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9947/files - new: https://git.openjdk.org/jdk/pull/9947/files/0f2c57c7..bba52b74 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9947&range=42 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9947&range=41-42 Stats: 18 lines in 3 files changed: 8 ins; 2 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/9947.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/9947/head:pull/9947 PR: https://git.openjdk.org/jdk/pull/9947 From qamai at openjdk.org Sun Jan 7 15:52:20 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Sun, 7 Jan 2024 15:52:20 GMT Subject: RFR: 8282365: Consolidate and improve division by constant idealizations [v42] In-Reply-To: References: Message-ID: On Tue, 2 Jan 2024 09:23:37 GMT, Stefan Karlsson wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> power of 2 > > test/hotspot/gtest/opto/test_constant_division.cpp line 29: > >> 27: #include "runtime/os.hpp" >> 28: #include "utilities/growableArray.hpp" >> 29: #include > > Move include. I was told that `unittest.hpp` should come last so this is the order, I have added a line between JDK header and stdlib header as well as resolved your other comments. Thanks a lot. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1444028106 From qamai at openjdk.org Sun Jan 7 15:52:22 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Sun, 7 Jan 2024 15:52:22 GMT Subject: RFR: 8282365: Consolidate and improve division by constant idealizations [v42] In-Reply-To: References: Message-ID: On Sat, 30 Dec 2023 18:36:19 GMT, Kim Barrett wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> power of 2 > > test/hotspot/gtest/opto/test_constant_division.cpp line 33: > >> 31: >> 32: // Generate a random positive integer of type T in a way that biases >> 33: // towards smaller values > > Why is there a bias toward smaller numbers? Maybe it should be named differently to indicate that bias? Because we are dealing with inputs of division so it makes more sense to have them following somewhat a reciprocal distribution. > test/hotspot/gtest/opto/test_constant_division.cpp line 54: > >> 52: template <> >> 53: julong random() { >> 54: juint bits = juint(os::random()) % 63 + 1; > > This change (`&` => `%`, and the similar change below) go a long way toward explaining why I couldn't > puzzle out what this function was intended to do. Note that `&` has lower precedence than `+`, so the > earlier version was masking with 64. The new version doesn't have that operator precedence mistake, > though I'd prefer the precedence be made explicit using parens. Yes that was my mistake, have added parentheses. > test/hotspot/gtest/opto/test_constant_division.cpp line 132: > >> 130: for (int i = 0; i < iter_num;) { >> 131: UT d = random(); >> 132: if ((d & (d - 1)) == 0) { > > We have `is_power_of_2` for this. This catches `d == 0` also so using `is_power_of_2` is a little misleading I think. > test/hotspot/gtest/opto/test_constant_division.cpp line 139: > >> 137: UT N_pos = random(); >> 138: if (N_neg < d && N_pos < d) { >> 139: continue; > > With sufficiently bad luck, we could spin here for a long time. (Similarly, though much less likely above with > the power-of-2 case.) That doesn't seem great. Of course, if one does count these skipped cases against > the iteration limit then with sufficiently bad luck one might not test anything. Rather than skipping the test > here, could you instead modify one of the values and proceed with the test? Yes I have done that, thanks a lot for your suggestions. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1444028650 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1444028423 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1444028703 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1444028776 From qamai at openjdk.org Sun Jan 7 16:22:55 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Sun, 7 Jan 2024 16:22:55 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v4] In-Reply-To: References: Message-ID: > Hi, > > This patch adds unsigned bounds and known bits constraints to `TypeInt` and `TypeLong`. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. The new constraints are applied to identity and value calls of the common nodes (Add, Sub, L/R/URShift, And, Or, Xor, bit counting, Cmp, Bool, ConvI2L/L2I), the detailed ideas for each node will be presented below. > > In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (~x & ones) == 0`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must normalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. > > Please kindly review, thanks very much. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-4 Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 26 commits: - Merge branch 'master' into improvevalue - improve add/sub implementation - Merge branch 'master' into improvevalue - typo - whitespace - fix tests for x86_32 - fix widen of ConvI2L - problem lists - format - comment - ... and 16 more: https://git.openjdk.org/jdk/compare/faa9c690...de1bac2e ------------- Changes: https://git.openjdk.org/jdk/pull/15440/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15440&range=03 Stats: 3753 lines in 35 files changed: 1953 ins; 1234 del; 566 mod Patch: https://git.openjdk.org/jdk/pull/15440.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15440/head:pull/15440 PR: https://git.openjdk.org/jdk/pull/15440 From kbarrett at openjdk.org Mon Jan 8 01:05:39 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 8 Jan 2024 01:05:39 GMT Subject: RFR: 8322816: RISC-V: Incorrect guarantee in patch_vtype [v2] In-Reply-To: References: Message-ID: > Please review this change that fixes a test for a guarantee. This also > removes a -Wparentheses warning when those are enabled (which is how the > problem was discovered). > > The problem is that operator precedence groups the sub-expressions differently > than intended. The fix is to override the operator precedence by adding > parentheses to achieve the intended grouping. > > Testing: Local (linux-x64) cross-build for linux-riscv with this change plus > -Wparentheses enabled and other changes to allow that to work. > > Requesting someone from the riscv porters to properly test this. Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: guarantee !vill ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17215/files - new: https://git.openjdk.org/jdk/pull/17215/files/a3723801..ab335602 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17215&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17215&range=00-01 Stats: 4 lines in 1 file changed: 0 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17215.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17215/head:pull/17215 PR: https://git.openjdk.org/jdk/pull/17215 From kbarrett at openjdk.org Mon Jan 8 01:05:40 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 8 Jan 2024 01:05:40 GMT Subject: RFR: 8322816: RISC-V: Incorrect guarantee in patch_vtype [v2] In-Reply-To: References: <2Dii1vRWzgzjyBGHrx_tH28SlYMBWUZ0h2mO7D5so_4=.817b5636-aa8a-4c6d-807a-29b1855a59a7@github.com> Message-ID: On Wed, 3 Jan 2024 01:59:00 GMT, Fei Yang wrote: >> Rather than removing the guarantee, wouldn't it be better to guarantee/assert `vill == 0`? >> Although looking at uses, that argument is a bool, so it should be `guarantee(!vill, ...)`. > > Hi, Yes, that's better. Maybe: `guarantee(!vill, "should be");` I've changed the guarantee as discussed. There are further cleanups possible here, but I'll leave that to the riscv port maintainers. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17215#discussion_r1444109348 From fyang at openjdk.org Mon Jan 8 02:08:22 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 8 Jan 2024 02:08:22 GMT Subject: RFR: 8322816: RISC-V: Incorrect guarantee in patch_vtype [v2] In-Reply-To: References: Message-ID: On Mon, 8 Jan 2024 01:05:39 GMT, Kim Barrett wrote: >> Please review this change that fixes a test for a guarantee. This also >> removes a -Wparentheses warning when those are enabled (which is how the >> problem was discovered). >> >> The problem is that operator precedence groups the sub-expressions differently >> than intended. The fix is to override the operator precedence by adding >> parentheses to achieve the intended grouping. >> >> Testing: Local (linux-x64) cross-build for linux-riscv with this change plus >> -Wparentheses enabled and other changes to allow that to work. >> >> Requesting someone from the riscv porters to properly test this. > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > guarantee !vill Marked as reviewed by fyang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/17215#pullrequestreview-1807946542 From xliu at openjdk.org Mon Jan 8 03:24:33 2024 From: xliu at openjdk.org (Xin Liu) Date: Mon, 8 Jan 2024 03:24:33 GMT Subject: RFR: 8322982: CTW fails to build after 8308753 Message-ID: This patch fixes the build error of CTW by sidelining ModuleInfoWriter.java. ModuleInfoWriter uses Class-File API, which has transitioned to preview. If we really need to compile it, we have to append --enable-preview and --source N. The fact is CTW itself doesn't depend on ModuleInfoWriter. I think it's easier to maintain CTW if we filter it out in Makefile. ------------- Commit messages: - 8322982: CTW fails to build after 8308753 Changes: https://git.openjdk.org/jdk/pull/17292/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17292&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8322982 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17292.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17292/head:pull/17292 PR: https://git.openjdk.org/jdk/pull/17292 From kbarrett at openjdk.org Mon Jan 8 05:37:45 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 8 Jan 2024 05:37:45 GMT Subject: RFR: 8322759: Eliminate -Wparentheses warnings in compiler code [v2] In-Reply-To: <496tGkQ1KUCrW1IHOETyvhqopkNYjsEoupxjo0Ze3Wg=.0223f494-fd34-4e4a-a31a-5030603f2113@github.com> References: <496tGkQ1KUCrW1IHOETyvhqopkNYjsEoupxjo0Ze3Wg=.0223f494-fd34-4e4a-a31a-5030603f2113@github.com> Message-ID: <-w16Kse74yx2EiWCorBtcKf1KXA1Rh5q-6Ze2T_qors=.06ead22b-ec5a-4859-888c-f0e3a283d7f3@github.com> > Please review this change to eliminate some -Wparentheses warnings. This > involved simply adding a few parentheses to make some implicit operator > precedence explicit. > > This change addresses non-C2 parts of the compiler component. > > Testing: mach5 tier1 > > Also ran mach5 tier1 with these changes in conjunction enabling -Wparentheses > and other changes needed to make that work. Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' into compiler-wparentheses - simplify asserts - update copyrights for new year - fix -Wparentheses warnings in non-C2 compiler code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17200/files - new: https://git.openjdk.org/jdk/pull/17200/files/b2a4515a..e130bb2f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17200&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17200&range=00-01 Stats: 5096 lines in 438 files changed: 2760 ins; 933 del; 1403 mod Patch: https://git.openjdk.org/jdk/pull/17200.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17200/head:pull/17200 PR: https://git.openjdk.org/jdk/pull/17200 From kbarrett at openjdk.org Mon Jan 8 05:37:47 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 8 Jan 2024 05:37:47 GMT Subject: RFR: 8322759: Eliminate -Wparentheses warnings in compiler code [v2] In-Reply-To: References: <496tGkQ1KUCrW1IHOETyvhqopkNYjsEoupxjo0Ze3Wg=.0223f494-fd34-4e4a-a31a-5030603f2113@github.com> Message-ID: On Wed, 3 Jan 2024 12:07:10 GMT, Aleksey Shipilev wrote: >> Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Merge branch 'master' into compiler-wparentheses >> - simplify asserts >> - update copyrights for new year >> - fix -Wparentheses warnings in non-C2 compiler code > > src/hotspot/share/compiler/compilerDefinitions.inline.hpp line 60: > >> 58: >> 59: inline bool CompilerConfig::is_c1_or_interpreter_only_no_jvmci() { >> 60: assert((is_jvmci_compiler() && is_jvmci()) || !is_jvmci_compiler(), "JVMCI compiler implies enabled JVMCI"); > > This looks like simply: > > > assert(!is_jvmci_compiler() || is_jvmci(), "JVMCI compiler implies enabled JVMCI"); Agreed. Changed accordingly. > src/hotspot/share/compiler/compilerDefinitions.inline.hpp line 117: > >> 115: // Tiered is basically C1 & (C2 | JVMCI) minus all the odd cases with restrictions. >> 116: inline bool CompilerConfig::is_tiered() { >> 117: assert((is_c1_simple_only() && is_c1_only()) || !is_c1_simple_only(), "c1 simple mode must imply c1-only mode"); > > Ditto, > > > assert(!is_c1_simple_only() || is_c1_only(), "c1 simple mode must imply c1-only mode"); Agreed. Changed accordingly. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17200#discussion_r1444184964 PR Review Comment: https://git.openjdk.org/jdk/pull/17200#discussion_r1444185000 From jbhateja at openjdk.org Mon Jan 8 06:09:22 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 8 Jan 2024 06:09:22 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v3] In-Reply-To: References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Message-ID: On Fri, 5 Jan 2024 10:02:28 GMT, Emanuel Peter wrote: > Thanks for the updates! > > One more idea: Your AVX2 solution has a lot of cost for converting the mask to a permutation. Might it make sense to split this off into a separate vector-node, so that it can float out of a loop if the mask is invariant? CompressV / ExpandV only accepts two inputs, vector to be operated on and mask under which operation is performed, permute table based implementation is specific to x86 backend implementation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17261#issuecomment-1880430502 From jbhateja at openjdk.org Mon Jan 8 06:09:24 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 8 Jan 2024 06:09:24 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2] In-Reply-To: References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Message-ID: <1GHGK7AGinCMKjFIB5oadUP0jiZrC39Z0hncAS3H-9Y=.eb617125-1f51-4cff-889e-b15321f5c72b@github.com> On Fri, 5 Jan 2024 09:45:11 GMT, Emanuel Peter wrote: > You are using `VectorMask pred = VectorMask.fromLong(ispecies, maskctr++);`. That basically systematically iterates over all masks, which is nice for a correctness test. But that would use different density inside one test run, right? The average over the loop is still at `50%`, correct? > > I was thinking more a run where the percentage over the whole loop is lower than maybe `1%`. That would get us to a point where maybe the branch prediction of non-vectorized code might be faster, what do you think? An imperative loop compression will check each mask bit to select compressible lane. Therefore mask with low or high density of set bits should show similar performance. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1444196848 From jbhateja at openjdk.org Mon Jan 8 06:23:46 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 8 Jan 2024 06:23:46 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v4] In-Reply-To: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Message-ID: > Hi, > > Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only targets. > Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 instruction set. > These are very frequently used APIs in columnar database filter operation. > > Implementation uses a lookup table to record permute indices. Table index is computed using > mask argument of compress/expand operation. > > Following are the performance number of JMH micro included with the patch. > > > System : Intel(R) Xeon(R) Platinum 8480+ (Sapphire Rapids) > > Baseline: > Benchmark (size) Mode Cnt Score Error Units > ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 142.767 ops/ms > ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 71.436 ops/ms > ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 35.992 ops/ms > ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 182.151 ops/ms > ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 91.096 ops/ms > ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 44.757 ops/ms > ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 184.099 ops/ms > ColumnFilterBenchmark.filterIntColumn 2047 thrpt 2 91.981 ops/ms > ColumnFilterBenchmark.filterIntColumn 4096 thrpt 2 45.170 ops/ms > ColumnFilterBenchmark.filterLongColumn 1024 thrpt 2 148.017 ops/ms > ColumnFilterBenchmark.filterLongColumn 2047 thrpt 2 73.516 ops/ms > ColumnFilterBenchmark.filterLongColumn 4096 thrpt 2 36.844 ops/ms > > Withopt: > Benchmark (size) Mode Cnt Score Error Units > ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 2051.707 ops/ms > ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 914.072 ops/ms > ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 489.898 ops/ms > ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 5324.195 ops/ms > ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 2587.229 ops/ms > ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 1278.665 ops/ms > ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 4149.384 ops/ms > ColumnFilterBenchmark.filterIntColumn 2047 thrpt 2 1791.170 ops/ms > ColumnFilterBenchmark.filterIntColumn 4096... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review suggestions incorporated. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17261/files - new: https://git.openjdk.org/jdk/pull/17261/files/ea0aa0b4..257a6351 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17261&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17261&range=02-03 Stats: 24 lines in 1 file changed: 2 ins; 2 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/17261.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17261/head:pull/17261 PR: https://git.openjdk.org/jdk/pull/17261 From thartmann at openjdk.org Mon Jan 8 06:55:24 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 8 Jan 2024 06:55:24 GMT Subject: RFR: 8320128: Clean up Parse constructor for OSR [v4] In-Reply-To: <-QQZANSu7PVpZ3CUX7kbp00f_xkhjiy7la3Z3gXutKA=.a1c938e4-12e0-44b8-a2dd-7cc0c0c9f09f@github.com> References: <0mxhhLR9DDdpiBhbXWPxCmMzo_OI9DYiIX36xGhxC_c=.1eed99b9-2dc3-4403-b3c1-22b6ebed6079@github.com> <-QQZANSu7PVpZ3CUX7kbp00f_xkhjiy7la3Z3gXutKA=.a1c938e4-12e0-44b8-a2dd-7cc0c0c9f09f@github.com> Message-ID: <_LWzDNl5A61rNi5D-W0kgE3nFG5dScUQ8KO1TtqMCKw=.ad6c34cc-1bd8-4bb8-9949-00f0ce09a432@github.com> On Thu, 4 Jan 2024 17:06:38 GMT, Xin Liu wrote: >> There's a special case for the constructor of Parse. If current compilation is OSR and it is handling the top-level method(depth() == 1), then >> >> 1. _tf = C->tf(); >> 2. _entry_bci = C->entry_bci(); >> 3. _flow = method()->get_osr_flow_analysis(_entry_bci); >> >> We don't need to assign those member data twice. We can also factor out _flow->failing() for the special case and normal cases. >> >> It's worth mentioning that we can't save ciTypeFlow computation because >> get_osr_flow_analysis(_entry_bci) actually needs get_flow_analysis(method()). > > Xin Liu has updated the pull request incrementally with one additional commit since the last revision: > > Use print_cr for the log message. All tests passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16669#issuecomment-1880464791 From thartmann at openjdk.org Mon Jan 8 06:58:21 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 8 Jan 2024 06:58:21 GMT Subject: RFR: 8322694: C1: Handle Constant and IfOp in NullCheckEliminator [v2] In-Reply-To: References: <1vvyuwLRjlWItKwCyighjCSM5SNbO4CSEE59hQtCU24=.b4783e52-328a-4ce3-8c92-7b736cea7546@github.com> Message-ID: On Wed, 3 Jan 2024 13:37:21 GMT, Denghui Dong wrote: >> This patch added the support for Constant and IfOn in NullCheckEliminator to eliminate more null check. >> >> testing: tier1-4 in progress > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > update Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17191#pullrequestreview-1808102434 From thartmann at openjdk.org Mon Jan 8 07:01:25 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 8 Jan 2024 07:01:25 GMT Subject: RFR: 8322589: Add Ideal transformation: (~a) & (~b) => ~(a | b) [v5] In-Reply-To: References: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> Message-ID: On Fri, 5 Jan 2024 21:43:39 GMT, Zhiqiang Zang wrote: >> Hello, >> >> `(~a) & (~b) => ~(a | b)` is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1616C28-L1616C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. >> >> Thanks. > > Zhiqiang Zang has updated the pull request incrementally with one additional commit since the last revision: > > update the copyright dates. Looks good to me otherwise. src/hotspot/share/opto/mulnode.cpp line 615: > 613: // Convert "(~a) & (~b)" into "~(a | b)" > 614: if (AddNode::is_not(phase, in(1), T_INT) && AddNode::is_not(phase, in(2), T_INT)) { > 615: Node *or_a_b = new OrINode(in(1)->in(1), in(2)->in(1)); Suggestion: Node* or_a_b = new OrINode(in(1)->in(1), in(2)->in(1)); Same below. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16333#pullrequestreview-1808105721 PR Review Comment: https://git.openjdk.org/jdk/pull/16333#discussion_r1444223718 From thartmann at openjdk.org Mon Jan 8 07:05:27 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 8 Jan 2024 07:05:27 GMT Subject: RFR: 8322589: Add Ideal transformation: (~a) & (~b) => ~(a | b) [v5] In-Reply-To: References: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> Message-ID: On Fri, 5 Jan 2024 21:43:39 GMT, Zhiqiang Zang wrote: >> Hello, >> >> `(~a) & (~b) => ~(a | b)` is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1616C28-L1616C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. >> >> Thanks. > > Zhiqiang Zang has updated the pull request incrementally with one additional commit since the last revision: > > update the copyright dates. src/hotspot/share/opto/addnode.hpp line 84: > 82: // Utility function to check if the given node is a NOT operation, > 83: // i.e., n == m ^ (-1). > 84: static bool is_not(PhaseGVN* phase, Node* n, BasicType bt); Could these be made non-static? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16333#discussion_r1444226697 From rehn at openjdk.org Mon Jan 8 07:30:25 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 8 Jan 2024 07:30:25 GMT Subject: RFR: 8322790: RISC-V: Tune costs for shuffles with no conversion [v2] In-Reply-To: References: Message-ID: On Sat, 6 Jan 2024 16:27:41 GMT, Ilya Gavrilin wrote: >> Hi all, please review this small change to RISC-V nodes insertion costs. >> Now we have several nodes which provide shuffles without conversion: https://github.com/openjdk/jdk/blob/32d80e2caf6063b58128bd5f3dc87b276f3bd0cb/src/hotspot/cpu/riscv/riscv.ad#L8525-L8741 >> On most RISC-V cpu`s we prefer reg<->reg operations, because they are faster, but now stack<->reg operations used (for details about reasons, please, visit connected jbs issue). >> After changing insertion costs reg<->reg operations selected, and we can see performance improvements for benchmarks, which use such shuffles (tested on thead C910 board): >> | Benchmark | Upstream build (ops/ms) | Patched build (ops/ms) | difference (%) | >> |:-----------------------------------:|:-----------------------:|:----------------------:|:--------------:| >> | MathBench.doubleToRawLongBitsDouble | 30935.139 | 32171.761 | +4.00 | >> | StrictMathBench.ceilDouble | 24682.810 | 29782.050 | +20.66 | >> | StrictMathBench.cosDouble | 6948.309 | 6938.276 | -0.14 | >> | StrictMathBench.expDouble | 6816.143 | 7211.021 | +5.79 | >> | StrictMathBench.floorDouble | 30699.630 | 34189.509 | +11.37 | >> | StrictMathBench.maxDouble | 35157.355 | 34675.191 | -1.37 | >> | StrictMathBench.minDouble | 35192.135 | 35183.015 | -0.03 | >> | StrictMathBench.sinDouble | 6698.405 | 6721.809 | +0.35 | >> >> New benchmark for changed nodes: >> >> --- a/test/micro/org/openjdk/bench/java/lang/MathBench.java >> +++ b/test/micro/org/openjdk/bench/java/lang/MathBench.java >> @@ -540,4 +540,11 @@ public class MathBench { >> return Math.ulp(float7); >> } >> >> + @Benchmark >> + public long doubleToRawLongBitsDouble() { >> + double dbl162Dot5 = double81 * 2.0d + double0Dot5; >> + double dbl3 = double2 + double1; >> + return Double.doubleToRawLongBits(dbl162Dot5) + Double.doubleToRawLongBits(dbl3); >> + } >> + > > Ilya Gavrilin has updated the pull request incrementally with one additional commit since the last revision: > > Revert some costs changes Still reasonable to me. ------------- Marked as reviewed by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17206#pullrequestreview-1808157228 From epeter at openjdk.org Mon Jan 8 07:47:25 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 8 Jan 2024 07:47:25 GMT Subject: RFR: 8322589: Add Ideal transformation: (~a) & (~b) => ~(a | b) [v5] In-Reply-To: References: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> Message-ID: On Mon, 8 Jan 2024 07:02:50 GMT, Tobias Hartmann wrote: >> Zhiqiang Zang has updated the pull request incrementally with one additional commit since the last revision: >> >> update the copyright dates. > > src/hotspot/share/opto/addnode.hpp line 84: > >> 82: // Utility function to check if the given node is a NOT operation, >> 83: // i.e., n == m ^ (-1). >> 84: static bool is_not(PhaseGVN* phase, Node* n, BasicType bt); > > Could these be made non-static? Hmm, I agree with this idea. `n->is_not(...)` would really be nicer. You'd probably have to move the two methods to `node.hpp`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16333#discussion_r1444251176 From epeter at openjdk.org Mon Jan 8 07:50:25 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 8 Jan 2024 07:50:25 GMT Subject: RFR: 8322077: Add Ideal transformation: (~a) | (~b) => ~(a & b) [v8] In-Reply-To: <7nWuqQFeSjlCr0JfuC9sxty9HNF37Ytrz6H6EWTApZg=.b9ef1659-5815-4540-828e-69f1f450afe0@github.com> References: <7nWuqQFeSjlCr0JfuC9sxty9HNF37Ytrz6H6EWTApZg=.b9ef1659-5815-4540-828e-69f1f450afe0@github.com> Message-ID: On Sat, 6 Jan 2024 00:44:07 GMT, Zhiqiang Zang wrote: >> Hello, >> >> (~a) | (~b) => ~(a & b) is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1617C28-L1617C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. >> >> Thanks. > > Zhiqiang Zang has updated the pull request incrementally with one additional commit since the last revision: > > Add tests for using De Morgan's Law for both optimizations. Nice, looks much better, thanks for the updates! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16334#issuecomment-1880515343 From kbarrett at openjdk.org Mon Jan 8 08:00:39 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 8 Jan 2024 08:00:39 GMT Subject: RFR: 8323115: x86-32: Incorrect predicates for cmov instruct transforms with UseSSE Message-ID: <7BcH08lab4xxFOxm0UAfhHKVyBBEhRWtIPEUJuOIlmo=.2c5b0918-19ac-4cb3-b2f5-11131ce73422@github.com> Please review this change that fixes generation of CMOV by C2 as controlled by UseSSE. The predicates controlling that generation were using implicit operator precedence that didn't have the expected grouping. Fixed by adding parentheses to make the desired grouping explicit. Testing: Ran GHA with -Wparentheses enabled along with this and other changes needed to make that work. ------------- Commit messages: - fix predicates for cmov with UseSSE Changes: https://git.openjdk.org/jdk/pull/17296/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17296&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8323115 Stats: 13 lines in 1 file changed: 0 ins; 0 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/17296.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17296/head:pull/17296 PR: https://git.openjdk.org/jdk/pull/17296 From thartmann at openjdk.org Mon Jan 8 08:39:27 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 8 Jan 2024 08:39:27 GMT Subject: RFR: 8322589: Add Ideal transformation: (~a) & (~b) => ~(a | b) [v5] In-Reply-To: References: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> Message-ID: On Fri, 5 Jan 2024 21:43:39 GMT, Zhiqiang Zang wrote: >> Hello, >> >> `(~a) & (~b) => ~(a | b)` is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1616C28-L1616C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. >> >> Thanks. > > Zhiqiang Zang has updated the pull request incrementally with one additional commit since the last revision: > > update the copyright dates. src/hotspot/share/opto/addnode.cpp line 260: > 258: } > 259: > 260: AddNode* AddNode::make_not(PhaseGVN* phase, Node*n, BasicType bt) { Suggestion: AddNode* AddNode::make_not(PhaseGVN* phase, Node* n, BasicType bt) { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16333#discussion_r1444288412 From thartmann at openjdk.org Mon Jan 8 08:40:26 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 8 Jan 2024 08:40:26 GMT Subject: RFR: 8322077: Add Ideal transformation: (~a) | (~b) => ~(a & b) [v8] In-Reply-To: <7nWuqQFeSjlCr0JfuC9sxty9HNF37Ytrz6H6EWTApZg=.b9ef1659-5815-4540-828e-69f1f450afe0@github.com> References: <7nWuqQFeSjlCr0JfuC9sxty9HNF37Ytrz6H6EWTApZg=.b9ef1659-5815-4540-828e-69f1f450afe0@github.com> Message-ID: On Sat, 6 Jan 2024 00:44:07 GMT, Zhiqiang Zang wrote: >> Hello, >> >> (~a) | (~b) => ~(a & b) is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1617C28-L1617C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. >> >> Thanks. > > Zhiqiang Zang has updated the pull request incrementally with one additional commit since the last revision: > > Add tests for using De Morgan's Law for both optimizations. Looks good to me otherwise. test/hotspot/jtreg/compiler/c2/irTests/DeMorganLawIntTests.java line 31: > 29: * @test > 30: * @bug 8322077 > 31: * @summary Test that Ideal transformations on the De Morgan's Law performe Suggestion: * @summary Test that Ideal transformations on the De Morgan's Law perform test/hotspot/jtreg/compiler/c2/irTests/DeMorganLawLongTests.java line 31: > 29: * @test > 30: * @bug 8322077 > 31: * @summary Test that Ideal transformations on the De Morgan's Law performe Suggestion: * @summary Test that Ideal transformations on the De Morgan's Law perform ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16334#pullrequestreview-1808302290 PR Review Comment: https://git.openjdk.org/jdk/pull/16334#discussion_r1444279955 PR Review Comment: https://git.openjdk.org/jdk/pull/16334#discussion_r1444280127 From epeter at openjdk.org Mon Jan 8 08:49:23 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 8 Jan 2024 08:49:23 GMT Subject: RFR: 8322735: C2: minor improvements of bubble sort used in SuperWord::packset_sort [v2] In-Reply-To: <1W_HcoiwEnXLvtqprc7L8mghr0BRH3a_ITn6Cerzb_c=.f6c90e24-7d41-418d-bc50-41bff396663b@github.com> References: <1W_HcoiwEnXLvtqprc7L8mghr0BRH3a_ITn6Cerzb_c=.f6c90e24-7d41-418d-bc50-41bff396663b@github.com> Message-ID: On Fri, 5 Jan 2024 08:57:33 GMT, Denghui Dong wrote: >> A minor improvement could be made for bubble sort in SuperWord::packset_sort to reduce the comparison count in bad cases. >> >> See https://en.wikipedia.org/wiki/Bubble_sort > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > update @D-D-H testing passed. Looks good. Thanks for the change! ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17190#pullrequestreview-1808357678 From ddong at openjdk.org Mon Jan 8 09:24:20 2024 From: ddong at openjdk.org (Denghui Dong) Date: Mon, 8 Jan 2024 09:24:20 GMT Subject: RFR: 8322735: C2: minor improvements of bubble sort used in SuperWord::packset_sort In-Reply-To: References: Message-ID: On Fri, 5 Jan 2024 10:14:08 GMT, Emanuel Peter wrote: >>> Please do the renaming, and then I can run testing and give you my approval. >> >> Updated. Thanks. > > @D-D-H nice. Testing running. @eme64 Thank you! Do I need a second review? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17190#issuecomment-1880632132 From epeter at openjdk.org Mon Jan 8 09:24:23 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 8 Jan 2024 09:24:23 GMT Subject: RFR: 8322735: C2: minor improvements of bubble sort used in SuperWord::packset_sort [v2] In-Reply-To: <1W_HcoiwEnXLvtqprc7L8mghr0BRH3a_ITn6Cerzb_c=.f6c90e24-7d41-418d-bc50-41bff396663b@github.com> References: <1W_HcoiwEnXLvtqprc7L8mghr0BRH3a_ITn6Cerzb_c=.f6c90e24-7d41-418d-bc50-41bff396663b@github.com> Message-ID: On Fri, 5 Jan 2024 08:57:33 GMT, Denghui Dong wrote: >> A minor improvement could be made for bubble sort in SuperWord::packset_sort to reduce the comparison count in bad cases. >> >> See https://en.wikipedia.org/wiki/Bubble_sort > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > update Yes, I think that would be preferrable, even though this is not a very complicated fix. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17190#issuecomment-1880633452 From stefank at openjdk.org Mon Jan 8 09:28:49 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 8 Jan 2024 09:28:49 GMT Subject: RFR: 8282365: Consolidate and improve division by constant idealizations [v42] In-Reply-To: References: Message-ID: On Sun, 7 Jan 2024 15:44:44 GMT, Quan Anh Mai wrote: >> test/hotspot/gtest/opto/test_constant_division.cpp line 29: >> >>> 27: #include "runtime/os.hpp" >>> 28: #include "utilities/growableArray.hpp" >>> 29: #include >> >> Move include. > > I was told that `unittest.hpp` should come last so this is the order, I have added a line between JDK header and stdlib header as well as resolved your other comments. Thanks a lot. The rules around the includes lines in our tests and what we currently have in the tests are messy at the movement. We should fix that when we find the time to. For HotSpot source code files the includes should be structured as:: hotspot includes blank line system includes There are some deviations from that, but those should be cleaned up instead of used as a precedence. For our tests we should add "unittest.hpp" at the end of the "hotspot includes" section. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1444334222 From thartmann at openjdk.org Mon Jan 8 09:33:34 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 8 Jan 2024 09:33:34 GMT Subject: RFR: 8310711: [IR Framework] Remove safepoint while printing handling [v2] In-Reply-To: <278BZfCuvI5xJSWh2PvZtONVaZ2QjxkWKj1NifCbFYE=.af0a47ab-63a6-47bd-953b-4b0756107227@github.com> References: <278BZfCuvI5xJSWh2PvZtONVaZ2QjxkWKj1NifCbFYE=.af0a47ab-63a6-47bd-953b-4b0756107227@github.com> Message-ID: On Fri, 5 Jan 2024 10:58:41 GMT, Christian Hagedorn wrote: >> This clean-up PR removes the handling of the `` message in the IR framework. It is no longer required since we dump the output of `PrintIdeal` to the hotspot_pid file differently since [JDK-8306922](https://bugs.openjdk.org/browse/JDK-8306922). There is no interrupting `` message anymore. I removed the corresponding now unneeded code together with the previously added test case for it. >> >> Testing: tier1-4 >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Update copyright year > - Merge branch 'master' into JDK-8310711 > - 8310711: [IR Framework] Remove safepoint while printing handling Marked as reviewed by thartmann (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16921#pullrequestreview-1808483045 From epeter at openjdk.org Mon Jan 8 09:33:35 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 8 Jan 2024 09:33:35 GMT Subject: RFR: 8310711: [IR Framework] Remove safepoint while printing handling [v2] In-Reply-To: <278BZfCuvI5xJSWh2PvZtONVaZ2QjxkWKj1NifCbFYE=.af0a47ab-63a6-47bd-953b-4b0756107227@github.com> References: <278BZfCuvI5xJSWh2PvZtONVaZ2QjxkWKj1NifCbFYE=.af0a47ab-63a6-47bd-953b-4b0756107227@github.com> Message-ID: On Fri, 5 Jan 2024 10:58:41 GMT, Christian Hagedorn wrote: >> This clean-up PR removes the handling of the `` message in the IR framework. It is no longer required since we dump the output of `PrintIdeal` to the hotspot_pid file differently since [JDK-8306922](https://bugs.openjdk.org/browse/JDK-8306922). There is no interrupting `` message anymore. I removed the corresponding now unneeded code together with the previously added test case for it. >> >> Testing: tier1-4 >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Update copyright year > - Merge branch 'master' into JDK-8310711 > - 8310711: [IR Framework] Remove safepoint while printing handling Marked as reviewed by epeter (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16921#pullrequestreview-1808483801 From ddong at openjdk.org Mon Jan 8 09:40:22 2024 From: ddong at openjdk.org (Denghui Dong) Date: Mon, 8 Jan 2024 09:40:22 GMT Subject: RFR: 8322735: C2: minor improvements of bubble sort used in SuperWord::packset_sort [v2] In-Reply-To: References: <1W_HcoiwEnXLvtqprc7L8mghr0BRH3a_ITn6Cerzb_c=.f6c90e24-7d41-418d-bc50-41bff396663b@github.com> Message-ID: On Mon, 8 Jan 2024 09:21:56 GMT, Emanuel Peter wrote: > Yes, I think that would be preferrable, even though this is not a very complicated fix. Okay. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17190#issuecomment-1880655516 From thartmann at openjdk.org Mon Jan 8 09:45:49 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 8 Jan 2024 09:45:49 GMT Subject: [jdk22] RFR: 8310844: [AArch64] C1 compilation fails because monitor offset in OSR buffer is too large for immediate Message-ID: Hi all, This pull request contains a backport of commit [ade21a96](https://github.com/openjdk/jdk/commit/ade21a965f8a5fc889cd48bba76fad507bdeddf5) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. The commit being backported was authored by Tobias Hartmann on 5 Jan 2024 and was reviewed by Andrew Haley and Christian Hagedorn. Thanks! ------------- Commit messages: - Backport ade21a965f8a5fc889cd48bba76fad507bdeddf5 Changes: https://git.openjdk.org/jdk22/pull/38/files Webrev: https://webrevs.openjdk.org/?repo=jdk22&pr=38&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8310844 Stats: 149 lines in 2 files changed: 147 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk22/pull/38.diff Fetch: git fetch https://git.openjdk.org/jdk22.git pull/38/head:pull/38 PR: https://git.openjdk.org/jdk22/pull/38 From chagedorn at openjdk.org Mon Jan 8 09:45:49 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 8 Jan 2024 09:45:49 GMT Subject: [jdk22] RFR: 8310844: [AArch64] C1 compilation fails because monitor offset in OSR buffer is too large for immediate In-Reply-To: References: Message-ID: <6RC5XTDso8lKKf66R1NODDilXgqJ0mpO_08yXAmFJuw=.51a89b78-6aed-4d65-bd2b-f5e40145db61@github.com> On Mon, 8 Jan 2024 09:36:25 GMT, Tobias Hartmann wrote: > Hi all, > > This pull request contains a backport of commit [ade21a96](https://github.com/openjdk/jdk/commit/ade21a965f8a5fc889cd48bba76fad507bdeddf5) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Tobias Hartmann on 5 Jan 2024 and was reviewed by Andrew Haley and Christian Hagedorn. > > Thanks! Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk22/pull/38#pullrequestreview-1808522531 From thartmann at openjdk.org Mon Jan 8 09:45:50 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 8 Jan 2024 09:45:50 GMT Subject: [jdk22] RFR: 8310844: [AArch64] C1 compilation fails because monitor offset in OSR buffer is too large for immediate In-Reply-To: References: Message-ID: On Mon, 8 Jan 2024 09:36:25 GMT, Tobias Hartmann wrote: > Hi all, > > This pull request contains a backport of commit [ade21a96](https://github.com/openjdk/jdk/commit/ade21a965f8a5fc889cd48bba76fad507bdeddf5) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Tobias Hartmann on 5 Jan 2024 and was reviewed by Andrew Haley and Christian Hagedorn. > > Thanks! Thanks, Christian! ------------- PR Comment: https://git.openjdk.org/jdk22/pull/38#issuecomment-1880663155 From kbarrett at openjdk.org Mon Jan 8 09:47:49 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 8 Jan 2024 09:47:49 GMT Subject: RFR: 8282365: Consolidate and improve division by constant idealizations [v42] In-Reply-To: References: Message-ID: On Mon, 8 Jan 2024 09:25:53 GMT, Stefan Karlsson wrote: >> I was told that `unittest.hpp` should come last so this is the order, I have added a line between JDK header and stdlib header as well as resolved your other comments. Thanks a lot. > > The rules around the includes lines in our tests and what we currently have in the tests are messy at the movement. We should fix that when we find the time to. > > For HotSpot source code files the includes should be structured as:: > > hotspot includes > blank line > system includes > > > There are some deviations from that, but those should be cleaned up instead of used as a precedence. For our tests we should add "unittest.hpp" at the end of the "hotspot includes" section. In the Oracle-internal discussion of include order from about a year ago, there was not a consensus decision about the position of "unittest.hpp". There was a concern that in some cases it really was required to be last for some technical reason. That needed (and still needs) investigation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1444358649 From shade at openjdk.org Mon Jan 8 09:53:22 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 8 Jan 2024 09:53:22 GMT Subject: RFR: 8323115: x86-32: Incorrect predicates for cmov instruct transforms with UseSSE In-Reply-To: <7BcH08lab4xxFOxm0UAfhHKVyBBEhRWtIPEUJuOIlmo=.2c5b0918-19ac-4cb3-b2f5-11131ce73422@github.com> References: <7BcH08lab4xxFOxm0UAfhHKVyBBEhRWtIPEUJuOIlmo=.2c5b0918-19ac-4cb3-b2f5-11131ce73422@github.com> Message-ID: On Mon, 8 Jan 2024 07:55:21 GMT, Kim Barrett wrote: > Please review this change that fixes generation of CMOV by C2 as controlled by > UseSSE. The predicates controlling that generation were using implicit > operator precedence that didn't have the expected grouping. Fixed by adding > parentheses to make the desired grouping explicit. > > Testing: Ran GHA with -Wparentheses enabled along with this and other changes > needed to make that work. Oh, ouch. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17296#pullrequestreview-1808549197 From thartmann at openjdk.org Mon Jan 8 10:09:23 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 8 Jan 2024 10:09:23 GMT Subject: RFR: 8323095: Expand TraceOptoParse block output abbreviations In-Reply-To: References: Message-ID: On Fri, 5 Jan 2024 23:07:12 GMT, Joshua Cao wrote: > `e` -> `exception block` > `lphd` -> `loop head` > > Also removing an unnecessary space. The successor ids have a space before them. > > Examples from `java -Xcomp -XX:+TraceOptoParse -version`: > > > Parsing block #8 at bci [33,39), successors: 9 16(exception block) loop head Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17289#pullrequestreview-1808606516 From thartmann at openjdk.org Mon Jan 8 10:11:21 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 8 Jan 2024 10:11:21 GMT Subject: RFR: 8323065: Unneccesary CodeBlob lookup in CompiledIC::internal_set_ic_destination In-Reply-To: References: Message-ID: On Fri, 5 Jan 2024 14:37:16 GMT, Aleksey Shipilev wrote: > I was looking at hotpath for IC stub cleaning (happens at safepoint), and one obvious thing is that we look-up `CodeBlob` from `call->instruction_address()` only to assert that is compiled one. It used to be protected by `#ifdef ASSERT` before [JDK-8212681](https://bugs.openjdk.org/browse/JDK-8212681), and pulled from it to be used in Mutex in JDK 12: https://hg.openjdk.org/jdk/jdk/rev/d6dc479bcdd3#l15.62 And the Mutex was shortly gone after [JDK-8214257](https://bugs.openjdk.org/browse/JDK-8214257). So we are exposing this code to product binaries since JDK 12. > > This fix reinstates the `ASSERT` block again. There are small improvements (~1..10us) for safepoint cleanup on small ad-hoc tests in release builds on my Mac. But since this whole thing involves looking up things in code cache, it may cost quite a lot. Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17281#pullrequestreview-1808611926 From thartmann at openjdk.org Mon Jan 8 10:19:22 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 8 Jan 2024 10:19:22 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v4] In-Reply-To: References: Message-ID: On Thu, 4 Jan 2024 08:12:22 GMT, Roland Westrelin wrote: >> This change implements C2 optimizations for calls to >> ScopedValue.get(). Indeed, in: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> `v2` can be replaced by `v1` and the second call to `get()` can be >> optimized out. That's true whatever is between the 2 calls unless a >> new mapping for `scopedValue` is created in between (when that happens >> no optimizations is performed for the method being compiled). Hoisting >> a `get()` call out of loop for a loop invariant `scopedValue` should >> also be legal in most cases. >> >> `ScopedValue.get()` is implemented in java code as a 2 step process. A >> cache is attached to the current thread object. If the `ScopedValue` >> object is in the cache then the result from `get()` is read from >> there. Otherwise a slow call is performed that also inserts the >> mapping in the cache. The cache itself is lazily allocated. One >> `ScopedValue` can be hashed to 2 different indexes in the cache. On a >> cache probe, both indexes are checked. As a consequence, the process >> of probing the cache is a multi step process (check if the cache is >> present, check first index, check second index if first index >> failed). If the cache is populated early on, then when the method that >> calls `ScopedValue.get()` is compiled, profile reports the slow path >> as never taken and only the read from the cache is compiled. >> >> To perform the optimizations, I added 3 new node types to C2: >> >> - the pair >> ScopedValueGetHitsInCacheNode/ScopedValueGetLoadFromCacheNode for >> the cache probe >> >> - a cfg node ScopedValueGetResultNode to help locate the result of the >> `get()` call in the IR graph. >> >> In pseudo code, once the nodes are inserted, the code of a `get()` is: >> >> >> hits_in_the_cache = ScopedValueGetHitsInCache(scopedValue) >> if (hits_in_the_cache) { >> res = ScopedValueGetLoadFromCache(hits_in_the_cache); >> } else { >> res = ..; //slow call possibly inlined. Subgraph can be arbitray complex >> } >> res = ScopedValueGetResult(res) >> >> >> In the snippet: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> Replacing `v2` by `v1` is then done by starting from the >> `ScopedValueGetResult` node for the second `get()` and looking for a >> dominating `ScopedValueGetResult` for the same `ScopedValue` >> object. When one is found, it is used as a replacement. Eliminating >> the second `get()` call is achieved by making >> `ScopedValueGetHitsInCache` always successful if there's a dominating >> `Scoped... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > merge fix Tests all pass now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-1880715729 From qamai at openjdk.org Mon Jan 8 10:23:22 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 8 Jan 2024 10:23:22 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v3] In-Reply-To: References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Message-ID: <6ipaD7eRW4J37zaeFEKVf2LUVE3C0LmZmoAeePCG2PE=.7bb8ff9a-638e-4e7f-bea2-a40a424004f0@github.com> On Mon, 8 Jan 2024 06:06:22 GMT, Jatin Bhateja wrote: >> Thanks for the updates! >> >> One more idea: Your AVX2 solution has a lot of cost for converting the mask to a permutation. Might it make sense to split this off into a separate vector-node, so that it can float out of a loop if the mask is invariant? > >> Thanks for the updates! >> >> One more idea: Your AVX2 solution has a lot of cost for converting the mask to a permutation. Might it make sense to split this off into a separate vector-node, so that it can float out of a loop if the mask is invariant? > > CompressV / ExpandV only accepts two inputs, vector to be operated on and mask under which operation is performed, permute table based implementation is specific to x86 backend implementation. @jatin-bhateja I think you can expand them in the matcher into several `MachNode`s that will get scheduled separately. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17261#issuecomment-1880724248 From shade at openjdk.org Mon Jan 8 10:29:30 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 8 Jan 2024 10:29:30 GMT Subject: RFR: 8323065: Unneccesary CodeBlob lookup in CompiledIC::internal_set_ic_destination In-Reply-To: References: Message-ID: On Fri, 5 Jan 2024 14:37:16 GMT, Aleksey Shipilev wrote: > I was looking at hotpath for IC stub cleaning (happens at safepoint), and one obvious thing is that we look-up `CodeBlob` from `call->instruction_address()` only to assert that is compiled one. It used to be protected by `#ifdef ASSERT` before [JDK-8212681](https://bugs.openjdk.org/browse/JDK-8212681), and pulled from it to be used in Mutex in JDK 12: https://hg.openjdk.org/jdk/jdk/rev/d6dc479bcdd3#l15.62 And the Mutex was shortly gone after [JDK-8214257](https://bugs.openjdk.org/browse/JDK-8214257). So we are exposing this code to product binaries since JDK 12. > > This fix reinstates the `ASSERT` block again. There are small improvements (~1..10us) for safepoint cleanup on small ad-hoc tests in release builds on my Mac. But since this whole thing involves looking up things in code cache, it may cost quite a lot. Thanks! I am going to integrate it then. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17281#issuecomment-1880732904 From shade at openjdk.org Mon Jan 8 10:29:31 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 8 Jan 2024 10:29:31 GMT Subject: Integrated: 8323065: Unneccesary CodeBlob lookup in CompiledIC::internal_set_ic_destination In-Reply-To: References: Message-ID: On Fri, 5 Jan 2024 14:37:16 GMT, Aleksey Shipilev wrote: > I was looking at hotpath for IC stub cleaning (happens at safepoint), and one obvious thing is that we look-up `CodeBlob` from `call->instruction_address()` only to assert that is compiled one. It used to be protected by `#ifdef ASSERT` before [JDK-8212681](https://bugs.openjdk.org/browse/JDK-8212681), and pulled from it to be used in Mutex in JDK 12: https://hg.openjdk.org/jdk/jdk/rev/d6dc479bcdd3#l15.62 And the Mutex was shortly gone after [JDK-8214257](https://bugs.openjdk.org/browse/JDK-8214257). So we are exposing this code to product binaries since JDK 12. > > This fix reinstates the `ASSERT` block again. There are small improvements (~1..10us) for safepoint cleanup on small ad-hoc tests in release builds on my Mac. But since this whole thing involves looking up things in code cache, it may cost quite a lot. This pull request has now been integrated. Changeset: eb9e754b Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/eb9e754b3a439cc3ce36c2c9393bc8b250343844 Stats: 4 lines in 1 file changed: 3 ins; 1 del; 0 mod 8323065: Unneccesary CodeBlob lookup in CompiledIC::internal_set_ic_destination Reviewed-by: dlong, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/17281 From thartmann at openjdk.org Mon Jan 8 10:32:21 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 8 Jan 2024 10:32:21 GMT Subject: RFR: 8323115: x86-32: Incorrect predicates for cmov instruct transforms with UseSSE In-Reply-To: <7BcH08lab4xxFOxm0UAfhHKVyBBEhRWtIPEUJuOIlmo=.2c5b0918-19ac-4cb3-b2f5-11131ce73422@github.com> References: <7BcH08lab4xxFOxm0UAfhHKVyBBEhRWtIPEUJuOIlmo=.2c5b0918-19ac-4cb3-b2f5-11131ce73422@github.com> Message-ID: On Mon, 8 Jan 2024 07:55:21 GMT, Kim Barrett wrote: > Please review this change that fixes generation of CMOV by C2 as controlled by > UseSSE. The predicates controlling that generation were using implicit > operator precedence that didn't have the expected grouping. Fixed by adding > parentheses to make the desired grouping explicit. > > Testing: Ran GHA with -Wparentheses enabled along with this and other changes > needed to make that work. The fix looks good to me but it's concerning that we never hit this in testing. Maybe it never fails because the expanded instructions are guarded by `predicate (UseSSE ...` as well. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17296#pullrequestreview-1808685025 From epeter at openjdk.org Mon Jan 8 10:36:22 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 8 Jan 2024 10:36:22 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v4] In-Reply-To: References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Message-ID: <7x_AB9EVEuOwt5SldzxWgEKIqDG3ovw6ngBCjL4XKzU=.c8c79b8a-3023-42f5-b8d6-9ed6183d97f8@github.com> On Mon, 8 Jan 2024 06:23:46 GMT, Jatin Bhateja wrote: >> Hi, >> >> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only targets. >> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 instruction set. >> These are very frequently used APIs in columnar database filter operation. >> >> Implementation uses a lookup table to record permute indices. Table index is computed using >> mask argument of compress/expand operation. >> >> Following are the performance number of JMH micro included with the patch. >> >> >> System : Intel(R) Xeon(R) Platinum 8480+ (Sapphire Rapids) >> >> Baseline: >> Benchmark (size) Mode Cnt Score Error Units >> ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 142.767 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 71.436 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 35.992 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 182.151 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 91.096 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 44.757 ops/ms >> ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 184.099 ops/ms >> ColumnFilterBenchmark.filterIntColumn 2047 thrpt 2 91.981 ops/ms >> ColumnFilterBenchmark.filterIntColumn 4096 thrpt 2 45.170 ops/ms >> ColumnFilterBenchmark.filterLongColumn 1024 thrpt 2 148.017 ops/ms >> ColumnFilterBenchmark.filterLongColumn 2047 thrpt 2 73.516 ops/ms >> ColumnFilterBenchmark.filterLongColumn 4096 thrpt 2 36.844 ops/ms >> >> Withopt: >> Benchmark (size) Mode Cnt Score Error Units >> ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 2051.707 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 914.072 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 489.898 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 5324.195 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 2587.229 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 1278.665 ops/ms >> ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 4149.384 ops/ms >> ColumnFilterBenchmark.filterIntColumn 2047 thrpt ... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review suggestions incorporated. Exactly, like @merykitty suggests: you can do a platform-dependent expansion. ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17261#pullrequestreview-1808218664 From epeter at openjdk.org Mon Jan 8 10:36:23 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 8 Jan 2024 10:36:23 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2] In-Reply-To: References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Message-ID: <_IaxcZYOfUasnC-VujuwT4nFF3KVdEcKU2Pt92o5UO8=.bc760b2f-8ce9-4021-beb8-bfb19827cce3@github.com> On Fri, 5 Jan 2024 09:35:34 GMT, Emanuel Peter wrote: >> Thanks for the comment addition! > > Improvement suggestion: > For a vector with 8 ints, we get `2^8 = 256` many bit patterns for the mask. The table has a row for each `mask` value, consisting of 8 ints, which provide the valid permute index corresponding to set bit position in the `mask`, or a -1 (default) value. @jatin-bhateja thanks for the update! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1444256359 From epeter at openjdk.org Mon Jan 8 10:36:25 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 8 Jan 2024 10:36:25 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2] In-Reply-To: <1GHGK7AGinCMKjFIB5oadUP0jiZrC39Z0hncAS3H-9Y=.eb617125-1f51-4cff-889e-b15321f5c72b@github.com> References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> <1GHGK7AGinCMKjFIB5oadUP0jiZrC39Z0hncAS3H-9Y=.eb617125-1f51-4cff-889e-b15321f5c72b@github.com> Message-ID: On Mon, 8 Jan 2024 06:06:20 GMT, Jatin Bhateja wrote: >> You are using `VectorMask pred = VectorMask.fromLong(ispecies, maskctr++);`. >> That basically systematically iterates over all masks, which is nice for a correctness test. >> But that would use different density inside one test run, right? The average over the loop is still at `50%`, correct? >> >> I was thinking more a run where the percentage over the whole loop is lower than maybe `1%`. That would get us to a point where maybe the branch prediction of non-vectorized code might be faster, what do you think? > >> You are using `VectorMask pred = VectorMask.fromLong(ispecies, maskctr++);`. That basically systematically iterates over all masks, which is nice for a correctness test. But that would use different density inside one test run, right? The average over the loop is still at `50%`, correct? >> >> I was thinking more a run where the percentage over the whole loop is lower than maybe `1%`. That would get us to a point where maybe the branch prediction of non-vectorized code might be faster, what do you think? > > An imperative loop for compression will check each mask bit to select compressible lane. Therefore mask with low or high density of set bits should show similar performance. Yes, IF it is vectorized, then there is no difference between high and low density. My concern was more if vectorization is preferrable over the scalar alternative in the low-density case, where branch prediction is more stable. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1444257535 From fyang at openjdk.org Mon Jan 8 10:43:23 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 8 Jan 2024 10:43:23 GMT Subject: RFR: 8322790: RISC-V: Tune costs for shuffles with no conversion [v2] In-Reply-To: References: Message-ID: On Sat, 6 Jan 2024 16:27:41 GMT, Ilya Gavrilin wrote: >> Hi all, please review this small change to RISC-V nodes insertion costs. >> Now we have several nodes which provide shuffles without conversion: https://github.com/openjdk/jdk/blob/32d80e2caf6063b58128bd5f3dc87b276f3bd0cb/src/hotspot/cpu/riscv/riscv.ad#L8525-L8741 >> On most RISC-V cpu`s we prefer reg<->reg operations, because they are faster, but now stack<->reg operations used (for details about reasons, please, visit connected jbs issue). >> After changing insertion costs reg<->reg operations selected, and we can see performance improvements for benchmarks, which use such shuffles (tested on thead C910 board): >> | Benchmark | Upstream build (ops/ms) | Patched build (ops/ms) | difference (%) | >> |:-----------------------------------:|:-----------------------:|:----------------------:|:--------------:| >> | MathBench.doubleToRawLongBitsDouble | 30935.139 | 32171.761 | +4.00 | >> | StrictMathBench.ceilDouble | 24682.810 | 29782.050 | +20.66 | >> | StrictMathBench.cosDouble | 6948.309 | 6938.276 | -0.14 | >> | StrictMathBench.expDouble | 6816.143 | 7211.021 | +5.79 | >> | StrictMathBench.floorDouble | 30699.630 | 34189.509 | +11.37 | >> | StrictMathBench.maxDouble | 35157.355 | 34675.191 | -1.37 | >> | StrictMathBench.minDouble | 35192.135 | 35183.015 | -0.03 | >> | StrictMathBench.sinDouble | 6698.405 | 6721.809 | +0.35 | >> >> New benchmark for changed nodes: >> >> --- a/test/micro/org/openjdk/bench/java/lang/MathBench.java >> +++ b/test/micro/org/openjdk/bench/java/lang/MathBench.java >> @@ -540,4 +540,11 @@ public class MathBench { >> return Math.ulp(float7); >> } >> >> + @Benchmark >> + public long doubleToRawLongBitsDouble() { >> + double dbl162Dot5 = double81 * 2.0d + double0Dot5; >> + double dbl3 = double2 + double1; >> + return Double.doubleToRawLongBits(dbl162Dot5) + Double.doubleToRawLongBits(dbl3); >> + } >> + > > Ilya Gavrilin has updated the pull request incrementally with one additional commit since the last revision: > > Revert some costs changes Marked as reviewed by fyang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/17206#pullrequestreview-1808723779 From stuefe at openjdk.org Mon Jan 8 11:24:51 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 8 Jan 2024 11:24:51 GMT Subject: RFR: JDK-8318444: Write details about compilation bailouts into crash reports [v6] In-Reply-To: References: Message-ID: > A little debugging aid to help analyze broken bailout chains, mainly in C2 (C1 is pretty clean). > > A broken bailout chain occurs when code marks a compilation as failed, but then either that function itself or any of its caller functions fails to abort the compilation. That may cause crashes, e.g. [JDK-8318183](https://bugs.openjdk.org/browse/JDK-8318183) or [JDK-8318445](https://bugs.openjdk.org/browse/JDK-8318445). > > Now, if the compiler initiates a bailout, it stores some context information - compile id, time, and call stack. That information is stored as part of `Compile` or `Compilation`, depending on the compiler. > > If we crash later during the same compilation, we print out that information as part of the crash report. That way, we have two call stacks, and it is easy to spot where the compiler failed to heed the bailout. > > --------- > > Looks like this (from https://github.com/openjdk/jdk/pull/16248). The first call stack is the crash point. The second call stack is the point where the compiler bailout was initiated. > > > Current CompileTask: > C2:2574 45 45 843 4 sun.nio.fs.UnixPath::resolve (17 bytes) > > Stack: [0x00007fa608cb3000,0x00007fa608db4000], sp=0x00007fa608daf310, free space=1008k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0x631bb4] Unique_Node_List::push(Node*)+0x20 (node.hpp:1650) > V [libjvm.so+0xb8ea65] ConnectionGraph::verify_ram_nodes(Compile*, Node*)+0x87 (escape.cpp:743) > V [libjvm.so+0x960dda] Compile::Optimize()+0x956 (compile.cpp:2361) > V [libjvm.so+0x959d6c] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x165e (compile.cpp:860) > V [libjvm.so+0x81bcd9] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x203 (c2compiler.cpp:134) > V [libjvm.so+0x97bf63] CompileBroker::invoke_compiler_on_method(CompileTask*)+0xac5 (compileBroker.cpp:2290) > V [libjvm.so+0x97a981] CompileBroker::compiler_thread_loop()+0x411 (compileBroker.cpp:1951) > V [libjvm.so+0x99ebc0] CompilerThread::thread_entry(JavaThread*, JavaThread*)+0x84 (compilerThread.cpp:61) > V [libjvm.so+0xde0050] JavaThread::thread_main_inner()+0x15c (javaThread.cpp:720) > V [libjvm.so+0xddfeea] JavaThread::run()+0x258 (javaThread.cpp:705) > V [libjvm.so+0x15f5a04] Thread::call_run()+0x1a8 (thread.cpp:220) > V [libjvm.so+0x12de0a2] thread_native_entry(Thread*)+0x1c3 (os_linux.cpp:785) > > siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x000000000000002... Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: - Merge branch 'openjdk:master' into JDK-8318444-Write-details-about-compilation-bailouts-into-crash-reports - Merge branch 'master' into JDK-8318444-Write-details-about-compilation-bailouts-into-crash-reports - Feedback Christian - Merge branch 'master' into JDK-8318444-Write-details-about-compilation-bailouts-into-crash-reports - Update src/hotspot/share/compiler/compilationFailureInfo.hpp Co-authored-by: Tobias Hartmann - Update src/hotspot/share/utilities/vmError.cpp Co-authored-by: Tobias Hartmann - reinstate elapsed time prefix in hs-err file - Merge branch 'openjdk:master' into JDK-8318444-Write-details-about-compilation-bailouts-into-crash-reports - wip - wip - ... and 3 more: https://git.openjdk.org/jdk/compare/eb9e754b...06f157c4 ------------- Changes: https://git.openjdk.org/jdk/pull/16247/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16247&range=05 Stats: 236 lines in 11 files changed: 224 ins; 5 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/16247.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16247/head:pull/16247 PR: https://git.openjdk.org/jdk/pull/16247 From thartmann at openjdk.org Mon Jan 8 11:39:26 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 8 Jan 2024 11:39:26 GMT Subject: [jdk22] Integrated: 8310844: [AArch64] C1 compilation fails because monitor offset in OSR buffer is too large for immediate In-Reply-To: References: Message-ID: On Mon, 8 Jan 2024 09:36:25 GMT, Tobias Hartmann wrote: > Hi all, > > This pull request contains a backport of commit [ade21a96](https://github.com/openjdk/jdk/commit/ade21a965f8a5fc889cd48bba76fad507bdeddf5) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Tobias Hartmann on 5 Jan 2024 and was reviewed by Andrew Haley and Christian Hagedorn. > > Thanks! This pull request has now been integrated. Changeset: 0442d772 Author: Tobias Hartmann URL: https://git.openjdk.org/jdk22/commit/0442d772b0eb253aebf8638eb966957ab2b694c2 Stats: 149 lines in 2 files changed: 147 ins; 0 del; 2 mod 8310844: [AArch64] C1 compilation fails because monitor offset in OSR buffer is too large for immediate Reviewed-by: chagedorn Backport-of: ade21a965f8a5fc889cd48bba76fad507bdeddf5 ------------- PR: https://git.openjdk.org/jdk22/pull/38 From shade at openjdk.org Mon Jan 8 11:45:25 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 8 Jan 2024 11:45:25 GMT Subject: RFR: 8320128: Clean up Parse constructor for OSR [v4] In-Reply-To: <-QQZANSu7PVpZ3CUX7kbp00f_xkhjiy7la3Z3gXutKA=.a1c938e4-12e0-44b8-a2dd-7cc0c0c9f09f@github.com> References: <0mxhhLR9DDdpiBhbXWPxCmMzo_OI9DYiIX36xGhxC_c=.1eed99b9-2dc3-4403-b3c1-22b6ebed6079@github.com> <-QQZANSu7PVpZ3CUX7kbp00f_xkhjiy7la3Z3gXutKA=.a1c938e4-12e0-44b8-a2dd-7cc0c0c9f09f@github.com> Message-ID: On Thu, 4 Jan 2024 17:06:38 GMT, Xin Liu wrote: >> There's a special case for the constructor of Parse. If current compilation is OSR and it is handling the top-level method(depth() == 1), then >> >> 1. _tf = C->tf(); >> 2. _entry_bci = C->entry_bci(); >> 3. _flow = method()->get_osr_flow_analysis(_entry_bci); >> >> We don't need to assign those member data twice. We can also factor out _flow->failing() for the special case and normal cases. >> >> It's worth mentioning that we can't save ciTypeFlow computation because >> get_osr_flow_analysis(_entry_bci) actually needs get_flow_analysis(method()). > > Xin Liu has updated the pull request incrementally with one additional commit since the last revision: > > Use print_cr for the log message. All right then, I think we are good to go. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16669#issuecomment-1880843848 From tholenstein at openjdk.org Mon Jan 8 11:48:32 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 8 Jan 2024 11:48:32 GMT Subject: RFR: JDK-8277869: Maven POMs are using HTTP links where HTTPS is available Message-ID: Replace `http` with `https` in xml files of IdealGraphVisualizer and LogCompilation. Moreover, replace the old URL (./maven-v4_0_0.xsd) with of the suggested one (./maven-4.0.0.xsd) for `xsi:schemaLocation` in pom.xml. Tested: IdealGraphVisualizer and LogCompilation build and run as expected. ------------- Commit messages: - replace http:// with https:// in IdealGraphVisualizer - LogCompilation use https and maven-4.0.0.xsd in pom.xml Changes: https://git.openjdk.org/jdk/pull/17302/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17302&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8277869 Stats: 43 lines in 40 files changed: 1 ins; 1 del; 41 mod Patch: https://git.openjdk.org/jdk/pull/17302.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17302/head:pull/17302 PR: https://git.openjdk.org/jdk/pull/17302 From shade at openjdk.org Mon Jan 8 11:50:22 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 8 Jan 2024 11:50:22 GMT Subject: RFR: 8322759: Eliminate -Wparentheses warnings in compiler code [v2] In-Reply-To: <-w16Kse74yx2EiWCorBtcKf1KXA1Rh5q-6Ze2T_qors=.06ead22b-ec5a-4859-888c-f0e3a283d7f3@github.com> References: <496tGkQ1KUCrW1IHOETyvhqopkNYjsEoupxjo0Ze3Wg=.0223f494-fd34-4e4a-a31a-5030603f2113@github.com> <-w16Kse74yx2EiWCorBtcKf1KXA1Rh5q-6Ze2T_qors=.06ead22b-ec5a-4859-888c-f0e3a283d7f3@github.com> Message-ID: On Mon, 8 Jan 2024 05:37:45 GMT, Kim Barrett wrote: >> Please review this change to eliminate some -Wparentheses warnings. This >> involved simply adding a few parentheses to make some implicit operator >> precedence explicit. >> >> This change addresses non-C2 parts of the compiler component. >> >> Testing: mach5 tier1 >> >> Also ran mach5 tier1 with these changes in conjunction enabling -Wparentheses >> and other changes needed to make that work. > > Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into compiler-wparentheses > - simplify asserts > - update copyrights for new year > - fix -Wparentheses warnings in non-C2 compiler code Looks reasonable, thanks! ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17200#pullrequestreview-1808919758 From shade at openjdk.org Mon Jan 8 11:55:24 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 8 Jan 2024 11:55:24 GMT Subject: RFR: 8322982: CTW fails to build after 8308753 In-Reply-To: References: Message-ID: On Mon, 8 Jan 2024 03:19:52 GMT, Xin Liu wrote: > This patch fixes the build error of CTW by sidelining ModuleInfoWriter.java. ModuleInfoWriter uses Class-File API, which has transitioned to preview. > If we really need to compile it, we have to append --enable-preview and --source N. > > The fact is CTW itself doesn't depend on ModuleInfoWriter. I think it's easier to maintain CTW if we filter it out in Makefile. test/hotspot/jtreg/testlibrary/ctw/Makefile line 50: > 48: $(TESTLIBRARY_DIR)/jtreg \ > 49: -maxdepth 1 -name '*.java') > 50: LIB_FILES=$(filter-out %ModuleInfoWriter.java, $(LIB_FILES_ORIG)) Looks reasonable, but I think you can chain these without introducing new variables: LIB_FILES = $(filter-out %ModuleInfoWriter.java, \ $(shell find $(TESTLIBRARY_DIR)/jdk/test/lib/ \ $(TESTLIBRARY_DIR)/jdk/test/lib/process \ $(TESTLIBRARY_DIR)/jdk/test/lib/util \ $(TESTLIBRARY_DIR)/jtreg \ -maxdepth 1 -name '*.java')) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17292#discussion_r1444501336 From shade at openjdk.org Mon Jan 8 12:13:21 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 8 Jan 2024 12:13:21 GMT Subject: RFR: 8323115: x86-32: Incorrect predicates for cmov instruct transforms with UseSSE In-Reply-To: References: <7BcH08lab4xxFOxm0UAfhHKVyBBEhRWtIPEUJuOIlmo=.2c5b0918-19ac-4cb3-b2f5-11131ce73422@github.com> Message-ID: On Mon, 8 Jan 2024 10:29:35 GMT, Tobias Hartmann wrote: > The fix looks good to me but it's concerning that we never hit this in testing. Maybe it never fails because the expanded instructions are guarded by `predicate (UseSSE ...` as well. In my experience fixing bugs in these FPU-related match rules is that it takes a combination of code shape and relevant hardware (that defaults for unusual `UseSSE <= 2`), or specific testing that runs with lower `UseSSE`. I think I was one of the few remaining people who ran x86_32 with `-XX:UseSSE=0`, for example, but finally stopped. I think going forward we would just need to require `UseSSE >= 2` for x86_32, like for x86_64, making these issues go away. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17296#issuecomment-1880883043 From chagedorn at openjdk.org Mon Jan 8 13:00:34 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 8 Jan 2024 13:00:34 GMT Subject: RFR: 8310711: [IR Framework] Remove safepoint while printing handling [v2] In-Reply-To: References: <278BZfCuvI5xJSWh2PvZtONVaZ2QjxkWKj1NifCbFYE=.af0a47ab-63a6-47bd-953b-4b0756107227@github.com> Message-ID: On Mon, 8 Jan 2024 09:30:53 GMT, Tobias Hartmann wrote: >> Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Update copyright year >> - Merge branch 'master' into JDK-8310711 >> - 8310711: [IR Framework] Remove safepoint while printing handling > > Marked as reviewed by thartmann (Reviewer). Thanks for the re-review @TobiHartmann @eme64! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16921#issuecomment-1880957797 From chagedorn at openjdk.org Mon Jan 8 13:00:36 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 8 Jan 2024 13:00:36 GMT Subject: Integrated: 8310711: [IR Framework] Remove safepoint while printing handling In-Reply-To: References: Message-ID: On Fri, 1 Dec 2023 12:47:48 GMT, Christian Hagedorn wrote: > This clean-up PR removes the handling of the `` message in the IR framework. It is no longer required since we dump the output of `PrintIdeal` to the hotspot_pid file differently since [JDK-8306922](https://bugs.openjdk.org/browse/JDK-8306922). There is no interrupting `` message anymore. I removed the corresponding now unneeded code together with the previously added test case for it. > > Testing: tier1-4 > > Thanks, > Christian This pull request has now been integrated. Changeset: 458e563c Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/458e563cd994f5e0f590c2144e8ed35d020d53d6 Stats: 461 lines in 6 files changed: 0 ins; 457 del; 4 mod 8310711: [IR Framework] Remove safepoint while printing handling Reviewed-by: thartmann, epeter ------------- PR: https://git.openjdk.org/jdk/pull/16921 From jvernee at openjdk.org Mon Jan 8 13:45:31 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 8 Jan 2024 13:45:31 GMT Subject: RFR: 8320310: CompiledMethod::has_monitors flag can be incorrect [v4] In-Reply-To: References: <4efExybeWDkEbcsckI1Qdz8kpYFqd-Rbmt7oiWz5qlo=.d8d38d0e-affa-48dc-b963-45f958041c4e@github.com> Message-ID: On Mon, 11 Dec 2023 18:38:55 GMT, Jorn Vernee wrote: >> Currently, the `CompiledMethod::has_monitors` flag is set when either a `monitorenter` is parsed by C1, and `monitorexit` is parsed by C1 or C2 during method compilation. However, not necessarily every bytecode of a method is parsed, which means that we could miss all `monitorenter`/`monitorexit` byte codes in a method, while it actually does use monitors. This can lead to situations where a thread holds a monitor, but `has_monitors` for all frames is set to `false`, leading to an assertion failure in 'freeze_internal' in continuationFreezeThaw.cpp: >> >> assert(monitors_on_stack(current) == ((current->held_monitor_count() - current->jni_monitor_count()) > 0), >> "Held monitor count and locks on stack invariant: " INT64_FORMAT " JNI: " INT64_FORMAT, (int64_t)current->held_monitor_count(), (int64_t)current->jni_monitor_count()); >> >> The proposed fix is to rely on `Method::has_monitor_bytecodes` to set the `has_monitors` flag when compiling, which is immune to issues where not all byte codes of a method are parsed during compilation. We can follow the pattern established for `has_reserved_stack_access`, which is similar. >> >> Note that this PR is based on: https://github.com/openjdk/jdk/pull/16416 which disables the assertion. The goal of this PR is to fix the issue, and then re-enable the assertion. >> >> Testing: Tier 1-4, `java/lang/Thread/virtual/stress/PinALot.java` > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > re-enable assert again Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16799#issuecomment-1881031735 From stuefe at openjdk.org Mon Jan 8 13:50:39 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 8 Jan 2024 13:50:39 GMT Subject: Integrated: JDK-8318444: Write details about compilation bailouts into crash reports In-Reply-To: References: Message-ID: On Wed, 18 Oct 2023 13:32:32 GMT, Thomas Stuefe wrote: > A little debugging aid to help analyze broken bailout chains, mainly in C2 (C1 is pretty clean). > > A broken bailout chain occurs when code marks a compilation as failed, but then either that function itself or any of its caller functions fails to abort the compilation. That may cause crashes, e.g. [JDK-8318183](https://bugs.openjdk.org/browse/JDK-8318183) or [JDK-8318445](https://bugs.openjdk.org/browse/JDK-8318445). > > Now, if the compiler initiates a bailout, it stores some context information - compile id, time, and call stack. That information is stored as part of `Compile` or `Compilation`, depending on the compiler. > > If we crash later during the same compilation, we print out that information as part of the crash report. That way, we have two call stacks, and it is easy to spot where the compiler failed to heed the bailout. > > --------- > > Looks like this (from https://github.com/openjdk/jdk/pull/16248). The first call stack is the crash point. The second call stack is the point where the compiler bailout was initiated. > > > Current CompileTask: > C2:2574 45 45 843 4 sun.nio.fs.UnixPath::resolve (17 bytes) > > Stack: [0x00007fa608cb3000,0x00007fa608db4000], sp=0x00007fa608daf310, free space=1008k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0x631bb4] Unique_Node_List::push(Node*)+0x20 (node.hpp:1650) > V [libjvm.so+0xb8ea65] ConnectionGraph::verify_ram_nodes(Compile*, Node*)+0x87 (escape.cpp:743) > V [libjvm.so+0x960dda] Compile::Optimize()+0x956 (compile.cpp:2361) > V [libjvm.so+0x959d6c] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x165e (compile.cpp:860) > V [libjvm.so+0x81bcd9] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x203 (c2compiler.cpp:134) > V [libjvm.so+0x97bf63] CompileBroker::invoke_compiler_on_method(CompileTask*)+0xac5 (compileBroker.cpp:2290) > V [libjvm.so+0x97a981] CompileBroker::compiler_thread_loop()+0x411 (compileBroker.cpp:1951) > V [libjvm.so+0x99ebc0] CompilerThread::thread_entry(JavaThread*, JavaThread*)+0x84 (compilerThread.cpp:61) > V [libjvm.so+0xde0050] JavaThread::thread_main_inner()+0x15c (javaThread.cpp:720) > V [libjvm.so+0xddfeea] JavaThread::run()+0x258 (javaThread.cpp:705) > V [libjvm.so+0x15f5a04] Thread::call_run()+0x1a8 (thread.cpp:220) > V [libjvm.so+0x12de0a2] thread_native_entry(Thread*)+0x1c3 (os_linux.cpp:785) > > siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x000000000000002... This pull request has now been integrated. Changeset: c90768c9 Author: Thomas Stuefe URL: https://git.openjdk.org/jdk/commit/c90768c93b26771bb8f4bdbe855d054ad089b337 Stats: 236 lines in 11 files changed: 224 ins; 5 del; 7 mod 8318444: Write details about compilation bailouts into crash reports Reviewed-by: thartmann, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/16247 From chagedorn at openjdk.org Mon Jan 8 14:41:47 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 8 Jan 2024 14:41:47 GMT Subject: RFR: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs [v59] In-Reply-To: References: Message-ID: <7koVi33WJ_-H7Z69jVF3kWIFB8cGzKlLW2chbzv2WZc=.cfdfda1c-1c7c-446a-bb9e-4df94643a566@github.com> On Thu, 4 Jan 2024 07:00:48 GMT, Emanuel Peter wrote: >> I want to push this in JDK23. >> After this fix here, I'm doing [this refactoring](https://github.com/openjdk/jdk/pull/16620). >> >> To calm your nerves: most of the changes are in auto-generated tests, and tests in general. >> >> **Context** >> >> `-XX:+AlignVector` ensures that SuperWord only creates LoadVector and StoreVector that can be memory aligned. This is achieved by iterating in the pre-loop until we reach the alignment boundary, then we can start the main loop properly aligned. However, this is not possible in all cases, sometimes some memory accesses cannot be guaranteed to be aligned, and we need to reject vectorization (at least partially, for some of the packs). >> >> Alignment is split into two tasks: >> - Alignment Correctness Checks: only relevant if `-XX:+AlignVector`. Need to reject vectorization if alignment is not possible. We must check if the address of the vector load/store is aligned with (divisible by) `ObjectAlignmentInBytes`. >> - Alignment by adjusting pre-loop limit: alignment is desirable even if `-XX:-AlignVector`. We would like to align the vectors with their vector width. >> >> **Problem** >> >> I have recently found a bug with our AlignVector [JDK-8310190](https://bugs.openjdk.org/browse/JDK-8310190). >> In that bug, we perform a misaligned memory vector access, which results in a `SIGBUS` on an ARM32 machine. >> Thanks @fg1417 for confirming this! >> Hence, we need to fix the alignment correctness checks. >> >> While working on this task, I also found some bugs in the "alignment by adjusting pre-loop limit": there were cases where it did not align the vectors correctly. >> >> **Problem Details** >> >> Reproducer: >> >> >> static void test(short[] a, short[] b, short mask) { >> for (int i = 0; i < RANGE; i+=8) { >> // Problematic for AlignVector >> b[i+0] = (short)(a[i+0] & mask); // best_memref, align 0 >> >> b[i+3] = (short)(a[i+3] & mask); // pack at offset 6 bytes >> b[i+4] = (short)(a[i+4] & mask); >> b[i+5] = (short)(a[i+5] & mask); >> b[i+6] = (short)(a[i+6] & mask); >> } >> } >> >> >> During `SuperWord::find_adjacent_refs` we used to check if the references are expected to be aligned. For that, we look at each "group" of references (eg all `LoadS`) and take the reference with the lowest offset. For that chosen reference, we check if it is alignable. If yes, we accept all references of that group, if no we reject all. >> >> This is problemati... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > some minor changes for Vladimir Some last minor comments. Otherwise, looks good! src/hotspot/share/opto/chaitin.cpp line 1794: > 1792: Node* PhaseChaitin::find_base_for_derived(Node** derived_base_map, Node* derived, uint& maxlrg) { > 1793: // See if already computed; if so return it > 1794: if(derived_base_map[derived->_idx]) { Suggestion: if (derived_base_map[derived->_idx]) { src/hotspot/share/opto/superword.cpp line 1620: > 1618: > 1619: const MemNode* mem_ref = pack->at(0)->as_Mem(); > 1620: VPointer mem_ref_p(mem_ref, phase(), lpt(), nullptr, false); Since you renamed `p` -> `pack`, you should also rename this one to pack: Suggestion: VPointer mem_ref_pack(mem_ref, phase(), lpt(), nullptr, false); src/hotspot/share/opto/superword.cpp line 1630: > 1628: mem_ref_p.invar(), > 1629: mem_ref_p.invar_factor(), > 1630: mem_ref_p.scale_in_bytes(), Suggestion: AlignmentSolver solver(pack->at(0)->as_Mem(), pack->size(), mem_ref_pack.base(), mem_ref_pack.offset_in_bytes(), mem_ref_pack.invar(), mem_ref_pack.invar_factor(), mem_ref_pack.scale_in_bytes(), src/hotspot/share/opto/superword.cpp line 1702: > 1700: if (current->is_constrained()) { > 1701: // Solution is constrained (not trivial) > 1702: // -> must change pre-limit to acheive alignment Suggestion: // -> must change pre-limit to achieve alignment src/hotspot/share/opto/vectorization.cpp line 756: > 754: // We describe the 6 terms: > 755: // 1) The "base" of the address is the address of a Java object (e.g. array), > 756: // and as such ObjectAlignmentInBytes (a power of 2) aligned. We have Suggestion: // and as such ObjectAlignmentInBytes (a power of 2) aligned. We have src/hotspot/share/opto/vectorization.cpp line 934: > 932: // > 933: // Hence, pre_iter_C_const has a non-trivial (because x > 1) periodic (periodicity x) > 934: // solution, i.e it has a constrained solution. Suggestion: // solution, i.e. it has a constrained solution. src/hotspot/share/opto/vectorization.cpp line 947: > 945: // (C_const + C_pre * pre_iter_C_const) % aw != 0 > 946: // > 947: // This is in constradiction with (4a), and therefore there cannot be any solution, Suggestion: // This is in contradiction with (4a), and therefore there cannot be any solution, src/hotspot/share/opto/vectorization.cpp line 1038: > 1036: // sign(C_pre) = C_pre / abs(C_pre) = (C_pre > 0) ? 1 : -1, (7) > 1037: // > 1038: // We know that abs(C_pre) as well as aw are a powers of 2, and since (5) we can define integer q: Suggestion: // We know that abs(C_pre) as well as aw are powers of 2, and since (5) we can define integer q: ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14785#pullrequestreview-1809074856 PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1444615477 PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1444624179 PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1444624796 PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1444632226 PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1444680843 PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1444699344 PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1444700628 PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1444711822 From chagedorn at openjdk.org Mon Jan 8 14:44:25 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 8 Jan 2024 14:44:25 GMT Subject: RFR: 8323095: Expand TraceOptoParse block output abbreviations In-Reply-To: References: Message-ID: On Fri, 5 Jan 2024 23:07:12 GMT, Joshua Cao wrote: > `e` -> `exception block` > `lphd` -> `loop head` > > Also removing an unnecessary space. The successor ids have a space before them. > > Examples from `java -Xcomp -XX:+TraceOptoParse -version`: > > > Parsing block #8 at bci [33,39), successors: 9 16(exception block) loop head Looks good to me, too. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17289#pullrequestreview-1809259801 From fgao at openjdk.org Mon Jan 8 14:46:41 2024 From: fgao at openjdk.org (Fei Gao) Date: Mon, 8 Jan 2024 14:46:41 GMT Subject: RFR: 8319690: [AArch64] C2 compilation hits offset_ok_for_immed: assert "c2 compiler bug" In-Reply-To: References: <16J-lJ2AceGTVcRWBcP15yKcwO-1IA1XsngyOuNjf7k=.0776f081-ae2c-4279-87cf-d909806c2bc4@github.com> <4KRCqYxn02wMjYDuN3_HbYWxc9BDtYMNd40bNrJ4K8w=.5362dda4-8d30-48fd-8915-909eb15a6023@github.com> Message-ID: On Sat, 6 Jan 2024 17:44:04 GMT, Andrew Haley wrote: >>> After this change, `immIOffset` and `immLOffset` appear to be obsolete. >> >> Removed them in the new commit. Thanks! > >> @fg1417 what is the state on this? >> >> The example here may look like a strange edge-case. But with my plans this may become more common: [JDK-8318446](https://bugs.openjdk.org/browse/JDK-8318446) C2: implement StoreNode::Ideal_merge_stores >> >> I will merge stores, which can then look like unaligned stores of a larger type. This bug here blocks my progress. I'm not in a hurry, just curious if there is now a plan how to proceed here ;) > > The problem with this PR is that the code is way too complex for such a simple problem. The port is correct as it is, in the release build. > > The only problem is an assertion. We could simply remove that assertion, but if it were me I'd fix the problem properly. Both @dean-long and I have suggested ways to improve this patch with less code. If @fg1417 decides to drop this PR I'll fix it. Sorry, I can't work on this right now. @theRealAph could you help to push the changes please? Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16991#issuecomment-1881144322 From fgao at openjdk.org Mon Jan 8 14:46:42 2024 From: fgao at openjdk.org (Fei Gao) Date: Mon, 8 Jan 2024 14:46:42 GMT Subject: Withdrawn: 8319690: [AArch64] C2 compilation hits offset_ok_for_immed: assert "c2 compiler bug" In-Reply-To: <16J-lJ2AceGTVcRWBcP15yKcwO-1IA1XsngyOuNjf7k=.0776f081-ae2c-4279-87cf-d909806c2bc4@github.com> References: <16J-lJ2AceGTVcRWBcP15yKcwO-1IA1XsngyOuNjf7k=.0776f081-ae2c-4279-87cf-d909806c2bc4@github.com> Message-ID: On Wed, 6 Dec 2023 06:24:59 GMT, Fei Gao wrote: > On LP64 systems, if the heap can be moved into low virtual address space (below 4GB) and the heap size is smaller than the interesting threshold of 4 GB, we can use unscaled decoding pattern for narrow klass decoding. It means that a generic field reference can be decoded by: > > cast<64> (32-bit compressed reference) + field_offset > > > When the `field_offset` is an immediate, on aarch64 platform, the unscaled decoding pattern can match perfectly with a direct addressing mode, i.e., `base_plus_offset`, supported by `LDR/STR` instructions. But for certain data width, not all immediates can be encoded in the instruction field of `LDR/STR` [[1]](https://github.com/openjdk/jdk/blob/8db7bad992a0f31de9c7e00c2657c18670539102/src/hotspot/cpu/aarch64/assembler_aarch64.inline.hpp#L33). The ranges are different as data widths vary. > > For example, when we try to load a value of long type at offset of `1030`, the address expression is `(AddP (DecodeN base) 1030)`. Before the patch, the expression was matching with `operand indOffIN()`. But, for 64-bit `LDR/STR`, signed immediate byte offset must be in the range -256 to 255 or positive immediate byte offset must be a multiple of 8 in the range 0 to 32760 [[2]](https://developer.arm.com/documentation/ddi0602/2023-09/Base-Instructions/LDR--immediate---Load-Register--immediate--?lang=en). `1030` can't be encoded in the instruction field. So, after matching, when we do checking for instruction encoding, the assertion would fail. > > In this patch, we're going to filter out invalid immediates when deciding if current addressing mode can be matched as `base_plus_offset`. We introduce `indOffIN4/indOffLN4` and `indOffIN8/indOffLN8` for 32-bit data type and 64-bit data type separately in the patch. E.g., for `memory4`, we remove the generic `indOffIN/indOffLN`, which matches wrong unscaled immediate range, and replace them with `indOffIN4/indOffLN4` instead. > > Since 8-bit and 16-bit `LDR/STR` instructions also support the unscaled decoding pattern, we add the addressing mode in the lists of `memory1` and `memory2` by introducing `indOffIN1/indOffLN1` and `indOffIN2/indOffLN2`. > > We also remove unused operands `indOffI/indOffl/indOffIN/indOffLN` to avoid misuse. > > Tier 1-3 passed on aarch64. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/16991 From roland at openjdk.org Mon Jan 8 14:49:00 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 8 Jan 2024 14:49:00 GMT Subject: RFR: 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 [v10] In-Reply-To: References: Message-ID: > Range check smearing and range check predication make an array access > dependent on 2 (or more in the case of RC smearing) conditions. As a > consequence, if a range check can be eliminated because there's an > identical dominating range check, the control dependent nodes that > could float and become dependent on the dominating range check cannot > be allowed to float because there's a risk that they would then bypass > one of the checks that make the access legal. > > `IfNode::dominated_by()` and `PhaseIdealLoop::dominated_by()` have > logic to prevent this: nodes that are control dependent on a range > check or predicate are not allowed to float. This is however not > sufficient as demonstrated by the test cases. > > In `TestArrayAccessAboveRCAfterSmearingOrPredication.testRangeCheckSmearing()`: > > > v += array[i]; > if (flag2) { > if (flag3) { > field = 0x42; > } > } > if (flagField == 1) { > v += array[i]; > } > > > The range check for the second `array[i]` load is replaced by the > dominating range check for the first `array[i]` but because the second > `array[i]` load could really be dependent on multiple range checks (in > case smearing happened which is not the case here), c2 doesn't allow > the second `array[i]` to float when the second range check is > removed. The second `array[i]` is then control dependent on: > > > if (flagField == 1) { > > > which is next found to be dominated by the same test: > > > if (flag == 1) { > > > and is removed. However nothing in `dominated_by()` treats node > dependent on tests that are not range check or predicates > specially. So the second `array[i]` is allowed to float and become > dependent on: > > > if (flag == 1) { > > > which is above the range check for that access. The test method in its > last invocation is passed an index for the array access that's widely > out of range. The array load happens before the range check and > crashes the VM. `testLoopPredication()` is a similar test where array > loads become dependent on predicates and end up above range checks. > > `TestArrayAccessCastIIAboveRC.java` is the test case from the bug > where for similar reasons a range check `CastII` ends up above its > range check, becomes top because its input becomes some integer that > conflicts with its type (but there's no condition to catch it). The > graph becomes broken and c2 crashes. > > Logic in the `dominated_by()` methods ... Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16886/files - new: https://git.openjdk.org/jdk/pull/16886/files/dbe3c4c1..2cc6f1d3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16886&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16886&range=08-09 Stats: 64 lines in 10 files changed: 29 ins; 0 del; 35 mod Patch: https://git.openjdk.org/jdk/pull/16886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16886/head:pull/16886 PR: https://git.openjdk.org/jdk/pull/16886 From epeter at openjdk.org Mon Jan 8 14:53:07 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 8 Jan 2024 14:53:07 GMT Subject: RFR: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs [v60] In-Reply-To: References: Message-ID: > I want to push this in JDK23. > After this fix here, I'm doing [this refactoring](https://github.com/openjdk/jdk/pull/16620). > > To calm your nerves: most of the changes are in auto-generated tests, and tests in general. > > **Context** > > `-XX:+AlignVector` ensures that SuperWord only creates LoadVector and StoreVector that can be memory aligned. This is achieved by iterating in the pre-loop until we reach the alignment boundary, then we can start the main loop properly aligned. However, this is not possible in all cases, sometimes some memory accesses cannot be guaranteed to be aligned, and we need to reject vectorization (at least partially, for some of the packs). > > Alignment is split into two tasks: > - Alignment Correctness Checks: only relevant if `-XX:+AlignVector`. Need to reject vectorization if alignment is not possible. We must check if the address of the vector load/store is aligned with (divisible by) `ObjectAlignmentInBytes`. > - Alignment by adjusting pre-loop limit: alignment is desirable even if `-XX:-AlignVector`. We would like to align the vectors with their vector width. > > **Problem** > > I have recently found a bug with our AlignVector [JDK-8310190](https://bugs.openjdk.org/browse/JDK-8310190). > In that bug, we perform a misaligned memory vector access, which results in a `SIGBUS` on an ARM32 machine. > Thanks @fg1417 for confirming this! > Hence, we need to fix the alignment correctness checks. > > While working on this task, I also found some bugs in the "alignment by adjusting pre-loop limit": there were cases where it did not align the vectors correctly. > > **Problem Details** > > Reproducer: > > > static void test(short[] a, short[] b, short mask) { > for (int i = 0; i < RANGE; i+=8) { > // Problematic for AlignVector > b[i+0] = (short)(a[i+0] & mask); // best_memref, align 0 > > b[i+3] = (short)(a[i+3] & mask); // pack at offset 6 bytes > b[i+4] = (short)(a[i+4] & mask); > b[i+5] = (short)(a[i+5] & mask); > b[i+6] = (short)(a[i+6] & mask); > } > } > > > During `SuperWord::find_adjacent_refs` we used to check if the references are expected to be aligned. For that, we look at each "group" of references (eg all `LoadS`) and take the reference with the lowest offset. For that chosen reference, we check if it is alignable. If yes, we accept all references of that group, if no we reject all. > > This is problematic as shown in this example. We have references at index offset `0, 3, 4, 5, 6`, and by... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review Thanks to Christian Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14785/files - new: https://git.openjdk.org/jdk/pull/14785/files/aef48ab4..76630041 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14785&range=59 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14785&range=58-59 Stats: 6 lines in 3 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/14785.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14785/head:pull/14785 PR: https://git.openjdk.org/jdk/pull/14785 From epeter at openjdk.org Mon Jan 8 14:53:08 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 8 Jan 2024 14:53:08 GMT Subject: RFR: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs [v59] In-Reply-To: <7koVi33WJ_-H7Z69jVF3kWIFB8cGzKlLW2chbzv2WZc=.cfdfda1c-1c7c-446a-bb9e-4df94643a566@github.com> References: <7koVi33WJ_-H7Z69jVF3kWIFB8cGzKlLW2chbzv2WZc=.cfdfda1c-1c7c-446a-bb9e-4df94643a566@github.com> Message-ID: On Mon, 8 Jan 2024 13:19:23 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> some minor changes for Vladimir > > src/hotspot/share/opto/superword.cpp line 1620: > >> 1618: >> 1619: const MemNode* mem_ref = pack->at(0)->as_Mem(); >> 1620: VPointer mem_ref_p(mem_ref, phase(), lpt(), nullptr, false); > > Since you renamed `p` -> `pack`, you should also rename this one to pack: > Suggestion: > > VPointer mem_ref_pack(mem_ref, phase(), lpt(), nullptr, false); I keep it without your suggestion. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1444766758 From epeter at openjdk.org Mon Jan 8 14:53:08 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 8 Jan 2024 14:53:08 GMT Subject: RFR: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs [v59] In-Reply-To: References: <7koVi33WJ_-H7Z69jVF3kWIFB8cGzKlLW2chbzv2WZc=.cfdfda1c-1c7c-446a-bb9e-4df94643a566@github.com> Message-ID: On Mon, 8 Jan 2024 14:48:01 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/superword.cpp line 1620: >> >>> 1618: >>> 1619: const MemNode* mem_ref = pack->at(0)->as_Mem(); >>> 1620: VPointer mem_ref_p(mem_ref, phase(), lpt(), nullptr, false); >> >> Since you renamed `p` -> `pack`, you should also rename this one to pack: >> Suggestion: >> >> VPointer mem_ref_pack(mem_ref, phase(), lpt(), nullptr, false); > > I keep it without your suggestion. The idea is that it is the pointer of the mem_ref ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1444769346 From roland at openjdk.org Mon Jan 8 14:58:29 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 8 Jan 2024 14:58:29 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v2] In-Reply-To: <42h7t16pyeYV2jszIztjGu0JE2ZZWnnJCiyRd2s2oLg=.fffb35a5-e208-442c-9157-ec5d3fcaa31d@github.com> References: <42h7t16pyeYV2jszIztjGu0JE2ZZWnnJCiyRd2s2oLg=.fffb35a5-e208-442c-9157-ec5d3fcaa31d@github.com> Message-ID: <_0ZJL7u55Fcg1yID2yjH4DHPkrgKTKeekpYtWG1YsAI=.e9caec05-a88d-4123-832d-6699a1990e49@github.com> On Thu, 7 Dec 2023 22:51:50 GMT, Joshua Cao wrote: >> I'm not 100% sure if this covers all case of late inlines. >> >> Passes jtreg tier1 locally on my Linux machine with a fastdebug build. With sample Java programs and -XX:+PrintInlining, I can see >> >> >> @ 15 java.lang.Float::valueOf (9 bytes) late inline (boxing method) > > Joshua Cao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: > > - 8319850: PrintInlining should report late inlines > - Revert "8319850: PrintInlining should report late inlines" > > This reverts commit c5bfb832ff989261b6b2c98f26017c6491fe3067. > - 8319850: PrintInlining should report late inlines When `InlineTree::ok_to_inline()` is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the `InlineTree::ok_to_inline()` has some useful information that's lost when late inlining happens? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16595#issuecomment-1881167177 From jvernee at openjdk.org Mon Jan 8 14:58:33 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 8 Jan 2024 14:58:33 GMT Subject: Integrated: 8320310: CompiledMethod::has_monitors flag can be incorrect In-Reply-To: <4efExybeWDkEbcsckI1Qdz8kpYFqd-Rbmt7oiWz5qlo=.d8d38d0e-affa-48dc-b963-45f958041c4e@github.com> References: <4efExybeWDkEbcsckI1Qdz8kpYFqd-Rbmt7oiWz5qlo=.d8d38d0e-affa-48dc-b963-45f958041c4e@github.com> Message-ID: On Thu, 23 Nov 2023 15:55:07 GMT, Jorn Vernee wrote: > Currently, the `CompiledMethod::has_monitors` flag is set when either a `monitorenter` is parsed by C1, and `monitorexit` is parsed by C1 or C2 during method compilation. However, not necessarily every bytecode of a method is parsed, which means that we could miss all `monitorenter`/`monitorexit` byte codes in a method, while it actually does use monitors. This can lead to situations where a thread holds a monitor, but `has_monitors` for all frames is set to `false`, leading to an assertion failure in 'freeze_internal' in continuationFreezeThaw.cpp: > > assert(monitors_on_stack(current) == ((current->held_monitor_count() - current->jni_monitor_count()) > 0), > "Held monitor count and locks on stack invariant: " INT64_FORMAT " JNI: " INT64_FORMAT, (int64_t)current->held_monitor_count(), (int64_t)current->jni_monitor_count()); > > The proposed fix is to rely on `Method::has_monitor_bytecodes` to set the `has_monitors` flag when compiling, which is immune to issues where not all byte codes of a method are parsed during compilation. We can follow the pattern established for `has_reserved_stack_access`, which is similar. > > Note that this PR is based on: https://github.com/openjdk/jdk/pull/16416 which disables the assertion. The goal of this PR is to fix the issue, and then re-enable the assertion. > > Testing: Tier 1-4, `java/lang/Thread/virtual/stress/PinALot.java` This pull request has now been integrated. Changeset: c8fa3e21 Author: Jorn Vernee URL: https://git.openjdk.org/jdk/commit/c8fa3e21e6a4fd7846932b545a1748cc1dc6d9f1 Stats: 48 lines in 5 files changed: 9 ins; 17 del; 22 mod 8320310: CompiledMethod::has_monitors flag can be incorrect Reviewed-by: vlivanov, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/16799 From roland at openjdk.org Mon Jan 8 15:01:12 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 8 Jan 2024 15:01:12 GMT Subject: RFR: 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 [v11] In-Reply-To: References: Message-ID: > Range check smearing and range check predication make an array access > dependent on 2 (or more in the case of RC smearing) conditions. As a > consequence, if a range check can be eliminated because there's an > identical dominating range check, the control dependent nodes that > could float and become dependent on the dominating range check cannot > be allowed to float because there's a risk that they would then bypass > one of the checks that make the access legal. > > `IfNode::dominated_by()` and `PhaseIdealLoop::dominated_by()` have > logic to prevent this: nodes that are control dependent on a range > check or predicate are not allowed to float. This is however not > sufficient as demonstrated by the test cases. > > In `TestArrayAccessAboveRCAfterSmearingOrPredication.testRangeCheckSmearing()`: > > > v += array[i]; > if (flag2) { > if (flag3) { > field = 0x42; > } > } > if (flagField == 1) { > v += array[i]; > } > > > The range check for the second `array[i]` load is replaced by the > dominating range check for the first `array[i]` but because the second > `array[i]` load could really be dependent on multiple range checks (in > case smearing happened which is not the case here), c2 doesn't allow > the second `array[i]` to float when the second range check is > removed. The second `array[i]` is then control dependent on: > > > if (flagField == 1) { > > > which is next found to be dominated by the same test: > > > if (flag == 1) { > > > and is removed. However nothing in `dominated_by()` treats node > dependent on tests that are not range check or predicates > specially. So the second `array[i]` is allowed to float and become > dependent on: > > > if (flag == 1) { > > > which is above the range check for that access. The test method in its > last invocation is passed an index for the array access that's widely > out of range. The array load happens before the range check and > crashes the VM. `testLoopPredication()` is a similar test where array > loads become dependent on predicates and end up above range checks. > > `TestArrayAccessCastIIAboveRC.java` is the test case from the bug > where for similar reasons a range check `CastII` ends up above its > range check, becomes top because its input becomes some integer that > conflicts with its type (but there's no condition to catch it). The > graph becomes broken and c2 crashes. > > Logic in the `dominated_by()` methods ... Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: whitespaces ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16886/files - new: https://git.openjdk.org/jdk/pull/16886/files/2cc6f1d3..51231631 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16886&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16886&range=09-10 Stats: 6 lines in 1 file changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/16886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16886/head:pull/16886 PR: https://git.openjdk.org/jdk/pull/16886 From igavrilin at openjdk.org Mon Jan 8 15:52:24 2024 From: igavrilin at openjdk.org (Ilya Gavrilin) Date: Mon, 8 Jan 2024 15:52:24 GMT Subject: RFR: 8322790: RISC-V: Tune costs for shuffles with no conversion [v2] In-Reply-To: References: Message-ID: <1PfPMl6oI_lYd-rw0LevGwVDph6ffIrIM_gZ2ikL0D0=.1e57ac0b-e14e-4b84-9920-71c18df0ecbe@github.com> On Mon, 8 Jan 2024 07:27:46 GMT, Robbin Ehn wrote: >> Ilya Gavrilin has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert some costs changes > > Still reasonable to me. @robehn @RealFYang Thanks for your reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17206#issuecomment-1881311125 From igavrilin at openjdk.org Mon Jan 8 15:56:33 2024 From: igavrilin at openjdk.org (Ilya Gavrilin) Date: Mon, 8 Jan 2024 15:56:33 GMT Subject: Integrated: 8322790: RISC-V: Tune costs for shuffles with no conversion In-Reply-To: References: Message-ID: On Sat, 30 Dec 2023 20:07:13 GMT, Ilya Gavrilin wrote: > Hi all, please review this small change to RISC-V nodes insertion costs. > Now we have several nodes which provide shuffles without conversion: https://github.com/openjdk/jdk/blob/32d80e2caf6063b58128bd5f3dc87b276f3bd0cb/src/hotspot/cpu/riscv/riscv.ad#L8525-L8741 > On most RISC-V cpu`s we prefer reg<->reg operations, because they are faster, but now stack<->reg operations used (for details about reasons, please, visit connected jbs issue). > After changing insertion costs reg<->reg operations selected, and we can see performance improvements for benchmarks, which use such shuffles (tested on thead C910 board): > | Benchmark | Upstream build (ops/ms) | Patched build (ops/ms) | difference (%) | > |:-----------------------------------:|:-----------------------:|:----------------------:|:--------------:| > | MathBench.doubleToRawLongBitsDouble | 30935.139 | 32171.761 | +4.00 | > | StrictMathBench.ceilDouble | 24682.810 | 29782.050 | +20.66 | > | StrictMathBench.cosDouble | 6948.309 | 6938.276 | -0.14 | > | StrictMathBench.expDouble | 6816.143 | 7211.021 | +5.79 | > | StrictMathBench.floorDouble | 30699.630 | 34189.509 | +11.37 | > | StrictMathBench.maxDouble | 35157.355 | 34675.191 | -1.37 | > | StrictMathBench.minDouble | 35192.135 | 35183.015 | -0.03 | > | StrictMathBench.sinDouble | 6698.405 | 6721.809 | +0.35 | > > New benchmark for changed nodes: > > --- a/test/micro/org/openjdk/bench/java/lang/MathBench.java > +++ b/test/micro/org/openjdk/bench/java/lang/MathBench.java > @@ -540,4 +540,11 @@ public class MathBench { > return Math.ulp(float7); > } > > + @Benchmark > + public long doubleToRawLongBitsDouble() { > + double dbl162Dot5 = double81 * 2.0d + double0Dot5; > + double dbl3 = double2 + double1; > + return Double.doubleToRawLongBits(dbl162Dot5) + Double.doubleToRawLongBits(dbl3); > + } > + This pull request has now been integrated. Changeset: 2acb5bd9 Author: Ilya Gavrilin Committer: Vladimir Kempik URL: https://git.openjdk.org/jdk/commit/2acb5bd9924511b58b0e57ea9eb6c2dee9fd3ee8 Stats: 5 lines in 1 file changed: 1 ins; 0 del; 4 mod 8322790: RISC-V: Tune costs for shuffles with no conversion Reviewed-by: rehn, fyang ------------- PR: https://git.openjdk.org/jdk/pull/17206 From roland at openjdk.org Mon Jan 8 16:12:34 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 8 Jan 2024 16:12:34 GMT Subject: RFR: 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 [v9] In-Reply-To: References: Message-ID: <7FURGQ8UskC-HVN6r-ne-VJnCH8XjxV7BAX5dIYiJhw=.7a559dc9-f6ef-4602-b784-66d2256a210d@github.com> On Thu, 4 Jan 2024 16:22:38 GMT, Emanuel Peter wrote: > I mostly left suggestions for better comments and improved naming. Thanks for reviewing this. I pushed an update with more comments/some renaming following your suggestions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16886#issuecomment-1881379045 From epeter at openjdk.org Mon Jan 8 16:12:55 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 8 Jan 2024 16:12:55 GMT Subject: RFR: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs [v57] In-Reply-To: References: <4g4SbB2RBLU-ZFcrH_ukdqC_QSoSvibNGanasAFl-lw=.731266a6-9974-402e-954e-e441706426ab@github.com> Message-ID: On Fri, 22 Dec 2023 15:40:47 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> Apply suggestions from code review by Christian >> >> Co-authored-by: Christian Hagedorn > > Thanks a lot Emanuel for all the discussions and for addressing all my comments online and offline :-) > > It looks very good now and it's easy to follow the logic. The proofs are great and really helpful to better understand the (rather simple in the end) code for proving and calculating the alignment solutions. Thanks for putting the extra effort in here. > > I will have another complete look at the entire PR in the new year. But I think it looks good! Thanks @chhagedorn for all the help to get this over the line :) Thanks @vnkozlov for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14785#issuecomment-1881377644 From epeter at openjdk.org Mon Jan 8 16:12:57 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 8 Jan 2024 16:12:57 GMT Subject: Integrated: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs In-Reply-To: References: Message-ID: On Thu, 6 Jul 2023 14:13:01 GMT, Emanuel Peter wrote: > I want to push this in JDK23. > After this fix here, I'm doing [this refactoring](https://github.com/openjdk/jdk/pull/16620). > > To calm your nerves: most of the changes are in auto-generated tests, and tests in general. > > **Context** > > `-XX:+AlignVector` ensures that SuperWord only creates LoadVector and StoreVector that can be memory aligned. This is achieved by iterating in the pre-loop until we reach the alignment boundary, then we can start the main loop properly aligned. However, this is not possible in all cases, sometimes some memory accesses cannot be guaranteed to be aligned, and we need to reject vectorization (at least partially, for some of the packs). > > Alignment is split into two tasks: > - Alignment Correctness Checks: only relevant if `-XX:+AlignVector`. Need to reject vectorization if alignment is not possible. We must check if the address of the vector load/store is aligned with (divisible by) `ObjectAlignmentInBytes`. > - Alignment by adjusting pre-loop limit: alignment is desirable even if `-XX:-AlignVector`. We would like to align the vectors with their vector width. > > **Problem** > > I have recently found a bug with our AlignVector [JDK-8310190](https://bugs.openjdk.org/browse/JDK-8310190). > In that bug, we perform a misaligned memory vector access, which results in a `SIGBUS` on an ARM32 machine. > Thanks @fg1417 for confirming this! > Hence, we need to fix the alignment correctness checks. > > While working on this task, I also found some bugs in the "alignment by adjusting pre-loop limit": there were cases where it did not align the vectors correctly. > > **Problem Details** > > Reproducer: > > > static void test(short[] a, short[] b, short mask) { > for (int i = 0; i < RANGE; i+=8) { > // Problematic for AlignVector > b[i+0] = (short)(a[i+0] & mask); // best_memref, align 0 > > b[i+3] = (short)(a[i+3] & mask); // pack at offset 6 bytes > b[i+4] = (short)(a[i+4] & mask); > b[i+5] = (short)(a[i+5] & mask); > b[i+6] = (short)(a[i+6] & mask); > } > } > > > During `SuperWord::find_adjacent_refs` we used to check if the references are expected to be aligned. For that, we look at each "group" of references (eg all `LoadS`) and take the reference with the lowest offset. For that chosen reference, we check if it is alignable. If yes, we accept all references of that group, if no we reject all. > > This is problematic as shown in this example. We have references at index offset `0, 3, 4, 5, 6`, and by... This pull request has now been integrated. Changeset: 827c71da Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/827c71dac9a5732f70bc7341743bce314cad302f Stats: 8892 lines in 23 files changed: 7569 ins; 362 del; 961 mod 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs Co-authored-by: Christian Hagedorn Reviewed-by: kvn, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/14785 From kxu at openjdk.org Mon Jan 8 17:45:31 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 8 Jan 2024 17:45:31 GMT Subject: RFR: 8320237 C2: late inlining of method handle invoke causes duplicate lines in PrintInlining output Message-ID: This PR resolves [JDK-8320237](https://bugs.openjdk.org/browse/JDK-8320237) The original behavior produces both a failure and success message upon late inlining which is confusing. The patch removes the failure message if inlining was successful. Huge thanks to @rwestrel for reporting and working out a solution. Unit test `test/hotspot/jtreg/compiler/inlining/TestDuplicatedLateInliningOutput.java` is added and passing. ------------- Commit messages: - update test summary, requirements, and VM flags - Merge branch 'master' into JDK-8320237 - make regex whitespace consistent - 8320237: C2: late inlining of method handle invoke causes duplicate lines in PrintInlining output Changes: https://git.openjdk.org/jdk/pull/17147/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17147&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8320237 Stats: 186 lines in 2 files changed: 186 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17147.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17147/head:pull/17147 PR: https://git.openjdk.org/jdk/pull/17147 From xliu at openjdk.org Mon Jan 8 18:53:38 2024 From: xliu at openjdk.org (Xin Liu) Date: Mon, 8 Jan 2024 18:53:38 GMT Subject: RFR: 8322982: CTW fails to build after 8308753 [v2] In-Reply-To: References: Message-ID: > This patch fixes the build error of CTW by sidelining ModuleInfoWriter.java. ModuleInfoWriter uses Class-File API, which has transitioned to preview. > If we really need to compile it, we have to append --enable-preview and --source N. > > The fact is CTW itself doesn't depend on ModuleInfoWriter. I think it's easier to maintain CTW if we filter it out in Makefile. Xin Liu has updated the pull request incrementally with one additional commit since the last revision: Combine two functions into one. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17292/files - new: https://git.openjdk.org/jdk/pull/17292/files/efd4e973..5ac1d9f1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17292&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17292&range=00-01 Stats: 4 lines in 1 file changed: 1 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17292.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17292/head:pull/17292 PR: https://git.openjdk.org/jdk/pull/17292 From xliu at openjdk.org Mon Jan 8 18:56:34 2024 From: xliu at openjdk.org (Xin Liu) Date: Mon, 8 Jan 2024 18:56:34 GMT Subject: Integrated: 8320128: Clean up Parse constructor for OSR In-Reply-To: <0mxhhLR9DDdpiBhbXWPxCmMzo_OI9DYiIX36xGhxC_c=.1eed99b9-2dc3-4403-b3c1-22b6ebed6079@github.com> References: <0mxhhLR9DDdpiBhbXWPxCmMzo_OI9DYiIX36xGhxC_c=.1eed99b9-2dc3-4403-b3c1-22b6ebed6079@github.com> Message-ID: On Wed, 15 Nov 2023 07:01:35 GMT, Xin Liu wrote: > There's a special case for the constructor of Parse. If current compilation is OSR and it is handling the top-level method(depth() == 1), then > > 1. _tf = C->tf(); > 2. _entry_bci = C->entry_bci(); > 3. _flow = method()->get_osr_flow_analysis(_entry_bci); > > We don't need to assign those member data twice. We can also factor out _flow->failing() for the special case and normal cases. > > It's worth mentioning that we can't save ciTypeFlow computation because > get_osr_flow_analysis(_entry_bci) actually needs get_flow_analysis(method()). This pull request has now been integrated. Changeset: d47393bd Author: Xin Liu URL: https://git.openjdk.org/jdk/commit/d47393bd8225e818f0f9cd45192a5e656018af11 Stats: 45 lines in 2 files changed: 19 ins; 17 del; 9 mod 8320128: Clean up Parse constructor for OSR Reviewed-by: thartmann, shade ------------- PR: https://git.openjdk.org/jdk/pull/16669 From shade at openjdk.org Mon Jan 8 19:28:24 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 8 Jan 2024 19:28:24 GMT Subject: RFR: 8322982: CTW fails to build after 8308753 [v2] In-Reply-To: References: Message-ID: On Mon, 8 Jan 2024 18:53:38 GMT, Xin Liu wrote: >> This patch fixes the build error of CTW by sidelining ModuleInfoWriter.java. ModuleInfoWriter uses Class-File API, which has transitioned to preview. >> If we really need to compile it, we have to append --enable-preview and --source N. >> >> The fact is CTW itself doesn't depend on ModuleInfoWriter. I think it's easier to maintain CTW if we filter it out in Makefile. > > Xin Liu has updated the pull request incrementally with one additional commit since the last revision: > > Combine two functions into one. Looks fine to me. Marked as reviewed by shade (Reviewer). test/hotspot/jtreg/testlibrary/ctw/Makefile line 45: > 43: > 44: SRC_FILES = $(shell find $(SRC_DIR) -name '*.java') > 45: # Exclude ModuleInfoWriter.java to circumvent '--enable-preview'. Wording: `Exclude files that need --enable-preview to compile`. There would probably be more files later. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17292#pullrequestreview-1809880221 PR Review: https://git.openjdk.org/jdk/pull/17292#pullrequestreview-1809881199 PR Review Comment: https://git.openjdk.org/jdk/pull/17292#discussion_r1445230097 From xliu at openjdk.org Mon Jan 8 19:48:34 2024 From: xliu at openjdk.org (Xin Liu) Date: Mon, 8 Jan 2024 19:48:34 GMT Subject: RFR: 8323095: Expand TraceOptoParse block output abbreviations In-Reply-To: References: Message-ID: On Fri, 5 Jan 2024 23:07:12 GMT, Joshua Cao wrote: > `e` -> `exception block` > `lphd` -> `loop head` > > Also removing an unnecessary space. The successor ids have a space before them. > > Examples from `java -Xcomp -XX:+TraceOptoParse -version`: > > > Parsing block #8 at bci [33,39), successors: 9 16(exception block) loop head LGMT. I am not a reviewer. ------------- Marked as reviewed by xliu (Committer). PR Review: https://git.openjdk.org/jdk/pull/17289#pullrequestreview-1809905645 From duke at openjdk.org Mon Jan 8 19:48:35 2024 From: duke at openjdk.org (Joshua Cao) Date: Mon, 8 Jan 2024 19:48:35 GMT Subject: Integrated: 8323095: Expand TraceOptoParse block output abbreviations In-Reply-To: References: Message-ID: On Fri, 5 Jan 2024 23:07:12 GMT, Joshua Cao wrote: > `e` -> `exception block` > `lphd` -> `loop head` > > Also removing an unnecessary space. The successor ids have a space before them. > > Examples from `java -Xcomp -XX:+TraceOptoParse -version`: > > > Parsing block #8 at bci [33,39), successors: 9 16(exception block) loop head This pull request has now been integrated. Changeset: 24823ba6 Author: Joshua Cao Committer: Xin Liu URL: https://git.openjdk.org/jdk/commit/24823ba647d4bf412586372cd5076f35bbc131a5 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod 8323095: Expand TraceOptoParse block output abbreviations Reviewed-by: thartmann, chagedorn, xliu ------------- PR: https://git.openjdk.org/jdk/pull/17289 From xliu at openjdk.org Mon Jan 8 20:08:37 2024 From: xliu at openjdk.org (Xin Liu) Date: Mon, 8 Jan 2024 20:08:37 GMT Subject: RFR: 8322982: CTW fails to build after 8308753 [v3] In-Reply-To: References: Message-ID: > This patch fixes the build error of CTW by sidelining ModuleInfoWriter.java. ModuleInfoWriter uses Class-File API, which has transitioned to preview. > If we really need to compile it, we have to append --enable-preview and --source N. > > The fact is CTW itself doesn't depend on ModuleInfoWriter. I think it's easier to maintain CTW if we filter it out in Makefile. Xin Liu has updated the pull request incrementally with one additional commit since the last revision: Wording and also remove add-modules required by ModuleInfoWriter.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17292/files - new: https://git.openjdk.org/jdk/pull/17292/files/5ac1d9f1..7978052e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17292&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17292&range=01-02 Stats: 5 lines in 1 file changed: 0 ins; 4 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17292.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17292/head:pull/17292 PR: https://git.openjdk.org/jdk/pull/17292 From xliu at openjdk.org Mon Jan 8 20:08:40 2024 From: xliu at openjdk.org (Xin Liu) Date: Mon, 8 Jan 2024 20:08:40 GMT Subject: RFR: 8322982: CTW fails to build after 8308753 [v2] In-Reply-To: References: Message-ID: On Mon, 8 Jan 2024 19:25:54 GMT, Aleksey Shipilev wrote: >> Xin Liu has updated the pull request incrementally with one additional commit since the last revision: >> >> Combine two functions into one. > > test/hotspot/jtreg/testlibrary/ctw/Makefile line 45: > >> 43: >> 44: SRC_FILES = $(shell find $(SRC_DIR) -name '*.java') >> 45: # Exclude ModuleInfoWriter.java to circumvent '--enable-preview'. > > Wording: `Exclude files that need --enable-preview to compile`. There would probably be more files later. I took a look at LIB_FILES. Only 'ModuleInfoWriter.java' depends on advanced APIs. It was added to testlibrary in [JDK-8304163](https://bugs.openjdk.org/browse/JDK-8304163). Yes, we may need to exclude more files in the future. Currently, Makefile selects LIB_FILES using wildcard matching. If it's necessary, we need to define LIB_FILES explicitly. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17292#discussion_r1445271062 From kvn at openjdk.org Mon Jan 8 20:50:21 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 8 Jan 2024 20:50:21 GMT Subject: RFR: JDK-8277869: Maven POMs are using HTTP links where HTTPS is available In-Reply-To: References: Message-ID: <_yC54VkkHOUc9a7YC6Wf-7QjqTiJkA9ieAWMlwJYApQ=.032ae2c6-0471-4b8f-bf78-dd57fb6c90db@github.com> On Mon, 8 Jan 2024 10:29:38 GMT, Tobias Holenstein wrote: > Replace `http` with `https` in xml files of IdealGraphVisualizer and LogCompilation. > Moreover, replace the old URL (./maven-v4_0_0.xsd) with of the suggested one (./maven-4.0.0.xsd) for `xsi:schemaLocation` in pom.xml. > > Tested: IdealGraphVisualizer and LogCompilation build and run as expected. Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17302#pullrequestreview-1809988510 From kvn at openjdk.org Mon Jan 8 21:07:24 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 8 Jan 2024 21:07:24 GMT Subject: RFR: 8322735: C2: minor improvements of bubble sort used in SuperWord::packset_sort [v2] In-Reply-To: <1W_HcoiwEnXLvtqprc7L8mghr0BRH3a_ITn6Cerzb_c=.f6c90e24-7d41-418d-bc50-41bff396663b@github.com> References: <1W_HcoiwEnXLvtqprc7L8mghr0BRH3a_ITn6Cerzb_c=.f6c90e24-7d41-418d-bc50-41bff396663b@github.com> Message-ID: On Fri, 5 Jan 2024 08:57:33 GMT, Denghui Dong wrote: >> A minor improvement could be made for bubble sort in SuperWord::packset_sort to reduce the comparison count in bad cases. >> >> See https://en.wikipedia.org/wiki/Bubble_sort > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > update Clever. src/hotspot/share/opto/superword.cpp line 3526: > 3524: // only swap when we find something to swap > 3525: if (alignment(q_low->at(0)) > alignment(q_i->at(0))) { > 3526: Node_List* t = q_i; Why you need this local `t`? src/hotspot/share/opto/superword.cpp line 3529: > 3527: *(_packset.adr_at(i)) = q_low; > 3528: *(_packset.adr_at(i-1)) = q_i; > 3529: max_swap_index = i; So we not using `i+1` here because all previous values should be < than `i`'s Right? ------------- PR Review: https://git.openjdk.org/jdk/pull/17190#pullrequestreview-1810006561 PR Review Comment: https://git.openjdk.org/jdk/pull/17190#discussion_r1445326103 PR Review Comment: https://git.openjdk.org/jdk/pull/17190#discussion_r1445331241 From kbarrett at openjdk.org Mon Jan 8 21:29:36 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 8 Jan 2024 21:29:36 GMT Subject: Integrated: 8322759: Eliminate -Wparentheses warnings in compiler code In-Reply-To: <496tGkQ1KUCrW1IHOETyvhqopkNYjsEoupxjo0Ze3Wg=.0223f494-fd34-4e4a-a31a-5030603f2113@github.com> References: <496tGkQ1KUCrW1IHOETyvhqopkNYjsEoupxjo0Ze3Wg=.0223f494-fd34-4e4a-a31a-5030603f2113@github.com> Message-ID: On Fri, 29 Dec 2023 03:33:11 GMT, Kim Barrett wrote: > Please review this change to eliminate some -Wparentheses warnings. This > involved simply adding a few parentheses to make some implicit operator > precedence explicit. > > This change addresses non-C2 parts of the compiler component. > > Testing: mach5 tier1 > > Also ran mach5 tier1 with these changes in conjunction enabling -Wparentheses > and other changes needed to make that work. This pull request has now been integrated. Changeset: ca9635df Author: Kim Barrett URL: https://git.openjdk.org/jdk/commit/ca9635df3357bf70b41645f619237b6d2068afb7 Stats: 16 lines in 5 files changed: 0 ins; 0 del; 16 mod 8322759: Eliminate -Wparentheses warnings in compiler code Reviewed-by: kvn, shade ------------- PR: https://git.openjdk.org/jdk/pull/17200 From kbarrett at openjdk.org Mon Jan 8 21:29:35 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 8 Jan 2024 21:29:35 GMT Subject: RFR: 8322759: Eliminate -Wparentheses warnings in compiler code [v2] In-Reply-To: <2WJkEZqCHKmE27ORwdudo3QC0JLzBxShw6HBBJ8k2qE=.4f172823-b930-418a-924d-578342d2c991@github.com> References: <496tGkQ1KUCrW1IHOETyvhqopkNYjsEoupxjo0Ze3Wg=.0223f494-fd34-4e4a-a31a-5030603f2113@github.com> <2WJkEZqCHKmE27ORwdudo3QC0JLzBxShw6HBBJ8k2qE=.4f172823-b930-418a-924d-578342d2c991@github.com> Message-ID: On Tue, 2 Jan 2024 20:13:47 GMT, Vladimir Kozlov wrote: >> Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Merge branch 'master' into compiler-wparentheses >> - simplify asserts >> - update copyrights for new year >> - fix -Wparentheses warnings in non-C2 compiler code > > Looks good. Thanks for reviews @vnkozlov and @shipilev . ------------- PR Comment: https://git.openjdk.org/jdk/pull/17200#issuecomment-1881844523 From sviswanathan at openjdk.org Tue Jan 9 00:10:39 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 9 Jan 2024 00:10:39 GMT Subject: RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp Message-ID: The C2_MacroAssembler::vminmax_fp and evminmax_fp were incorrectly checking for the two USE operand registers to be different and thus resulting in assertion failure. In x86_64.ad: instruct maxF_reg(legRegF dst, legRegF a, legRegF b, legRegF tmp, legRegF atmp, legRegF btmp) %{ ... effect(USE a, USE b, TEMP tmp, TEMP atmp, TEMP btmp); ... __ vminmax_fp(Op_MaxV, T_FLOAT, $dst$$XMMRegister, $a$$XMMRegister, $b$$XMMRegister, $tmp$$XMMRegister, $atmp$$XMMRegister, $btmp$$XMMRegister, Assembler::AVX_128bit); %} Changing the assert in vminmax_fp from: assert_different_registers(a, b, tmp, atmp, btmp); to: assert_different_registers(a, tmp, atmp, btmp); assert_different_registers(b, tmp, atmp, btmp); fixes the issue. Similar change done in evminmax_fp. Please review. Best Regards, Sandhya ------------- Commit messages: - 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp Changes: https://git.openjdk.org/jdk/pull/17315/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17315&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8321712 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17315.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17315/head:pull/17315 PR: https://git.openjdk.org/jdk/pull/17315 From duke at openjdk.org Tue Jan 9 01:52:50 2024 From: duke at openjdk.org (Zhiqiang Zang) Date: Tue, 9 Jan 2024 01:52:50 GMT Subject: RFR: 8322589: Add Ideal transformation: (~a) & (~b) => ~(a | b) [v6] In-Reply-To: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> References: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> Message-ID: > Hello, > > `(~a) & (~b) => ~(a | b)` is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1616C28-L1616C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. > > Thanks. Zhiqiang Zang has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/addnode.cpp Co-authored-by: Tobias Hartmann ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16333/files - new: https://git.openjdk.org/jdk/pull/16333/files/ecb2098b..afa0737a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16333&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16333&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16333.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16333/head:pull/16333 PR: https://git.openjdk.org/jdk/pull/16333 From duke at openjdk.org Tue Jan 9 01:53:50 2024 From: duke at openjdk.org (Zhiqiang Zang) Date: Tue, 9 Jan 2024 01:53:50 GMT Subject: RFR: 8322077: Add Ideal transformation: (~a) | (~b) => ~(a & b) [v9] In-Reply-To: References: Message-ID: <1X-pxmUfbW67Uog-E7xJBsSmO_fJHahJj16iR_ZL7Ds=.083a0765-0375-4792-b835-8a43aa7c46d2@github.com> > Hello, > > (~a) | (~b) => ~(a & b) is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1617C28-L1617C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. > > Thanks. Zhiqiang Zang has updated the pull request incrementally with two additional commits since the last revision: - Update test/hotspot/jtreg/compiler/c2/irTests/DeMorganLawLongTests.java Co-authored-by: Tobias Hartmann - Update test/hotspot/jtreg/compiler/c2/irTests/DeMorganLawIntTests.java Co-authored-by: Tobias Hartmann ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16334/files - new: https://git.openjdk.org/jdk/pull/16334/files/d8ed0f35..6eb29aef Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16334&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16334&range=07-08 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16334.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16334/head:pull/16334 PR: https://git.openjdk.org/jdk/pull/16334 From duke at openjdk.org Tue Jan 9 01:55:59 2024 From: duke at openjdk.org (Zhiqiang Zang) Date: Tue, 9 Jan 2024 01:55:59 GMT Subject: RFR: 8322589: Add Ideal transformation: (~a) & (~b) => ~(a | b) [v7] In-Reply-To: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> References: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> Message-ID: > Hello, > > `(~a) & (~b) => ~(a | b)` is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1616C28-L1616C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. > > Thanks. Zhiqiang Zang has updated the pull request incrementally with one additional commit since the last revision: address minor comments. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16333/files - new: https://git.openjdk.org/jdk/pull/16333/files/afa0737a..c4fa2e40 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16333&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16333&range=05-06 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/16333.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16333/head:pull/16333 PR: https://git.openjdk.org/jdk/pull/16333 From kvn at openjdk.org Tue Jan 9 02:28:21 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 9 Jan 2024 02:28:21 GMT Subject: RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp In-Reply-To: References: Message-ID: On Tue, 9 Jan 2024 00:01:04 GMT, Sandhya Viswanathan wrote: > The C2_MacroAssembler::vminmax_fp and evminmax_fp were incorrectly checking for the two USE operand registers to be different and thus resulting in assertion failure. > > In x86_64.ad: > > instruct maxF_reg(legRegF dst, legRegF a, legRegF b, legRegF tmp, legRegF atmp, legRegF btmp) %{ > ... > effect(USE a, USE b, TEMP tmp, TEMP atmp, TEMP btmp); > ... > __ vminmax_fp(Op_MaxV, T_FLOAT, $dst$$XMMRegister, $a$$XMMRegister, $b$$XMMRegister, $tmp$$XMMRegister, $atmp$$XMMRegister, $btmp$$XMMRegister, Assembler::AVX_128bit); > %} > > > Changing the assert in vminmax_fp from: > assert_different_registers(a, b, tmp, atmp, btmp); > to: > assert_different_registers(a, tmp, atmp, btmp); > assert_different_registers(b, tmp, atmp, btmp); > fixes the issue. > > Similar change done in evminmax_fp. > > Please review. > > Best Regards, > Sandhya Should we "short cut" code when registers are the same? ------------- PR Review: https://git.openjdk.org/jdk/pull/17315#pullrequestreview-1810310673 From duke at openjdk.org Tue Jan 9 02:35:04 2024 From: duke at openjdk.org (Zhiqiang Zang) Date: Tue, 9 Jan 2024 02:35:04 GMT Subject: RFR: 8322589: Add Ideal transformation: (~a) & (~b) => ~(a | b) [v8] In-Reply-To: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> References: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> Message-ID: <-0O_jW7NWGynEROp33izEgAreJ1FQEjVOg4AA8h5E8E=.a85abda4-54a0-4fba-abfe-1d1628f8a9ca@github.com> > Hello, > > `(~a) & (~b) => ~(a | b)` is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1616C28-L1616C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. > > Thanks. Zhiqiang Zang has updated the pull request incrementally with one additional commit since the last revision: move the two helper functions to member functions of the node class. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16333/files - new: https://git.openjdk.org/jdk/pull/16333/files/c4fa2e40..7a962d69 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16333&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16333&range=06-07 Stats: 53 lines in 5 files changed: 24 ins; 23 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/16333.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16333/head:pull/16333 PR: https://git.openjdk.org/jdk/pull/16333 From duke at openjdk.org Tue Jan 9 02:52:52 2024 From: duke at openjdk.org (Zhiqiang Zang) Date: Tue, 9 Jan 2024 02:52:52 GMT Subject: RFR: 8322077: Add Ideal transformation: (~a) | (~b) => ~(a & b) [v10] In-Reply-To: References: Message-ID: > Hello, > > (~a) | (~b) => ~(a & b) is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1617C28-L1617C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. > > Thanks. Zhiqiang Zang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: - adapt changes from the dependent pr. - Merge branch 'andnode-PNewDeMorganLawAndToOr' into ornode-PNewDeMorganLawOrToAnd - Update test/hotspot/jtreg/compiler/c2/irTests/DeMorganLawLongTests.java Co-authored-by: Tobias Hartmann - Update test/hotspot/jtreg/compiler/c2/irTests/DeMorganLawIntTests.java Co-authored-by: Tobias Hartmann - Add tests for using De Morgan's Law for both optimizations. - remove unused code from tests. - update the copyright dates. - address comments. - untabify. - use common helpful functions. - ... and 2 more: https://git.openjdk.org/jdk/compare/9fcae094...0c8d1077 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16334/files - new: https://git.openjdk.org/jdk/pull/16334/files/6eb29aef..0c8d1077 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16334&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16334&range=08-09 Stats: 60 lines in 5 files changed: 24 ins; 23 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/16334.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16334/head:pull/16334 PR: https://git.openjdk.org/jdk/pull/16334 From duke at openjdk.org Tue Jan 9 02:53:58 2024 From: duke at openjdk.org (Zhiqiang Zang) Date: Tue, 9 Jan 2024 02:53:58 GMT Subject: RFR: 8322589: Add Ideal transformation: (~a) & (~b) => ~(a | b) [v9] In-Reply-To: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> References: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> Message-ID: <5z37bxWaSr9AFumvmDHQgPYfj_qz5P0XFifGU-j8Mjk=.5fa90579-6998-4a94-a4da-345d59a4f69e@github.com> > Hello, > > `(~a) & (~b) => ~(a | b)` is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1616C28-L1616C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. > > Thanks. Zhiqiang Zang has updated the pull request incrementally with one additional commit since the last revision: update copyright dates. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16333/files - new: https://git.openjdk.org/jdk/pull/16333/files/7a962d69..3665de2f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16333&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16333&range=07-08 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16333.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16333/head:pull/16333 PR: https://git.openjdk.org/jdk/pull/16333 From ddong at openjdk.org Tue Jan 9 05:26:43 2024 From: ddong at openjdk.org (Denghui Dong) Date: Tue, 9 Jan 2024 05:26:43 GMT Subject: RFR: 8322735: C2: minor improvements of bubble sort used in SuperWord::packset_sort [v3] In-Reply-To: References: Message-ID: > A minor improvement could be made for bubble sort in SuperWord::packset_sort to reduce the comparison count in bad cases. > > See https://en.wikipedia.org/wiki/Bubble_sort Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: update ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17190/files - new: https://git.openjdk.org/jdk/pull/17190/files/ba53ed56..c635b10d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17190&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17190&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17190.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17190/head:pull/17190 PR: https://git.openjdk.org/jdk/pull/17190 From ddong at openjdk.org Tue Jan 9 05:26:46 2024 From: ddong at openjdk.org (Denghui Dong) Date: Tue, 9 Jan 2024 05:26:46 GMT Subject: RFR: 8322735: C2: minor improvements of bubble sort used in SuperWord::packset_sort [v2] In-Reply-To: References: <1W_HcoiwEnXLvtqprc7L8mghr0BRH3a_ITn6Cerzb_c=.f6c90e24-7d41-418d-bc50-41bff396663b@github.com> Message-ID: On Mon, 8 Jan 2024 20:59:53 GMT, Vladimir Kozlov wrote: >> Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: >> >> update > > src/hotspot/share/opto/superword.cpp line 3526: > >> 3524: // only swap when we find something to swap >> 3525: if (alignment(q_low->at(0)) > alignment(q_i->at(0))) { >> 3526: Node_List* t = q_i; > > Why you need this local `t`? Good catch. Deleted. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17190#discussion_r1445638467 From ddong at openjdk.org Tue Jan 9 05:30:22 2024 From: ddong at openjdk.org (Denghui Dong) Date: Tue, 9 Jan 2024 05:30:22 GMT Subject: RFR: 8322735: C2: minor improvements of bubble sort used in SuperWord::packset_sort [v2] In-Reply-To: References: <1W_HcoiwEnXLvtqprc7L8mghr0BRH3a_ITn6Cerzb_c=.f6c90e24-7d41-418d-bc50-41bff396663b@github.com> Message-ID: <15K1TZYYnVPyFf2zZD2hlqQI7ddz-U-1Ued9JNBq5vM=.816a182b-8604-4e6c-94e1-2145fc60cdfb@github.com> On Mon, 8 Jan 2024 21:05:03 GMT, Vladimir Kozlov wrote: >> Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: >> >> update > > src/hotspot/share/opto/superword.cpp line 3529: > >> 3527: *(_packset.adr_at(i)) = q_low; >> 3528: *(_packset.adr_at(i-1)) = q_i; >> 3529: max_swap_index = i; > > So we not using `i+1` here because all previous values should be < than `i`'s > Right? Yes. The last `i`'s value is > previous values and values between `i` and end are already sorted. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17190#discussion_r1445640947 From duke at openjdk.org Tue Jan 9 05:54:39 2024 From: duke at openjdk.org (Zhiqiang Zang) Date: Tue, 9 Jan 2024 05:54:39 GMT Subject: RFR: 8322589: Add Ideal transformation: (~a) & (~b) => ~(a | b) [v10] In-Reply-To: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> References: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> Message-ID: <_lWTVDYsWmINZsi0bPleMs3F3n-WPgHbLoTfpu8sHSg=.0dde4ccd-77e8-4e0e-80ad-cb233b858579@github.com> > Hello, > > `(~a) & (~b) => ~(a | b)` is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1616C28-L1616C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. > > Thanks. Zhiqiang Zang has updated the pull request incrementally with one additional commit since the last revision: move make_not back to AddNode because it cannot compile for architecture arm, s390x, ppc64le. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16333/files - new: https://git.openjdk.org/jdk/pull/16333/files/3665de2f..4ee8b089 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16333&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16333&range=08-09 Stats: 35 lines in 5 files changed: 15 ins; 16 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/16333.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16333/head:pull/16333 PR: https://git.openjdk.org/jdk/pull/16333 From duke at openjdk.org Tue Jan 9 06:02:24 2024 From: duke at openjdk.org (Zhiqiang Zang) Date: Tue, 9 Jan 2024 06:02:24 GMT Subject: RFR: 8322589: Add Ideal transformation: (~a) & (~b) => ~(a | b) [v5] In-Reply-To: References: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> Message-ID: On Mon, 8 Jan 2024 07:02:50 GMT, Tobias Hartmann wrote: >> Zhiqiang Zang has updated the pull request incrementally with one additional commit since the last revision: >> >> update the copyright dates. > > src/hotspot/share/opto/addnode.hpp line 84: > >> 82: // Utility function to check if the given node is a NOT operation, >> 83: // i.e., n == m ^ (-1). >> 84: static bool is_not(PhaseGVN* phase, Node* n, BasicType bt); > > Could these be made non-static? @TobiHartmann @eme64 I moved `is_not` but I was not able to move `make_not` to `node` class, because otherwise it would not compile for arm, s390x, ppc64le. /home/runner/work/jdk/jdk/src/hotspot/share/opto/node.cpp:1605:18: error: expected type-specifier before 'XorINode' 1605 | return new XorINode(this, phase->intcon(-1)); Please let me know if we still want to move `make_not`. Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16333#discussion_r1445656863 From duke at openjdk.org Tue Jan 9 06:06:51 2024 From: duke at openjdk.org (Zhiqiang Zang) Date: Tue, 9 Jan 2024 06:06:51 GMT Subject: RFR: 8322077: Add Ideal transformation: (~a) | (~b) => ~(a & b) [v11] In-Reply-To: References: Message-ID: > Hello, > > (~a) | (~b) => ~(a & b) is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1617C28-L1617C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. > > Thanks. Zhiqiang Zang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision: - adapt to new changes from the dependant pr. - Merge branch 'andnode-PNewDeMorganLawAndToOr' into ornode-PNewDeMorganLawOrToAnd - adapt changes from the dependent pr. - Merge branch 'andnode-PNewDeMorganLawAndToOr' into ornode-PNewDeMorganLawOrToAnd - Update test/hotspot/jtreg/compiler/c2/irTests/DeMorganLawLongTests.java Co-authored-by: Tobias Hartmann - Update test/hotspot/jtreg/compiler/c2/irTests/DeMorganLawIntTests.java Co-authored-by: Tobias Hartmann - Add tests for using De Morgan's Law for both optimizations. - remove unused code from tests. - update the copyright dates. - address comments. - ... and 4 more: https://git.openjdk.org/jdk/compare/851dbbb1...b21e242b ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16334/files - new: https://git.openjdk.org/jdk/pull/16334/files/0c8d1077..b21e242b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16334&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16334&range=09-10 Stats: 38 lines in 5 files changed: 15 ins; 16 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/16334.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16334/head:pull/16334 PR: https://git.openjdk.org/jdk/pull/16334 From jbhateja at openjdk.org Tue Jan 9 06:16:23 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 9 Jan 2024 06:16:23 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2] In-Reply-To: References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> <1GHGK7AGinCMKjFIB5oadUP0jiZrC39Z0hncAS3H-9Y=.eb617125-1f51-4cff-889e-b15321f5c72b@github.com> Message-ID: On Mon, 8 Jan 2024 07:55:00 GMT, Emanuel Peter wrote: >>> You are using `VectorMask pred = VectorMask.fromLong(ispecies, maskctr++);`. That basically systematically iterates over all masks, which is nice for a correctness test. But that would use different density inside one test run, right? The average over the loop is still at `50%`, correct? >>> >>> I was thinking more a run where the percentage over the whole loop is lower than maybe `1%`. That would get us to a point where maybe the branch prediction of non-vectorized code might be faster, what do you think? >> >> An imperative loop for compression will check each mask bit to select compressible lane. Therefore mask with low or high density of set bits should show similar performance. > > Yes, IF it is vectorized, then there is no difference between high and low density. My concern was more if vectorization is preferrable over the scalar alternative in the low-density case, where branch prediction is more stable. At runtime we do need to scan entire mask to pick the compressible lane corresponding to set mask bit. Thus the loop overhead of mask compare (BTW masks are held in a vector register for AVX2 targets) and jump will anyways be incurred , in addition for sparsely populated mask we may incur additional misprediction penalty for not taking if block which extracts an element from appropriate source vector lane and insert into destination vector lane. Overall vector solution will win for most common cases for varying mask and also for very sparsely populate masks. Here is the result of setting just a single mask bit. I am process of updating to benchmark for 128 bit species will update the patch. @Benchmark public void fuzzyFilterIntColumn() { int i = 0; int j = 0; long maskctr = 1; int endIndex = ispecies.loopBound(size); for (; i < endIndex; i += ispecies.length()) { IntVector vec = IntVector.fromArray(ispecies, intinCol, i); VectorMask pred = VectorMask.fromLong(ispecies, 1); vec.compress(pred).intoArray(intoutCol, j); j += pred.trueCount(); } } Baseline: Benchmark (size) Mode Cnt Score Error Units ColumnFilterBenchmark.fuzzyFilterIntColumn 1024 thrpt 2 379.059 ops/ms ColumnFilterBenchmark.fuzzyFilterIntColumn 2047 thrpt 2 188.355 ops/ms ColumnFilterBenchmark.fuzzyFilterIntColumn 4096 thrpt 2 95.315 ops/ms Withopt: Benchmark (size) Mode Cnt Score Error Units ColumnFilterBenchmark.fuzzyFilterIntColumn 1024 thrpt 2 7390.074 ops/ms ColumnFilterBenchmark.fuzzyFilterIntColumn 2047 thrpt 2 3483.247 ops/ms ColumnFilterBenchmark.fuzzyFilterIntColumn 4096 thrpt 2 1823.817 ops/ms ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1445666305 From kuaiwei.kw at alibaba-inc.com Tue Jan 9 06:23:59 2024 From: kuaiwei.kw at alibaba-inc.com (Kuai Wei) Date: Tue, 09 Jan 2024 14:23:59 +0800 Subject: =?UTF-8?B?ZGlzY3VzcyBhYm91dCByZWxlYXNlIGJhcnJpZXIgZm9yIGZpbmFsIGZpZWxkcyBpbml0aWFs?= =?UTF-8?B?aXphdGlvbg==?= Message-ID: <5e41f867-b31b-4ded-b737-8d0c869c8895.kuaiwei.kw@alibaba-inc.com> Hi, I made some experiments on object allocation performance. And I found on aarch64 N1, if object has final field, the allocation rate is about 75% of normal allocation. The cause is C2 will insert a release membar in , which will be translated as "dmb.ish" in aarch64. For normal allocation, a membar storestore is inserted and is emitted as "dmb.ishst", it make the difference. The test jmh is https://gist.github.com/kuaiwei/f71fba40df29991c93325a8600e34c13 java -jar target/benchmarks.jar -f 1 -wi 5 -w 3 -i 3 -r 3 testAlloc ... Benchmark Mode Cnt Score Error Units AllocFinal.testAlloc thrpt 3 1167.903 ? 44.973 ops/s AllocFinal.testAllocWithFinal thrpt 3 915.330 ? 52.596 ops/s I found only C2 will insert release membar and C1 just insert storestore for both final and normal allocation. In Doug Lea's cookbook https://gee.cs.oswego.edu/dl/jmm/cookbook.html Only storesotre is required. Alex has a great post on this topic https://shipilev.net/blog/2014/all-fields-are-final/ . It referred a case why loadstore is needed. https://www.hboehm.info/c++mm/no_write_fences.html I checked this case and IMO it looks some legacy architecture may break data dependency and cause problem. As I know, alpha architecture is an example. I think it doesn't break on modern architecture. Is there other case I missed? If storestore is enough in this situation, I will send a PR to loose the barrier. Thanks, Kuai Wei -------------- next part -------------- An HTML attachment was scrubbed... URL: From kuaiwei.kw at alibaba-inc.com Tue Jan 9 06:23:59 2024 From: kuaiwei.kw at alibaba-inc.com (Kuai Wei) Date: Tue, 09 Jan 2024 14:23:59 +0800 Subject: =?UTF-8?B?ZGlzY3VzcyBhYm91dCByZWxlYXNlIGJhcnJpZXIgZm9yIGZpbmFsIGZpZWxkcyBpbml0aWFs?= =?UTF-8?B?aXphdGlvbg==?= Message-ID: <5e41f867-b31b-4ded-b737-8d0c869c8895.kuaiwei.kw@alibaba-inc.com> Hi, I made some experiments on object allocation performance. And I found on aarch64 N1, if object has final field, the allocation rate is about 75% of normal allocation. The cause is C2 will insert a release membar in , which will be translated as "dmb.ish" in aarch64. For normal allocation, a membar storestore is inserted and is emitted as "dmb.ishst", it make the difference. The test jmh is https://gist.github.com/kuaiwei/f71fba40df29991c93325a8600e34c13 java -jar target/benchmarks.jar -f 1 -wi 5 -w 3 -i 3 -r 3 testAlloc ... Benchmark Mode Cnt Score Error Units AllocFinal.testAlloc thrpt 3 1167.903 ? 44.973 ops/s AllocFinal.testAllocWithFinal thrpt 3 915.330 ? 52.596 ops/s I found only C2 will insert release membar and C1 just insert storestore for both final and normal allocation. In Doug Lea's cookbook https://gee.cs.oswego.edu/dl/jmm/cookbook.html Only storesotre is required. Alex has a great post on this topic https://shipilev.net/blog/2014/all-fields-are-final/ . It referred a case why loadstore is needed. https://www.hboehm.info/c++mm/no_write_fences.html I checked this case and IMO it looks some legacy architecture may break data dependency and cause problem. As I know, alpha architecture is an example. I think it doesn't break on modern architecture. Is there other case I missed? If storestore is enough in this situation, I will send a PR to loose the barrier. Thanks, Kuai Wei -------------- next part -------------- An HTML attachment was scrubbed... URL: From jbhateja at openjdk.org Tue Jan 9 07:42:20 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 9 Jan 2024 07:42:20 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v3] In-Reply-To: <6ipaD7eRW4J37zaeFEKVf2LUVE3C0LmZmoAeePCG2PE=.7bb8ff9a-638e-4e7f-bea2-a40a424004f0@github.com> References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> <6ipaD7eRW4J37zaeFEKVf2LUVE3C0LmZmoAeePCG2PE=.7bb8ff9a-638e-4e7f-bea2-a40a424004f0@github.com> Message-ID: On Mon, 8 Jan 2024 10:20:33 GMT, Quan Anh Mai wrote: >>> Thanks for the updates! >>> >>> One more idea: Your AVX2 solution has a lot of cost for converting the mask to a permutation. Might it make sense to split this off into a separate vector-node, so that it can float out of a loop if the mask is invariant? >> >> CompressV / ExpandV only accepts two inputs, vector to be operated on and mask under which operation is performed, permute table based implementation is specific to x86 backend implementation. > > @jatin-bhateja I think you can expand them in the matcher into several `MachNode`s that will get scheduled separately. > Exactly, like @merykitty suggests: you can do a platform-dependent expansion. Hi @merykitty , @eme64 , in principle platform specific lowering is a good idea where ever useful, our main concern here is to identify a loop invariant constant mask in matcher patterns and save the cost of re-loading from a permute table index. Existing loop invariant analysis moves invariant masks out of loop and GCM should be able to move expanded load from permute table out of loop. But this looks very restrictive and will mainly be useful for constant one hot bit mask pattern. A constant mask may have more than one set bits and in such a case we will need to generate multiple loads from permute tables and handle multiple expansion scenarios. I think we can defer that complexity for that time being. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17261#issuecomment-1882549544 From roland at openjdk.org Tue Jan 9 07:46:22 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 9 Jan 2024 07:46:22 GMT Subject: RFR: 8320237 C2: late inlining of method handle invoke causes duplicate lines in PrintInlining output In-Reply-To: References: Message-ID: <213kgE2Qkgv1LsELuvCGboaJ6IobOND34Hl5842a3dU=.b5561082-324a-4c96-995f-6dd43b7b3d97@github.com> On Mon, 18 Dec 2023 15:28:47 GMT, Kangcheng Xu wrote: > This PR resolves [JDK-8320237](https://bugs.openjdk.org/browse/JDK-8320237) > > The original behavior produces both a failure and success message upon late inlining which is confusing. The patch removes the failure message if inlining was successful. Huge thanks to @rwestrel for reporting and working out a solution. > > Unit test `test/hotspot/jtreg/compiler/inlining/TestDuplicatedLateInliningOutput.java` is added and passing. Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17147#pullrequestreview-1810562097 From epeter at openjdk.org Tue Jan 9 08:08:42 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 9 Jan 2024 08:08:42 GMT Subject: RFR: 8315361: C2 SuperWord: refactor out loop analysis into shared auto-vectorization facility VLoopAnalyzer [v2] In-Reply-To: References: Message-ID: > This is a refactoring of `SuperWord`. > I intend to push it for JDK23, after [this bug fix](https://github.com/openjdk/jdk/pull/14785). > > **Goals** > > 1. Clean up `SuperWord`: disentangle different components, make them more **modular**. > 2. Make the loop analysis parts a **shared facility**, not just for SuperWord but also the post-loop-vectorizer ([JDK-8308994](https://bugs.openjdk.org/browse/JDK-8308994)). > 3. It is also a necessary step on my bigger plans for improvement with the C2 Auto-Vectorizer ([see my blog post](https://eme64.github.io/blog/2023/11/03/C2-AutoVectorizer-Improvement-Ideas.html)). > 4. Improve tracing in the auto-vectorization by making it more systematic. > > **Summary** > > - I wrote a summary of how C2 auto-vectorization with SuperWord works (please read!): > https://github.com/openjdk/jdk/blob/95fd361e60fc66eb91edad321662e508b2d1bdde/src/hotspot/share/opto/superword.hpp#L32-L177 > - I moved many `Superword` components out to `VLoop` and its subclass `VLoopAnalyzer`. The idea is that any vectorizer can use these facilities in the future. They are therefore made more modular, which should hopefully make future changes easier. These components are: > - Checking the pre-conditions for vectorization (e.g. no unwanted ctrl-flow). > - `VLoop::check_preconditions_helper` replaces code from old `SuperWord::transform_loop`. > - Running all submodules of `VLoopAnalyzer`: `VLoopAnalyzer::analyze_helper`. Replaces analysis part of `SuperWord::SLP_extract`. > - Finding and marking reductions -> `VLoopReductions` > - Detecting memory slices -> `VLoopMemorySlices` > - Analyzing the body -> `VLoopBody` (renamed `in_bb` -> `in_body`) > - Determining vector element types, and functions to determine the `vector_width` of a node -> `VLoopTypes` > - Constructing the dependence graph -> `VLoopDependenceGraph`. Replaces old `DepGraph` with all its components. > - New: CompileCommand option `TraceAutovectorization` > - Run with `-XX:CompileCommand=traceAutovectorization,*::*,help` to get a usage description. > - Replaced all printing with flags `TraceSuperWord` (and `Verbose`) and of `VectorizeDebug`. > - The advantage of a CompileCommand is that tracing can be applied selectively for only a limited set of java classes / methods. > - It uses tags, which are more readable than the `VectorizeDebug` bit-flags. These tags can be used for all parts of the vectorizer, but one can also target SuperWord specifically. > - I systematically added tracing at every point where vector... Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 83 commits: - manual merge - fix PRODUCT / DEBUG_ONLY guards - manual merge - fix whitespace issue - added CompileCommand TraceAutoVectorization Usage - add comments to trace flags - trace flag subtraction implemented - replace SuperWord with trace flags - refactor tracing for alignment - SuperWord algo summary - ... and 73 more: https://git.openjdk.org/jdk/compare/827c71da...e876d845 ------------- Changes: https://git.openjdk.org/jdk/pull/16620/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16620&range=01 Stats: 3809 lines in 29 files changed: 1999 ins; 1307 del; 503 mod Patch: https://git.openjdk.org/jdk/pull/16620.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16620/head:pull/16620 PR: https://git.openjdk.org/jdk/pull/16620 From chagedorn at openjdk.org Tue Jan 9 08:36:35 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 9 Jan 2024 08:36:35 GMT Subject: RFR: 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 [v11] In-Reply-To: References: Message-ID: On Mon, 8 Jan 2024 15:01:12 GMT, Roland Westrelin wrote: >> Range check smearing and range check predication make an array access >> dependent on 2 (or more in the case of RC smearing) conditions. As a >> consequence, if a range check can be eliminated because there's an >> identical dominating range check, the control dependent nodes that >> could float and become dependent on the dominating range check cannot >> be allowed to float because there's a risk that they would then bypass >> one of the checks that make the access legal. >> >> `IfNode::dominated_by()` and `PhaseIdealLoop::dominated_by()` have >> logic to prevent this: nodes that are control dependent on a range >> check or predicate are not allowed to float. This is however not >> sufficient as demonstrated by the test cases. >> >> In `TestArrayAccessAboveRCAfterSmearingOrPredication.testRangeCheckSmearing()`: >> >> >> v += array[i]; >> if (flag2) { >> if (flag3) { >> field = 0x42; >> } >> } >> if (flagField == 1) { >> v += array[i]; >> } >> >> >> The range check for the second `array[i]` load is replaced by the >> dominating range check for the first `array[i]` but because the second >> `array[i]` load could really be dependent on multiple range checks (in >> case smearing happened which is not the case here), c2 doesn't allow >> the second `array[i]` to float when the second range check is >> removed. The second `array[i]` is then control dependent on: >> >> >> if (flagField == 1) { >> >> >> which is next found to be dominated by the same test: >> >> >> if (flag == 1) { >> >> >> and is removed. However nothing in `dominated_by()` treats node >> dependent on tests that are not range check or predicates >> specially. So the second `array[i]` is allowed to float and become >> dependent on: >> >> >> if (flag == 1) { >> >> >> which is above the range check for that access. The test method in its >> last invocation is passed an index for the array access that's widely >> out of range. The array load happens before the range check and >> crashes the VM. `testLoopPredication()` is a similar test where array >> loads become dependent on predicates and end up above range checks. >> >> `TestArrayAccessCastIIAboveRC.java` is the test case from the bug >> where for similar reasons a range check `CastII` ends up above its >> range check, becomes top because its input becomes some integer that >> conflicts with its... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > whitespaces Apart from some minor comment improvement suggestions, the new comments and renaming look good. src/hotspot/share/opto/ifnode.cpp line 569: > 567: igvn->rehash_node_delayed(iff); > 568: iff->set_req_X(1, new_bol, igvn); > 569: // As part of range check smearing, this range check is widen. Loads and range check Cast nodes that are control Suggestion: // As part of range check smearing, this range check is widened. Loads and range check Cast nodes that are control src/hotspot/share/opto/loopPredicate.cpp line 1300: > 1298: // Eliminate the old If in the loop body > 1299: // If a range check is eliminated, data dependent nodes (Load and range check CastII nodes) are now dependent on 2 > 1300: // range check predicates (one for the start of the loop, one for the end) but we can only keep track of one control To follow the naming conventions added by the changes around JDK-8288981: Suggestion: // Hoisted Check Predicates (one for the start of the loop, one for the end) but we can only keep track of one control src/hotspot/share/opto/loopopts.cpp line 356: > 354: _igvn.replace_input_of(cd, 0, prevdom); > 355: if (pin_array_nodes) { > 356: // Because of range check predication, Loads and range check Cast nodes that are control dependent on this range Loop Predication? Suggestion: // Because of Loop Predication, Loads and range check Cast nodes that are control dependent on this range src/hotspot/share/opto/loopopts.cpp line 357: > 355: if (pin_array_nodes) { > 356: // Because of range check predication, Loads and range check Cast nodes that are control dependent on this range > 357: // check (that is about to be removed) now depend on multiple dominating range check predicates. After the Suggestion: // check (that is about to be removed) now depend on multiple dominating Hoisted Check Predicates. After the src/hotspot/share/opto/node.hpp line 1140: > 1138: // Returns a clone of the current node that's pinned (if the current node is not) for nodes found in array accesses > 1139: // (Load and range check CastII nodes). > 1140: // This is used when an array access is made dependent on 2 or more range checks (range check smearing or predication). Suggestion: // This is used when an array access is made dependent on 2 or more range checks (range check smearing or Loop Predication). ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16886#pullrequestreview-1810631648 PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1445772859 PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1445770144 PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1445770896 PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1445771288 PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1445772291 From thartmann at openjdk.org Tue Jan 9 08:51:31 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 9 Jan 2024 08:51:31 GMT Subject: RFR: 8322077: Add Ideal transformation: (~a) | (~b) => ~(a & b) [v11] In-Reply-To: References: Message-ID: On Tue, 9 Jan 2024 06:06:51 GMT, Zhiqiang Zang wrote: >> Hello, >> >> (~a) | (~b) => ~(a & b) is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1617C28-L1617C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. >> >> Thanks. > > Zhiqiang Zang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision: > > - adapt to new changes from the dependant pr. > - Merge branch 'andnode-PNewDeMorganLawAndToOr' into ornode-PNewDeMorganLawOrToAnd > - adapt changes from the dependent pr. > - Merge branch 'andnode-PNewDeMorganLawAndToOr' into ornode-PNewDeMorganLawOrToAnd > - Update test/hotspot/jtreg/compiler/c2/irTests/DeMorganLawLongTests.java > > Co-authored-by: Tobias Hartmann > - Update test/hotspot/jtreg/compiler/c2/irTests/DeMorganLawIntTests.java > > Co-authored-by: Tobias Hartmann > - Add tests for using De Morgan's Law for both optimizations. > - remove unused code from tests. > - update the copyright dates. > - address comments. > - ... and 4 more: https://git.openjdk.org/jdk/compare/2acdb5e1...b21e242b Marked as reviewed by thartmann (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16334#pullrequestreview-1810664512 From epeter at openjdk.org Tue Jan 9 08:52:00 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 9 Jan 2024 08:52:00 GMT Subject: RFR: 8315361: C2 SuperWord: refactor out loop analysis into shared auto-vectorization facility VLoopAnalyzer [v3] In-Reply-To: References: Message-ID: > This is a refactoring of `SuperWord`. > I intend to push it for JDK23, after [this bug fix](https://github.com/openjdk/jdk/pull/14785). > > **Goals** > > 1. Clean up `SuperWord`: disentangle different components, make them more **modular**. > 2. Make the loop analysis parts a **shared facility**, not just for SuperWord but also the post-loop-vectorizer ([JDK-8308994](https://bugs.openjdk.org/browse/JDK-8308994)). > 3. It is also a necessary step on my bigger plans for improvement with the C2 Auto-Vectorizer ([see my blog post](https://eme64.github.io/blog/2023/11/03/C2-AutoVectorizer-Improvement-Ideas.html)). > 4. Improve tracing in the auto-vectorization by making it more systematic. > > **Summary** > > - I wrote a summary of how C2 auto-vectorization with SuperWord works (please read!): > https://github.com/openjdk/jdk/blob/95fd361e60fc66eb91edad321662e508b2d1bdde/src/hotspot/share/opto/superword.hpp#L32-L177 > - I moved many `Superword` components out to `VLoop` and its subclass `VLoopAnalyzer`. The idea is that any vectorizer can use these facilities in the future. They are therefore made more modular, which should hopefully make future changes easier. These components are: > - Checking the pre-conditions for vectorization (e.g. no unwanted ctrl-flow). > - `VLoop::check_preconditions_helper` replaces code from old `SuperWord::transform_loop`. > - Running all submodules of `VLoopAnalyzer`: `VLoopAnalyzer::analyze_helper`. Replaces analysis part of `SuperWord::SLP_extract`. > - Finding and marking reductions -> `VLoopReductions` > - Detecting memory slices -> `VLoopMemorySlices` > - Analyzing the body -> `VLoopBody` (renamed `in_bb` -> `in_body`) > - Determining vector element types, and functions to determine the `vector_width` of a node -> `VLoopTypes` > - Constructing the dependence graph -> `VLoopDependenceGraph`. Replaces old `DepGraph` with all its components. > - New: CompileCommand option `TraceAutovectorization` > - Run with `-XX:CompileCommand=traceAutovectorization,*::*,help` to get a usage description. > - Replaced all printing with flags `TraceSuperWord` (and `Verbose`) and of `VectorizeDebug`. > - The advantage of a CompileCommand is that tracing can be applied selectively for only a limited set of java classes / methods. > - It uses tags, which are more readable than the `VectorizeDebug` bit-flags. These tags can be used for all parts of the vectorizer, but one can also target SuperWord specifically. > - I systematically added tracing at every point where vector... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: error state for align vector ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16620/files - new: https://git.openjdk.org/jdk/pull/16620/files/e876d845..0831bb59 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16620&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16620&range=01-02 Stats: 25 lines in 2 files changed: 19 ins; 1 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/16620.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16620/head:pull/16620 PR: https://git.openjdk.org/jdk/pull/16620 From thartmann at openjdk.org Tue Jan 9 09:08:26 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 9 Jan 2024 09:08:26 GMT Subject: RFR: 8322589: Add Ideal transformation: (~a) & (~b) => ~(a | b) [v5] In-Reply-To: References: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> Message-ID: On Tue, 9 Jan 2024 05:59:26 GMT, Zhiqiang Zang wrote: >> src/hotspot/share/opto/addnode.hpp line 84: >> >>> 82: // Utility function to check if the given node is a NOT operation, >>> 83: // i.e., n == m ^ (-1). >>> 84: static bool is_not(PhaseGVN* phase, Node* n, BasicType bt); >> >> Could these be made non-static? > > @TobiHartmann @eme64 > I moved `is_not` but I was not able to move `make_not` to `node` class, because otherwise it would not compile for arm, s390x, ppc64le. > > /home/runner/work/jdk/jdk/src/hotspot/share/opto/node.cpp:1605:18: error: expected type-specifier before 'XorINode' > 1605 | return new XorINode(this, phase->intcon(-1)); > > I do not see any similar use cases to `new XorINode` in `nocde.cpp`, so I was hesitant to include new header files for the file. > Please let me know if we still want to move `make_not`. Thanks. I would say it's better to leave both methods as static methods then, for consistency. Thanks for giving it a try! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16333#discussion_r1445810287 From thartmann at openjdk.org Tue Jan 9 09:12:23 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 9 Jan 2024 09:12:23 GMT Subject: RFR: JDK-8277869: Maven POMs are using HTTP links where HTTPS is available In-Reply-To: References: Message-ID: On Mon, 8 Jan 2024 10:29:38 GMT, Tobias Holenstein wrote: > Replace `http` with `https` in xml files of IdealGraphVisualizer and LogCompilation. > Moreover, replace the old URL (./maven-v4_0_0.xsd) with of the suggested one (./maven-4.0.0.xsd) for `xsi:schemaLocation` in pom.xml. > > Tested: IdealGraphVisualizer and LogCompilation build and run as expected. Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17302#pullrequestreview-1810703150 From thartmann at openjdk.org Tue Jan 9 09:18:23 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 9 Jan 2024 09:18:23 GMT Subject: RFR: 8322858: compiler/c2/aarch64/TestFarJump.java fails on AArch64 due to unexpected PrintAssembly output In-Reply-To: <2CaDCuUHVZ7kwRvW-aKzLtsSM0yJ5crh-DZ5QbHevRU=.28845f90-4c77-4875-b88d-8d5af411ea17@github.com> References: <2CaDCuUHVZ7kwRvW-aKzLtsSM0yJ5crh-DZ5QbHevRU=.28845f90-4c77-4875-b88d-8d5af411ea17@github.com> Message-ID: On Fri, 5 Jan 2024 11:32:16 GMT, Boris Ulasevich wrote: > Test checks for the ADRP instruction in the [Exception Handler] section of the TestFarJump::main PrintAssembly output. > > The -XX:CompileOnly=TestFarJump::main option is not sufficient to restrict PrintAssembly output to a specific method. A number of method assemblies precede the output of the TestFarJump::main method: > - java.lang.invoke.MethodHandle::linkToStatic(LLLLLLL)L > - java.lang.invoke.MethodHandle::linkToStatic(LLLL)V > - java.lang.invoke.MethodHandle::invokeBasic(LLLLLL)L > - java.lang.invoke.MethodHandle::linkToSpecial(LLLLLLLL)L > - java.lang.invoke.MethodHandle::linkToSpecial(LLLLL)V > - java.lang.invoke.MethodHandle::invokeBasic(LLL)L > - java.lang.invoke.MethodHandle::linkToSpecial(LLLLL)L > - java.lang.invoke.MethodHandle::linkToStatic(LL)L > - java.lang.invoke.MethodHandle::linkToSpecial(LL)V > - java.lang.invoke.MethodHandle::invokeBasic()L > - java.lang.invoke.MethodHandle::linkToSpecial(LL)L > - java.lang.invoke.MethodHandle::linkToStatic(LLL)L > - java.lang.invoke.MethodHandle::linkToStatic(LLLL)L > - java.lang.invoke.MethodHandle::linkToSpecial(LLL)V > - java.lang.invoke.MethodHandle::invokeBasic(L)L > - java.lang.invoke.MethodHandle::linkToSpecial(LLL)L > - java.lang.invoke.MethodHandle::linkToStatic(LLL)V > - java.lang.invoke.MethodHandle::linkToStatic(LL)I > - jdk.internal.vm.Continuation::enterSpecial > - compiler.c2.aarch64.TestFarJump::main > > With this change, I use the -XX:CompileCommand=option,TestFarJump::main,bool,PrintAssembly,true option to restrict the PrintAssembly output to a specific method. Now the only [Exception Handler] in the listing is the section of TestFarJump::main method. Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17278#pullrequestreview-1810713559 From roland at openjdk.org Tue Jan 9 09:28:01 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 9 Jan 2024 09:28:01 GMT Subject: RFR: 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 [v12] In-Reply-To: References: Message-ID: <7WZcrJtRFhNaGiw4c_ov_XW-dWcJ5-GdRy_8Vh2ikWA=.0cdfcad8-eb9b-43ee-b160-ad96f698a0b6@github.com> > Range check smearing and range check predication make an array access > dependent on 2 (or more in the case of RC smearing) conditions. As a > consequence, if a range check can be eliminated because there's an > identical dominating range check, the control dependent nodes that > could float and become dependent on the dominating range check cannot > be allowed to float because there's a risk that they would then bypass > one of the checks that make the access legal. > > `IfNode::dominated_by()` and `PhaseIdealLoop::dominated_by()` have > logic to prevent this: nodes that are control dependent on a range > check or predicate are not allowed to float. This is however not > sufficient as demonstrated by the test cases. > > In `TestArrayAccessAboveRCAfterSmearingOrPredication.testRangeCheckSmearing()`: > > > v += array[i]; > if (flag2) { > if (flag3) { > field = 0x42; > } > } > if (flagField == 1) { > v += array[i]; > } > > > The range check for the second `array[i]` load is replaced by the > dominating range check for the first `array[i]` but because the second > `array[i]` load could really be dependent on multiple range checks (in > case smearing happened which is not the case here), c2 doesn't allow > the second `array[i]` to float when the second range check is > removed. The second `array[i]` is then control dependent on: > > > if (flagField == 1) { > > > which is next found to be dominated by the same test: > > > if (flag == 1) { > > > and is removed. However nothing in `dominated_by()` treats node > dependent on tests that are not range check or predicates > specially. So the second `array[i]` is allowed to float and become > dependent on: > > > if (flag == 1) { > > > which is above the range check for that access. The test method in its > last invocation is passed an index for the array access that's widely > out of range. The array load happens before the range check and > crashes the VM. `testLoopPredication()` is a similar test where array > loads become dependent on predicates and end up above range checks. > > `TestArrayAccessCastIIAboveRC.java` is the test case from the bug > where for similar reasons a range check `CastII` ends up above its > range check, becomes top because its input becomes some integer that > conflicts with its type (but there's no condition to catch it). The > graph becomes broken and c2 crashes. > > Logic in the `dominated_by()` methods ... Roland Westrelin has updated the pull request incrementally with five additional commits since the last revision: - Update src/hotspot/share/opto/ifnode.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/node.hpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/loopopts.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/loopopts.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/loopPredicate.cpp Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16886/files - new: https://git.openjdk.org/jdk/pull/16886/files/51231631..04a9d3a5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16886&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16886&range=10-11 Stats: 5 lines in 4 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/16886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16886/head:pull/16886 PR: https://git.openjdk.org/jdk/pull/16886 From chagedorn at openjdk.org Tue Jan 9 09:48:31 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 9 Jan 2024 09:48:31 GMT Subject: RFR: 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 [v12] In-Reply-To: <7WZcrJtRFhNaGiw4c_ov_XW-dWcJ5-GdRy_8Vh2ikWA=.0cdfcad8-eb9b-43ee-b160-ad96f698a0b6@github.com> References: <7WZcrJtRFhNaGiw4c_ov_XW-dWcJ5-GdRy_8Vh2ikWA=.0cdfcad8-eb9b-43ee-b160-ad96f698a0b6@github.com> Message-ID: <3tBQwAHsTPlkltypH11S3rj0Ptxaa1kZHTWFZMyWJYY=.19ccf21d-9029-4f77-928f-c9ef823e6b92@github.com> On Tue, 9 Jan 2024 09:28:01 GMT, Roland Westrelin wrote: >> Range check smearing and range check predication make an array access >> dependent on 2 (or more in the case of RC smearing) conditions. As a >> consequence, if a range check can be eliminated because there's an >> identical dominating range check, the control dependent nodes that >> could float and become dependent on the dominating range check cannot >> be allowed to float because there's a risk that they would then bypass >> one of the checks that make the access legal. >> >> `IfNode::dominated_by()` and `PhaseIdealLoop::dominated_by()` have >> logic to prevent this: nodes that are control dependent on a range >> check or predicate are not allowed to float. This is however not >> sufficient as demonstrated by the test cases. >> >> In `TestArrayAccessAboveRCAfterSmearingOrPredication.testRangeCheckSmearing()`: >> >> >> v += array[i]; >> if (flag2) { >> if (flag3) { >> field = 0x42; >> } >> } >> if (flagField == 1) { >> v += array[i]; >> } >> >> >> The range check for the second `array[i]` load is replaced by the >> dominating range check for the first `array[i]` but because the second >> `array[i]` load could really be dependent on multiple range checks (in >> case smearing happened which is not the case here), c2 doesn't allow >> the second `array[i]` to float when the second range check is >> removed. The second `array[i]` is then control dependent on: >> >> >> if (flagField == 1) { >> >> >> which is next found to be dominated by the same test: >> >> >> if (flag == 1) { >> >> >> and is removed. However nothing in `dominated_by()` treats node >> dependent on tests that are not range check or predicates >> specially. So the second `array[i]` is allowed to float and become >> dependent on: >> >> >> if (flag == 1) { >> >> >> which is above the range check for that access. The test method in its >> last invocation is passed an index for the array access that's widely >> out of range. The array load happens before the range check and >> crashes the VM. `testLoopPredication()` is a similar test where array >> loads become dependent on predicates and end up above range checks. >> >> `TestArrayAccessCastIIAboveRC.java` is the test case from the bug >> where for similar reasons a range check `CastII` ends up above its >> range check, becomes top because its input becomes some integer that >> conflicts with its... > > Roland Westrelin has updated the pull request incrementally with five additional commits since the last revision: > > - Update src/hotspot/share/opto/ifnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/node.hpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopopts.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopopts.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopPredicate.cpp > > Co-authored-by: Christian Hagedorn Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16886#pullrequestreview-1810774372 From epeter at openjdk.org Tue Jan 9 10:12:53 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 9 Jan 2024 10:12:53 GMT Subject: RFR: 8315361: C2 SuperWord: refactor out loop analysis into shared auto-vectorization facility VLoopAnalyzer [v4] In-Reply-To: References: Message-ID: > This is a refactoring of `SuperWord`. > I intend to push it for JDK23, after [this bug fix](https://github.com/openjdk/jdk/pull/14785). > > **Goals** > > 1. Clean up `SuperWord`: disentangle different components, make them more **modular**. > 2. Make the loop analysis parts a **shared facility**, not just for SuperWord but also the post-loop-vectorizer ([JDK-8308994](https://bugs.openjdk.org/browse/JDK-8308994)). > 3. It is also a necessary step on my bigger plans for improvement with the C2 Auto-Vectorizer ([see my blog post](https://eme64.github.io/blog/2023/11/03/C2-AutoVectorizer-Improvement-Ideas.html)). > 4. Improve tracing in the auto-vectorization by making it more systematic. > > **Summary** > > - I wrote a summary of how C2 auto-vectorization with SuperWord works (please read!): > https://github.com/openjdk/jdk/blob/95fd361e60fc66eb91edad321662e508b2d1bdde/src/hotspot/share/opto/superword.hpp#L32-L177 > - I moved many `Superword` components out to `VLoop` and its subclass `VLoopAnalyzer`. The idea is that any vectorizer can use these facilities in the future. They are therefore made more modular, which should hopefully make future changes easier. These components are: > - Checking the pre-conditions for vectorization (e.g. no unwanted ctrl-flow). > - `VLoop::check_preconditions_helper` replaces code from old `SuperWord::transform_loop`. > - Running all submodules of `VLoopAnalyzer`: `VLoopAnalyzer::analyze_helper`. Replaces analysis part of `SuperWord::SLP_extract`. > - Finding and marking reductions -> `VLoopReductions` > - Detecting memory slices -> `VLoopMemorySlices` > - Analyzing the body -> `VLoopBody` (renamed `in_bb` -> `in_body`) > - Determining vector element types, and functions to determine the `vector_width` of a node -> `VLoopTypes` > - Constructing the dependence graph -> `VLoopDependenceGraph`. Replaces old `DepGraph` with all its components. > - New: CompileCommand option `TraceAutovectorization` > - Run with `-XX:CompileCommand=traceAutovectorization,*::*,help` to get a usage description. > - Replaced all printing with flags `TraceSuperWord` (and `Verbose`) and of `VectorizeDebug`. > - The advantage of a CompileCommand is that tracing can be applied selectively for only a limited set of java classes / methods. > - It uses tags, which are more readable than the `VectorizeDebug` bit-flags. These tags can be used for all parts of the vectorizer, but one can also target SuperWord specifically. > - I systematically added tracing at every point where vector... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: move superword tracing ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16620/files - new: https://git.openjdk.org/jdk/pull/16620/files/0831bb59..99b577bd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16620&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16620&range=02-03 Stats: 94 lines in 4 files changed: 23 ins; 32 del; 39 mod Patch: https://git.openjdk.org/jdk/pull/16620.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16620/head:pull/16620 PR: https://git.openjdk.org/jdk/pull/16620 From bulasevich at openjdk.org Tue Jan 9 10:36:30 2024 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Tue, 9 Jan 2024 10:36:30 GMT Subject: RFR: 8322858: compiler/c2/aarch64/TestFarJump.java fails on AArch64 due to unexpected PrintAssembly output In-Reply-To: <2CaDCuUHVZ7kwRvW-aKzLtsSM0yJ5crh-DZ5QbHevRU=.28845f90-4c77-4875-b88d-8d5af411ea17@github.com> References: <2CaDCuUHVZ7kwRvW-aKzLtsSM0yJ5crh-DZ5QbHevRU=.28845f90-4c77-4875-b88d-8d5af411ea17@github.com> Message-ID: On Fri, 5 Jan 2024 11:32:16 GMT, Boris Ulasevich wrote: > Test checks for the ADRP instruction in the [Exception Handler] section of the TestFarJump::main PrintAssembly output. > > The -XX:CompileOnly=TestFarJump::main option is not sufficient to restrict PrintAssembly output to a specific method. A number of method assemblies precede the output of the TestFarJump::main method: > - java.lang.invoke.MethodHandle::linkToStatic(LLLLLLL)L > - java.lang.invoke.MethodHandle::linkToStatic(LLLL)V > - java.lang.invoke.MethodHandle::invokeBasic(LLLLLL)L > - java.lang.invoke.MethodHandle::linkToSpecial(LLLLLLLL)L > - java.lang.invoke.MethodHandle::linkToSpecial(LLLLL)V > - java.lang.invoke.MethodHandle::invokeBasic(LLL)L > - java.lang.invoke.MethodHandle::linkToSpecial(LLLLL)L > - java.lang.invoke.MethodHandle::linkToStatic(LL)L > - java.lang.invoke.MethodHandle::linkToSpecial(LL)V > - java.lang.invoke.MethodHandle::invokeBasic()L > - java.lang.invoke.MethodHandle::linkToSpecial(LL)L > - java.lang.invoke.MethodHandle::linkToStatic(LLL)L > - java.lang.invoke.MethodHandle::linkToStatic(LLLL)L > - java.lang.invoke.MethodHandle::linkToSpecial(LLL)V > - java.lang.invoke.MethodHandle::invokeBasic(L)L > - java.lang.invoke.MethodHandle::linkToSpecial(LLL)L > - java.lang.invoke.MethodHandle::linkToStatic(LLL)V > - java.lang.invoke.MethodHandle::linkToStatic(LL)I > - jdk.internal.vm.Continuation::enterSpecial > - compiler.c2.aarch64.TestFarJump::main > > With this change, I use the -XX:CompileCommand=option,TestFarJump::main,bool,PrintAssembly,true option to restrict the PrintAssembly output to a specific method. Now the only [Exception Handler] in the listing is the section of TestFarJump::main method. Tobias and Andrew, thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/17278#issuecomment-1882814890 From bulasevich at openjdk.org Tue Jan 9 10:36:31 2024 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Tue, 9 Jan 2024 10:36:31 GMT Subject: Integrated: 8322858: compiler/c2/aarch64/TestFarJump.java fails on AArch64 due to unexpected PrintAssembly output In-Reply-To: <2CaDCuUHVZ7kwRvW-aKzLtsSM0yJ5crh-DZ5QbHevRU=.28845f90-4c77-4875-b88d-8d5af411ea17@github.com> References: <2CaDCuUHVZ7kwRvW-aKzLtsSM0yJ5crh-DZ5QbHevRU=.28845f90-4c77-4875-b88d-8d5af411ea17@github.com> Message-ID: <9QSoSVKI_YVlpOE47Akaz_gV9EeML20B-AlXu9CpGVY=.4bfab3ea-9719-4dce-b647-deb87b6ed107@github.com> On Fri, 5 Jan 2024 11:32:16 GMT, Boris Ulasevich wrote: > Test checks for the ADRP instruction in the [Exception Handler] section of the TestFarJump::main PrintAssembly output. > > The -XX:CompileOnly=TestFarJump::main option is not sufficient to restrict PrintAssembly output to a specific method. A number of method assemblies precede the output of the TestFarJump::main method: > - java.lang.invoke.MethodHandle::linkToStatic(LLLLLLL)L > - java.lang.invoke.MethodHandle::linkToStatic(LLLL)V > - java.lang.invoke.MethodHandle::invokeBasic(LLLLLL)L > - java.lang.invoke.MethodHandle::linkToSpecial(LLLLLLLL)L > - java.lang.invoke.MethodHandle::linkToSpecial(LLLLL)V > - java.lang.invoke.MethodHandle::invokeBasic(LLL)L > - java.lang.invoke.MethodHandle::linkToSpecial(LLLLL)L > - java.lang.invoke.MethodHandle::linkToStatic(LL)L > - java.lang.invoke.MethodHandle::linkToSpecial(LL)V > - java.lang.invoke.MethodHandle::invokeBasic()L > - java.lang.invoke.MethodHandle::linkToSpecial(LL)L > - java.lang.invoke.MethodHandle::linkToStatic(LLL)L > - java.lang.invoke.MethodHandle::linkToStatic(LLLL)L > - java.lang.invoke.MethodHandle::linkToSpecial(LLL)V > - java.lang.invoke.MethodHandle::invokeBasic(L)L > - java.lang.invoke.MethodHandle::linkToSpecial(LLL)L > - java.lang.invoke.MethodHandle::linkToStatic(LLL)V > - java.lang.invoke.MethodHandle::linkToStatic(LL)I > - jdk.internal.vm.Continuation::enterSpecial > - compiler.c2.aarch64.TestFarJump::main > > With this change, I use the -XX:CompileCommand=option,TestFarJump::main,bool,PrintAssembly,true option to restrict the PrintAssembly output to a specific method. Now the only [Exception Handler] in the listing is the section of TestFarJump::main method. This pull request has now been integrated. Changeset: 52a6c375 Author: Boris Ulasevich URL: https://git.openjdk.org/jdk/commit/52a6c37558fa970f595067bc1bb5bc2b710c3876 Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod 8322858: compiler/c2/aarch64/TestFarJump.java fails on AArch64 due to unexpected PrintAssembly output Reviewed-by: aph, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/17278 From epeter at openjdk.org Tue Jan 9 10:49:57 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 9 Jan 2024 10:49:57 GMT Subject: RFR: 8315361: C2 SuperWord: refactor out loop analysis into shared auto-vectorization facility VLoopAnalyzer [v5] In-Reply-To: References: Message-ID: > This is a refactoring of `SuperWord`. > > **Goals** > > 1. Clean up `SuperWord`: disentangle different components, make them more **modular**. > 2. Make the loop analysis parts a **shared facility**, not just for SuperWord but also the post-loop-vectorizer ([JDK-8308994](https://bugs.openjdk.org/browse/JDK-8308994)). > 3. It is also a necessary step on my bigger plans for improvement with the C2 Auto-Vectorizer ([see my blog post](https://eme64.github.io/blog/2023/11/03/C2-AutoVectorizer-Improvement-Ideas.html)). > 4. Improve tracing in the auto-vectorization by making it more systematic. > > **Summary** > > - I wrote a summary of how C2 auto-vectorization with SuperWord works (please read!): > https://github.com/openjdk/jdk/blob/95fd361e60fc66eb91edad321662e508b2d1bdde/src/hotspot/share/opto/superword.hpp#L32-L177 > - I moved many `Superword` components out to `VLoop` and its subclass `VLoopAnalyzer`. The idea is that any vectorizer can use these facilities in the future. They are therefore made more modular, which should hopefully make future changes easier. These components are: > - Checking the pre-conditions for vectorization (e.g. no unwanted ctrl-flow). > - `VLoop::check_preconditions_helper` replaces code from old `SuperWord::transform_loop`. > - Running all submodules of `VLoopAnalyzer`: `VLoopAnalyzer::analyze_helper`. Replaces analysis part of `SuperWord::SLP_extract`. > - Finding and marking reductions -> `VLoopReductions` > - Detecting memory slices -> `VLoopMemorySlices` > - Analyzing the body -> `VLoopBody` (renamed `in_bb` -> `in_body`) > - Determining vector element types, and functions to determine the `vector_width` of a node -> `VLoopTypes` > - Constructing the dependence graph -> `VLoopDependenceGraph`. Replaces old `DepGraph` with all its components. > - New: CompileCommand option `TraceAutovectorization` > - Run with `-XX:CompileCommand=traceAutovectorization,*::*,help` to get a usage description. > - Replaced all printing with flags `TraceSuperWord` (and `Verbose`) and of `VectorizeDebug`. > - The advantage of a CompileCommand is that tracing can be applied selectively for only a limited set of java classes / methods. > - It uses tags, which are more readable than the `VectorizeDebug` bit-flags. These tags can be used for all parts of the vectorizer, but one can also target SuperWord specifically. > - I systematically added tracing at every point where vectorization (partially) fails (use tag `SW_REJECTIONS`). > - `TraceSuperWord` still works, and perfo... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: fix debug / product guards for tracing, now consistently not_product ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16620/files - new: https://git.openjdk.org/jdk/pull/16620/files/99b577bd..28e0e4e0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16620&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16620&range=03-04 Stats: 19 lines in 3 files changed: 0 ins; 1 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/16620.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16620/head:pull/16620 PR: https://git.openjdk.org/jdk/pull/16620 From epeter at openjdk.org Tue Jan 9 10:57:40 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 9 Jan 2024 10:57:40 GMT Subject: RFR: 8315361: C2 SuperWord: refactor out loop analysis into shared auto-vectorization facility VLoopAnalyzer [v6] In-Reply-To: References: Message-ID: > This is a refactoring of `SuperWord`. > > **Goals** > > 1. Clean up `SuperWord`: disentangle different components, make them more **modular**. > 2. Make the loop analysis parts a **shared facility**, not just for SuperWord but also the post-loop-vectorizer ([JDK-8308994](https://bugs.openjdk.org/browse/JDK-8308994)). > 3. It is also a necessary step on my bigger plans for improvement with the C2 Auto-Vectorizer ([see my blog post](https://eme64.github.io/blog/2023/11/03/C2-AutoVectorizer-Improvement-Ideas.html)). > 4. Improve tracing in the auto-vectorization by making it more systematic. > > **Summary** > > - I wrote a summary of how C2 auto-vectorization with SuperWord works (please read!): > https://github.com/openjdk/jdk/blob/95fd361e60fc66eb91edad321662e508b2d1bdde/src/hotspot/share/opto/superword.hpp#L32-L177 > - I moved many `Superword` components out to `VLoop` and its subclass `VLoopAnalyzer`. The idea is that any vectorizer can use these facilities in the future. They are therefore made more modular, which should hopefully make future changes easier. These components are: > - Checking the pre-conditions for vectorization (e.g. no unwanted ctrl-flow). > - `VLoop::check_preconditions_helper` replaces code from old `SuperWord::transform_loop`. > - Running all submodules of `VLoopAnalyzer`: `VLoopAnalyzer::analyze_helper`. Replaces analysis part of `SuperWord::SLP_extract`. > - Finding and marking reductions -> `VLoopReductions` > - Detecting memory slices -> `VLoopMemorySlices` > - Analyzing the body -> `VLoopBody` (renamed `in_bb` -> `in_body`) > - Determining vector element types, and functions to determine the `vector_width` of a node -> `VLoopTypes` > - Constructing the dependence graph -> `VLoopDependenceGraph`. Replaces old `DepGraph` with all its components. > - New: CompileCommand option `TraceAutovectorization` > - Run with `-XX:CompileCommand=traceAutovectorization,*::*,help` to get a usage description. > - Replaced all printing with flags `TraceSuperWord` (and `Verbose`) and of `VectorizeDebug`. > - The advantage of a CompileCommand is that tracing can be applied selectively for only a limited set of java classes / methods. > - It uses tags, which are more readable than the `VectorizeDebug` bit-flags. These tags can be used for all parts of the vectorizer, but one can also target SuperWord specifically. > - I systematically added tracing at every point where vectorization (partially) fails (use tag `SW_REJECTIONS`). > - `TraceSuperWord` still works, and perfo... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: beautify bailout on failure state ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16620/files - new: https://git.openjdk.org/jdk/pull/16620/files/28e0e4e0..c9079656 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16620&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16620&range=04-05 Stats: 16 lines in 1 file changed: 6 ins; 5 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/16620.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16620/head:pull/16620 PR: https://git.openjdk.org/jdk/pull/16620 From epeter at openjdk.org Tue Jan 9 11:43:40 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 9 Jan 2024 11:43:40 GMT Subject: RFR: 8315361: C2 SuperWord: refactor out loop analysis into shared auto-vectorization facility VLoopAnalyzer [v7] In-Reply-To: References: Message-ID: <_ltybJEddrDgVxjogJgcbodsqPTEoeZfWTZsX1v4Jvg=.43b52e44-c1ec-42d3-b117-de9bc28432e8@github.com> > This is a refactoring of `SuperWord`. > > **Goals** > > 1. Clean up `SuperWord`: disentangle different components, make them more **modular**. > 2. Make the loop analysis parts a **shared facility**, not just for SuperWord but also the post-loop-vectorizer ([JDK-8308994](https://bugs.openjdk.org/browse/JDK-8308994)). > 3. It is also a necessary step on my bigger plans for improvement with the C2 Auto-Vectorizer ([see my blog post](https://eme64.github.io/blog/2023/11/03/C2-AutoVectorizer-Improvement-Ideas.html)). > 4. Improve tracing in the auto-vectorization by making it more systematic. > > **Summary** > > - I wrote a summary of how C2 auto-vectorization with SuperWord works (please read!): > https://github.com/openjdk/jdk/blob/95fd361e60fc66eb91edad321662e508b2d1bdde/src/hotspot/share/opto/superword.hpp#L32-L177 > - I moved many `Superword` components out to `VLoop` and its subclass `VLoopAnalyzer`. The idea is that any vectorizer can use these facilities in the future. They are therefore made more modular, which should hopefully make future changes easier. These components are: > - Checking the pre-conditions for vectorization (e.g. no unwanted ctrl-flow). > - `VLoop::check_preconditions_helper` replaces code from old `SuperWord::transform_loop`. > - Running all submodules of `VLoopAnalyzer`: `VLoopAnalyzer::analyze_helper`. Replaces analysis part of `SuperWord::SLP_extract`. > - Finding and marking reductions -> `VLoopReductions` > - Detecting memory slices -> `VLoopMemorySlices` > - Analyzing the body -> `VLoopBody` (renamed `in_bb` -> `in_body`) > - Determining vector element types, and functions to determine the `vector_width` of a node -> `VLoopTypes` > - Constructing the dependence graph -> `VLoopDependenceGraph`. Replaces old `DepGraph` with all its components. > - New: CompileCommand option `TraceAutovectorization` > - Run with `-XX:CompileCommand=traceAutovectorization,*::*,help` to get a usage description. > - Replaced all printing with flags `TraceSuperWord` (and `Verbose`) and of `VectorizeDebug`. > - The advantage of a CompileCommand is that tracing can be applied selectively for only a limited set of java classes / methods. > - It uses tags, which are more readable than the `VectorizeDebug` bit-flags. These tags can be used for all parts of the vectorizer, but one can also target SuperWord specifically. > - I systematically added tracing at every point where vectorization (partially) fails (use tag `SW_REJECTIONS`). > - `TraceSuperWord` still works, and perfo... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: product guard for TraceSuperWordLoopUnrollAnalysis tracing ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16620/files - new: https://git.openjdk.org/jdk/pull/16620/files/c9079656..1f5d4ef2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16620&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16620&range=05-06 Stats: 6 lines in 1 file changed: 6 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16620.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16620/head:pull/16620 PR: https://git.openjdk.org/jdk/pull/16620 From epeter at openjdk.org Tue Jan 9 11:47:29 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 9 Jan 2024 11:47:29 GMT Subject: RFR: 8315361: C2 SuperWord: refactor out loop analysis into shared auto-vectorization facility VLoopAnalyzer [v7] In-Reply-To: <_ltybJEddrDgVxjogJgcbodsqPTEoeZfWTZsX1v4Jvg=.43b52e44-c1ec-42d3-b117-de9bc28432e8@github.com> References: <_ltybJEddrDgVxjogJgcbodsqPTEoeZfWTZsX1v4Jvg=.43b52e44-c1ec-42d3-b117-de9bc28432e8@github.com> Message-ID: On Tue, 9 Jan 2024 11:43:40 GMT, Emanuel Peter wrote: >> This is a refactoring of `SuperWord`. >> >> **Goals** >> >> 1. Clean up `SuperWord`: disentangle different components, make them more **modular**. >> 2. Make the loop analysis parts a **shared facility**, not just for SuperWord but also the post-loop-vectorizer ([JDK-8308994](https://bugs.openjdk.org/browse/JDK-8308994)). >> 3. It is also a necessary step on my bigger plans for improvement with the C2 Auto-Vectorizer ([see my blog post](https://eme64.github.io/blog/2023/11/03/C2-AutoVectorizer-Improvement-Ideas.html)). >> 4. Improve tracing in the auto-vectorization by making it more systematic. >> >> **Summary** >> >> - I wrote a summary of how C2 auto-vectorization with SuperWord works (please read!): >> https://github.com/openjdk/jdk/blob/95fd361e60fc66eb91edad321662e508b2d1bdde/src/hotspot/share/opto/superword.hpp#L32-L177 >> - I moved many `Superword` components out to `VLoop` and its subclass `VLoopAnalyzer`. The idea is that any vectorizer can use these facilities in the future. They are therefore made more modular, which should hopefully make future changes easier. These components are: >> - Checking the pre-conditions for vectorization (e.g. no unwanted ctrl-flow). >> - `VLoop::check_preconditions_helper` replaces code from old `SuperWord::transform_loop`. >> - Running all submodules of `VLoopAnalyzer`: `VLoopAnalyzer::analyze_helper`. Replaces analysis part of `SuperWord::SLP_extract`. >> - Finding and marking reductions -> `VLoopReductions` >> - Detecting memory slices -> `VLoopMemorySlices` >> - Analyzing the body -> `VLoopBody` (renamed `in_bb` -> `in_body`) >> - Determining vector element types, and functions to determine the `vector_width` of a node -> `VLoopTypes` >> - Constructing the dependence graph -> `VLoopDependenceGraph`. Replaces old `DepGraph` with all its components. >> - New: CompileCommand option `TraceAutovectorization` >> - Run with `-XX:CompileCommand=traceAutovectorization,*::*,help` to get a usage description. >> - Replaced all printing with flags `TraceSuperWord` (and `Verbose`) and of `VectorizeDebug`. >> - The advantage of a CompileCommand is that tracing can be applied selectively for only a limited set of java classes / methods. >> - It uses tags, which are more readable than the `VectorizeDebug` bit-flags. These tags can be used for all parts of the vectorizer, but one can also target SuperWord specifically. >> - I systematically added tracing at every point where vectorization (partially) fails (use tag `SW_R... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > product guard for TraceSuperWordLoopUnrollAnalysis tracing @fg1417 @chhagedorn I merged in my other SuperWord change (AlignVector fix), and addressed the previous comments. Would you mind reviewing (again)? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16620#issuecomment-1882920281 From chagedorn at openjdk.org Tue Jan 9 13:44:25 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 9 Jan 2024 13:44:25 GMT Subject: RFR: 8315361: C2 SuperWord: refactor out loop analysis into shared auto-vectorization facility VLoopAnalyzer [v7] In-Reply-To: <_ltybJEddrDgVxjogJgcbodsqPTEoeZfWTZsX1v4Jvg=.43b52e44-c1ec-42d3-b117-de9bc28432e8@github.com> References: <_ltybJEddrDgVxjogJgcbodsqPTEoeZfWTZsX1v4Jvg=.43b52e44-c1ec-42d3-b117-de9bc28432e8@github.com> Message-ID: On Tue, 9 Jan 2024 11:43:40 GMT, Emanuel Peter wrote: >> This is a refactoring of `SuperWord`. >> >> **Goals** >> >> 1. Clean up `SuperWord`: disentangle different components, make them more **modular**. >> 2. Make the loop analysis parts a **shared facility**, not just for SuperWord but also the post-loop-vectorizer ([JDK-8308994](https://bugs.openjdk.org/browse/JDK-8308994)). >> 3. It is also a necessary step on my bigger plans for improvement with the C2 Auto-Vectorizer ([see my blog post](https://eme64.github.io/blog/2023/11/03/C2-AutoVectorizer-Improvement-Ideas.html)). >> 4. Improve tracing in the auto-vectorization by making it more systematic. >> >> **Summary** >> >> - I wrote a summary of how C2 auto-vectorization with SuperWord works (please read!): >> https://github.com/openjdk/jdk/blob/95fd361e60fc66eb91edad321662e508b2d1bdde/src/hotspot/share/opto/superword.hpp#L32-L177 >> - I moved many `Superword` components out to `VLoop` and its subclass `VLoopAnalyzer`. The idea is that any vectorizer can use these facilities in the future. They are therefore made more modular, which should hopefully make future changes easier. These components are: >> - Checking the pre-conditions for vectorization (e.g. no unwanted ctrl-flow). >> - `VLoop::check_preconditions_helper` replaces code from old `SuperWord::transform_loop`. >> - Running all submodules of `VLoopAnalyzer`: `VLoopAnalyzer::analyze_helper`. Replaces analysis part of `SuperWord::SLP_extract`. >> - Finding and marking reductions -> `VLoopReductions` >> - Detecting memory slices -> `VLoopMemorySlices` >> - Analyzing the body -> `VLoopBody` (renamed `in_bb` -> `in_body`) >> - Determining vector element types, and functions to determine the `vector_width` of a node -> `VLoopTypes` >> - Constructing the dependence graph -> `VLoopDependenceGraph`. Replaces old `DepGraph` with all its components. >> - New: CompileCommand option `TraceAutovectorization` >> - Run with `-XX:CompileCommand=traceAutovectorization,*::*,help` to get a usage description. >> - Replaced all printing with flags `TraceSuperWord` (and `Verbose`) and of `VectorizeDebug`. >> - The advantage of a CompileCommand is that tracing can be applied selectively for only a limited set of java classes / methods. >> - It uses tags, which are more readable than the `VectorizeDebug` bit-flags. These tags can be used for all parts of the vectorizer, but one can also target SuperWord specifically. >> - I systematically added tracing at every point where vectorization (partially) fails (use tag `SW_R... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > product guard for TraceSuperWordLoopUnrollAnalysis tracing Sure, I'll try to have a look later this week. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16620#issuecomment-1883071528 From epeter at openjdk.org Tue Jan 9 13:54:42 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 9 Jan 2024 13:54:42 GMT Subject: RFR: 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 [v12] In-Reply-To: <7WZcrJtRFhNaGiw4c_ov_XW-dWcJ5-GdRy_8Vh2ikWA=.0cdfcad8-eb9b-43ee-b160-ad96f698a0b6@github.com> References: <7WZcrJtRFhNaGiw4c_ov_XW-dWcJ5-GdRy_8Vh2ikWA=.0cdfcad8-eb9b-43ee-b160-ad96f698a0b6@github.com> Message-ID: <9VeaoeqFVJApx5G4UWDoZM8UqZgm4lKlDiRxxZoha5c=.62c6c314-ae6c-42da-9ebd-de9d200b39ce@github.com> On Tue, 9 Jan 2024 09:28:01 GMT, Roland Westrelin wrote: >> Range check smearing and range check predication make an array access >> dependent on 2 (or more in the case of RC smearing) conditions. As a >> consequence, if a range check can be eliminated because there's an >> identical dominating range check, the control dependent nodes that >> could float and become dependent on the dominating range check cannot >> be allowed to float because there's a risk that they would then bypass >> one of the checks that make the access legal. >> >> `IfNode::dominated_by()` and `PhaseIdealLoop::dominated_by()` have >> logic to prevent this: nodes that are control dependent on a range >> check or predicate are not allowed to float. This is however not >> sufficient as demonstrated by the test cases. >> >> In `TestArrayAccessAboveRCAfterSmearingOrPredication.testRangeCheckSmearing()`: >> >> >> v += array[i]; >> if (flag2) { >> if (flag3) { >> field = 0x42; >> } >> } >> if (flagField == 1) { >> v += array[i]; >> } >> >> >> The range check for the second `array[i]` load is replaced by the >> dominating range check for the first `array[i]` but because the second >> `array[i]` load could really be dependent on multiple range checks (in >> case smearing happened which is not the case here), c2 doesn't allow >> the second `array[i]` to float when the second range check is >> removed. The second `array[i]` is then control dependent on: >> >> >> if (flagField == 1) { >> >> >> which is next found to be dominated by the same test: >> >> >> if (flag == 1) { >> >> >> and is removed. However nothing in `dominated_by()` treats node >> dependent on tests that are not range check or predicates >> specially. So the second `array[i]` is allowed to float and become >> dependent on: >> >> >> if (flag == 1) { >> >> >> which is above the range check for that access. The test method in its >> last invocation is passed an index for the array access that's widely >> out of range. The array load happens before the range check and >> crashes the VM. `testLoopPredication()` is a similar test where array >> loads become dependent on predicates and end up above range checks. >> >> `TestArrayAccessCastIIAboveRC.java` is the test case from the bug >> where for similar reasons a range check `CastII` ends up above its >> range check, becomes top because its input becomes some integer that >> conflicts with its... > > Roland Westrelin has updated the pull request incrementally with five additional commits since the last revision: > > - Update src/hotspot/share/opto/ifnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/node.hpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopopts.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopopts.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopPredicate.cpp > > Co-authored-by: Christian Hagedorn @rwestrel thanks for the update, I really like the comments now! Just one more comment suggestion and a single renaming idea. Otherwise LGTM ? src/hotspot/share/opto/cfgnode.hpp line 434: > 432: static Node* up_one_dom(Node* curr, bool linear_only = false); > 433: bool is_zero_trip_guard() const; > 434: Node* dominated_by(Node* prev_dom, PhaseIterGVN* igvn, bool pin_array_nodes); Suggestion: Node* dominated_by(Node* prev_dom, PhaseIterGVN* igvn, bool pin_array_access_nodes); src/hotspot/share/opto/ifnode.cpp line 1502: > 1500: > 1501: //------------------------------dominated_by----------------------------------- > 1502: Node* IfNode::dominated_by(Node* prev_dom, PhaseIterGVN* igvn, bool pin_array_nodes) { Suggestion: Node* IfNode::dominated_by(Node* prev_dom, PhaseIterGVN* igvn, bool pin_array_access_nodes) { src/hotspot/share/opto/ifnode.cpp line 1537: > 1535: // Do not rewire Div and Mod nodes which could have a zero divisor to avoid skipping their zero check. > 1536: igvn->replace_input_of(s, 0, data_target); // Move child to data-target > 1537: if (pin_array_nodes && data_target != top) { Suggestion: if (pin_array_access_nodes && data_target != top) { src/hotspot/share/opto/loopnode.hpp line 1510: > 1508: // Mark an IfNode as being dominated by a prior test, > 1509: // without actually altering the CFG (and hence IDOM info). > 1510: void dominated_by(IfProjNode* prevdom, IfNode* iff, bool flip = false, bool pin_array_nodes = false); Suggestion: void dominated_by(IfProjNode* prevdom, IfNode* iff, bool flip = false, bool pin_array_access_nodes = false); src/hotspot/share/opto/loopopts.cpp line 308: > 306: // IGVN worklist for later cleanup. Move control-dependent data Nodes on the > 307: // live path up to the dominating control. > 308: void PhaseIdealLoop::dominated_by(IfProjNode* prevdom, IfNode* iff, bool flip, bool pin_array_nodes) { Suggestion: void PhaseIdealLoop::dominated_by(IfProjNode* prevdom, IfNode* iff, bool flip, bool pin_array_access_nodes) { src/hotspot/share/opto/loopopts.cpp line 355: > 353: assert(cd->in(0) == dp, ""); > 354: _igvn.replace_input_of(cd, 0, prevdom); > 355: if (pin_array_nodes) { Suggestion: if (pin_array_access_nodes) { ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16886#pullrequestreview-1811165732 Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16886#pullrequestreview-1811191003 PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1446097554 PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1446098611 PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1446098965 PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1446108306 PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1446108500 PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1446108919 From epeter at openjdk.org Tue Jan 9 13:54:43 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 9 Jan 2024 13:54:43 GMT Subject: RFR: 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 [v9] In-Reply-To: References: Message-ID: On Fri, 5 Jan 2024 09:53:09 GMT, Roland Westrelin wrote: >> Ah. Does this mean that if there are multiple RangeCheck in a loop, where some could be smeared, these are not smeared, and then we have more RangeChecks to eliminate out of the loop? Maybe in the end this all comes down to the same anyway. Just wondering. > >> Why is it ok to delay this to post-loop-opts? Does it not prevent moving some CFG from being eliminated out of loops? Would be nice to have a little justification comment. > > Maybe. With this fix, range check smearing requires pinning nodes. So running it early also has a drawback: it can cause nodes that would otherwise float to be pinned. The way I see it, range check smearing is a local optimization for cases where range checks can't be eliminated some other way so running it late should not make a difference. If the range check is in a loop and predication removes it then running RC smearing early doesn't make a difference. If the range check is part of a range check sequence that can only be optimized by RC smearing then having a longer range check sequence for the duration of loop opts probably makes no difference. @rwestrel would you mind explaining exactly that in a comment? Something like: We are about to perform range check smearing (i.e. remove this RangeCheck if it is dominated by two RangeChecks which have a range that covers the this RangeCheck). This can cause nodes to be pinned. We want to avoid that and first allow RangeCheckElimination a chance to remove the RangeChecks from loops. Hence, we delay range check smearing until after loop opts. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1446106265 From epeter at openjdk.org Tue Jan 9 14:01:28 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 9 Jan 2024 14:01:28 GMT Subject: RFR: 8322589: Add Ideal transformation: (~a) & (~b) => ~(a | b) [v10] In-Reply-To: <_lWTVDYsWmINZsi0bPleMs3F3n-WPgHbLoTfpu8sHSg=.0dde4ccd-77e8-4e0e-80ad-cb233b858579@github.com> References: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> <_lWTVDYsWmINZsi0bPleMs3F3n-WPgHbLoTfpu8sHSg=.0dde4ccd-77e8-4e0e-80ad-cb233b858579@github.com> Message-ID: On Tue, 9 Jan 2024 05:54:39 GMT, Zhiqiang Zang wrote: >> Hello, >> >> `(~a) & (~b) => ~(a | b)` is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1616C28-L1616C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. >> >> Thanks. > > Zhiqiang Zang has updated the pull request incrementally with one additional commit since the last revision: > > move make_not back to AddNode because it cannot compile for architecture arm, s390x, ppc64le. LGTM, thanks for the work! ? ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16333#pullrequestreview-1811202862 From epeter at openjdk.org Tue Jan 9 14:05:27 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 9 Jan 2024 14:05:27 GMT Subject: RFR: 8322077: Add Ideal transformation: (~a) | (~b) => ~(a & b) [v11] In-Reply-To: References: Message-ID: On Tue, 9 Jan 2024 06:06:51 GMT, Zhiqiang Zang wrote: >> Hello, >> >> (~a) | (~b) => ~(a & b) is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1617C28-L1617C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. >> >> Thanks. > > Zhiqiang Zang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision: > > - adapt to new changes from the dependant pr. > - Merge branch 'andnode-PNewDeMorganLawAndToOr' into ornode-PNewDeMorganLawOrToAnd > - adapt changes from the dependent pr. > - Merge branch 'andnode-PNewDeMorganLawAndToOr' into ornode-PNewDeMorganLawOrToAnd > - Update test/hotspot/jtreg/compiler/c2/irTests/DeMorganLawLongTests.java > > Co-authored-by: Tobias Hartmann > - Update test/hotspot/jtreg/compiler/c2/irTests/DeMorganLawIntTests.java > > Co-authored-by: Tobias Hartmann > - Add tests for using De Morgan's Law for both optimizations. > - remove unused code from tests. > - update the copyright dates. > - address comments. > - ... and 4 more: https://git.openjdk.org/jdk/compare/25f84663...b21e242b LGTM, and thanks for the work! Please only integrate this once your other change is integrated, and merged into this one. Then wait for GHA to complete, and run your own testing. ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16334#pullrequestreview-1811210531 From epeter at openjdk.org Tue Jan 9 14:16:25 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 9 Jan 2024 14:16:25 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v4] In-Reply-To: References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Message-ID: On Mon, 8 Jan 2024 06:23:46 GMT, Jatin Bhateja wrote: >> Hi, >> >> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only targets. >> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 instruction set. >> These are very frequently used APIs in columnar database filter operation. >> >> Implementation uses a lookup table to record permute indices. Table index is computed using >> mask argument of compress/expand operation. >> >> Following are the performance number of JMH micro included with the patch. >> >> >> System : Intel(R) Xeon(R) Platinum 8480+ (Sapphire Rapids) >> >> Baseline: >> Benchmark (size) Mode Cnt Score Error Units >> ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 142.767 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 71.436 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 35.992 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 182.151 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 91.096 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 44.757 ops/ms >> ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 184.099 ops/ms >> ColumnFilterBenchmark.filterIntColumn 2047 thrpt 2 91.981 ops/ms >> ColumnFilterBenchmark.filterIntColumn 4096 thrpt 2 45.170 ops/ms >> ColumnFilterBenchmark.filterLongColumn 1024 thrpt 2 148.017 ops/ms >> ColumnFilterBenchmark.filterLongColumn 2047 thrpt 2 73.516 ops/ms >> ColumnFilterBenchmark.filterLongColumn 4096 thrpt 2 36.844 ops/ms >> >> Withopt: >> Benchmark (size) Mode Cnt Score Error Units >> ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 2051.707 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 914.072 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 489.898 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 5324.195 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 2587.229 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 1278.665 ops/ms >> ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 4149.384 ops/ms >> ColumnFilterBenchmark.filterIntColumn 2047 thrpt ... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review suggestions incorporated. I think we are almost there! ? src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 5291: > 5289: if (bt == T_INT || bt == T_FLOAT) { > 5290: vmovmskps(rtmp, mask, vec_enc); > 5291: shlq(rtmp, 5); Suggestion: shlq(rtmp, 5); // for 32 bit rows (8 int) src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 5309: > 5307: assert(bt == T_LONG || bt == T_DOUBLE, ""); > 5308: vmovmskpd(rtmp, mask, vec_enc); > 5309: shlq(rtmp, 5); Suggestion: shlq(rtmp, 5); // for 32 bit rows (4 long) src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 1018: > 1016: } else { > 1017: assert(esize == 64, ""); > 1018: // Loop to generate 16 x 4 int expand permute index table. A row is accessed Suggestion: // Loop to generate 16 x 4 long expand permute index table. A row is accessed ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17261#pullrequestreview-1811224600 PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1446133371 PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1446133800 PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1446132575 From epeter at openjdk.org Tue Jan 9 14:16:27 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 9 Jan 2024 14:16:27 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2] In-Reply-To: References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> <1GHGK7AGinCMKjFIB5oadUP0jiZrC39Z0hncAS3H-9Y=.eb617125-1f51-4cff-889e-b15321f5c72b@github.com> Message-ID: On Tue, 9 Jan 2024 06:13:44 GMT, Jatin Bhateja wrote: >> Yes, IF it is vectorized, then there is no difference between high and low density. My concern was more if vectorization is preferrable over the scalar alternative in the low-density case, where branch prediction is more stable. > > At runtime we do need to scan entire mask to pick the compressible lane corresponding to set mask bit. Thus the loop overhead of mask compare (BTW masks are held in a vector register for AVX2 targets) and jump will anyways be incurred , in addition for sparsely populated mask we may incur additional misprediction penalty for not taking if block which extracts an element from appropriate source vector lane and insert into destination vector lane. Overall vector solution will win for most common cases for varying mask and also for very sparsely populate masks. Here is the result of setting just a single mask bit. > > > @Benchmark > public void fuzzyFilterIntColumn() { > int i = 0; > int j = 0; > long maskctr = 1; > int endIndex = ispecies.loopBound(size); > for (; i < endIndex; i += ispecies.length()) { > IntVector vec = IntVector.fromArray(ispecies, intinCol, i); > VectorMask pred = VectorMask.fromLong(ispecies, 1); > vec.compress(pred).intoArray(intoutCol, j); > j += pred.trueCount(); > } > } > > > Baseline: > Benchmark (size) Mode Cnt Score Error Units > ColumnFilterBenchmark.fuzzyFilterIntColumn 1024 thrpt 2 379.059 ops/ms > ColumnFilterBenchmark.fuzzyFilterIntColumn 2047 thrpt 2 188.355 ops/ms > ColumnFilterBenchmark.fuzzyFilterIntColumn 4096 thrpt 2 95.315 ops/ms > > > Withopt: > Benchmark (size) Mode Cnt Score Error Units > ColumnFilterBenchmark.fuzzyFilterIntColumn 1024 thrpt 2 7390.074 ops/ms > ColumnFilterBenchmark.fuzzyFilterIntColumn 2047 thrpt 2 3483.247 ops/ms > ColumnFilterBenchmark.fuzzyFilterIntColumn 4096 thrpt 2 1823.817 ops/ms Nice, thanks for the data! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1446138902 From rrich at openjdk.org Tue Jan 9 14:17:23 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 9 Jan 2024 14:17:23 GMT Subject: RFR: 8290965: PPC64: Implement post-call NOPs [v2] In-Reply-To: References: Message-ID: On Sat, 23 Dec 2023 11:56:10 GMT, Richard Reingruber wrote: >> #### Implementation of post call nops (PCNs) on ppc64. >> >> Depends on https://github.com/openjdk/jdk/pull/17150 >> >> About post call nops: >> >> - instruction(s) at return addresses of compiled java calls >> - emitted iff vm continuations are enabled to support virtual threads >> - encode data that can be be used to find the corresponding CodeBlob and oop map faster >> - mt-safe patchable to trigger deoptimization >> >> Background: >> >> - Frames in continuation StackChunks are not visited if their compiled method is made not entrant (in contrast to frames on stack). >> Instead all PCNs of the compiled method are patched to trigger deoptimization when control returns to such frames. >> - With vm continuations, stacks are walked and inspected more frequently. This requires lookup of metadata like frame size and oop maps. As an optimization the offset of the CodeBlob to the PCN and the oop map slot are encoded as data in the PCN. >> >> Post call nops on ppc64 >> >> - 1 instruction, i.e. 4 bytes (either CMPI or CMPLI[1]) >> x86_64: 1 instruction, 8 bytes >> aarch64: 3 instruction, 12 bytes >> [1] 3.1.10 Fixed Point Compare Instructions in Power ISA 3.1B >> https://openpowerfoundation.org/specifications/isa/ >> >> - 26 bits data payload >> x86_64: 32 bits; aarch64: 32 bits >> - 9 bits dedicated to oop map slot. With 8 bits there where cases with SPECjvm2008 where the slot could not be encoded (on ppc64 and x86_64). >> x86_64: 8 bits; aarch64: 8 bits >> - 17 bits dedicated to cb offset. Effectively 19 bits due to instruction alignment. >> x86_64: 24 bits; aarch64: 24 bits >> - Also used when reconstructing the back chain after thawing continuation frames (see `Thaw::patch_caller_links`) >> >> - Refactored frame constructors to make use of fast CodeBlob lookup based on PCNs. >> The fast lookup may only be used if the pc is known to be in the code cache because `CodeCache::find_blob_fast` can yield wrong results if it finds instructions outside the code cache that look just like PCNs. Callers of the frame class constructors need to pass `frame::kind::native` in that case to avoid errors. Other platforms don't make this explicit which is a problem in my eyes. Picking the wrong constructor can cause errors when porting and in future development. >> >> - Currently only the PCNs in nmethods are initialized. Therefore we don't even try to make a fast lookup based on PCNs if we know the CodeBlob is, e.g., a RuntimeStub. To achieve this we call the frame cons... > > Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: > > Fix comment > > Co-authored-by: Andrew Haley > _Mailing list message from [Andrew Haley](mailto:aph-open at littlepinkcloud.com) on [hotspot-compiler-dev](mailto:hotspot-compiler-dev at mail.openjdk.org):_ > > On 12/20/23 20:36, Richard Reingruber wrote: > > > | test/jdk/java/lang/Thread/virtual/stress/Skynet.java | ppc64le | x86_64 | > > |------------------------------------------------------|-----------|-----------| > > | PCN lookup success | 306955525 | 247185016 | > > | PCN lookup failure | 500975 | 421098 | > > | PCN decode success (C2) | 306951893 | 247181691 | > > | PCN decode failure | 3168 | 59 | > > | PCN patch success | 2080 | 2662 | > > | PCN patch cb offset failure | 0 | 0 | > > | PCN patch oopmap slot failure | 0 | 0 | > > These data are really interesting. How did you gather them? Thanks. This is the code for the stats based on master: https://github.com/openjdk/jdk/commit/c376fcc9099251a3f62edc246748f26d0a54e2c0 This is the version for this pr: https://github.com/openjdk/jdk/commit/ae2b6ba70bfdca6a58f9af6b3a675c0f2aec7d85 (Actually these are a cleaner reimplementations of the original code) ------------- PR Comment: https://git.openjdk.org/jdk/pull/17171#issuecomment-1883125887 From jbhateja at openjdk.org Tue Jan 9 15:17:22 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 9 Jan 2024 15:17:22 GMT Subject: RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp In-Reply-To: References: Message-ID: <8ZGiFoB4TkRgQSP67ekJ_Tw_uMnEyVNdU9GSa4bx69M=.f252a9b8-367c-49e6-916e-48dd0e6e936e@github.com> On Tue, 9 Jan 2024 02:25:15 GMT, Vladimir Kozlov wrote: > Should we "short cut" code when registers are the same? Hi @sviswa7 , An identity transformation may be useful here to prevent generating MaxF/D in case both the arguments are same. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17315#issuecomment-1883238177 From qamai at openjdk.org Tue Jan 9 15:32:51 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 9 Jan 2024 15:32:51 GMT Subject: RFR: 8282365: Consolidate and improve division by constant idealizations [v42] In-Reply-To: References: Message-ID: On Mon, 8 Jan 2024 09:44:51 GMT, Kim Barrett wrote: >> The rules around the includes lines in our tests and what we currently have in the tests are messy at the movement. We should fix that when we find the time to. >> >> For HotSpot source code files the includes should be structured as:: >> >> hotspot includes >> blank line >> system includes >> >> >> There are some deviations from that, but those should be cleaned up instead of used as a precedence. For our tests we should add "unittest.hpp" at the end of the "hotspot includes" section. > > In the Oracle-internal discussion of include order from about a year ago, there was not a consensus > decision about the position of "unittest.hpp". There was a concern that in some cases it really was > required to be last for some technical reason. That needed (and still needs) investigation. I assume this means that the include order is good as it is now? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1446242593 From stefank at openjdk.org Tue Jan 9 16:00:01 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 9 Jan 2024 16:00:01 GMT Subject: RFR: 8282365: Consolidate and improve division by constant idealizations [v42] In-Reply-To: References: Message-ID: On Tue, 9 Jan 2024 15:29:44 GMT, Quan Anh Mai wrote: >> In the Oracle-internal discussion of include order from about a year ago, there was not a consensus >> decision about the position of "unittest.hpp". There was a concern that in some cases it really was >> required to be last for some technical reason. That needed (and still needs) investigation. > > I assume this means that the include order is good as it is now? Please update it to: #include "precompiled.hpp" #include "opto/divconstants.hpp" #include "runtime/os.hpp" #include "utilities/growableArray.hpp" #include "unittest.hpp" #include ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1446279651 From duke at openjdk.org Tue Jan 9 16:47:08 2024 From: duke at openjdk.org (ArsenyBochkarev) Date: Tue, 9 Jan 2024 16:47:08 GMT Subject: RFR: 8317721: RISC-V: Implement CRC32 intrinsic [v4] In-Reply-To: References: Message-ID: > Hi everyone! Please review this port of [AArch64](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L4224) `_updateBytesCRC32`, `_updateByteBufferCRC32` and `_updateCRC32` intrinsics. This patch introduces only the plain (non-vectorized, no Zbc) version. > > ### Correctness checks > > Tier 1/2 tests are ok. > > ### Performance results on T-Head board > > #### Results for enabled intrinsic: > > Used test is `test/micro/org/openjdk/bench/java/util/TestCRC32.java` > > | Benchmark | (count) | Mode | Cnt | Score | Error | Units | > | --- | ---- | ----- | --- | ---- | --- | ---- | > | CRC32.TestCRC32.testCRC32Update | 64 | thrpt | 24 | 3730.929 | 37.773 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 128 | thrpt | 24 | 2126.673 | 2.032 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 256 | thrpt | 24 | 1134.330 | 6.714 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 512 | thrpt | 24 | 584.017 | 2.267 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 2048 | thrpt | 24 | 151.173 | 0.346 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 16384 | thrpt | 24 | 19.113 | 0.008 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 65536 | thrpt | 24 | 4.647 | 0.022 | ops/ms | > > #### Results for disabled intrinsic: > > | Benchmark | (count) | Mode | Cnt | Score | Error | Units | > | --------------------------------------------------- | ---------- | --------- | ---- | ----------- | --------- | ---------- | > | CRC32.TestCRC32.testCRC32Update | 64 | thrpt | 15 | 798.365 | 35.486 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 128 | thrpt | 15 | 677.756 | 46.619 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 256 | thrpt | 15 | 552.781 | 27.143 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 512 | thrpt | 15 | 429.304 | 12.518 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 2048 | thrpt | 15 | 166.738 | 0.935 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 16384 | thrpt | 15 | 25.060 | 0.034 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 65536 | thrpt | 15 | 6.196 | 0.030 | ops/ms | ArsenyBochkarev has updated the pull request incrementally with four additional commits since the last revision: - Fix unroll size - Rename constants - Partially unroll loop - Optimize loop counter in L_by16_loop ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17046/files - new: https://git.openjdk.org/jdk/pull/17046/files/a59481b4..046d5530 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17046&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17046&range=02-03 Stats: 33 lines in 1 file changed: 7 ins; 1 del; 25 mod Patch: https://git.openjdk.org/jdk/pull/17046.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17046/head:pull/17046 PR: https://git.openjdk.org/jdk/pull/17046 From duke at openjdk.org Tue Jan 9 16:47:08 2024 From: duke at openjdk.org (ArsenyBochkarev) Date: Tue, 9 Jan 2024 16:47:08 GMT Subject: RFR: 8317721: RISC-V: Implement CRC32 intrinsic [v3] In-Reply-To: <_CysHDX3CV-ZM4ilLgHSRrcDk4DHDNe1ClAKFCV_uoM=.751d91bf-e7e0-4b78-8ff5-2b864c38dd73@github.com> References: <_CysHDX3CV-ZM4ilLgHSRrcDk4DHDNe1ClAKFCV_uoM=.751d91bf-e7e0-4b78-8ff5-2b864c38dd73@github.com> Message-ID: On Thu, 21 Dec 2023 22:20:15 GMT, ArsenyBochkarev wrote: >> Hi everyone! Please review this port of [AArch64](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L4224) `_updateBytesCRC32`, `_updateByteBufferCRC32` and `_updateCRC32` intrinsics. This patch introduces only the plain (non-vectorized, no Zbc) version. >> >> ### Correctness checks >> >> Tier 1/2 tests are ok. >> >> ### Performance results on T-Head board >> >> #### Results for enabled intrinsic: >> >> Used test is `test/micro/org/openjdk/bench/java/util/TestCRC32.java` >> >> | Benchmark | (count) | Mode | Cnt | Score | Error | Units | >> | --- | ---- | ----- | --- | ---- | --- | ---- | >> | CRC32.TestCRC32.testCRC32Update | 64 | thrpt | 24 | 3730.929 | 37.773 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 128 | thrpt | 24 | 2126.673 | 2.032 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 256 | thrpt | 24 | 1134.330 | 6.714 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 512 | thrpt | 24 | 584.017 | 2.267 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 2048 | thrpt | 24 | 151.173 | 0.346 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 16384 | thrpt | 24 | 19.113 | 0.008 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 65536 | thrpt | 24 | 4.647 | 0.022 | ops/ms | >> >> #### Results for disabled intrinsic: >> >> | Benchmark | (count) | Mode | Cnt | Score | Error | Units | >> | --------------------------------------------------- | ---------- | --------- | ---- | ----------- | --------- | ---------- | >> | CRC32.TestCRC32.testCRC32Update | 64 | thrpt | 15 | 798.365 | 35.486 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 128 | thrpt | 15 | 677.756 | 46.619 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 256 | thrpt | 15 | 552.781 | 27.143 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 512 | thrpt | 15 | 429.304 | 12.518 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 2048 | thrpt | 15 | 166.738 | 0.935 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 16384 | thrpt | 15 | 25.060 | 0.034 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 65536 | thrpt | 15 | 6.196 | 0.030 | ops/ms | > > ArsenyBochkarev has updated the pull request incrementally with five additional commits since the last revision: > > - Use MacroAssembler::lwu instead of Assembler::lwu > - Save instruction when getting table3 address > - Left note on how table elements are accessed > - Fix comment for result register > - Remove unused L_by16 label Hello again everyone! I was able to optimize regressions for most cases on big amount of data by partially unrolling the big loop and disposing from loop counter (previously in `len` register). Results for `-XX:+UseZba` of StarFive VisionFive2 board: | Benchmark | (count) | Mode | Cnt | Score | Error | Units | | ------------------------------ | ------------ | --------- | ----- | ---------- | ----------- | ----- | | CRC32.TestCRC32.testCRC32Update | 64 | thrpt | 12 | 4215.728 | 3.972 | ops/ms | | CRC32.TestCRC32.testCRC32Update | 128 | thrpt | 12 | 2607.882 | 1.627 | ops/ms | | CRC32.TestCRC32.testCRC32Update | 256 | thrpt | 12 | 1364.899 | 8.857 | ops/ms | | CRC32.TestCRC32.testCRC32Update | 512 | thrpt | 12 | 704.316 | 3.222 | ops/ms | | CRC32.TestCRC32.testCRC32Update | 2048 | thrpt | 12 | 180.738 | 0.474 | ops/ms | | CRC32.TestCRC32.testCRC32Update | 16384 | thrpt | 12 | 22.722 | 0.059 | ops/ms | | CRC32.TestCRC32.testCRC32Update | 65536 | thrpt | 12 | 5.327 | 0.019 | ops/ms | while the results for `-XX:-UseCRC32Intrinsics` are [here](https://github.com/openjdk/jdk/pull/17046#issuecomment-1850364667) ------------- PR Comment: https://git.openjdk.org/jdk/pull/17046#issuecomment-1883404214 From jbhateja at openjdk.org Tue Jan 9 16:48:56 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 9 Jan 2024 16:48:56 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v5] In-Reply-To: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Message-ID: > Hi, > > Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only targets. > Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 instruction set. > These are very frequently used APIs in columnar database filter operation. > > Implementation uses a lookup table to record permute indices. Table index is computed using > mask argument of compress/expand operation. > > Following are the performance number of JMH micro included with the patch. > > > System : Intel(R) Xeon(R) Platinum 8480+ (Sapphire Rapids) > > Baseline: > Benchmark (size) Mode Cnt Score Error Units > ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 142.767 ops/ms > ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 71.436 ops/ms > ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 35.992 ops/ms > ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 182.151 ops/ms > ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 91.096 ops/ms > ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 44.757 ops/ms > ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 184.099 ops/ms > ColumnFilterBenchmark.filterIntColumn 2047 thrpt 2 91.981 ops/ms > ColumnFilterBenchmark.filterIntColumn 4096 thrpt 2 45.170 ops/ms > ColumnFilterBenchmark.filterLongColumn 1024 thrpt 2 148.017 ops/ms > ColumnFilterBenchmark.filterLongColumn 2047 thrpt 2 73.516 ops/ms > ColumnFilterBenchmark.filterLongColumn 4096 thrpt 2 36.844 ops/ms > > Withopt: > Benchmark (size) Mode Cnt Score Error Units > ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 2051.707 ops/ms > ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 914.072 ops/ms > ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 489.898 ops/ms > ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 5324.195 ops/ms > ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 2587.229 ops/ms > ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 1278.665 ops/ms > ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 4149.384 ops/ms > ColumnFilterBenchmark.filterIntColumn 2047 thrpt 2 1791.170 ops/ms > ColumnFilterBenchmark.filterIntColumn 4096... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Using emulated variable blend E-Core optimized instruction. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17261/files - new: https://git.openjdk.org/jdk/pull/17261/files/257a6351..c3f1c50e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17261&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17261&range=03-04 Stats: 28 lines in 4 files changed: 18 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/17261.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17261/head:pull/17261 PR: https://git.openjdk.org/jdk/pull/17261 From roland at openjdk.org Tue Jan 9 16:51:02 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 9 Jan 2024 16:51:02 GMT Subject: RFR: 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 [v13] In-Reply-To: References: Message-ID: > Range check smearing and range check predication make an array access > dependent on 2 (or more in the case of RC smearing) conditions. As a > consequence, if a range check can be eliminated because there's an > identical dominating range check, the control dependent nodes that > could float and become dependent on the dominating range check cannot > be allowed to float because there's a risk that they would then bypass > one of the checks that make the access legal. > > `IfNode::dominated_by()` and `PhaseIdealLoop::dominated_by()` have > logic to prevent this: nodes that are control dependent on a range > check or predicate are not allowed to float. This is however not > sufficient as demonstrated by the test cases. > > In `TestArrayAccessAboveRCAfterSmearingOrPredication.testRangeCheckSmearing()`: > > > v += array[i]; > if (flag2) { > if (flag3) { > field = 0x42; > } > } > if (flagField == 1) { > v += array[i]; > } > > > The range check for the second `array[i]` load is replaced by the > dominating range check for the first `array[i]` but because the second > `array[i]` load could really be dependent on multiple range checks (in > case smearing happened which is not the case here), c2 doesn't allow > the second `array[i]` to float when the second range check is > removed. The second `array[i]` is then control dependent on: > > > if (flagField == 1) { > > > which is next found to be dominated by the same test: > > > if (flag == 1) { > > > and is removed. However nothing in `dominated_by()` treats node > dependent on tests that are not range check or predicates > specially. So the second `array[i]` is allowed to float and become > dependent on: > > > if (flag == 1) { > > > which is above the range check for that access. The test method in its > last invocation is passed an index for the array access that's widely > out of range. The array load happens before the range check and > crashes the VM. `testLoopPredication()` is a similar test where array > loads become dependent on predicates and end up above range checks. > > `TestArrayAccessCastIIAboveRC.java` is the test case from the bug > where for similar reasons a range check `CastII` ends up above its > range check, becomes top because its input becomes some integer that > conflicts with its type (but there's no condition to catch it). The > graph becomes broken and c2 crashes. > > Logic in the `dominated_by()` methods ... Roland Westrelin has updated the pull request incrementally with six additional commits since the last revision: - Update src/hotspot/share/opto/loopopts.cpp Co-authored-by: Emanuel Peter - Update src/hotspot/share/opto/loopopts.cpp Co-authored-by: Emanuel Peter - Update src/hotspot/share/opto/loopnode.hpp Co-authored-by: Emanuel Peter - Update src/hotspot/share/opto/ifnode.cpp Co-authored-by: Emanuel Peter - Update src/hotspot/share/opto/ifnode.cpp Co-authored-by: Emanuel Peter - Update src/hotspot/share/opto/cfgnode.hpp Co-authored-by: Emanuel Peter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16886/files - new: https://git.openjdk.org/jdk/pull/16886/files/04a9d3a5..372021b6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16886&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16886&range=11-12 Stats: 6 lines in 4 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/16886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16886/head:pull/16886 PR: https://git.openjdk.org/jdk/pull/16886 From duke at openjdk.org Tue Jan 9 16:56:50 2024 From: duke at openjdk.org (Zhiqiang Zang) Date: Tue, 9 Jan 2024 16:56:50 GMT Subject: RFR: 8322589: Add Ideal transformation: (~a) & (~b) => ~(a | b) [v11] In-Reply-To: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> References: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> Message-ID: > Hello, > > `(~a) & (~b) => ~(a | b)` is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1616C28-L1616C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. > > Thanks. Zhiqiang Zang has updated the pull request incrementally with three additional commits since the last revision: - Revert "move the two helper functions to member functions of the node class." This reverts commit 7a962d69ac687c0476e54cd004037f2ebb0800ba. - Revert "update copyright dates." This reverts commit 3665de2f3d38b4fce3ba968ec6247dd2576ecc32. - Revert "move make_not back to AddNode because it cannot compile for architecture arm, s390x, ppc64le." This reverts commit 4ee8b089b6f119b1692dcd2c512d249d92734697. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16333/files - new: https://git.openjdk.org/jdk/pull/16333/files/4ee8b089..65942221 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16333&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16333&range=09-10 Stats: 21 lines in 5 files changed: 8 ins; 8 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/16333.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16333/head:pull/16333 PR: https://git.openjdk.org/jdk/pull/16333 From roland at openjdk.org Tue Jan 9 17:02:48 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 9 Jan 2024 17:02:48 GMT Subject: RFR: 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 [v14] In-Reply-To: References: Message-ID: > Range check smearing and range check predication make an array access > dependent on 2 (or more in the case of RC smearing) conditions. As a > consequence, if a range check can be eliminated because there's an > identical dominating range check, the control dependent nodes that > could float and become dependent on the dominating range check cannot > be allowed to float because there's a risk that they would then bypass > one of the checks that make the access legal. > > `IfNode::dominated_by()` and `PhaseIdealLoop::dominated_by()` have > logic to prevent this: nodes that are control dependent on a range > check or predicate are not allowed to float. This is however not > sufficient as demonstrated by the test cases. > > In `TestArrayAccessAboveRCAfterSmearingOrPredication.testRangeCheckSmearing()`: > > > v += array[i]; > if (flag2) { > if (flag3) { > field = 0x42; > } > } > if (flagField == 1) { > v += array[i]; > } > > > The range check for the second `array[i]` load is replaced by the > dominating range check for the first `array[i]` but because the second > `array[i]` load could really be dependent on multiple range checks (in > case smearing happened which is not the case here), c2 doesn't allow > the second `array[i]` to float when the second range check is > removed. The second `array[i]` is then control dependent on: > > > if (flagField == 1) { > > > which is next found to be dominated by the same test: > > > if (flag == 1) { > > > and is removed. However nothing in `dominated_by()` treats node > dependent on tests that are not range check or predicates > specially. So the second `array[i]` is allowed to float and become > dependent on: > > > if (flag == 1) { > > > which is above the range check for that access. The test method in its > last invocation is passed an index for the array access that's widely > out of range. The array load happens before the range check and > crashes the VM. `testLoopPredication()` is a similar test where array > loads become dependent on predicates and end up above range checks. > > `TestArrayAccessCastIIAboveRC.java` is the test case from the bug > where for similar reasons a range check `CastII` ends up above its > range check, becomes top because its input becomes some integer that > conflicts with its type (but there's no condition to catch it). The > graph becomes broken and c2 crashes. > > Logic in the `dominated_by()` methods ... Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16886/files - new: https://git.openjdk.org/jdk/pull/16886/files/372021b6..998d030e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16886&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16886&range=12-13 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16886/head:pull/16886 PR: https://git.openjdk.org/jdk/pull/16886 From roland at openjdk.org Tue Jan 9 17:02:50 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 9 Jan 2024 17:02:50 GMT Subject: RFR: 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 [v12] In-Reply-To: <9VeaoeqFVJApx5G4UWDoZM8UqZgm4lKlDiRxxZoha5c=.62c6c314-ae6c-42da-9ebd-de9d200b39ce@github.com> References: <7WZcrJtRFhNaGiw4c_ov_XW-dWcJ5-GdRy_8Vh2ikWA=.0cdfcad8-eb9b-43ee-b160-ad96f698a0b6@github.com> <9VeaoeqFVJApx5G4UWDoZM8UqZgm4lKlDiRxxZoha5c=.62c6c314-ae6c-42da-9ebd-de9d200b39ce@github.com> Message-ID: On Tue, 9 Jan 2024 13:51:47 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with five additional commits since the last revision: >> >> - Update src/hotspot/share/opto/ifnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/node.hpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/loopopts.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/loopopts.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/loopPredicate.cpp >> >> Co-authored-by: Christian Hagedorn > > Otherwise LGTM ? @eme64 @chhagedorn thanks for the suggestions. I made the change you requested. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16886#issuecomment-1883434825 From epeter at openjdk.org Tue Jan 9 17:10:36 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 9 Jan 2024 17:10:36 GMT Subject: RFR: 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 [v14] In-Reply-To: References: Message-ID: On Tue, 9 Jan 2024 17:02:48 GMT, Roland Westrelin wrote: >> Range check smearing and range check predication make an array access >> dependent on 2 (or more in the case of RC smearing) conditions. As a >> consequence, if a range check can be eliminated because there's an >> identical dominating range check, the control dependent nodes that >> could float and become dependent on the dominating range check cannot >> be allowed to float because there's a risk that they would then bypass >> one of the checks that make the access legal. >> >> `IfNode::dominated_by()` and `PhaseIdealLoop::dominated_by()` have >> logic to prevent this: nodes that are control dependent on a range >> check or predicate are not allowed to float. This is however not >> sufficient as demonstrated by the test cases. >> >> In `TestArrayAccessAboveRCAfterSmearingOrPredication.testRangeCheckSmearing()`: >> >> >> v += array[i]; >> if (flag2) { >> if (flag3) { >> field = 0x42; >> } >> } >> if (flagField == 1) { >> v += array[i]; >> } >> >> >> The range check for the second `array[i]` load is replaced by the >> dominating range check for the first `array[i]` but because the second >> `array[i]` load could really be dependent on multiple range checks (in >> case smearing happened which is not the case here), c2 doesn't allow >> the second `array[i]` to float when the second range check is >> removed. The second `array[i]` is then control dependent on: >> >> >> if (flagField == 1) { >> >> >> which is next found to be dominated by the same test: >> >> >> if (flag == 1) { >> >> >> and is removed. However nothing in `dominated_by()` treats node >> dependent on tests that are not range check or predicates >> specially. So the second `array[i]` is allowed to float and become >> dependent on: >> >> >> if (flag == 1) { >> >> >> which is above the range check for that access. The test method in its >> last invocation is passed an index for the array access that's widely >> out of range. The array load happens before the range check and >> crashes the VM. `testLoopPredication()` is a similar test where array >> loads become dependent on predicates and end up above range checks. >> >> `TestArrayAccessCastIIAboveRC.java` is the test case from the bug >> where for similar reasons a range check `CastII` ends up above its >> range check, becomes top because its input becomes some integer that >> conflicts with its... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Thanks, still LGTM ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16886#pullrequestreview-1811622944 From epeter at openjdk.org Tue Jan 9 17:21:06 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 9 Jan 2024 17:21:06 GMT Subject: RFR: 8315361: C2 SuperWord: refactor out loop analysis into shared auto-vectorization facility VLoopAnalyzer [v8] In-Reply-To: References: Message-ID: > This is a refactoring of `SuperWord`. > > **Goals** > > 1. Clean up `SuperWord`: disentangle different components, make them more **modular**. > 2. Make the loop analysis parts a **shared facility**, not just for SuperWord but also the post-loop-vectorizer ([JDK-8308994](https://bugs.openjdk.org/browse/JDK-8308994)). > 3. It is also a necessary step on my bigger plans for improvement with the C2 Auto-Vectorizer ([see my blog post](https://eme64.github.io/blog/2023/11/03/C2-AutoVectorizer-Improvement-Ideas.html)). > 4. Improve tracing in the auto-vectorization by making it more systematic. > > **Summary** > > - I wrote a summary of how C2 auto-vectorization with SuperWord works (please read!): > https://github.com/openjdk/jdk/blob/95fd361e60fc66eb91edad321662e508b2d1bdde/src/hotspot/share/opto/superword.hpp#L32-L177 > - I moved many `Superword` components out to `VLoop` and its subclass `VLoopAnalyzer`. The idea is that any vectorizer can use these facilities in the future. They are therefore made more modular, which should hopefully make future changes easier. These components are: > - Checking the pre-conditions for vectorization (e.g. no unwanted ctrl-flow). > - `VLoop::check_preconditions_helper` replaces code from old `SuperWord::transform_loop`. > - Running all submodules of `VLoopAnalyzer`: `VLoopAnalyzer::analyze_helper`. Replaces analysis part of `SuperWord::SLP_extract`. > - Finding and marking reductions -> `VLoopReductions` > - Detecting memory slices -> `VLoopMemorySlices` > - Analyzing the body -> `VLoopBody` (renamed `in_bb` -> `in_body`) > - Determining vector element types, and functions to determine the `vector_width` of a node -> `VLoopTypes` > - Constructing the dependence graph -> `VLoopDependenceGraph`. Replaces old `DepGraph` with all its components. > - New: CompileCommand option `TraceAutovectorization` > - Run with `-XX:CompileCommand=traceAutovectorization,*::*,help` to get a usage description. > - Replaced all printing with flags `TraceSuperWord` (and `Verbose`) and of `VectorizeDebug`. > - The advantage of a CompileCommand is that tracing can be applied selectively for only a limited set of java classes / methods. > - It uses tags, which are more readable than the `VectorizeDebug` bit-flags. These tags can be used for all parts of the vectorizer, but one can also target SuperWord specifically. > - I systematically added tracing at every point where vectorization (partially) fails (use tag `SW_REJECTIONS`). > - `TraceSuperWord` still works, and perfo... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: update copyright for 2024 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16620/files - new: https://git.openjdk.org/jdk/pull/16620/files/1f5d4ef2..4302f58b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16620&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16620&range=06-07 Stats: 21 lines in 21 files changed: 0 ins; 0 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/16620.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16620/head:pull/16620 PR: https://git.openjdk.org/jdk/pull/16620 From duke at openjdk.org Tue Jan 9 17:23:58 2024 From: duke at openjdk.org (Zhiqiang Zang) Date: Tue, 9 Jan 2024 17:23:58 GMT Subject: RFR: 8322077: Add Ideal transformation: (~a) | (~b) => ~(a & b) [v12] In-Reply-To: References: Message-ID: > Hello, > > (~a) | (~b) => ~(a & b) is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1617C28-L1617C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. > > Thanks. Zhiqiang Zang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 18 additional commits since the last revision: - update copyright dates. - Merge branch 'andnode-PNewDeMorganLawAndToOr' into ornode-PNewDeMorganLawOrToAnd - Revert "adapt changes from the dependent pr." This reverts commit 0c8d10776fe55a28ea9f60160b4f6f57aabad5ab. - Revert "adapt to new changes from the dependant pr." This reverts commit b21e242b306d9cf01e1ba84ebba47ef5b471d98e. - adapt to new changes from the dependant pr. - Merge branch 'andnode-PNewDeMorganLawAndToOr' into ornode-PNewDeMorganLawOrToAnd - adapt changes from the dependent pr. - Merge branch 'andnode-PNewDeMorganLawAndToOr' into ornode-PNewDeMorganLawOrToAnd - Update test/hotspot/jtreg/compiler/c2/irTests/DeMorganLawLongTests.java Co-authored-by: Tobias Hartmann - Update test/hotspot/jtreg/compiler/c2/irTests/DeMorganLawIntTests.java Co-authored-by: Tobias Hartmann - ... and 8 more: https://git.openjdk.org/jdk/compare/8ab76889...dc60a548 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16334/files - new: https://git.openjdk.org/jdk/pull/16334/files/b21e242b..dc60a548 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16334&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16334&range=10-11 Stats: 23 lines in 5 files changed: 8 ins; 8 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/16334.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16334/head:pull/16334 PR: https://git.openjdk.org/jdk/pull/16334 From qamai at openjdk.org Tue Jan 9 17:27:38 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 9 Jan 2024 17:27:38 GMT Subject: RFR: 8282365: Consolidate and improve division by constant idealizations [v44] In-Reply-To: References: Message-ID: > This patch implements idealisation for unsigned divisions to change a division by a constant into a series of multiplication and shift. I also change the idealisation of `DivI` to get a more efficient series when the magic constant overflows an int32. > > In general, the idea behind a signed division transformation is that for a positive constant `d`, we would need to find constants `c` and `m` so that: > > floor(x / d) = floor(x * c / 2**m) for 0 < x < 2**(N - 1) (1) > ceil(x / d) = floor(x * c / 2**m) + 1 for -2**(N - 1) <= x < 0 (2) > > The implementation in the original book takes into consideration that the machine may not be able to perform the full multiplication `x * c`, so the constant overflow and we need to add back the dividend as in `DivLNode::Ideal` cases. However, for int32 division, `x * c` cannot overflow an int64. As a result, it is always feasible to just calculate the product and shift the result. > > For unsigned multiplication, the situation is somewhat trickier because the condition needs to be twice as strong (the condition (1) and (2) above are mostly the same). This results in the magic constant `c` calculated based on the method presented in Hacker's Delight by Henry S. Warren, Jr. may overflow an uintN. For int division, we can depend on the theorem devised by Arch D. Robison in N-Bit Unsigned Division Via N-Bit Multiply-Add, which states that there exists either: > > c1 in uint32 and m1, such that floor(x / d) = floor(x * c1 / 2**m1) for 0 < x < 2**32 (3) > c2 in uint32 and m2, such that floor(x / d) = floor((x + 1) * c2 / 2**m2) for 0 < x < 2**32 (4) > > which means that either `x * c1` never overflows an uint64 or `(x + 1) * c2` never overflows an uint64. And we can perform a full multiplication. > > For longs, there is no way to do a full multiplication so we do some basic transformations to achieve a computable formula. The details I have written as comments in the overflow case. > > More tests are added to cover the possible patterns. > > Please take a look and have some reviews. Thank you very much. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: update include order and license year ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9947/files - new: https://git.openjdk.org/jdk/pull/9947/files/bba52b74..db80bd4a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9947&range=43 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9947&range=42-43 Stats: 15 lines in 13 files changed: 1 ins; 1 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/9947.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/9947/head:pull/9947 PR: https://git.openjdk.org/jdk/pull/9947 From qamai at openjdk.org Tue Jan 9 17:27:39 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 9 Jan 2024 17:27:39 GMT Subject: RFR: 8282365: Consolidate and improve division by constant idealizations [v42] In-Reply-To: References: Message-ID: <111zTZKk5JMIxSLq08oZKKvd4xYn4ZKhH8VjOxX0BVI=.b3ed64da-03e1-4ebe-8af7-124a152292cd@github.com> On Tue, 9 Jan 2024 15:56:27 GMT, Stefan Karlsson wrote: >> I assume this means that the include order is good as it is now? > > Please update it to: > > #include "precompiled.hpp" > #include "opto/divconstants.hpp" > #include "runtime/os.hpp" > #include "utilities/growableArray.hpp" > #include "unittest.hpp" > > #include Got it, fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1446390806 From epeter at openjdk.org Tue Jan 9 17:35:21 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 9 Jan 2024 17:35:21 GMT Subject: RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp In-Reply-To: References: Message-ID: On Tue, 9 Jan 2024 00:01:04 GMT, Sandhya Viswanathan wrote: > The C2_MacroAssembler::vminmax_fp and evminmax_fp were incorrectly checking for the two USE operand registers to be different and thus resulting in assertion failure. > > In x86_64.ad: > > instruct maxF_reg(legRegF dst, legRegF a, legRegF b, legRegF tmp, legRegF atmp, legRegF btmp) %{ > ... > effect(USE a, USE b, TEMP tmp, TEMP atmp, TEMP btmp); > ... > __ vminmax_fp(Op_MaxV, T_FLOAT, $dst$$XMMRegister, $a$$XMMRegister, $b$$XMMRegister, $tmp$$XMMRegister, $atmp$$XMMRegister, $btmp$$XMMRegister, Assembler::AVX_128bit); > %} > > > Changing the assert in vminmax_fp from: > assert_different_registers(a, b, tmp, atmp, btmp); > to: > assert_different_registers(a, tmp, atmp, btmp); > assert_different_registers(b, tmp, atmp, btmp); > fixes the issue. > > Similar change done in evminmax_fp. > > Please review. > > Best Regards, > Sandhya Also: do you have a regression test for this? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17315#issuecomment-1883492250 From sviswanathan at openjdk.org Tue Jan 9 18:03:37 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 9 Jan 2024 18:03:37 GMT Subject: RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp [v2] In-Reply-To: References: Message-ID: <1MyRVb7ND-RQj3XrqDQfJt7KGQ5tCAjRdRjUDVfkqOM=.db88d703-26b5-495b-9bb9-5acebce915e1@github.com> On Tue, 9 Jan 2024 02:25:15 GMT, Vladimir Kozlov wrote: >> Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: >> >> review comments > > Should we "short cut" code when registers are the same? @vnkozlov @jatin-bhateja Your review comments are addressed, please take a look. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17315#issuecomment-1883534160 From sviswanathan at openjdk.org Tue Jan 9 18:03:35 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 9 Jan 2024 18:03:35 GMT Subject: RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp [v2] In-Reply-To: References: Message-ID: > The C2_MacroAssembler::vminmax_fp and evminmax_fp were incorrectly checking for the two USE operand registers to be different and thus resulting in assertion failure. > > In x86_64.ad: > > instruct maxF_reg(legRegF dst, legRegF a, legRegF b, legRegF tmp, legRegF atmp, legRegF btmp) %{ > ... > effect(USE a, USE b, TEMP tmp, TEMP atmp, TEMP btmp); > ... > __ vminmax_fp(Op_MaxV, T_FLOAT, $dst$$XMMRegister, $a$$XMMRegister, $b$$XMMRegister, $tmp$$XMMRegister, $atmp$$XMMRegister, $btmp$$XMMRegister, Assembler::AVX_128bit); > %} > > > Changing the assert in vminmax_fp from: > assert_different_registers(a, b, tmp, atmp, btmp); > to: > assert_different_registers(a, tmp, atmp, btmp); > assert_different_registers(b, tmp, atmp, btmp); > fixes the issue. > > Similar change done in evminmax_fp. > > Please review. > > Best Regards, > Sandhya Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17315/files - new: https://git.openjdk.org/jdk/pull/17315/files/aee22d07..55c6e32e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17315&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17315&range=00-01 Stats: 23 lines in 3 files changed: 23 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17315.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17315/head:pull/17315 PR: https://git.openjdk.org/jdk/pull/17315 From epeter at openjdk.org Tue Jan 9 18:06:23 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 9 Jan 2024 18:06:23 GMT Subject: RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp [v2] In-Reply-To: <1MyRVb7ND-RQj3XrqDQfJt7KGQ5tCAjRdRjUDVfkqOM=.db88d703-26b5-495b-9bb9-5acebce915e1@github.com> References: <1MyRVb7ND-RQj3XrqDQfJt7KGQ5tCAjRdRjUDVfkqOM=.db88d703-26b5-495b-9bb9-5acebce915e1@github.com> Message-ID: <7i0OLV4-N2s3xtJ1cD0zcwFcqQQKWqfQ4ZlxKVaPcJA=.2dbe8082-120d-47d6-a131-656780d9b3e2@github.com> On Tue, 9 Jan 2024 18:00:59 GMT, Sandhya Viswanathan wrote: >> Should we "short cut" code when registers are the same? > > @vnkozlov @jatin-bhateja Your review comments are addressed, please take a look. @sviswa7 but is the "same address" not an indication of a missing ideal transformation? Hence, the assert may actually be ok, and the root cause be fixed in the ideal transformation. I think this maybe what @jatin-bhateja was suggesting. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17315#issuecomment-1883537937 From sviswanathan at openjdk.org Tue Jan 9 18:14:21 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 9 Jan 2024 18:14:21 GMT Subject: RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp [v2] In-Reply-To: <7i0OLV4-N2s3xtJ1cD0zcwFcqQQKWqfQ4ZlxKVaPcJA=.2dbe8082-120d-47d6-a131-656780d9b3e2@github.com> References: <1MyRVb7ND-RQj3XrqDQfJt7KGQ5tCAjRdRjUDVfkqOM=.db88d703-26b5-495b-9bb9-5acebce915e1@github.com> <7i0OLV4-N2s3xtJ1cD0zcwFcqQQKWqfQ4ZlxKVaPcJA=.2dbe8082-120d-47d6-a131-656780d9b3e2@github.com> Message-ID: On Tue, 9 Jan 2024 18:03:40 GMT, Emanuel Peter wrote: >> @vnkozlov @jatin-bhateja Your review comments are addressed, please take a look. > > @sviswa7 but is the "same address" not an indication of a missing ideal transformation? Hence, the assert may actually be ok, and the root cause be fixed in the ideal transformation. I think this maybe what @jatin-bhateja was suggesting. @eme64 Probably, but my goal here is limited. We have to fix this PR within RDP2 i.e. asap. That is why I kept the changes to minimum. On Vladimir's request I have added a minimum change to handle the case when a and b are same. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17315#issuecomment-1883549079 From epeter at openjdk.org Tue Jan 9 18:17:22 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 9 Jan 2024 18:17:22 GMT Subject: RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp [v2] In-Reply-To: References: <1MyRVb7ND-RQj3XrqDQfJt7KGQ5tCAjRdRjUDVfkqOM=.db88d703-26b5-495b-9bb9-5acebce915e1@github.com> <7i0OLV4-N2s3xtJ1cD0zcwFcqQQKWqfQ4ZlxKVaPcJA=.2dbe8082-120d-47d6-a131-656780d9b3e2@github.com> Message-ID: On Tue, 9 Jan 2024 18:11:18 GMT, Sandhya Viswanathan wrote: >> @sviswa7 but is the "same address" not an indication of a missing ideal transformation? Hence, the assert may actually be ok, and the root cause be fixed in the ideal transformation. I think this maybe what @jatin-bhateja was suggesting. > > @eme64 Probably, but my goal here is limited. We have to fix this PR within RDP2 i.e. asap. That is why I kept the changes to minimum. On Vladimir's request I have added a minimum change to handle the case when a and b are same. @sviswa7 Ok, I understand. But a regression test would still be good. We should just reduce the regression test attached to https://bugs.openjdk.org/browse/JDK-8322090, @TobiHartmann mentioned it on JIRA. I guess we can also file a follow up RFE to improve the Ideal transformations. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17315#issuecomment-1883554067 From sviswanathan at openjdk.org Tue Jan 9 18:17:22 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 9 Jan 2024 18:17:22 GMT Subject: RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp [v2] In-Reply-To: References: <1MyRVb7ND-RQj3XrqDQfJt7KGQ5tCAjRdRjUDVfkqOM=.db88d703-26b5-495b-9bb9-5acebce915e1@github.com> <7i0OLV4-N2s3xtJ1cD0zcwFcqQQKWqfQ4ZlxKVaPcJA=.2dbe8082-120d-47d6-a131-656780d9b3e2@github.com> Message-ID: <0EuE5ubzcizekbWuuI5KVIYG1VE_4mqtN-Paw3Z_UYU=.dddffc40-42c8-418b-b405-e415bd35099a@github.com> On Tue, 9 Jan 2024 18:14:55 GMT, Emanuel Peter wrote: >> @eme64 Probably, but my goal here is limited. We have to fix this PR within RDP2 i.e. asap. That is why I kept the changes to minimum. On Vladimir's request I have added a minimum change to handle the case when a and b are same. > > @sviswa7 Ok, I understand. But a regression test would still be good. We should just reduce the regression test attached to https://bugs.openjdk.org/browse/JDK-8322090, @TobiHartmann mentioned it on JIRA. > > I guess we can also file a follow up RFE to improve the Ideal transformations. @eme64 No, I don't have a regression test for this. I followed the ctw.sh mechanism provided in the bug report by Roland Westrelin to reproduce and verify. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17315#issuecomment-1883554360 From epeter at openjdk.org Tue Jan 9 18:20:24 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 9 Jan 2024 18:20:24 GMT Subject: RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp [v2] In-Reply-To: References: Message-ID: On Tue, 9 Jan 2024 18:03:35 GMT, Sandhya Viswanathan wrote: >> The C2_MacroAssembler::vminmax_fp and evminmax_fp were incorrectly checking for the two USE operand registers to be different and thus resulting in assertion failure. >> >> In x86_64.ad: >> >> instruct maxF_reg(legRegF dst, legRegF a, legRegF b, legRegF tmp, legRegF atmp, legRegF btmp) %{ >> ... >> effect(USE a, USE b, TEMP tmp, TEMP atmp, TEMP btmp); >> ... >> __ vminmax_fp(Op_MaxV, T_FLOAT, $dst$$XMMRegister, $a$$XMMRegister, $b$$XMMRegister, $tmp$$XMMRegister, $atmp$$XMMRegister, $btmp$$XMMRegister, Assembler::AVX_128bit); >> %} >> >> >> Changing the assert in vminmax_fp from: >> assert_different_registers(a, b, tmp, atmp, btmp); >> to: >> assert_different_registers(a, tmp, atmp, btmp); >> assert_different_registers(b, tmp, atmp, btmp); >> fixes the issue. >> >> Similar change done in evminmax_fp. >> >> Please review. >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > review comments https://bugs.openjdk.org/secure/attachment/107681/Test_276.java This is the regression test of the bug that is closed as duplicate of your issue, am I correct? This is the duplicate bug: https://bugs.openjdk.org/browse/JDK-8322090 Fails with: `assert(regs[i] != regs[j]) failed: Multiple uses of register: xmm3` You need to at least verify if this bug is fixed with your patch, otherwise we would need to re-open it, since it would not be a duplicate. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17315#issuecomment-1883557719 From kvn at openjdk.org Tue Jan 9 19:57:23 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 9 Jan 2024 19:57:23 GMT Subject: RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp [v2] In-Reply-To: References: Message-ID: On Tue, 9 Jan 2024 18:03:35 GMT, Sandhya Viswanathan wrote: >> The C2_MacroAssembler::vminmax_fp and evminmax_fp were incorrectly checking for the two USE operand registers to be different and thus resulting in assertion failure. >> >> In x86_64.ad: >> >> instruct maxF_reg(legRegF dst, legRegF a, legRegF b, legRegF tmp, legRegF atmp, legRegF btmp) %{ >> ... >> effect(USE a, USE b, TEMP tmp, TEMP atmp, TEMP btmp); >> ... >> __ vminmax_fp(Op_MaxV, T_FLOAT, $dst$$XMMRegister, $a$$XMMRegister, $b$$XMMRegister, $tmp$$XMMRegister, $atmp$$XMMRegister, $btmp$$XMMRegister, Assembler::AVX_128bit); >> %} >> >> >> Changing the assert in vminmax_fp from: >> assert_different_registers(a, b, tmp, atmp, btmp); >> to: >> assert_different_registers(a, tmp, atmp, btmp); >> assert_different_registers(b, tmp, atmp, btmp); >> fixes the issue. >> >> Similar change done in evminmax_fp. >> >> Please review. >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > review comments Actually Ideal transformation fix could be smaller than these changes. You will not need to change platform specific code. Hmm, may be NaN values could be a problem. Have to check for them as we do in other operations. Even suggested "short cut" (use move) could be wrong for NaN. Okay, lets go to the first version of these changes: only assert fix. And file separate RFE to make changes in Ideal graph. And we need regression test as @eme64 pointed. ------------- PR Review: https://git.openjdk.org/jdk/pull/17315#pullrequestreview-1811885864 From kvn at openjdk.org Tue Jan 9 20:21:21 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 9 Jan 2024 20:21:21 GMT Subject: RFR: 8320237 C2: late inlining of method handle invoke causes duplicate lines in PrintInlining output In-Reply-To: References: Message-ID: On Mon, 18 Dec 2023 15:28:47 GMT, Kangcheng Xu wrote: > This PR resolves [JDK-8320237](https://bugs.openjdk.org/browse/JDK-8320237) > > The original behavior produces both a failure and success message upon late inlining which is confusing. The patch removes the failure message if inlining was successful. Huge thanks to @rwestrel for reporting and working out a solution. > > Unit test `test/hotspot/jtreg/compiler/inlining/TestDuplicatedLateInliningOutput.java` is added and passing. Looks good to me too. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17147#pullrequestreview-1811918664 From duke at openjdk.org Tue Jan 9 21:24:26 2024 From: duke at openjdk.org (Zhiqiang Zang) Date: Tue, 9 Jan 2024 21:24:26 GMT Subject: RFR: 8322589: Add Ideal transformation: (~a) & (~b) => ~(a | b) [v5] In-Reply-To: References: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> Message-ID: On Mon, 8 Jan 2024 06:58:58 GMT, Tobias Hartmann wrote: >> Zhiqiang Zang has updated the pull request incrementally with one additional commit since the last revision: >> >> update the copyright dates. > > Looks good to me otherwise. @TobiHartmann @eme64 Thanks a lot for reviewing and all the comments. Can you sponsor when you get a chance? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16333#issuecomment-1883815046 From sviswanathan at openjdk.org Tue Jan 9 22:00:37 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 9 Jan 2024 22:00:37 GMT Subject: RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp [v3] In-Reply-To: References: Message-ID: <30j6-cR2RH4NxQzduweT7lsy9BaJ-q4OF52MA30N0vo=.557a3a38-7dac-4475-b186-538a45f57d10@github.com> > The C2_MacroAssembler::vminmax_fp and evminmax_fp were incorrectly checking for the two USE operand registers to be different and thus resulting in assertion failure. > > In x86_64.ad: > > instruct maxF_reg(legRegF dst, legRegF a, legRegF b, legRegF tmp, legRegF atmp, legRegF btmp) %{ > ... > effect(USE a, USE b, TEMP tmp, TEMP atmp, TEMP btmp); > ... > __ vminmax_fp(Op_MaxV, T_FLOAT, $dst$$XMMRegister, $a$$XMMRegister, $b$$XMMRegister, $tmp$$XMMRegister, $atmp$$XMMRegister, $btmp$$XMMRegister, Assembler::AVX_128bit); > %} > > > Changing the assert in vminmax_fp from: > assert_different_registers(a, b, tmp, atmp, btmp); > to: > assert_different_registers(a, tmp, atmp, btmp); > assert_different_registers(b, tmp, atmp, btmp); > fixes the issue. > > Similar change done in evminmax_fp. > > Please review. > > Best Regards, > Sandhya Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: Retain only asserts ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17315/files - new: https://git.openjdk.org/jdk/pull/17315/files/55c6e32e..c5dac9b5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17315&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17315&range=01-02 Stats: 23 lines in 3 files changed: 0 ins; 23 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17315.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17315/head:pull/17315 PR: https://git.openjdk.org/jdk/pull/17315 From sviswanathan at openjdk.org Tue Jan 9 22:36:23 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 9 Jan 2024 22:36:23 GMT Subject: RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp [v3] In-Reply-To: <30j6-cR2RH4NxQzduweT7lsy9BaJ-q4OF52MA30N0vo=.557a3a38-7dac-4475-b186-538a45f57d10@github.com> References: <30j6-cR2RH4NxQzduweT7lsy9BaJ-q4OF52MA30N0vo=.557a3a38-7dac-4475-b186-538a45f57d10@github.com> Message-ID: On Tue, 9 Jan 2024 22:00:37 GMT, Sandhya Viswanathan wrote: >> The C2_MacroAssembler::vminmax_fp and evminmax_fp were incorrectly checking for the two USE operand registers to be different and thus resulting in assertion failure. >> >> In x86_64.ad: >> >> instruct maxF_reg(legRegF dst, legRegF a, legRegF b, legRegF tmp, legRegF atmp, legRegF btmp) %{ >> ... >> effect(USE a, USE b, TEMP tmp, TEMP atmp, TEMP btmp); >> ... >> __ vminmax_fp(Op_MaxV, T_FLOAT, $dst$$XMMRegister, $a$$XMMRegister, $b$$XMMRegister, $tmp$$XMMRegister, $atmp$$XMMRegister, $btmp$$XMMRegister, Assembler::AVX_128bit); >> %} >> >> >> Changing the assert in vminmax_fp from: >> assert_different_registers(a, b, tmp, atmp, btmp); >> to: >> assert_different_registers(a, tmp, atmp, btmp); >> assert_different_registers(b, tmp, atmp, btmp); >> fixes the issue. >> >> Similar change done in evminmax_fp. >> >> Please review. >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > Retain only asserts RFE filed: https://bugs.openjdk.org/browse/JDK-8323429 ------------- PR Comment: https://git.openjdk.org/jdk/pull/17315#issuecomment-1883906977 From kbarrett at openjdk.org Tue Jan 9 22:36:50 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 9 Jan 2024 22:36:50 GMT Subject: RFR: 8323115: x86-32: Incorrect predicates for cmov instruct transforms with UseSSE [v2] In-Reply-To: <7BcH08lab4xxFOxm0UAfhHKVyBBEhRWtIPEUJuOIlmo=.2c5b0918-19ac-4cb3-b2f5-11131ce73422@github.com> References: <7BcH08lab4xxFOxm0UAfhHKVyBBEhRWtIPEUJuOIlmo=.2c5b0918-19ac-4cb3-b2f5-11131ce73422@github.com> Message-ID: > Please review this change that fixes generation of CMOV by C2 as controlled by > UseSSE. The predicates controlling that generation were using implicit > operator precedence that didn't have the expected grouping. Fixed by adding > parentheses to make the desired grouping explicit. > > Testing: Ran GHA with -Wparentheses enabled along with this and other changes > needed to make that work. Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' into x86-32-cmov-preds - fix predicates for cmov with UseSSE ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17296/files - new: https://git.openjdk.org/jdk/pull/17296/files/f2c5ba0d..6f49985d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17296&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17296&range=00-01 Stats: 13013 lines in 180 files changed: 10072 ins; 1583 del; 1358 mod Patch: https://git.openjdk.org/jdk/pull/17296.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17296/head:pull/17296 PR: https://git.openjdk.org/jdk/pull/17296 From kbarrett at openjdk.org Tue Jan 9 22:36:51 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 9 Jan 2024 22:36:51 GMT Subject: RFR: 8323115: x86-32: Incorrect predicates for cmov instruct transforms with UseSSE [v2] In-Reply-To: References: <7BcH08lab4xxFOxm0UAfhHKVyBBEhRWtIPEUJuOIlmo=.2c5b0918-19ac-4cb3-b2f5-11131ce73422@github.com> Message-ID: On Mon, 8 Jan 2024 12:11:11 GMT, Aleksey Shipilev wrote: >> The fix looks good to me but it's concerning that we never hit this in testing. Maybe it never fails because the expanded instructions are guarded by `predicate (UseSSE ...` as well. > >> The fix looks good to me but it's concerning that we never hit this in testing. Maybe it never fails because the expanded instructions are guarded by `predicate (UseSSE ...` as well. > > In my experience fixing bugs in these FPU-related match rules is that it takes a combination of code shape and relevant hardware (that defaults for unusual `UseSSE <= 2`), or specific testing that runs with lower `UseSSE`. I think I was one of the few remaining people who ran x86_32 with `-XX:UseSSE=0`, for example, but finally stopped. I think going forward we would just need to require `UseSSE >= 2` for x86_32, like for x86_64, making these issues go away. Thanks for reviews, @shipilev and @TobiHartmann . ------------- PR Comment: https://git.openjdk.org/jdk/pull/17296#issuecomment-1883903590 From kbarrett at openjdk.org Tue Jan 9 22:36:52 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 9 Jan 2024 22:36:52 GMT Subject: Integrated: 8323115: x86-32: Incorrect predicates for cmov instruct transforms with UseSSE In-Reply-To: <7BcH08lab4xxFOxm0UAfhHKVyBBEhRWtIPEUJuOIlmo=.2c5b0918-19ac-4cb3-b2f5-11131ce73422@github.com> References: <7BcH08lab4xxFOxm0UAfhHKVyBBEhRWtIPEUJuOIlmo=.2c5b0918-19ac-4cb3-b2f5-11131ce73422@github.com> Message-ID: On Mon, 8 Jan 2024 07:55:21 GMT, Kim Barrett wrote: > Please review this change that fixes generation of CMOV by C2 as controlled by > UseSSE. The predicates controlling that generation were using implicit > operator precedence that didn't have the expected grouping. Fixed by adding > parentheses to make the desired grouping explicit. > > Testing: Ran GHA with -Wparentheses enabled along with this and other changes > needed to make that work. This pull request has now been integrated. Changeset: 28d8149c Author: Kim Barrett URL: https://git.openjdk.org/jdk/commit/28d8149c693a9470bbde4b1a27c4b9be6c5f365c Stats: 13 lines in 1 file changed: 0 ins; 0 del; 13 mod 8323115: x86-32: Incorrect predicates for cmov instruct transforms with UseSSE Reviewed-by: shade, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/17296 From kbarrett at openjdk.org Tue Jan 9 22:52:25 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 9 Jan 2024 22:52:25 GMT Subject: RFR: 8322816: RISC-V: Incorrect guarantee in patch_vtype [v2] In-Reply-To: References: Message-ID: On Mon, 8 Jan 2024 01:05:39 GMT, Kim Barrett wrote: >> Please review this change that fixes a test for a guarantee. This also >> removes a -Wparentheses warning when those are enabled (which is how the >> problem was discovered). >> >> The problem is that operator precedence groups the sub-expressions differently >> than intended. The fix is to override the operator precedence by adding >> parentheses to achieve the intended grouping. >> >> Testing: Local (linux-x64) cross-build for linux-riscv with this change plus >> -Wparentheses enabled and other changes to allow that to work. >> >> Requesting someone from the riscv porters to properly test this. > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > guarantee !vill Waiting for second review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17215#issuecomment-1883922445 From sviswanathan at openjdk.org Tue Jan 9 23:03:35 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 9 Jan 2024 23:03:35 GMT Subject: RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp [v4] In-Reply-To: References: Message-ID: > The C2_MacroAssembler::vminmax_fp and evminmax_fp were incorrectly checking for the two USE operand registers to be different and thus resulting in assertion failure. > > In x86_64.ad: > > instruct maxF_reg(legRegF dst, legRegF a, legRegF b, legRegF tmp, legRegF atmp, legRegF btmp) %{ > ... > effect(USE a, USE b, TEMP tmp, TEMP atmp, TEMP btmp); > ... > __ vminmax_fp(Op_MaxV, T_FLOAT, $dst$$XMMRegister, $a$$XMMRegister, $b$$XMMRegister, $tmp$$XMMRegister, $atmp$$XMMRegister, $btmp$$XMMRegister, Assembler::AVX_128bit); > %} > > > Changing the assert in vminmax_fp from: > assert_different_registers(a, b, tmp, atmp, btmp); > to: > assert_different_registers(a, tmp, atmp, btmp); > assert_different_registers(b, tmp, atmp, btmp); > fixes the issue. > > Similar change done in evminmax_fp. > > Please review. > > Best Regards, > Sandhya Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: add test case from vpaprotsk ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17315/files - new: https://git.openjdk.org/jdk/pull/17315/files/c5dac9b5..43462531 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17315&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17315&range=02-03 Stats: 11 lines in 1 file changed: 11 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17315.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17315/head:pull/17315 PR: https://git.openjdk.org/jdk/pull/17315 From luhenry at openjdk.org Tue Jan 9 23:12:24 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Tue, 9 Jan 2024 23:12:24 GMT Subject: RFR: 8322816: RISC-V: Incorrect guarantee in patch_vtype [v2] In-Reply-To: References: Message-ID: On Mon, 8 Jan 2024 01:05:39 GMT, Kim Barrett wrote: >> Please review this change that fixes a test for a guarantee. This also >> removes a -Wparentheses warning when those are enabled (which is how the >> problem was discovered). >> >> The problem is that operator precedence groups the sub-expressions differently >> than intended. The fix is to override the operator precedence by adding >> parentheses to achieve the intended grouping. >> >> Testing: Local (linux-x64) cross-build for linux-riscv with this change plus >> -Wparentheses enabled and other changes to allow that to work. >> >> Requesting someone from the riscv porters to properly test this. > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > guarantee !vill Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/17215#pullrequestreview-1812165209 From sviswanathan at openjdk.org Tue Jan 9 23:23:23 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 9 Jan 2024 23:23:23 GMT Subject: RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp [v2] In-Reply-To: References: Message-ID: On Tue, 9 Jan 2024 19:55:03 GMT, Vladimir Kozlov wrote: >> Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: >> >> review comments > > Actually Ideal transformation fix could be smaller than these changes. You will not need to change platform specific code. Hmm, may be NaN values could be a problem. Have to check for them as we do in other operations. Even suggested "short cut" (use move) could be wrong for NaN. > > Okay, lets go to the first version of these changes: only assert fix. And file separate RFE to make changes in Ideal graph. > > And we need regression test as @eme64 pointed. @vnkozlov I have reverted the changes to just asserts and added a test case to the existing test. The new test case fails without this PR and passes with the PR changes. @eme64 I have verified that [Test_276.java](https://bugs.openjdk.org/secure/attachment/107681/Test_276.java) fails without this PR with the given arguments in the [JBS Bug Entry](https://bugs.openjdk.org/browse/JDK-8322090) and passes with the PR changes. I have filed an [RFE](https://bugs.openjdk.org/browse/JDK-8323429) for future optimization as requested. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17315#issuecomment-1883949886 From kvn at openjdk.org Tue Jan 9 23:36:26 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 9 Jan 2024 23:36:26 GMT Subject: RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp [v4] In-Reply-To: References: Message-ID: On Tue, 9 Jan 2024 23:03:35 GMT, Sandhya Viswanathan wrote: >> The C2_MacroAssembler::vminmax_fp and evminmax_fp were incorrectly checking for the two USE operand registers to be different and thus resulting in assertion failure. >> >> In x86_64.ad: >> >> instruct maxF_reg(legRegF dst, legRegF a, legRegF b, legRegF tmp, legRegF atmp, legRegF btmp) %{ >> ... >> effect(USE a, USE b, TEMP tmp, TEMP atmp, TEMP btmp); >> ... >> __ vminmax_fp(Op_MaxV, T_FLOAT, $dst$$XMMRegister, $a$$XMMRegister, $b$$XMMRegister, $tmp$$XMMRegister, $atmp$$XMMRegister, $btmp$$XMMRegister, Assembler::AVX_128bit); >> %} >> >> >> Changing the assert in vminmax_fp from: >> assert_different_registers(a, b, tmp, atmp, btmp); >> to: >> assert_different_registers(a, tmp, atmp, btmp); >> assert_different_registers(b, tmp, atmp, btmp); >> fixes the issue. >> >> Similar change done in evminmax_fp. >> >> Please review. >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > add test case from vpaprotsk Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17315#pullrequestreview-1812183690 From kvn at openjdk.org Tue Jan 9 23:36:40 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 9 Jan 2024 23:36:40 GMT Subject: RFR: 8322743: assert(held_monitor_count() == jni_monitor_count()) failed Message-ID: Corner case with a local (not escaped) object used for synchronization. C2 Escape Analysis thinks that it can eliminate locks for it. In most cases it is true but not in this case. for (int i = 0; i < 2; ++i) { Object o = new Object(); synchronized (o) { // monitorenter // Trigger OSR compilation for (int j = 0; j < 100_000; ++j) { The test has nested loop which trigger OSR compilation. The locked object comes from Interpreter into compiled OSR code. During parsing C2 creates an other non escaped object and correctly merge both together (with Phi node) so that non escaped object is not scalar replaceable. Because it does not globally escapes EA still removes locks for it and, as result, also for merged locked object from Interpreter which is the bug. The fix is to check that synchronized block does not have any associated escaped objects when EA decides if locks can be eliminated. Added regression test prepared by @TobiHartmann. Tested tier1-5, xcomp and stress. Performance testing show no difference. ------------- Commit messages: - Fix trailing and other spaces. - 8322743: assert(held_monitor_count() == jni_monitor_count()) failed Changes: https://git.openjdk.org/jdk/pull/17331/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17331&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8322743 Stats: 132 lines in 6 files changed: 115 ins; 2 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/17331.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17331/head:pull/17331 PR: https://git.openjdk.org/jdk/pull/17331 From sviswanathan at openjdk.org Tue Jan 9 23:46:45 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 9 Jan 2024 23:46:45 GMT Subject: RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp [v5] In-Reply-To: References: Message-ID: <-rzXk3WJ9aOgrgoyIVDpCeiQLBkq5N5yQCYp_42oMEo=.df74c3a7-508d-48a1-baa2-796a2015bea1@github.com> > The C2_MacroAssembler::vminmax_fp and evminmax_fp were incorrectly checking for the two USE operand registers to be different and thus resulting in assertion failure. > > In x86_64.ad: > > instruct maxF_reg(legRegF dst, legRegF a, legRegF b, legRegF tmp, legRegF atmp, legRegF btmp) %{ > ... > effect(USE a, USE b, TEMP tmp, TEMP atmp, TEMP btmp); > ... > __ vminmax_fp(Op_MaxV, T_FLOAT, $dst$$XMMRegister, $a$$XMMRegister, $b$$XMMRegister, $tmp$$XMMRegister, $atmp$$XMMRegister, $btmp$$XMMRegister, Assembler::AVX_128bit); > %} > > > Changing the assert in vminmax_fp from: > assert_different_registers(a, b, tmp, atmp, btmp); > to: > assert_different_registers(a, tmp, atmp, btmp); > assert_different_registers(b, tmp, atmp, btmp); > fixes the issue. > > Similar change done in evminmax_fp. > > Please review. > > Best Regards, > Sandhya Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: copyright year update ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17315/files - new: https://git.openjdk.org/jdk/pull/17315/files/43462531..05f8cf81 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17315&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17315&range=03-04 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17315.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17315/head:pull/17315 PR: https://git.openjdk.org/jdk/pull/17315 From sviswanathan at openjdk.org Tue Jan 9 23:46:47 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 9 Jan 2024 23:46:47 GMT Subject: RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp [v4] In-Reply-To: References: Message-ID: On Tue, 9 Jan 2024 23:33:39 GMT, Vladimir Kozlov wrote: >> Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: >> >> add test case from vpaprotsk > > Looks good. Thanks a lot @vnkozlov. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17315#issuecomment-1883967275 From dean.long at oracle.com Wed Jan 10 00:11:10 2024 From: dean.long at oracle.com (dean.long at oracle.com) Date: Tue, 9 Jan 2024 16:11:10 -0800 Subject: discuss about release barrier for final fields initialization In-Reply-To: <5e41f867-b31b-4ded-b737-8d0c869c8895.kuaiwei.kw@alibaba-inc.com> References: <5e41f867-b31b-4ded-b737-8d0c869c8895.kuaiwei.kw@alibaba-inc.com> Message-ID: <581217fe-bb9b-4cbb-865b-316559ad8646@oracle.com> We only have https://bugs.openjdk.org/browse/JDK-8300148 for that. thanks, dl On 1/8/24 10:23 PM, Kuai Wei wrote: > > Hi, > > ? I made some experiments on object allocation performance. And I > found on aarch64 N1, if object has final field, the allocation rate is > about 75% of normal allocation. > The cause is C2 will insert a release membar in , which will be > translated as "dmb.ish" in aarch64. For normal allocation, a membar > storestore is inserted and > is emitted as "dmb.ishst", it make the difference. The test jmh is > https://gist.github.com/kuaiwei/f71fba40df29991c93325a8600e34c13 > > java -jar target/benchmarks.jar -f 1 -wi 5 -w 3 -i 3 -r 3 testAlloc > ... > > Benchmark ? ? ? Mode? Cnt ? ? Score? ? Error? Units > AllocFinal.testAlloc ?thrpt? ? 3? 1167.903 ? 44.973? ops/s > AllocFinal.testAllocWithFinal? thrpt? ? 3 915.330 ? 52.596? ops/s > > > ? I found only C2 will insert release membar and C1 just insert > storestore for both final and normal allocation. In Doug Lea's > cookbook https://gee.cs.oswego.edu/dl/jmm/cookbook.html > Only storesotre is required. Alex has a great post on this topic > https://shipilev.net/blog/2014/all-fields-are-final/?. It referred?a > case why loadstore is needed. > https://www.hboehm.info/c++mm/no_write_fences.html > I checked this case and IMO it looks some legacy architecture may > break data dependency and cause problem. As I know, alpha architecture > is an example. I think it doesn't > break on modern architecture. Is there other case I missed? > > ? If storestore is enough in this situation, I will send a PR to loose > the barrier. > > Thanks, > Kuai Wei -------------- next part -------------- An HTML attachment was scrubbed... URL: From kbarrett at openjdk.org Wed Jan 10 00:15:37 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 10 Jan 2024 00:15:37 GMT Subject: RFR: 8322816: RISC-V: Incorrect guarantee in patch_vtype [v3] In-Reply-To: References: Message-ID: > Please review this change that fixes a test for a guarantee. This also > removes a -Wparentheses warning when those are enabled (which is how the > problem was discovered). > > The problem is that operator precedence groups the sub-expressions differently > than intended. The fix is to override the operator precedence by adding > parentheses to achieve the intended grouping. > > Testing: Local (linux-x64) cross-build for linux-riscv with this change plus > -Wparentheses enabled and other changes to allow that to work. > > Requesting someone from the riscv porters to properly test this. Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into riscv-paren-bug - guarantee !vill - fix subexpression grouping in patch_vtype guarantee ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17215/files - new: https://git.openjdk.org/jdk/pull/17215/files/ab335602..22fc7a2b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17215&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17215&range=01-02 Stats: 18102 lines in 593 files changed: 12845 ins; 2510 del; 2747 mod Patch: https://git.openjdk.org/jdk/pull/17215.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17215/head:pull/17215 PR: https://git.openjdk.org/jdk/pull/17215 From kbarrett at openjdk.org Wed Jan 10 00:15:37 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 10 Jan 2024 00:15:37 GMT Subject: RFR: 8322816: RISC-V: Incorrect guarantee in patch_vtype [v3] In-Reply-To: References: Message-ID: On Mon, 8 Jan 2024 02:06:05 GMT, Fei Yang wrote: >> Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Merge branch 'master' into riscv-paren-bug >> - guarantee !vill >> - fix subexpression grouping in patch_vtype guarantee > > Marked as reviewed by fyang (Reviewer). Thanks for reviews, @RealFYang and @luhenry . ------------- PR Comment: https://git.openjdk.org/jdk/pull/17215#issuecomment-1883991703 From kbarrett at openjdk.org Wed Jan 10 00:21:30 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 10 Jan 2024 00:21:30 GMT Subject: Integrated: 8322816: RISC-V: Incorrect guarantee in patch_vtype In-Reply-To: References: Message-ID: On Tue, 2 Jan 2024 07:23:56 GMT, Kim Barrett wrote: > Please review this change that fixes a test for a guarantee. This also > removes a -Wparentheses warning when those are enabled (which is how the > problem was discovered). > > The problem is that operator precedence groups the sub-expressions differently > than intended. The fix is to override the operator precedence by adding > parentheses to achieve the intended grouping. > > Testing: Local (linux-x64) cross-build for linux-riscv with this change plus > -Wparentheses enabled and other changes to allow that to work. > > Requesting someone from the riscv porters to properly test this. This pull request has now been integrated. Changeset: f4ca41ad Author: Kim Barrett URL: https://git.openjdk.org/jdk/commit/f4ca41ad75fa78a08ff069ba0b6ac3596e35c23d Stats: 5 lines in 1 file changed: 0 ins; 2 del; 3 mod 8322816: RISC-V: Incorrect guarantee in patch_vtype Reviewed-by: fyang, luhenry ------------- PR: https://git.openjdk.org/jdk/pull/17215 From vlivanov at openjdk.org Wed Jan 10 01:00:33 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 10 Jan 2024 01:00:33 GMT Subject: RFR: 8323115: x86-32: Incorrect predicates for cmov instruct transforms with UseSSE [v2] In-Reply-To: References: <7BcH08lab4xxFOxm0UAfhHKVyBBEhRWtIPEUJuOIlmo=.2c5b0918-19ac-4cb3-b2f5-11131ce73422@github.com> Message-ID: On Mon, 8 Jan 2024 12:11:11 GMT, Aleksey Shipilev wrote: > The fix looks good to me but it's concerning that we never hit this in testing. @TobiHartmann it looks more like the bug is benign since the predicates are effectively redundant. The AD instructions have different operands (`regFPR`/`regDPR` vs `regF`/`regD` which also have `UseSSE` predicates) , so they don't conflict at runtime. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17296#issuecomment-1884026389 From cslucas at openjdk.org Wed Jan 10 01:29:40 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 10 Jan 2024 01:29:40 GMT Subject: RFR: JDK-8323190: Segfault during deoptimization of C2-compiled code Message-ID: Currently, if `ReduceAllocationMerges` reduces an allocation merge that is used as a monitor C2 will SIGFAULT in `Process_OopMap_Node` because it's missing code to handle that case. This patch fixes C2 to properly handle reduced allocation merges that are used as monitors. Tested with Linux x86_64 hotspot_all. ------------- Commit messages: - Fix invalid location. Changes: https://git.openjdk.org/jdk/pull/17333/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17333&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8323190 Stats: 88 lines in 2 files changed: 88 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17333.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17333/head:pull/17333 PR: https://git.openjdk.org/jdk/pull/17333 From duke at openjdk.org Wed Jan 10 02:09:35 2024 From: duke at openjdk.org (Yude Lin) Date: Wed, 10 Jan 2024 02:09:35 GMT Subject: RFR: 8323122: AArch64: Increase itable stub size estimate Message-ID: Since [JDK-8307352](https://bugs.openjdk.org/browse/JDK-8307352), itable stub size has grown by 20 bytes on linux-aarch64. In particular, the "slop-counted" code increases from 100->120 bytes, where the current estimate is 124 bytes. I haven't found a case where it exceeds the estimate. For now this size is stable across the few linux-aarch64 configurations I ran with. It doesn't vary (for example) by different klass decoding schemes. But I think the idea of the estimate is that we can never know. I propose we increase the estimate to be safe. Passed hotspot/jtreg/:tier1 ------------- Commit messages: - 8323122: AArch64: Increase itable stub size estimate Changes: https://git.openjdk.org/jdk/pull/17336/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17336&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8323122 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17336.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17336/head:pull/17336 PR: https://git.openjdk.org/jdk/pull/17336 From dlong at openjdk.org Wed Jan 10 02:10:22 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 10 Jan 2024 02:10:22 GMT Subject: RFR: 8322743: assert(held_monitor_count() == jni_monitor_count()) failed In-Reply-To: References: Message-ID: On Tue, 9 Jan 2024 22:57:27 GMT, Vladimir Kozlov wrote: > Corner case with a local (not escaped) object used for synchronization. C2 Escape Analysis thinks that it can eliminate locks for it. In most cases it is true but not in this case. > > > for (int i = 0; i < 2; ++i) { > Object o = new Object(); > synchronized (o) { // monitorenter > // Trigger OSR compilation > for (int j = 0; j < 100_000; ++j) { > > The test has nested loop which trigger OSR compilation. The locked object comes from Interpreter into compiled OSR code. During parsing C2 creates an other non escaped object and correctly merge both together (with Phi node) so that non escaped object is not scalar replaceable. Because it does not globally escapes EA still removes locks for it and, as result, also for merged locked object from Interpreter which is the bug. > > The fix is to check that synchronized block does not have any associated escaped objects when EA decides if locks can be eliminated. > > Added regression test prepared by @TobiHartmann. Tested tier1-5, xcomp and stress. > Performance testing show no difference. I'm wondering if there is a simpler solution. What if in `Parse::load_interpreter_state` we maark the lock objects from the interpreter as global escape? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17331#issuecomment-1884076429 From fyang at openjdk.org Wed Jan 10 06:11:24 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 10 Jan 2024 06:11:24 GMT Subject: RFR: 8317721: RISC-V: Implement CRC32 intrinsic [v4] In-Reply-To: References: Message-ID: On Tue, 9 Jan 2024 16:47:08 GMT, ArsenyBochkarev wrote: >> Hi everyone! Please review this port of [AArch64](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L4224) `_updateBytesCRC32`, `_updateByteBufferCRC32` and `_updateCRC32` intrinsics. This patch introduces only the plain (non-vectorized, no Zbc) version. >> >> ### Correctness checks >> >> Tier 1/2 tests are ok. >> >> ### Performance results on T-Head board >> >> #### Results for enabled intrinsic: >> >> Used test is `test/micro/org/openjdk/bench/java/util/TestCRC32.java` >> >> | Benchmark | (count) | Mode | Cnt | Score | Error | Units | >> | --- | ---- | ----- | --- | ---- | --- | ---- | >> | CRC32.TestCRC32.testCRC32Update | 64 | thrpt | 24 | 3730.929 | 37.773 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 128 | thrpt | 24 | 2126.673 | 2.032 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 256 | thrpt | 24 | 1134.330 | 6.714 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 512 | thrpt | 24 | 584.017 | 2.267 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 2048 | thrpt | 24 | 151.173 | 0.346 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 16384 | thrpt | 24 | 19.113 | 0.008 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 65536 | thrpt | 24 | 4.647 | 0.022 | ops/ms | >> >> #### Results for disabled intrinsic: >> >> | Benchmark | (count) | Mode | Cnt | Score | Error | Units | >> | --------------------------------------------------- | ---------- | --------- | ---- | ----------- | --------- | ---------- | >> | CRC32.TestCRC32.testCRC32Update | 64 | thrpt | 15 | 798.365 | 35.486 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 128 | thrpt | 15 | 677.756 | 46.619 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 256 | thrpt | 15 | 552.781 | 27.143 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 512 | thrpt | 15 | 429.304 | 12.518 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 2048 | thrpt | 15 | 166.738 | 0.935 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 16384 | thrpt | 15 | 25.060 | 0.034 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 65536 | thrpt | 15 | 6.196 | 0.030 | ops/ms | > > ArsenyBochkarev has updated the pull request incrementally with four additional commits since the last revision: > > - Fix unroll size > - Rename constants > - Partially unroll loop > - Optimize loop counter in L_by16_loop Hi, do we have performance numbers on other hardware platforms like unmatched? Thanks. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4655: > 4653: const Register table3 = c_rarg6; > 4654: > 4655: const Register tmp1 = t0; As previously discussed elsewhere, it is error-prone to create aliases for scratch registers like `t0` and pass as parameters to other assember functions. It will be safer if we use `t0` directly in `kernel_crc32` and remove the `tmp` formal parameter of `kernel_crc32`. ------------- Changes requested by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17046#pullrequestreview-1812473851 PR Review Comment: https://git.openjdk.org/jdk/pull/17046#discussion_r1446923292 From chagedorn at openjdk.org Wed Jan 10 07:20:31 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 10 Jan 2024 07:20:31 GMT Subject: RFR: 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 [v14] In-Reply-To: References: Message-ID: On Tue, 9 Jan 2024 17:02:48 GMT, Roland Westrelin wrote: >> Range check smearing and range check predication make an array access >> dependent on 2 (or more in the case of RC smearing) conditions. As a >> consequence, if a range check can be eliminated because there's an >> identical dominating range check, the control dependent nodes that >> could float and become dependent on the dominating range check cannot >> be allowed to float because there's a risk that they would then bypass >> one of the checks that make the access legal. >> >> `IfNode::dominated_by()` and `PhaseIdealLoop::dominated_by()` have >> logic to prevent this: nodes that are control dependent on a range >> check or predicate are not allowed to float. This is however not >> sufficient as demonstrated by the test cases. >> >> In `TestArrayAccessAboveRCAfterSmearingOrPredication.testRangeCheckSmearing()`: >> >> >> v += array[i]; >> if (flag2) { >> if (flag3) { >> field = 0x42; >> } >> } >> if (flagField == 1) { >> v += array[i]; >> } >> >> >> The range check for the second `array[i]` load is replaced by the >> dominating range check for the first `array[i]` but because the second >> `array[i]` load could really be dependent on multiple range checks (in >> case smearing happened which is not the case here), c2 doesn't allow >> the second `array[i]` to float when the second range check is >> removed. The second `array[i]` is then control dependent on: >> >> >> if (flagField == 1) { >> >> >> which is next found to be dominated by the same test: >> >> >> if (flag == 1) { >> >> >> and is removed. However nothing in `dominated_by()` treats node >> dependent on tests that are not range check or predicates >> specially. So the second `array[i]` is allowed to float and become >> dependent on: >> >> >> if (flag == 1) { >> >> >> which is above the range check for that access. The test method in its >> last invocation is passed an index for the array access that's widely >> out of range. The array load happens before the range check and >> crashes the VM. `testLoopPredication()` is a similar test where array >> loads become dependent on predicates and end up above range checks. >> >> `TestArrayAccessCastIIAboveRC.java` is the test case from the bug >> where for similar reasons a range check `CastII` ends up above its >> range check, becomes top because its input becomes some integer that >> conflicts with its... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Thanks for the updates, looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16886#pullrequestreview-1812551628 From thartmann at openjdk.org Wed Jan 10 07:33:31 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 10 Jan 2024 07:33:31 GMT Subject: RFR: 8322077: Add Ideal transformation: (~a) | (~b) => ~(a & b) [v12] In-Reply-To: References: Message-ID: <8yZZsbOchNFkdPEmTKwxVZ_j_XjzWHV42j32ZXG9fAU=.e1899965-da0a-422c-8898-858c81ecb96b@github.com> On Tue, 9 Jan 2024 17:23:58 GMT, Zhiqiang Zang wrote: >> Hello, >> >> (~a) | (~b) => ~(a & b) is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1617C28-L1617C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. >> >> Thanks. > > Zhiqiang Zang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 18 additional commits since the last revision: > > - update copyright dates. > - Merge branch 'andnode-PNewDeMorganLawAndToOr' into ornode-PNewDeMorganLawOrToAnd > - Revert "adapt changes from the dependent pr." > > This reverts commit 0c8d10776fe55a28ea9f60160b4f6f57aabad5ab. > - Revert "adapt to new changes from the dependant pr." > > This reverts commit b21e242b306d9cf01e1ba84ebba47ef5b471d98e. > - adapt to new changes from the dependant pr. > - Merge branch 'andnode-PNewDeMorganLawAndToOr' into ornode-PNewDeMorganLawOrToAnd > - adapt changes from the dependent pr. > - Merge branch 'andnode-PNewDeMorganLawAndToOr' into ornode-PNewDeMorganLawOrToAnd > - Update test/hotspot/jtreg/compiler/c2/irTests/DeMorganLawLongTests.java > > Co-authored-by: Tobias Hartmann > - Update test/hotspot/jtreg/compiler/c2/irTests/DeMorganLawIntTests.java > > Co-authored-by: Tobias Hartmann > - ... and 8 more: https://git.openjdk.org/jdk/compare/f2cd45d6...dc60a548 Marked as reviewed by thartmann (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16334#pullrequestreview-1812568240 From thartmann at openjdk.org Wed Jan 10 07:34:39 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 10 Jan 2024 07:34:39 GMT Subject: RFR: 8322589: Add Ideal transformation: (~a) & (~b) => ~(a | b) [v11] In-Reply-To: References: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> Message-ID: On Tue, 9 Jan 2024 16:56:50 GMT, Zhiqiang Zang wrote: >> Hello, >> >> `(~a) & (~b) => ~(a | b)` is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1616C28-L1616C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. >> >> Thanks. > > Zhiqiang Zang has updated the pull request incrementally with three additional commits since the last revision: > > - Revert "move the two helper functions to member functions of the node class." > > This reverts commit 7a962d69ac687c0476e54cd004037f2ebb0800ba. > - Revert "update copyright dates." > > This reverts commit 3665de2f3d38b4fce3ba968ec6247dd2576ecc32. > - Revert "move make_not back to AddNode because it cannot compile for architecture arm, s390x, ppc64le." > > This reverts commit 4ee8b089b6f119b1692dcd2c512d249d92734697. Thanks, looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16333#pullrequestreview-1812567403 From duke at openjdk.org Wed Jan 10 07:34:42 2024 From: duke at openjdk.org (Zhiqiang Zang) Date: Wed, 10 Jan 2024 07:34:42 GMT Subject: Integrated: 8322589: Add Ideal transformation: (~a) & (~b) => ~(a | b) In-Reply-To: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> References: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> Message-ID: <-TukAXeNv1T-WEIfmKX8CfgtWfhnHflXws1wGZ78e5s=.68f2ac9d-2d85-4821-aa37-3375d52619d1@github.com> On Tue, 24 Oct 2023 04:49:20 GMT, Zhiqiang Zang wrote: > Hello, > > `(~a) & (~b) => ~(a | b)` is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1616C28-L1616C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. > > Thanks. This pull request has now been integrated. Changeset: 85692274 Author: Zhiqiang Zang Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/856922747358291ed2e112c328fb776a7be2567d Stats: 132 lines in 6 files changed: 121 ins; 0 del; 11 mod 8322589: Add Ideal transformation: (~a) & (~b) => ~(a | b) Reviewed-by: thartmann, epeter ------------- PR: https://git.openjdk.org/jdk/pull/16333 From thartmann at openjdk.org Wed Jan 10 07:38:25 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 10 Jan 2024 07:38:25 GMT Subject: RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp [v5] In-Reply-To: <-rzXk3WJ9aOgrgoyIVDpCeiQLBkq5N5yQCYp_42oMEo=.df74c3a7-508d-48a1-baa2-796a2015bea1@github.com> References: <-rzXk3WJ9aOgrgoyIVDpCeiQLBkq5N5yQCYp_42oMEo=.df74c3a7-508d-48a1-baa2-796a2015bea1@github.com> Message-ID: <4yFqfJwrep8NWbNTztQQvQY9dap5gQkLIrOFh6Od2Js=.98c4d04c-1b3b-4b71-9c3c-9c3ff021793e@github.com> On Tue, 9 Jan 2024 23:46:45 GMT, Sandhya Viswanathan wrote: >> The C2_MacroAssembler::vminmax_fp and evminmax_fp were incorrectly checking for the two USE operand registers to be different and thus resulting in assertion failure. >> >> In x86_64.ad: >> >> instruct maxF_reg(legRegF dst, legRegF a, legRegF b, legRegF tmp, legRegF atmp, legRegF btmp) %{ >> ... >> effect(USE a, USE b, TEMP tmp, TEMP atmp, TEMP btmp); >> ... >> __ vminmax_fp(Op_MaxV, T_FLOAT, $dst$$XMMRegister, $a$$XMMRegister, $b$$XMMRegister, $tmp$$XMMRegister, $atmp$$XMMRegister, $btmp$$XMMRegister, Assembler::AVX_128bit); >> %} >> >> >> Changing the assert in vminmax_fp from: >> assert_different_registers(a, b, tmp, atmp, btmp); >> to: >> assert_different_registers(a, tmp, atmp, btmp); >> assert_different_registers(b, tmp, atmp, btmp); >> fixes the issue. >> >> Similar change done in evminmax_fp. >> >> Please review. >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > copyright year update Looks good to me. Thanks for including a test. I submitted some testing and will report back once it passed. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17315#pullrequestreview-1812580424 From epeter at openjdk.org Wed Jan 10 07:50:26 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 10 Jan 2024 07:50:26 GMT Subject: RFR: 8322077: Add Ideal transformation: (~a) | (~b) => ~(a & b) [v4] In-Reply-To: <2yuELMZxtnZVtWL2rhdtdzwIjg0tgZEi0csMHasBXtQ=.3e62b735-b4e5-4bfe-9594-bfb66e19205c@github.com> References: <5KKTYmY7dJ4nW0OEQ2UPuloIWOYc-pI9M8HRjoaRzw4=.f5eda6f9-c38c-4a2b-9690-cbb1791a2622@github.com> <2yuELMZxtnZVtWL2rhdtdzwIjg0tgZEi0csMHasBXtQ=.3e62b735-b4e5-4bfe-9594-bfb66e19205c@github.com> Message-ID: On Sat, 6 Jan 2024 00:44:19 GMT, Zhiqiang Zang wrote: >> Looks like a good idea. Left a few comments. >> >> I would have merged this with https://github.com/openjdk/jdk/pull/16333, since it is essentially the symmetric case. But leave it separate now. >> >> It would be nice to have some shared tests, where both optimizations need to be combined. Like: >> `(~a | ~b) & (~c | ~d)` -> `~(a & b) & ~(c & d)` -> `~((a & b) | (c & d))` > > @eme64 @TobiHartmann Thanks for the comments. All addressed. > > I rebased this PR onto #16333 so I was able to add these tests for using both optimizations. (the history was messed up). @CptGit can you merge from master again, please? It looks now like you are pushing both the changes here and the ones from your previous PR. Once you did that, I'd like to run some testing before we push this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16334#issuecomment-1884342502 From epeter at openjdk.org Wed Jan 10 07:59:26 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 10 Jan 2024 07:59:26 GMT Subject: RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp [v5] In-Reply-To: <-rzXk3WJ9aOgrgoyIVDpCeiQLBkq5N5yQCYp_42oMEo=.df74c3a7-508d-48a1-baa2-796a2015bea1@github.com> References: <-rzXk3WJ9aOgrgoyIVDpCeiQLBkq5N5yQCYp_42oMEo=.df74c3a7-508d-48a1-baa2-796a2015bea1@github.com> Message-ID: On Tue, 9 Jan 2024 23:46:45 GMT, Sandhya Viswanathan wrote: >> The C2_MacroAssembler::vminmax_fp and evminmax_fp were incorrectly checking for the two USE operand registers to be different and thus resulting in assertion failure. >> >> In x86_64.ad: >> >> instruct maxF_reg(legRegF dst, legRegF a, legRegF b, legRegF tmp, legRegF atmp, legRegF btmp) %{ >> ... >> effect(USE a, USE b, TEMP tmp, TEMP atmp, TEMP btmp); >> ... >> __ vminmax_fp(Op_MaxV, T_FLOAT, $dst$$XMMRegister, $a$$XMMRegister, $b$$XMMRegister, $tmp$$XMMRegister, $atmp$$XMMRegister, $btmp$$XMMRegister, Assembler::AVX_128bit); >> %} >> >> >> Changing the assert in vminmax_fp from: >> assert_different_registers(a, b, tmp, atmp, btmp); >> to: >> assert_different_registers(a, tmp, atmp, btmp); >> assert_different_registers(b, tmp, atmp, btmp); >> fixes the issue. >> >> Similar change done in evminmax_fp. >> >> Please review. >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > copyright year update @sviswa7 Thanks for filing the follow up RFE! Nice, the reproducer fails without, and passes with your patch :) I also verified that the reproducer `Test_276.java` fails without, and passes with the patch. ==> LGTM ? (pending testing from Tobias) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17315#pullrequestreview-1812606665 From thartmann at openjdk.org Wed Jan 10 08:03:28 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 10 Jan 2024 08:03:28 GMT Subject: RFR: JDK-8323190: Segfault during deoptimization of C2-compiled code In-Reply-To: References: Message-ID: <6jsDdUuD6X9u8CT-4SUN_r2VUgPA4QuPCMLQyz-myaY=.52f1830e-0c87-44bf-931a-c272cabb74b4@github.com> On Wed, 10 Jan 2024 01:22:37 GMT, Cesar Soares Lucas wrote: > Currently, if `ReduceAllocationMerges` reduces an allocation merge that is used as a monitor C2 will SIGFAULT in `Process_OopMap_Node` because it's missing code to handle that case. This patch fixes C2 to properly handle reduced allocation merges that are used as monitors. > > Tested with Linux x86_64 hotspot_all. Thanks for quickly jumping on this, Cesar! The fix looks good to me. I also submitted testing and will report back once it passed. It's concerning though that we don't have any other test covering this. Would it make sense to extend `AllocationMergesTests.java` to cover some more variants? src/hotspot/share/opto/output.cpp line 1096: > 1094: > 1095: int merge_pointer_idx = smerge->merge_pointer_idx(youngest_jvms); > 1096: (void)FillLocArray(0, sfn, sfn->in(merge_pointer_idx), &deps, objs); Suggestion: FillLocArray(0, sfn, sfn->in(merge_pointer_idx), &deps, objs); Also below. I know that this is used in old code but I don't think it has any value. test/hotspot/jtreg/compiler/escapeAnalysis/TestInvalidLocation.java line 2: > 1: /* Copyright (c) 2023, Oracle and/or its affiliates. All rights reserved. > 2: * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. Suggestion: /* * Copyright (c) 2024, Oracle and/or its affiliates. All rights reserved. * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. test/hotspot/jtreg/compiler/escapeAnalysis/TestInvalidLocation.java line 27: > 25: * @bug 8323190 > 26: * @summary C2 Segfaults during code generation because of unhandled SafePointScalarMerge monitor debug info. > 27: * @run main/othervm -XX:+UnlockDiagnosticVMOptions -Xcomp -Xbatch -XX:+ReduceAllocationMerges TestInvalidLocation Suggestion: * @run main/othervm -XX:+UnlockDiagnosticVMOptions -Xcomp -XX:+ReduceAllocationMerges TestInvalidLocation `-Xcomp` implies `-Xbatch`. ------------- PR Review: https://git.openjdk.org/jdk/pull/17333#pullrequestreview-1812601748 PR Review Comment: https://git.openjdk.org/jdk/pull/17333#discussion_r1447003530 PR Review Comment: https://git.openjdk.org/jdk/pull/17333#discussion_r1447001783 PR Review Comment: https://git.openjdk.org/jdk/pull/17333#discussion_r1447002283 From thartmann at openjdk.org Wed Jan 10 08:09:22 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 10 Jan 2024 08:09:22 GMT Subject: RFR: 8320237 C2: late inlining of method handle invoke causes duplicate lines in PrintInlining output In-Reply-To: References: Message-ID: On Mon, 18 Dec 2023 15:28:47 GMT, Kangcheng Xu wrote: > This PR resolves [JDK-8320237](https://bugs.openjdk.org/browse/JDK-8320237) > > The original behavior produces both a failure and success message upon late inlining which is confusing. The patch removes the failure message if inlining was successful. Huge thanks to @rwestrel for reporting and working out a solution. > > Unit test `test/hotspot/jtreg/compiler/inlining/TestDuplicatedLateInliningOutput.java` is added and passing. Please do not sponsor this yet. We see various test failures. I'll follow-up shortly. ------------- Changes requested by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17147#pullrequestreview-1812622680 From thartmann at openjdk.org Wed Jan 10 08:16:23 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 10 Jan 2024 08:16:23 GMT Subject: RFR: 8320237 C2: late inlining of method handle invoke causes duplicate lines in PrintInlining output In-Reply-To: References: Message-ID: On Mon, 18 Dec 2023 15:28:47 GMT, Kangcheng Xu wrote: > This PR resolves [JDK-8320237](https://bugs.openjdk.org/browse/JDK-8320237) > > The original behavior produces both a failure and success message upon late inlining which is confusing. The patch removes the failure message if inlining was successful. Huge thanks to @rwestrel for reporting and working out a solution. > > Unit test `test/hotspot/jtreg/compiler/inlining/TestDuplicatedLateInliningOutput.java` is added and passing. `compiler/print/PrintInlining.java` fails with `-ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:+TieredCompilation`: # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (workspace/open/src/hotspot/share/opto/compile.cpp:4601), pid=418042, tid=418058 # assert(_print_inlining_stream->size() > 0) failed: missing inlining msg # # JRE version: Java(TM) SE Runtime Environment (23.0) (fastdebug build 23-internal-2024-01-10-0732483.tobias.hartmann.jdk2) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 23-internal-2024-01-10-0732483.tobias.hartmann.jdk2, compiled mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x9db4e6] Compile::print_inlining_update_delayed(CallGenerator*)+0x1c6 Current CompileTask: C2:643 171 b 4 java.lang.String::substring (58 bytes) Stack: [0x00007f59706d4000,0x00007f59707d4000], sp=0x00007f59707cf220, free space=1004k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x9db4e6] Compile::print_inlining_update_delayed(CallGenerator*)+0x1c6 (compile.cpp:4601) V [libjvm.so+0x844d7e] CallGenerator::do_late_inline_helper()+0x8ee (callGenerator.cpp:687) V [libjvm.so+0x9e1a52] Compile::inline_boxing_calls(PhaseIterGVN&)+0xc2 (compile.cpp:2026) V [libjvm.so+0x9e42e3] Compile::Optimize()+0x583 (compile.cpp:2276) V [libjvm.so+0x9e81a4] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1b44 (compile.cpp:860) V [libjvm.so+0x83d245] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x1d5 (c2compiler.cpp:142) V [libjvm.so+0x9f3bbc] CompileBroker::invoke_compiler_on_method(CompileTask*)+0x92c (compileBroker.cpp:2299) V [libjvm.so+0x9f4848] CompileBroker::compiler_thread_loop()+0x468 (compileBroker.cpp:1958) V [libjvm.so+0xeb98ec] JavaThread::thread_main_inner()+0xcc (javaThread.cpp:721) V [libjvm.so+0x179b586] Thread::call_run()+0xb6 (thread.cpp:220) V [libjvm.so+0x14a8d47] thread_native_entry(Thread*)+0x127 (os_linux.cpp:789) `compiler/cha/StrengthReduceInterfaceCall.java` and `compiler/ciReplay/TestIncrementalInlining.java` fail as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17147#issuecomment-1884373123 From tholenstein at openjdk.org Wed Jan 10 08:33:28 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Wed, 10 Jan 2024 08:33:28 GMT Subject: RFR: JDK-8277869: Maven POMs are using HTTP links where HTTPS is available In-Reply-To: <_yC54VkkHOUc9a7YC6Wf-7QjqTiJkA9ieAWMlwJYApQ=.032ae2c6-0471-4b8f-bf78-dd57fb6c90db@github.com> References: <_yC54VkkHOUc9a7YC6Wf-7QjqTiJkA9ieAWMlwJYApQ=.032ae2c6-0471-4b8f-bf78-dd57fb6c90db@github.com> Message-ID: <96w5MYwXQAoIrPHpLChBCHwm9zACbJCq1UYQhGIhxXI=.08bf5e8b-a2ba-4b23-8734-73ec5743f115@github.com> On Mon, 8 Jan 2024 20:48:09 GMT, Vladimir Kozlov wrote: >> Replace `http` with `https` in xml files of IdealGraphVisualizer and LogCompilation. >> Moreover, replace the old URL (./maven-v4_0_0.xsd) with of the suggested one (./maven-4.0.0.xsd) for `xsi:schemaLocation` in pom.xml. >> >> Tested: IdealGraphVisualizer and LogCompilation build and run as expected. > > Looks good. Thanks for the reviews @vnkozlov and @TobiHartmann ------------- PR Comment: https://git.openjdk.org/jdk/pull/17302#issuecomment-1884392529 From tholenstein at openjdk.org Wed Jan 10 08:33:32 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Wed, 10 Jan 2024 08:33:32 GMT Subject: Integrated: JDK-8277869: Maven POMs are using HTTP links where HTTPS is available In-Reply-To: References: Message-ID: <-GmghZFPYEBDTNBFTXvcbWgk0bhUM8WD92ZZRImPJTI=.f1c52038-315e-445a-925c-c90a428136b2@github.com> On Mon, 8 Jan 2024 10:29:38 GMT, Tobias Holenstein wrote: > Replace `http` with `https` in xml files of IdealGraphVisualizer and LogCompilation. > Moreover, replace the old URL (./maven-v4_0_0.xsd) with of the suggested one (./maven-4.0.0.xsd) for `xsi:schemaLocation` in pom.xml. > > Tested: IdealGraphVisualizer and LogCompilation build and run as expected. This pull request has now been integrated. Changeset: 88378ed0 Author: Tobias Holenstein URL: https://git.openjdk.org/jdk/commit/88378ed0584c7eb0849b6fc1e361fd8ea0698caf Stats: 43 lines in 40 files changed: 1 ins; 1 del; 41 mod 8277869: Maven POMs are using HTTP links where HTTPS is available Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/17302 From rrich at openjdk.org Wed Jan 10 08:55:50 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 10 Jan 2024 08:55:50 GMT Subject: RFR: 8322294: Cleanup NativePostCallNop [v5] In-Reply-To: <6LS57mCF2fgaosnyfnNydaqfT3cD3F42xsDOujG5SgY=.2db5f614-f64d-4fe4-8e68-1c06e70205d3@github.com> References: <6LS57mCF2fgaosnyfnNydaqfT3cD3F42xsDOujG5SgY=.2db5f614-f64d-4fe4-8e68-1c06e70205d3@github.com> Message-ID: > This is a refactoring/cleanup of `NativePostCallNop` that simplifies the ppc64 port (dependent pr https://github.com/openjdk/jdk/pull/17171). > > * `frame::get_oop_map()` is moved to shared code > > * encoding / decoding details of the oopmap slot and the CodeBlob offset are moved from shared code to the platform dependent implementations of `bool NativePostCallNop::patch(int32_t oopmap_slot, int32_t cb_offset)` and `bool NativePostCallNop::decode(int32_t& oopmap_slot, int32_t& cb_offset)` > > The change passed our CI testing. JTReg tests: tier1-4 of hotspot and jdk. All of Langtools and jaxp. SPECjvm2008, SPECjbb2015, Renaissance Suite, and SAP specific tests. > All testing was done with fastdebug and release builds on the main platforms and also on Linux/PPC64le and AIX. > > EDIT 2023-12-22: Statistics > > The statistical numbers were generated with release builds. For riscv64 I used qemu. > The variance is high on all platforms. Up to 80% I think. Numbers with fastdebug are also very different. > Nevertheless, they are consistent within one run, and I'd expect errors in encoding or decoding to manifest in the numbers. > > | test/jdk/java/lang/Thread/virtual/stress/Skynet.java | x86_64: base | x86_64: pr | aarch64: base | aarch64: pr | riscv64: base | riscv64: pr | > |------------------------------------------------------|--------------|------------|---------------|-------------|---------------|-------------| > | PCN lookup success | 17517455 | 15339681 | 13179049 | 15980253 | 19400110 | 30017193 | > | PCN lookup failure | 328164 | 372555 | 237617 | 138164 | 415341 | 586476 | > | PCN decode success | 17513991 | 15336485 | 13176061 | 15977651 | 19397398 | 30014226 | > | PCN decode failure | 3464 | 3196 | 2988 | 2602 | 2712 | 2967 | > | PCN patch success | 2676 | 2465 | 2459 | 2089 | 2214 | 2259 | > | PCN patch cb offset failure | 0 | 0 | 0 | 0 | 0 | 0 | > | PCN patch oopmap slot failure | 0 | 0 | 0 | 0 | 0 | 0 | > > > | SpecJVM2008 compiler.compiler with fix iterations | x86_64: base | x8... Richard Reingruber has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'master' - Suggstion Andrew Co-authored-by: Andrew Haley - Add newline - Review Martin - 8322294: Cleanup NativePostCallNop ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17150/files - new: https://git.openjdk.org/jdk/pull/17150/files/6c1fd588..1dfa9628 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17150&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17150&range=03-04 Stats: 21335 lines in 773 files changed: 14969 ins; 3022 del; 3344 mod Patch: https://git.openjdk.org/jdk/pull/17150.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17150/head:pull/17150 PR: https://git.openjdk.org/jdk/pull/17150 From chagedorn at openjdk.org Wed Jan 10 09:02:22 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 10 Jan 2024 09:02:22 GMT Subject: RFR: JDK-8323190: Segfault during deoptimization of C2-compiled code In-Reply-To: References: Message-ID: On Wed, 10 Jan 2024 01:22:37 GMT, Cesar Soares Lucas wrote: > Currently, if `ReduceAllocationMerges` reduces an allocation merge that is used as a monitor C2 will SIGFAULT in `Process_OopMap_Node` because it's missing code to handle that case. This patch fixes C2 to properly handle reduced allocation merges that are used as monitors. > > Tested with Linux x86_64 hotspot_all. src/hotspot/share/opto/output.cpp line 1092: > 1090: ObjectMergeValue* mv = (ObjectMergeValue*) sv_for_node_id(objs, smerge->_idx); > 1091: > 1092: if (mv == NULL) { You should replace `NULL` with `nullptr` here and below. This also seems wrong here where you took the code from: https://github.com/openjdk/jdk/blob/88378ed0584c7eb0849b6fc1e361fd8ea0698caf/src/hotspot/share/opto/output.cpp#L775-L796 On a separate note, the code looks almost identical. Could it be shared somehow? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17333#discussion_r1447068114 From aph-open at littlepinkcloud.com Wed Jan 10 09:53:43 2024 From: aph-open at littlepinkcloud.com (Andrew Haley) Date: Wed, 10 Jan 2024 09:53:43 +0000 Subject: discuss about release barrier for final fields initialization In-Reply-To: <5e41f867-b31b-4ded-b737-8d0c869c8895.kuaiwei.kw@alibaba-inc.com> References: <5e41f867-b31b-4ded-b737-8d0c869c8895.kuaiwei.kw@alibaba-inc.com> Message-ID: <20965b9a-ec64-4d81-866e-b1a0a94ed1e0@littlepinkcloud.com> On 1/9/24 06:23, Kuai Wei wrote: > ? I found only C2 will insert release membar and C1 just insert storestore for both final and normal allocation. In Doug Lea's cookbook https://gee.cs.oswego.edu/dl/jmm/cookbook.html > Only storesotre is required. Alex has a great post on this topic https://shipilev.net/blog/2014/all-fields-are-final/ ?. It referred?a case why loadstore is needed. https://www.hboehm.info/c++mm/no_write_fences.html > I checked this case and IMO it looks some legacy architecture may break data dependency and cause problem. As I know, alpha architecture is an example. I think it doesn't > break on modern architecture. Is there other case I missed? I think it requires a very careful analysis of the compiler to be sure. The problem occurs if an optimizer knows what a store is going to do. If it does, then there's nothing to prevent a load from being elided, and your load dependency has gone. This isn't a problem with C1, because C1 doesn't do that kind of optimization. I don't know that C2 does either, or even whether it is allowed to do so. From what I remember of the conversation, we left the release barrier in because of an abundance of caution rather than any proof that a storestore was inadequate. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From rrich at openjdk.org Wed Jan 10 10:38:25 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 10 Jan 2024 10:38:25 GMT Subject: RFR: 8290965: PPC64: Implement post-call NOPs [v2] In-Reply-To: References: Message-ID: On Wed, 27 Dec 2023 18:27:22 GMT, Martin Doerr wrote: >> Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix comment >> >> Co-authored-by: Andrew Haley > > src/hotspot/cpu/ppc/frame_ppc.hpp line 414: > >> 412: // Constructors >> 413: inline frame(intptr_t* sp, intptr_t* fp, address pc); >> 414: inline frame(intptr_t* sp, address pc, kind knd = kind::nmethod); > > I think using `kind::nmethod` by default is potentially dangerous. The pc may be outside of the code cache and calling find_blob_fast would be unreliable. It's used by pns for debugging code. It doesn't look performance critical and we could use a conservative default. > I guess that we don't see issues because native code doesn't set bit 9 in CMPI/CMPLI. `pns` does not use this constructor. It uses `frame::frame(void* sp, void* fp, void* pc) : frame((intptr_t*)sp, (address)pc, kind::code_blob)`. So there's no problem. `pns` seems to be the only user of this one. It might good to use `kind::native` there. Using `kind::native` (or `kind::unknow`) as default instead of `kind::nmethod` is potentially problematic since there might be locations in shared code that should set `kind::nmethod`. I think this requires a clean-up of the shared frame api. Note also that using the wrong kind (wrong constructor on other platfroms) hit the assertion in `CodeCache::find_blob_and_oopmap` (that's how I noticed that the distinction is actually needed :)) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17171#discussion_r1447187050 From mli at openjdk.org Wed Jan 10 10:41:23 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 10 Jan 2024 10:41:23 GMT Subject: RFR: 8317721: RISC-V: Implement CRC32 intrinsic [v4] In-Reply-To: References: Message-ID: On Tue, 9 Jan 2024 16:47:08 GMT, ArsenyBochkarev wrote: >> Hi everyone! Please review this port of [AArch64](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L4224) `_updateBytesCRC32`, `_updateByteBufferCRC32` and `_updateCRC32` intrinsics. This patch introduces only the plain (non-vectorized, no Zbc) version. >> >> ### Correctness checks >> >> Tier 1/2 tests are ok. >> >> ### Performance results on T-Head board >> >> #### Results for enabled intrinsic: >> >> Used test is `test/micro/org/openjdk/bench/java/util/TestCRC32.java` >> >> | Benchmark | (count) | Mode | Cnt | Score | Error | Units | >> | --- | ---- | ----- | --- | ---- | --- | ---- | >> | CRC32.TestCRC32.testCRC32Update | 64 | thrpt | 24 | 3730.929 | 37.773 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 128 | thrpt | 24 | 2126.673 | 2.032 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 256 | thrpt | 24 | 1134.330 | 6.714 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 512 | thrpt | 24 | 584.017 | 2.267 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 2048 | thrpt | 24 | 151.173 | 0.346 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 16384 | thrpt | 24 | 19.113 | 0.008 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 65536 | thrpt | 24 | 4.647 | 0.022 | ops/ms | >> >> #### Results for disabled intrinsic: >> >> | Benchmark | (count) | Mode | Cnt | Score | Error | Units | >> | --------------------------------------------------- | ---------- | --------- | ---- | ----------- | --------- | ---------- | >> | CRC32.TestCRC32.testCRC32Update | 64 | thrpt | 15 | 798.365 | 35.486 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 128 | thrpt | 15 | 677.756 | 46.619 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 256 | thrpt | 15 | 552.781 | 27.143 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 512 | thrpt | 15 | 429.304 | 12.518 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 2048 | thrpt | 15 | 166.738 | 0.935 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 16384 | thrpt | 15 | 25.060 | 0.034 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 65536 | thrpt | 15 | 6.196 | 0.030 | ops/ms | > > ArsenyBochkarev has updated the pull request incrementally with four additional commits since the last revision: > > - Fix unroll size > - Rename constants > - Partially unroll loop > - Optimize loop counter in L_by16_loop Same the performance trend is that: the larger the data size, the closer the performance gap. when size is `65536`, there seems a little perf regression. So I wonder how it will behave when the size is bigger than 65536, and whether we need to consider the size bigger than 65536 depends on what's the expected regular data size of java CRC32, are the larger data size (equal or larger than 65536) common cases? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17046#issuecomment-1884592765 From rrich at openjdk.org Wed Jan 10 12:20:45 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 10 Jan 2024 12:20:45 GMT Subject: RFR: 8322294: Cleanup NativePostCallNop [v5] In-Reply-To: References: <6LS57mCF2fgaosnyfnNydaqfT3cD3F42xsDOujG5SgY=.2db5f614-f64d-4fe4-8e68-1c06e70205d3@github.com> Message-ID: On Wed, 10 Jan 2024 08:55:50 GMT, Richard Reingruber wrote: >> This is a refactoring/cleanup of `NativePostCallNop` that simplifies the ppc64 port (dependent pr https://github.com/openjdk/jdk/pull/17171). >> >> * `frame::get_oop_map()` is moved to shared code >> >> * encoding / decoding details of the oopmap slot and the CodeBlob offset are moved from shared code to the platform dependent implementations of `bool NativePostCallNop::patch(int32_t oopmap_slot, int32_t cb_offset)` and `bool NativePostCallNop::decode(int32_t& oopmap_slot, int32_t& cb_offset)` >> >> The change passed our CI testing. JTReg tests: tier1-4 of hotspot and jdk. All of Langtools and jaxp. SPECjvm2008, SPECjbb2015, Renaissance Suite, and SAP specific tests. >> All testing was done with fastdebug and release builds on the main platforms and also on Linux/PPC64le and AIX. >> >> EDIT 2023-12-22: Statistics >> >> The statistical numbers were generated with release builds. For riscv64 I used qemu. >> The variance is high on all platforms. Up to 80% I think. Numbers with fastdebug are also very different. >> Nevertheless, they are consistent within one run, and I'd expect errors in encoding or decoding to manifest in the numbers. >> >> | test/jdk/java/lang/Thread/virtual/stress/Skynet.java | x86_64: base | x86_64: pr | aarch64: base | aarch64: pr | riscv64: base | riscv64: pr | >> |------------------------------------------------------|--------------|------------|---------------|-------------|---------------|-------------| >> | PCN lookup success | 17517455 | 15339681 | 13179049 | 15980253 | 19400110 | 30017193 | >> | PCN lookup failure | 328164 | 372555 | 237617 | 138164 | 415341 | 586476 | >> | PCN decode success | 17513991 | 15336485 | 13176061 | 15977651 | 19397398 | 30014226 | >> | PCN decode failure | 3464 | 3196 | 2988 | 2602 | 2712 | 2967 | >> | PCN patch success | 2676 | 2465 | 2459 | 2089 | 2214 | 2259 | >> | PCN patch cb offset failure | 0 | 0 | 0 | 0 | 0 | 0 | >> | PCN patch oopmap slot failure | 0 | 0 | 0 | 0 | 0 | 0 | >> >> >> | SpecJVM2008 compil... > > Richard Reingruber has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' > - Suggstion Andrew > > Co-authored-by: Andrew Haley > - Add newline > - Review Martin > - 8322294: Cleanup NativePostCallNop Tests are good after merging master. Shipping now... Thanks again for the feedback and reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/17150#issuecomment-1884739328 From rrich at openjdk.org Wed Jan 10 12:20:46 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 10 Jan 2024 12:20:46 GMT Subject: Integrated: 8322294: Cleanup NativePostCallNop In-Reply-To: <6LS57mCF2fgaosnyfnNydaqfT3cD3F42xsDOujG5SgY=.2db5f614-f64d-4fe4-8e68-1c06e70205d3@github.com> References: <6LS57mCF2fgaosnyfnNydaqfT3cD3F42xsDOujG5SgY=.2db5f614-f64d-4fe4-8e68-1c06e70205d3@github.com> Message-ID: On Mon, 18 Dec 2023 22:05:32 GMT, Richard Reingruber wrote: > This is a refactoring/cleanup of `NativePostCallNop` that simplifies the ppc64 port (dependent pr https://github.com/openjdk/jdk/pull/17171). > > * `frame::get_oop_map()` is moved to shared code > > * encoding / decoding details of the oopmap slot and the CodeBlob offset are moved from shared code to the platform dependent implementations of `bool NativePostCallNop::patch(int32_t oopmap_slot, int32_t cb_offset)` and `bool NativePostCallNop::decode(int32_t& oopmap_slot, int32_t& cb_offset)` > > The change passed our CI testing. JTReg tests: tier1-4 of hotspot and jdk. All of Langtools and jaxp. SPECjvm2008, SPECjbb2015, Renaissance Suite, and SAP specific tests. > All testing was done with fastdebug and release builds on the main platforms and also on Linux/PPC64le and AIX. > > EDIT 2023-12-22: Statistics > > The statistical numbers were generated with release builds. For riscv64 I used qemu. > The variance is high on all platforms. Up to 80% I think. Numbers with fastdebug are also very different. > Nevertheless, they are consistent within one run, and I'd expect errors in encoding or decoding to manifest in the numbers. > > | test/jdk/java/lang/Thread/virtual/stress/Skynet.java | x86_64: base | x86_64: pr | aarch64: base | aarch64: pr | riscv64: base | riscv64: pr | > |------------------------------------------------------|--------------|------------|---------------|-------------|---------------|-------------| > | PCN lookup success | 17517455 | 15339681 | 13179049 | 15980253 | 19400110 | 30017193 | > | PCN lookup failure | 328164 | 372555 | 237617 | 138164 | 415341 | 586476 | > | PCN decode success | 17513991 | 15336485 | 13176061 | 15977651 | 19397398 | 30014226 | > | PCN decode failure | 3464 | 3196 | 2988 | 2602 | 2712 | 2967 | > | PCN patch success | 2676 | 2465 | 2459 | 2089 | 2214 | 2259 | > | PCN patch cb offset failure | 0 | 0 | 0 | 0 | 0 | 0 | > | PCN patch oopmap slot failure | 0 | 0 | 0 | 0 | 0 | 0 | > > > | SpecJVM2008 compiler.compiler with fix iterations | x86_64: base | x8... This pull request has now been integrated. Changeset: 2e472fe7 Author: Richard Reingruber URL: https://git.openjdk.org/jdk/commit/2e472fe7ea98ca1f07a90d1ad6704e8b2bb3afcf Stats: 200 lines in 30 files changed: 51 ins; 114 del; 35 mod 8322294: Cleanup NativePostCallNop Reviewed-by: mdoerr, aph ------------- PR: https://git.openjdk.org/jdk/pull/17150 From rrich at openjdk.org Wed Jan 10 15:11:35 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 10 Jan 2024 15:11:35 GMT Subject: RFR: 8290965: PPC64: Implement post-call NOPs [v3] In-Reply-To: References: Message-ID: > #### Implementation of post call nops (PCNs) on ppc64. > > Depends on https://github.com/openjdk/jdk/pull/17150 > > About post call nops: > > - instruction(s) at return addresses of compiled java calls > - emitted iff vm continuations are enabled to support virtual threads > - encode data that can be be used to find the corresponding CodeBlob and oop map faster > - mt-safe patchable to trigger deoptimization > > Background: > > - Frames in continuation StackChunks are not visited if their compiled method is made not entrant (in contrast to frames on stack). > Instead all PCNs of the compiled method are patched to trigger deoptimization when control returns to such frames. > - With vm continuations, stacks are walked and inspected more frequently. This requires lookup of metadata like frame size and oop maps. As an optimization the offset of the CodeBlob to the PCN and the oop map slot are encoded as data in the PCN. > > Post call nops on ppc64 > > - 1 instruction, i.e. 4 bytes (either CMPI or CMPLI[1]) > x86_64: 1 instruction, 8 bytes > aarch64: 3 instruction, 12 bytes > [1] 3.1.10 Fixed Point Compare Instructions in Power ISA 3.1B > https://openpowerfoundation.org/specifications/isa/ > > - 26 bits data payload > x86_64: 32 bits; aarch64: 32 bits > - 9 bits dedicated to oop map slot. With 8 bits there where cases with SPECjvm2008 where the slot could not be encoded (on ppc64 and x86_64). > x86_64: 8 bits; aarch64: 8 bits > - 17 bits dedicated to cb offset. Effectively 19 bits due to instruction alignment. > x86_64: 24 bits; aarch64: 24 bits > - Also used when reconstructing the back chain after thawing continuation frames (see `Thaw::patch_caller_links`) > > - Refactored frame constructors to make use of fast CodeBlob lookup based on PCNs. > The fast lookup may only be used if the pc is known to be in the code cache because `CodeCache::find_blob_fast` can yield wrong results if it finds instructions outside the code cache that look just like PCNs. Callers of the frame class constructors need to pass `frame::kind::native` in that case to avoid errors. Other platforms don't make this explicit which is a problem in my eyes. Picking the wrong constructor can cause errors when porting and in future development. > > - Currently only the PCNs in nmethods are initialized. Therefore we don't even try to make a fast lookup based on PCNs if we know the CodeBlob is, e.g., a RuntimeStub. To achieve this we call the frame constructor passing `frame::kind::code_blob`. > > #### Statistics > > > | SpecJVM2008... Richard Reingruber has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: - Merge branch 'master' - Fix comment Co-authored-by: Andrew Haley - 8290965: PPC64: Implement post-call NOPs - 8322294: Cleanup NativePostCallNop ------------- Changes: https://git.openjdk.org/jdk/pull/17171/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17171&range=02 Stats: 133 lines in 13 files changed: 96 ins; 0 del; 37 mod Patch: https://git.openjdk.org/jdk/pull/17171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17171/head:pull/17171 PR: https://git.openjdk.org/jdk/pull/17171 From rrich at openjdk.org Wed Jan 10 15:17:30 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 10 Jan 2024 15:17:30 GMT Subject: RFR: 8290965: PPC64: Implement post-call NOPs [v2] In-Reply-To: References: Message-ID: On Wed, 27 Dec 2023 17:34:11 GMT, Martin Doerr wrote: >> Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix comment >> >> Co-authored-by: Andrew Haley > > src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 1191: > >> 1189: } >> 1190: // We use CMPI/CMPLI instructions to encode post call nops. >> 1191: // We set bit 9 to distinguish post call nops from real CMPI/CMPI instructions > > Should be CMPI/CMPLI. Maybe add that CMPI and CMPLI opcodes only differ in one bit which we use to encode data. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17171#discussion_r1447525235 From rrich at openjdk.org Wed Jan 10 15:22:30 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 10 Jan 2024 15:22:30 GMT Subject: RFR: 8290965: PPC64: Implement post-call NOPs [v2] In-Reply-To: References: Message-ID: <0K_jX-ImmJLihXnogDXZKQYhMn7JfdpRgRr_KVxGdcQ=.9c3d3e21-75c0-436c-978a-7907ad60ff95@github.com> On Wed, 27 Dec 2023 17:26:10 GMT, Martin Doerr wrote: > I think `kind::nmethod` should only be used if cb != nullptr which is not checked, here. Is this one performance critical? I don't quite understand: the purpose of using `kind::nmethod` is to allow for a fast lookup of the cb which is only done if cb != nullptr. See also my other response where `kind::nmethod` is default. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17171#discussion_r1447531558 From rrich at openjdk.org Wed Jan 10 15:55:23 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 10 Jan 2024 15:55:23 GMT Subject: RFR: 8290965: PPC64: Implement post-call NOPs [v2] In-Reply-To: <0K_jX-ImmJLihXnogDXZKQYhMn7JfdpRgRr_KVxGdcQ=.9c3d3e21-75c0-436c-978a-7907ad60ff95@github.com> References: <0K_jX-ImmJLihXnogDXZKQYhMn7JfdpRgRr_KVxGdcQ=.9c3d3e21-75c0-436c-978a-7907ad60ff95@github.com> Message-ID: On Wed, 10 Jan 2024 15:19:38 GMT, Richard Reingruber wrote: > Is this one performance critical? This is a good question. Honestly I have difficulties understanding why PCNs should be performance critical at all. AFAIK frames are only iterated on the slow path when freezing/thawing. Maybe the slow path is not that uncommen, e.g. if StackChunks are visited by GC. I wanted to use `kind::nmethod` as default whenever possible in order not to miss a place that actually is performance critical. See also https://github.com/openjdk/jdk/pull/8955#issuecomment-1142317441 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17171#discussion_r1447580465 From sviswanathan at openjdk.org Wed Jan 10 16:23:26 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 10 Jan 2024 16:23:26 GMT Subject: RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp [v2] In-Reply-To: References: Message-ID: On Tue, 9 Jan 2024 18:17:33 GMT, Emanuel Peter wrote: >> Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: >> >> review comments > > https://bugs.openjdk.org/secure/attachment/107681/Test_276.java > This is the regression test of the bug that is closed as duplicate of your issue, am I correct? > This is the duplicate bug: https://bugs.openjdk.org/browse/JDK-8322090 > > Fails with: `assert(regs[i] != regs[j]) failed: Multiple uses of register: xmm3` > > You need to at least verify if this bug is fixed with your patch, otherwise we would need to re-open it, since it would not be a duplicate. > > Culpable node seems to be: > `7274 MaxD === _ 363 363 [[ 4874 ]] !jvms: Test_276::mainTest @ bci:291 (line 1084)` Thanks a lot @eme64 and @TobiHartmann for the review, I will wait for the test results before integrating. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17315#issuecomment-1885172150 From kxu at openjdk.org Wed Jan 10 16:37:44 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 10 Jan 2024 16:37:44 GMT Subject: RFR: 8320237 C2: late inlining of method handle invoke causes duplicate lines in PrintInlining output [v2] In-Reply-To: References: Message-ID: > This PR resolves [JDK-8320237](https://bugs.openjdk.org/browse/JDK-8320237) > > The original behavior produces both a failure and success message upon late inlining which is confusing. The patch removes the failure message if inlining was successful. Huge thanks to @rwestrel for reporting and working out a solution. > > Unit test `test/hotspot/jtreg/compiler/inlining/TestDuplicatedLateInliningOutput.java` is added and passing. Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: fix VM crashes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17147/files - new: https://git.openjdk.org/jdk/pull/17147/files/3e53d03a..94d78fa1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17147&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17147&range=00-01 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17147.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17147/head:pull/17147 PR: https://git.openjdk.org/jdk/pull/17147 From duke at openjdk.org Wed Jan 10 16:57:47 2024 From: duke at openjdk.org (Zhiqiang Zang) Date: Wed, 10 Jan 2024 16:57:47 GMT Subject: RFR: 8322077: Add Ideal transformation: (~a) | (~b) => ~(a & b) [v13] In-Reply-To: References: Message-ID: > Hello, > > (~a) | (~b) => ~(a & b) is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1617C28-L1617C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. > > Thanks. Zhiqiang Zang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 34 commits: - Merge branch 'master' of https://github.com/openjdk/jdk into ornode-PNewDeMorganLawOrToAnd - update copyright dates. - Merge branch 'andnode-PNewDeMorganLawAndToOr' into ornode-PNewDeMorganLawOrToAnd - Revert "move the two helper functions to member functions of the node class." This reverts commit 7a962d69ac687c0476e54cd004037f2ebb0800ba. - Revert "update copyright dates." This reverts commit 3665de2f3d38b4fce3ba968ec6247dd2576ecc32. - Revert "move make_not back to AddNode because it cannot compile for architecture arm, s390x, ppc64le." This reverts commit 4ee8b089b6f119b1692dcd2c512d249d92734697. - Revert "adapt changes from the dependent pr." This reverts commit 0c8d10776fe55a28ea9f60160b4f6f57aabad5ab. - Revert "adapt to new changes from the dependant pr." This reverts commit b21e242b306d9cf01e1ba84ebba47ef5b471d98e. - adapt to new changes from the dependant pr. - Merge branch 'andnode-PNewDeMorganLawAndToOr' into ornode-PNewDeMorganLawOrToAnd - ... and 24 more: https://git.openjdk.org/jdk/compare/b86c3b7a...f908668b ------------- Changes: https://git.openjdk.org/jdk/pull/16334/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16334&range=12 Stats: 369 lines in 5 files changed: 369 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16334.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16334/head:pull/16334 PR: https://git.openjdk.org/jdk/pull/16334 From cslucas at openjdk.org Wed Jan 10 17:24:06 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 10 Jan 2024 17:24:06 GMT Subject: RFR: JDK-8323190: Segfault during deoptimization of C2-compiled code [v2] In-Reply-To: References: Message-ID: > Currently, if `ReduceAllocationMerges` reduces an allocation merge that is used as a monitor C2 will SIGFAULT in `Process_OopMap_Node` because it's missing code to handle that case. This patch fixes C2 to properly handle reduced allocation merges that are used as monitors. > > Tested with Linux x86_64 hotspot_all. Cesar Soares Lucas has updated the pull request incrementally with three additional commits since the last revision: - Update test/hotspot/jtreg/compiler/escapeAnalysis/TestInvalidLocation.java Co-authored-by: Tobias Hartmann - Update src/hotspot/share/opto/output.cpp Co-authored-by: Tobias Hartmann - Update test/hotspot/jtreg/compiler/escapeAnalysis/TestInvalidLocation.java Co-authored-by: Tobias Hartmann ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17333/files - new: https://git.openjdk.org/jdk/pull/17333/files/95fe08dd..5e2f0089 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17333&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17333&range=00-01 Stats: 4 lines in 2 files changed: 1 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/17333.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17333/head:pull/17333 PR: https://git.openjdk.org/jdk/pull/17333 From phh at openjdk.org Wed Jan 10 17:32:25 2024 From: phh at openjdk.org (Paul Hohensee) Date: Wed, 10 Jan 2024 17:32:25 GMT Subject: RFR: 8322982: CTW fails to build after 8308753 [v3] In-Reply-To: References: Message-ID: On Mon, 8 Jan 2024 20:08:37 GMT, Xin Liu wrote: >> This patch fixes the build error of CTW by sidelining ModuleInfoWriter.java. ModuleInfoWriter uses Class-File API, which has transitioned to preview. >> If we really need to compile it, we have to append --enable-preview and --source N. >> >> The fact is CTW itself doesn't depend on ModuleInfoWriter. I think it's easier to maintain CTW if we filter it out in Makefile. > > Xin Liu has updated the pull request incrementally with one additional commit since the last revision: > > Wording and also remove add-modules required by ModuleInfoWriter.java Marked as reviewed by phh (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/17292#pullrequestreview-1813751239 From qamai at openjdk.org Wed Jan 10 18:08:39 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 10 Jan 2024 18:08:39 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v5] In-Reply-To: References: Message-ID: <6SOOtHaZrEN8UIKKLHBJniz_nUvZjRvV0TLofW7Xjxk=.debaf12a-b7c8-4b04-b217-3b8cd9b3c6f5@github.com> > Hi, > > This patch adds unsigned bounds and known bits constraints to `TypeInt` and `TypeLong`. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. The new constraints are applied to identity and value calls of the common nodes (Add, Sub, L/R/URShift, And, Or, Xor, bit counting, Cmp, Bool, ConvI2L/L2I), the detailed ideas for each node will be presented below. > > In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (~x & ones) == 0`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must normalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. > > Please kindly review, thanks very much. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-4 Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 27 commits: - Merge branch 'master' into improvevalue - Merge branch 'master' into improvevalue - improve add/sub implementation - Merge branch 'master' into improvevalue - typo - whitespace - fix tests for x86_32 - fix widen of ConvI2L - problem lists - format - ... and 17 more: https://git.openjdk.org/jdk/compare/f0169341...843ad076 ------------- Changes: https://git.openjdk.org/jdk/pull/15440/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15440&range=04 Stats: 3692 lines in 35 files changed: 1895 ins; 1235 del; 562 mod Patch: https://git.openjdk.org/jdk/pull/15440.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15440/head:pull/15440 PR: https://git.openjdk.org/jdk/pull/15440 From jbhateja at openjdk.org Wed Jan 10 18:09:29 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 10 Jan 2024 18:09:29 GMT Subject: RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp [v5] In-Reply-To: <-rzXk3WJ9aOgrgoyIVDpCeiQLBkq5N5yQCYp_42oMEo=.df74c3a7-508d-48a1-baa2-796a2015bea1@github.com> References: <-rzXk3WJ9aOgrgoyIVDpCeiQLBkq5N5yQCYp_42oMEo=.df74c3a7-508d-48a1-baa2-796a2015bea1@github.com> Message-ID: On Tue, 9 Jan 2024 23:46:45 GMT, Sandhya Viswanathan wrote: >> The C2_MacroAssembler::vminmax_fp and evminmax_fp were incorrectly checking for the two USE operand registers to be different and thus resulting in assertion failure. >> >> In x86_64.ad: >> >> instruct maxF_reg(legRegF dst, legRegF a, legRegF b, legRegF tmp, legRegF atmp, legRegF btmp) %{ >> ... >> effect(USE a, USE b, TEMP tmp, TEMP atmp, TEMP btmp); >> ... >> __ vminmax_fp(Op_MaxV, T_FLOAT, $dst$$XMMRegister, $a$$XMMRegister, $b$$XMMRegister, $tmp$$XMMRegister, $atmp$$XMMRegister, $btmp$$XMMRegister, Assembler::AVX_128bit); >> %} >> >> >> Changing the assert in vminmax_fp from: >> assert_different_registers(a, b, tmp, atmp, btmp); >> to: >> assert_different_registers(a, tmp, atmp, btmp); >> assert_different_registers(b, tmp, atmp, btmp); >> fixes the issue. >> >> Similar change done in evminmax_fp. >> >> Please review. >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > copyright year update Thanks for filing RFE, LGTM ------------- Marked as reviewed by jbhateja (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17315#pullrequestreview-1813816523 From sviswanathan at openjdk.org Wed Jan 10 18:12:28 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 10 Jan 2024 18:12:28 GMT Subject: RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp [v5] In-Reply-To: <8ZGiFoB4TkRgQSP67ekJ_Tw_uMnEyVNdU9GSa4bx69M=.f252a9b8-367c-49e6-916e-48dd0e6e936e@github.com> References: <8ZGiFoB4TkRgQSP67ekJ_Tw_uMnEyVNdU9GSa4bx69M=.f252a9b8-367c-49e6-916e-48dd0e6e936e@github.com> Message-ID: On Tue, 9 Jan 2024 15:14:33 GMT, Jatin Bhateja wrote: >> Should we "short cut" code when registers are the same? > >> Should we "short cut" code when registers are the same? > > Hi @sviswa7 , An identity transformation may be useful here to prevent generating MaxF/D in case both the arguments are same. Thanks a lot @jatin-bhateja for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17315#issuecomment-1885368974 From cslucas at openjdk.org Wed Jan 10 18:14:24 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 10 Jan 2024 18:14:24 GMT Subject: RFR: JDK-8323190: Segfault during deoptimization of C2-compiled code [v2] In-Reply-To: References: Message-ID: On Wed, 10 Jan 2024 08:57:58 GMT, Christian Hagedorn wrote: >> Cesar Soares Lucas has updated the pull request incrementally with three additional commits since the last revision: >> >> - Update test/hotspot/jtreg/compiler/escapeAnalysis/TestInvalidLocation.java >> >> Co-authored-by: Tobias Hartmann >> - Update src/hotspot/share/opto/output.cpp >> >> Co-authored-by: Tobias Hartmann >> - Update test/hotspot/jtreg/compiler/escapeAnalysis/TestInvalidLocation.java >> >> Co-authored-by: Tobias Hartmann > > src/hotspot/share/opto/output.cpp line 1092: > >> 1090: ObjectMergeValue* mv = (ObjectMergeValue*) sv_for_node_id(objs, smerge->_idx); >> 1091: >> 1092: if (mv == NULL) { > > You should replace `NULL` with `nullptr` here and below. This also seems wrong here where you took the code from: > https://github.com/openjdk/jdk/blob/88378ed0584c7eb0849b6fc1e361fd8ea0698caf/src/hotspot/share/opto/output.cpp#L775-L796 > > On a separate note, the code looks almost identical. Could it be shared somehow? Thank you for reviewing @chhagedorn. I've converted the NULLs to nullptrs. However, I'll defer the refactoring of the identical code to a RFE - mainly because I'll have to backport the current patch and I'd like to keep it as minimal as possible. Please let me know if you disagree. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17333#discussion_r1447752271 From cslucas at openjdk.org Wed Jan 10 18:20:43 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 10 Jan 2024 18:20:43 GMT Subject: RFR: JDK-8323190: Segfault during deoptimization of C2-compiled code [v3] In-Reply-To: <6jsDdUuD6X9u8CT-4SUN_r2VUgPA4QuPCMLQyz-myaY=.52f1830e-0c87-44bf-931a-c272cabb74b4@github.com> References: <6jsDdUuD6X9u8CT-4SUN_r2VUgPA4QuPCMLQyz-myaY=.52f1830e-0c87-44bf-931a-c272cabb74b4@github.com> Message-ID: On Wed, 10 Jan 2024 08:00:38 GMT, Tobias Hartmann wrote: > It's concerning though that we don't have any other test covering this. Would it make sense to extend AllocationMergesTests.java to cover some more variants? Thank you for reviewing @TobiHartmann ! I think `AllocationMergesTests.java` isn't the ideal place for these tests. The tests in `AllocationMergesTests.java` are for checking the IR shape after the optimization, the current issue was actually because of a problem emitting debug info for a compilation unit - it's not something that we can capture with the IR-framework I believe. In this other PR (https://github.com/openjdk/jdk/pull/15825) I have a test file that I think will be more appropriate for this kind of test. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17333#issuecomment-1885378367 From cslucas at openjdk.org Wed Jan 10 18:20:42 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 10 Jan 2024 18:20:42 GMT Subject: RFR: JDK-8323190: Segfault during deoptimization of C2-compiled code [v3] In-Reply-To: References: Message-ID: > Currently, if `ReduceAllocationMerges` reduces an allocation merge that is used as a monitor C2 will SIGFAULT in `Process_OopMap_Node` because it's missing code to handle that case. This patch fixes C2 to properly handle reduced allocation merges that are used as monitors. > > Tested with Linux x86_64 hotspot_all. Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: Convert NULL to nullptr. Remove type cast. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17333/files - new: https://git.openjdk.org/jdk/pull/17333/files/5e2f0089..8c21a4b5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17333&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17333&range=01-02 Stats: 7 lines in 2 files changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/17333.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17333/head:pull/17333 PR: https://git.openjdk.org/jdk/pull/17333 From kvn at openjdk.org Wed Jan 10 18:29:25 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 10 Jan 2024 18:29:25 GMT Subject: RFR: 8322743: assert(held_monitor_count() == jni_monitor_count()) failed In-Reply-To: References: Message-ID: On Wed, 10 Jan 2024 02:07:29 GMT, Dean Long wrote: > I'm wondering if there is a simpler solution. What if in `Parse::load_interpreter_state` we maark the lock objects from the interpreter as global escape? Thank you, Dean, for looking on changes. You are correct, we can mark created `BoxLock` node in `Parse::load_interpreter_state` as having escaped object. But in general case it could be only dead path where such object is referenced. Also it could be other cases where EA think that object escapes on one of paths. I wanted to check graph only after some transformations which happens before EA and use EA analysis to find escaped objects. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17331#issuecomment-1885396689 From kvn at openjdk.org Wed Jan 10 18:55:27 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 10 Jan 2024 18:55:27 GMT Subject: RFR: 8322735: C2: minor improvements of bubble sort used in SuperWord::packset_sort [v3] In-Reply-To: References: Message-ID: On Tue, 9 Jan 2024 05:26:43 GMT, Denghui Dong wrote: >> A minor improvement could be made for bubble sort in SuperWord::packset_sort to reduce the comparison count in bad cases. >> >> See https://en.wikipedia.org/wiki/Bubble_sort > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > update Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17190#pullrequestreview-1813896857 From jbhateja at openjdk.org Wed Jan 10 19:20:26 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 10 Jan 2024 19:20:26 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v5] In-Reply-To: References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Message-ID: On Tue, 9 Jan 2024 16:48:56 GMT, Jatin Bhateja wrote: >> Hi, >> >> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only targets. >> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 instruction set. >> These are very frequently used APIs in columnar database filter operation. >> >> Implementation uses a lookup table to record permute indices. Table index is computed using >> mask argument of compress/expand operation. >> >> Following are the performance number of JMH micro included with the patch. >> >> >> System : Intel(R) Xeon(R) Platinum 8480+ (Sapphire Rapids) >> >> Baseline: >> Benchmark (size) Mode Cnt Score Error Units >> ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 142.767 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 71.436 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 35.992 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 182.151 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 91.096 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 44.757 ops/ms >> ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 184.099 ops/ms >> ColumnFilterBenchmark.filterIntColumn 2047 thrpt 2 91.981 ops/ms >> ColumnFilterBenchmark.filterIntColumn 4096 thrpt 2 45.170 ops/ms >> ColumnFilterBenchmark.filterLongColumn 1024 thrpt 2 148.017 ops/ms >> ColumnFilterBenchmark.filterLongColumn 2047 thrpt 2 73.516 ops/ms >> ColumnFilterBenchmark.filterLongColumn 4096 thrpt 2 36.844 ops/ms >> >> Withopt: >> Benchmark (size) Mode Cnt Score Error Units >> ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 2051.707 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 914.072 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 489.898 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 5324.195 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 2587.229 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 1278.665 ops/ms >> ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 4149.384 ops/ms >> ColumnFilterBenchmark.filterIntColumn 2047 thrpt ... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Using emulated variable blend E-Core optimized instruction. Following are the performance numbers for existing Vector API JMH micro benchmark over Meteor Lake - Crestmont E-cores. ![image](https://github.com/openjdk/jdk/assets/59989778/dab762f8-2379-4fcf-90da-f765e907c6c1) ------------- PR Comment: https://git.openjdk.org/jdk/pull/17261#issuecomment-1885525420 From xliu at openjdk.org Wed Jan 10 19:44:31 2024 From: xliu at openjdk.org (Xin Liu) Date: Wed, 10 Jan 2024 19:44:31 GMT Subject: Integrated: 8322982: CTW fails to build after 8308753 In-Reply-To: References: Message-ID: On Mon, 8 Jan 2024 03:19:52 GMT, Xin Liu wrote: > This patch fixes the build error of CTW by sidelining ModuleInfoWriter.java. ModuleInfoWriter uses Class-File API, which has transitioned to preview. > If we really need to compile it, we have to append --enable-preview and --source N. > > The fact is CTW itself doesn't depend on ModuleInfoWriter. I think it's easier to maintain CTW if we filter it out in Makefile. This pull request has now been integrated. Changeset: d89602a5 Author: Xin Liu URL: https://git.openjdk.org/jdk/commit/d89602a53f173e4fc1e0aa10bb0ffdf7232456cb Stats: 8 lines in 1 file changed: 1 ins; 4 del; 3 mod 8322982: CTW fails to build after 8308753 Reviewed-by: shade, phh ------------- PR: https://git.openjdk.org/jdk/pull/17292 From duke at openjdk.org Wed Jan 10 20:32:25 2024 From: duke at openjdk.org (Zhiqiang Zang) Date: Wed, 10 Jan 2024 20:32:25 GMT Subject: RFR: 8322077: Add Ideal transformation: (~a) | (~b) => ~(a & b) [v4] In-Reply-To: References: <5KKTYmY7dJ4nW0OEQ2UPuloIWOYc-pI9M8HRjoaRzw4=.f5eda6f9-c38c-4a2b-9690-cbb1791a2622@github.com> <2yuELMZxtnZVtWL2rhdtdzwIjg0tgZEi0csMHasBXtQ=.3e62b735-b4e5-4bfe-9594-bfb66e19205c@github.com> Message-ID: On Wed, 10 Jan 2024 07:47:35 GMT, Emanuel Peter wrote: >> @eme64 @TobiHartmann Thanks for the comments. All addressed. >> >> I rebased this PR onto #16333 so I was able to add these tests for using both optimizations. (the history was messed up). > > @CptGit can you merge from master again, please? It looks now like you are pushing both the changes here and the ones from your previous PR. Once you did that, I'd like to run some testing before we push this. @eme64 Yes I merged. Looks clean now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16334#issuecomment-1885669143 From kxu at openjdk.org Wed Jan 10 22:23:23 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 10 Jan 2024 22:23:23 GMT Subject: RFR: 8320237 C2: late inlining of method handle invoke causes duplicate lines in PrintInlining output In-Reply-To: References: Message-ID: On Wed, 10 Jan 2024 08:13:46 GMT, Tobias Hartmann wrote: >> This PR resolves [JDK-8320237](https://bugs.openjdk.org/browse/JDK-8320237) >> >> The original behavior produces both a failure and success message upon late inlining which is confusing. The patch removes the failure message if inlining was successful. Huge thanks to @rwestrel for reporting and working out a solution. >> >> Unit test `test/hotspot/jtreg/compiler/inlining/TestDuplicatedLateInliningOutput.java` is added and passing. > > `compiler/print/PrintInlining.java` fails with `-ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:+TieredCompilation`: > > > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (workspace/open/src/hotspot/share/opto/compile.cpp:4601), pid=418042, tid=418058 > # assert(_print_inlining_stream->size() > 0) failed: missing inlining msg > # > # JRE version: Java(TM) SE Runtime Environment (23.0) (fastdebug build 23-internal-2024-01-10-0732483.tobias.hartmann.jdk2) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 23-internal-2024-01-10-0732483.tobias.hartmann.jdk2, compiled mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) > # Problematic frame: > # V [libjvm.so+0x9db4e6] Compile::print_inlining_update_delayed(CallGenerator*)+0x1c6 > > Current CompileTask: > C2:643 171 b 4 java.lang.String::substring (58 bytes) > > Stack: [0x00007f59706d4000,0x00007f59707d4000], sp=0x00007f59707cf220, free space=1004k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0x9db4e6] Compile::print_inlining_update_delayed(CallGenerator*)+0x1c6 (compile.cpp:4601) > V [libjvm.so+0x844d7e] CallGenerator::do_late_inline_helper()+0x8ee (callGenerator.cpp:687) > V [libjvm.so+0x9e1a52] Compile::inline_boxing_calls(PhaseIterGVN&)+0xc2 (compile.cpp:2026) > V [libjvm.so+0x9e42e3] Compile::Optimize()+0x583 (compile.cpp:2276) > V [libjvm.so+0x9e81a4] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1b44 (compile.cpp:860) > V [libjvm.so+0x83d245] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x1d5 (c2compiler.cpp:142) > V [libjvm.so+0x9f3bbc] CompileBroker::invoke_compiler_on_method(CompileTask*)+0x92c (compileBroker.cpp:2299) > V [libjvm.so+0x9f4848] CompileBroker::compiler_thread_loop()+0x468 (compileBroker.cpp:1958) > V [libjvm.so+0xeb98ec] JavaThread::thread_main_inner()+0xcc (javaThread.cpp:721) > V [libjvm.so+0x179b586] Thread::call_run()+0xb6 (thread.cpp:220) > V [libjvm.so+0x14a8d47] thread_native_entry(Thread*)+0x127 (os_linux.cpp:789) > > > `compiler/cha/StrengthReduceInterfaceCall.java` and `compiler/ciReplay/TestIncrementalInlining.java` fail as well. @TobiHartmann Thanks for the report. The tests were crashing in fastdebug config with or without those specific flags. The latest commit should fix the problem. Please take a look. Thank you very much. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17147#issuecomment-1885834913 From dlong at openjdk.org Wed Jan 10 23:05:21 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 10 Jan 2024 23:05:21 GMT Subject: RFR: 8322743: assert(held_monitor_count() == jni_monitor_count()) failed In-Reply-To: References: Message-ID: On Tue, 9 Jan 2024 22:57:27 GMT, Vladimir Kozlov wrote: > Corner case with a local (not escaped) object used for synchronization. C2 Escape Analysis thinks that it can eliminate locks for it. In most cases it is true but not in this case. > > > for (int i = 0; i < 2; ++i) { > Object o = new Object(); > synchronized (o) { // monitorenter > // Trigger OSR compilation > for (int j = 0; j < 100_000; ++j) { > > The test has nested loop which trigger OSR compilation. The locked object comes from Interpreter into compiled OSR code. During parsing C2 creates an other non escaped object and correctly merge both together (with Phi node) so that non escaped object is not scalar replaceable. Because it does not globally escapes EA still removes locks for it and, as result, also for merged locked object from Interpreter which is the bug. > > The fix is to check that synchronized block does not have any associated escaped objects when EA decides if locks can be eliminated. > > Added regression test prepared by @TobiHartmann. Tested tier1-5, xcomp and stress. > Performance testing show no difference. I was thinking that the OSR situation is similar to this: for (int i = 0; i < 2; ++i) { Object o = osr ? static_volatile_field /* black hole, can't eliminate */ : new Object() /* can eliminate */; synchronized (o) { // monitorenter // Trigger OSR compilation for (int j = 0; j < 100_000; ++j) { but maybe we can do better. If C2 can eliminate allocations/locks for non-escaping objects, and that works in one direction C2 --> interpreter (deopt), then the reverse direction, interpreter --> C2 (OSR) might also be made to work. In other words, I think we could eliminate the lock, even in the OSR case. We know from EA that the object coming from the interpreter does not escape, so if load_interpreter_state did the reverse of deopt, we would end up with a scalar-replaced object. Deopt does scalar-replaced object --> materialized, so OSR would need to do materialized --> scalar-replaced object. The fields of the scalar-replaced object would be populated from the fields of the interpreter object, but ignoring fields with a default (0) value. Assuming I'm right, and this could work, that doesn't mean it's worth doing. I'm just throwing this idea out mostly for completeness. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17331#issuecomment-1885884165 From dlong at openjdk.org Wed Jan 10 23:45:22 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 10 Jan 2024 23:45:22 GMT Subject: RFR: 8322743: assert(held_monitor_count() == jni_monitor_count()) failed In-Reply-To: References: Message-ID: On Tue, 9 Jan 2024 22:57:27 GMT, Vladimir Kozlov wrote: > Corner case with a local (not escaped) object used for synchronization. C2 Escape Analysis thinks that it can eliminate locks for it. In most cases it is true but not in this case. > > > for (int i = 0; i < 2; ++i) { > Object o = new Object(); > synchronized (o) { // monitorenter > // Trigger OSR compilation > for (int j = 0; j < 100_000; ++j) { > > The test has nested loop which trigger OSR compilation. The locked object comes from Interpreter into compiled OSR code. During parsing C2 creates an other non escaped object and correctly merge both together (with Phi node) so that non escaped object is not scalar replaceable. Because it does not globally escapes EA still removes locks for it and, as result, also for merged locked object from Interpreter which is the bug. > > The fix is to check that synchronized block does not have any associated escaped objects when EA decides if locks can be eliminated. > > Added regression test prepared by @TobiHartmann. Tested tier1-5, xcomp and stress. > Performance testing show no difference. Nevermind, object fields from the interpreter could have any value, so my idea doesn't work. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17331#issuecomment-1885931501 From kvn at openjdk.org Thu Jan 11 00:01:21 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 11 Jan 2024 00:01:21 GMT Subject: RFR: 8322743: assert(held_monitor_count() == jni_monitor_count()) failed In-Reply-To: References: Message-ID: On Tue, 9 Jan 2024 22:57:27 GMT, Vladimir Kozlov wrote: > Corner case with a local (not escaped) object used for synchronization. C2 Escape Analysis thinks that it can eliminate locks for it. In most cases it is true but not in this case. > > > for (int i = 0; i < 2; ++i) { > Object o = new Object(); > synchronized (o) { // monitorenter > // Trigger OSR compilation > for (int j = 0; j < 100_000; ++j) { > > The test has nested loop which trigger OSR compilation. The locked object comes from Interpreter into compiled OSR code. During parsing C2 creates an other non escaped object and correctly merge both together (with Phi node) so that non escaped object is not scalar replaceable. Because it does not globally escapes EA still removes locks for it and, as result, also for merged locked object from Interpreter which is the bug. > > The fix is to check that synchronized block does not have any associated escaped objects when EA decides if locks can be eliminated. > > Added regression test prepared by @TobiHartmann. Tested tier1-5, xcomp and stress. > Performance testing show no difference. "We know from EA that the object coming from the interpreter does not escape" - we don't know what happens in Interpreter to this object. There is no information where this object is coming from (no method and no bci info). We only know that we have monitor at slot 0 which uses this object. Yes, we can do bytecode analysis to determine that but it is a lot more code. There could be other, more complicated, ways to remove locks for this case. I was thinking about splitting `unlock(obj)` through Phi node to keep separate `unlock` for object coming from Interpreter. Unfortunately it is not enough. We need also to keep separate synchronization blocks defined by BoxLock node. Otherwise we still eliminate all locks/unlocks during locks elimination [macro.cpp#L1946](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/macro.cpp#L1946). Note, we can't eliminate only part of locks/unlocks associated with one synchronization block. Otherwise we can't guarantee that we have balanced locks and unlocks (we had bugs about it). So we either eliminate or keep all of them. I think my fix is conservative solution for this issue. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17331#issuecomment-1885946951 From vlivanov at openjdk.org Thu Jan 11 00:17:23 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 11 Jan 2024 00:17:23 GMT Subject: RFR: 8322743: assert(held_monitor_count() == jni_monitor_count()) failed In-Reply-To: References: Message-ID: On Wed, 10 Jan 2024 23:58:36 GMT, Vladimir Kozlov wrote: > I think my fix is conservative solution for this issue. It's still not clear to me why conservatively marking all objects coming from interpreter as globally escaped wouldn't work (what Dean initially proposed). My reading of your response is that it may be way too conservative: > But in general case it could be only dead path where such object is referenced. Is it your main concern? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17331#issuecomment-1885966563 From kvn at openjdk.org Thu Jan 11 00:38:21 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 11 Jan 2024 00:38:21 GMT Subject: RFR: 8322743: assert(held_monitor_count() == jni_monitor_count()) failed In-Reply-To: References: Message-ID: On Thu, 11 Jan 2024 00:14:38 GMT, Vladimir Ivanov wrote: > > I think my fix is conservative solution for this issue. > > It's still not clear to me why conservatively marking all objects coming from interpreter as globally escaped wouldn't work (what Dean initially proposed). It would work only for this OSR case. > > My reading of your response is that it may be way too conservative: > > > But in general case it could be only dead path where such object is referenced. > > Is it your main concern? First, I am concern that marking synchronization region as `has_escaped_object` during parsing when we load OSR state could be premature and later we can still eliminate locks if we don't do that. That was my comment about dead path. Second, marking during OSR load could be not enough. We may get an escaped locked object not only in such case. And **not** checking all objects in EA will miss it. Which may be not true and I am paranoid. I think my fix cover all cases. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17331#issuecomment-1885987490 From qamai at openjdk.org Thu Jan 11 03:23:31 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 11 Jan 2024 03:23:31 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v5] In-Reply-To: <6SOOtHaZrEN8UIKKLHBJniz_nUvZjRvV0TLofW7Xjxk=.debaf12a-b7c8-4b04-b217-3b8cd9b3c6f5@github.com> References: <6SOOtHaZrEN8UIKKLHBJniz_nUvZjRvV0TLofW7Xjxk=.debaf12a-b7c8-4b04-b217-3b8cd9b3c6f5@github.com> Message-ID: On Wed, 10 Jan 2024 18:08:39 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to `TypeInt` and `TypeLong`. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. The new constraints are applied to identity and value calls of the common nodes (Add, Sub, L/R/URShift, And, Or, Xor, bit counting, Cmp, Bool, ConvI2L/L2I), the detailed ideas for each node will be presented below. >> >> In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (~x & ones) == 0`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must normalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. >> >> Please kindly review, thanks very much. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 27 commits: > > - Merge branch 'master' into improvevalue > - Merge branch 'master' into improvevalue > - improve add/sub implementation > - Merge branch 'master' into improvevalue > - typo > - whitespace > - fix tests for x86_32 > - fix widen of ConvI2L > - problem lists > - format > - ... and 17 more: https://git.openjdk.org/jdk/compare/f0169341...843ad076 May someone give their opinion on this PR, please? Thanks a lot. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15440#issuecomment-1886159291 From thartmann at openjdk.org Thu Jan 11 06:52:23 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 11 Jan 2024 06:52:23 GMT Subject: RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp [v5] In-Reply-To: <-rzXk3WJ9aOgrgoyIVDpCeiQLBkq5N5yQCYp_42oMEo=.df74c3a7-508d-48a1-baa2-796a2015bea1@github.com> References: <-rzXk3WJ9aOgrgoyIVDpCeiQLBkq5N5yQCYp_42oMEo=.df74c3a7-508d-48a1-baa2-796a2015bea1@github.com> Message-ID: On Tue, 9 Jan 2024 23:46:45 GMT, Sandhya Viswanathan wrote: >> The C2_MacroAssembler::vminmax_fp and evminmax_fp were incorrectly checking for the two USE operand registers to be different and thus resulting in assertion failure. >> >> In x86_64.ad: >> >> instruct maxF_reg(legRegF dst, legRegF a, legRegF b, legRegF tmp, legRegF atmp, legRegF btmp) %{ >> ... >> effect(USE a, USE b, TEMP tmp, TEMP atmp, TEMP btmp); >> ... >> __ vminmax_fp(Op_MaxV, T_FLOAT, $dst$$XMMRegister, $a$$XMMRegister, $b$$XMMRegister, $tmp$$XMMRegister, $atmp$$XMMRegister, $btmp$$XMMRegister, Assembler::AVX_128bit); >> %} >> >> >> Changing the assert in vminmax_fp from: >> assert_different_registers(a, b, tmp, atmp, btmp); >> to: >> assert_different_registers(a, tmp, atmp, btmp); >> assert_different_registers(b, tmp, atmp, btmp); >> fixes the issue. >> >> Similar change done in evminmax_fp. >> >> Please review. >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > copyright year update Testing all passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17315#issuecomment-1886401306 From chagedorn at openjdk.org Thu Jan 11 07:50:23 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 11 Jan 2024 07:50:23 GMT Subject: RFR: JDK-8323190: Segfault during deoptimization of C2-compiled code [v3] In-Reply-To: References: Message-ID: <-BFrEysV5oZgA6gf67EwVlmFC5Lkpv7V-N5HvQ6B_sI=.b9e08a64-8b6f-418e-8810-f908cbeb68c5@github.com> On Wed, 10 Jan 2024 18:11:14 GMT, Cesar Soares Lucas wrote: >> src/hotspot/share/opto/output.cpp line 1092: >> >>> 1090: ObjectMergeValue* mv = (ObjectMergeValue*) sv_for_node_id(objs, smerge->_idx); >>> 1091: >>> 1092: if (mv == NULL) { >> >> You should replace `NULL` with `nullptr` here and below. This also seems wrong here where you took the code from: >> https://github.com/openjdk/jdk/blob/88378ed0584c7eb0849b6fc1e361fd8ea0698caf/src/hotspot/share/opto/output.cpp#L775-L796 >> >> On a separate note, the code looks almost identical. Could it be shared somehow? > > Thank you for reviewing @chhagedorn. I've converted the NULLs to nullptrs. However, I'll defer the refactoring of the identical code to a RFE - mainly because I'll have to backport the current patch and I'd like to keep it as minimal as possible. Please let me know if you disagree. That's perfectly fine, especially since the code for the `is_SafePointScalarObject()` case was already duplicated before. So, we could change both in one go in a follow-up RFE. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17333#discussion_r1448432678 From chagedorn at openjdk.org Thu Jan 11 07:54:24 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 11 Jan 2024 07:54:24 GMT Subject: RFR: JDK-8323190: Segfault during deoptimization of C2-compiled code [v3] In-Reply-To: References: Message-ID: <5kN_x85owu9QMT9jIpfo-URWZFpJjUuaL97fVQfo1Zk=.c8bc54ac-8c2a-4ad8-a09e-a87b6dc354d9@github.com> On Wed, 10 Jan 2024 18:20:42 GMT, Cesar Soares Lucas wrote: >> Currently, if `ReduceAllocationMerges` reduces an allocation merge that is used as a monitor C2 will SIGFAULT in `Process_OopMap_Node` because it's missing code to handle that case. This patch fixes C2 to properly handle reduced allocation merges that are used as monitors. >> >> Tested with Linux x86_64 hotspot_all. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Convert NULL to nullptr. Remove type cast. Update looks good, thanks. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17333#pullrequestreview-1814889165 From epeter at openjdk.org Thu Jan 11 08:25:36 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 11 Jan 2024 08:25:36 GMT Subject: RFR: 8322077: Add Ideal transformation: (~a) | (~b) => ~(a & b) [v13] In-Reply-To: References: Message-ID: <_b5ODSBa8YhAf5i7hafehvmw40MAdi4z5yF0EicXBUE=.bb562b91-9751-4dab-a487-0e9961b1f199@github.com> On Wed, 10 Jan 2024 16:57:47 GMT, Zhiqiang Zang wrote: >> Hello, >> >> (~a) | (~b) => ~(a & b) is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1617C28-L1617C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. >> >> Thanks. > > Zhiqiang Zang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 34 commits: > > - Merge branch 'master' of https://github.com/openjdk/jdk into ornode-PNewDeMorganLawOrToAnd > - update copyright dates. > - Merge branch 'andnode-PNewDeMorganLawAndToOr' into ornode-PNewDeMorganLawOrToAnd > - Revert "move the two helper functions to member functions of the node class." > > This reverts commit 7a962d69ac687c0476e54cd004037f2ebb0800ba. > - Revert "update copyright dates." > > This reverts commit 3665de2f3d38b4fce3ba968ec6247dd2576ecc32. > - Revert "move make_not back to AddNode because it cannot compile for architecture arm, s390x, ppc64le." > > This reverts commit 4ee8b089b6f119b1692dcd2c512d249d92734697. > - Revert "adapt changes from the dependent pr." > > This reverts commit 0c8d10776fe55a28ea9f60160b4f6f57aabad5ab. > - Revert "adapt to new changes from the dependant pr." > > This reverts commit b21e242b306d9cf01e1ba84ebba47ef5b471d98e. > - adapt to new changes from the dependant pr. > - Merge branch 'andnode-PNewDeMorganLawAndToOr' into ornode-PNewDeMorganLawOrToAnd > - ... and 24 more: https://git.openjdk.org/jdk/compare/b86c3b7a...f908668b Testing running for commit 34... ------------- PR Comment: https://git.openjdk.org/jdk/pull/16334#issuecomment-1886601525 From thartmann at openjdk.org Thu Jan 11 08:30:25 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 11 Jan 2024 08:30:25 GMT Subject: RFR: JDK-8323190: Segfault during deoptimization of C2-compiled code [v3] In-Reply-To: References: <6jsDdUuD6X9u8CT-4SUN_r2VUgPA4QuPCMLQyz-myaY=.52f1830e-0c87-44bf-931a-c272cabb74b4@github.com> Message-ID: On Wed, 10 Jan 2024 18:16:42 GMT, Cesar Soares Lucas wrote: > I think AllocationMergesTests.java isn't the ideal place for these tests. The tests in AllocationMergesTests.java are for checking the IR shape after the optimization, the current issue was actually because of a problem emitting debug info for a compilation unit - it's not something that we can capture with the IR-framework I believe Right, what I meant is that this issue shows that we don't have enough test coverage for all the cases optimized by [JDK-8287061](https://bugs.openjdk.org/browse/JDK-8287061). Or do we have an existing test for reduced allocation merges that are used as monitors? Ideally, we would have an IR framework test for the important cases that would then check both that the code is optimized as expected as well as that it's correct (no crash, correct result, ...). I'm fine with adding more tests with https://github.com/openjdk/jdk/pull/15825 though. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17333#issuecomment-1886607265 From thartmann at openjdk.org Thu Jan 11 08:30:24 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 11 Jan 2024 08:30:24 GMT Subject: RFR: JDK-8323190: Segfault during deoptimization of C2-compiled code [v3] In-Reply-To: References: Message-ID: <_NKT-XZ-IwknOUt5nqBmxAFXrI2cOSYjQcFmuERRc4I=.149c1d56-f82f-4469-b2ac-6bca87e98f8c@github.com> On Wed, 10 Jan 2024 18:20:42 GMT, Cesar Soares Lucas wrote: >> Currently, if `ReduceAllocationMerges` reduces an allocation merge that is used as a monitor C2 will SIGFAULT in `Process_OopMap_Node` because it's missing code to handle that case. This patch fixes C2 to properly handle reduced allocation merges that are used as monitors. >> >> Tested with Linux x86_64 hotspot_all. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Convert NULL to nullptr. Remove type cast. Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17333#pullrequestreview-1814951569 From rrich at openjdk.org Thu Jan 11 08:57:52 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 11 Jan 2024 08:57:52 GMT Subject: RFR: 8290965: PPC64: Implement post-call NOPs [v4] In-Reply-To: References: Message-ID: > #### Implementation of post call nops (PCNs) on ppc64. > > Depends on https://github.com/openjdk/jdk/pull/17150 > > About post call nops: > > - instruction(s) at return addresses of compiled java calls > - emitted iff vm continuations are enabled to support virtual threads > - encode data that can be be used to find the corresponding CodeBlob and oop map faster > - mt-safe patchable to trigger deoptimization > > Background: > > - Frames in continuation StackChunks are not visited if their compiled method is made not entrant (in contrast to frames on stack). > Instead all PCNs of the compiled method are patched to trigger deoptimization when control returns to such frames. > - With vm continuations, stacks are walked and inspected more frequently. This requires lookup of metadata like frame size and oop maps. As an optimization the offset of the CodeBlob to the PCN and the oop map slot are encoded as data in the PCN. > > Post call nops on ppc64 > > - 1 instruction, i.e. 4 bytes (either CMPI or CMPLI[1]) > x86_64: 1 instruction, 8 bytes > aarch64: 3 instruction, 12 bytes > [1] 3.1.10 Fixed Point Compare Instructions in Power ISA 3.1B > https://openpowerfoundation.org/specifications/isa/ > > - 26 bits data payload > x86_64: 32 bits; aarch64: 32 bits > - 9 bits dedicated to oop map slot. With 8 bits there where cases with SPECjvm2008 where the slot could not be encoded (on ppc64 and x86_64). > x86_64: 8 bits; aarch64: 8 bits > - 17 bits dedicated to cb offset. Effectively 19 bits due to instruction alignment. > x86_64: 24 bits; aarch64: 24 bits > - Also used when reconstructing the back chain after thawing continuation frames (see `Thaw::patch_caller_links`) > > - Refactored frame constructors to make use of fast CodeBlob lookup based on PCNs. > The fast lookup may only be used if the pc is known to be in the code cache because `CodeCache::find_blob_fast` can yield wrong results if it finds instructions outside the code cache that look just like PCNs. Callers of the frame class constructors need to pass `frame::kind::native` in that case to avoid errors. Other platforms don't make this explicit which is a problem in my eyes. Picking the wrong constructor can cause errors when porting and in future development. > > - Currently only the PCNs in nmethods are initialized. Therefore we don't even try to make a fast lookup based on PCNs if we know the CodeBlob is, e.g., a RuntimeStub. To achieve this we call the frame constructor passing `frame::kind::code_blob`. > > #### Statistics > > > | SpecJVM2008... Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: Review Martin ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17171/files - new: https://git.openjdk.org/jdk/pull/17171/files/5852ea38..05fa480f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17171&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17171&range=02-03 Stats: 16 lines in 9 files changed: 0 ins; 0 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/17171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17171/head:pull/17171 PR: https://git.openjdk.org/jdk/pull/17171 From roland at openjdk.org Thu Jan 11 09:03:42 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 11 Jan 2024 09:03:42 GMT Subject: RFR: 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 [v14] In-Reply-To: References: Message-ID: <0kbxDa-wMnnD6VR6S0O6DYDfoQ0BVeXTg1cx1CEheGI=.3e14d559-c3e4-4962-b3f2-2b45a5ce4771@github.com> On Tue, 9 Jan 2024 17:07:14 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > Thanks, still LGTM @eme64 @chhagedorn thanks for the reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/16886#issuecomment-1886660971 From roland at openjdk.org Thu Jan 11 09:03:45 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 11 Jan 2024 09:03:45 GMT Subject: Integrated: 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 In-Reply-To: References: Message-ID: On Wed, 29 Nov 2023 15:52:59 GMT, Roland Westrelin wrote: > Range check smearing and range check predication make an array access > dependent on 2 (or more in the case of RC smearing) conditions. As a > consequence, if a range check can be eliminated because there's an > identical dominating range check, the control dependent nodes that > could float and become dependent on the dominating range check cannot > be allowed to float because there's a risk that they would then bypass > one of the checks that make the access legal. > > `IfNode::dominated_by()` and `PhaseIdealLoop::dominated_by()` have > logic to prevent this: nodes that are control dependent on a range > check or predicate are not allowed to float. This is however not > sufficient as demonstrated by the test cases. > > In `TestArrayAccessAboveRCAfterSmearingOrPredication.testRangeCheckSmearing()`: > > > v += array[i]; > if (flag2) { > if (flag3) { > field = 0x42; > } > } > if (flagField == 1) { > v += array[i]; > } > > > The range check for the second `array[i]` load is replaced by the > dominating range check for the first `array[i]` but because the second > `array[i]` load could really be dependent on multiple range checks (in > case smearing happened which is not the case here), c2 doesn't allow > the second `array[i]` to float when the second range check is > removed. The second `array[i]` is then control dependent on: > > > if (flagField == 1) { > > > which is next found to be dominated by the same test: > > > if (flag == 1) { > > > and is removed. However nothing in `dominated_by()` treats node > dependent on tests that are not range check or predicates > specially. So the second `array[i]` is allowed to float and become > dependent on: > > > if (flag == 1) { > > > which is above the range check for that access. The test method in its > last invocation is passed an index for the array access that's widely > out of range. The array load happens before the range check and > crashes the VM. `testLoopPredication()` is a similar test where array > loads become dependent on predicates and end up above range checks. > > `TestArrayAccessCastIIAboveRC.java` is the test case from the bug > where for similar reasons a range check `CastII` ends up above its > range check, becomes top because its input becomes some integer that > conflicts with its type (but there's no condition to catch it). The > graph becomes broken and c2 crashes. > > Logic in the `dominated_by()` methods ... This pull request has now been integrated. Changeset: b922f8d4 Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/b922f8d45951250b7c39cb179b9bc1a8a6256a9e Stats: 400 lines in 14 files changed: 342 ins; 27 del; 31 mod 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 Reviewed-by: chagedorn, epeter ------------- PR: https://git.openjdk.org/jdk/pull/16886 From dlunden at openjdk.org Thu Jan 11 10:23:36 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 11 Jan 2024 10:23:36 GMT Subject: RFR: 8322996: BoxLockNode creation fails with assert(reg < CHUNK_SIZE) failed: sanity Message-ID: This changeset fixes an issue where deeply nested synchronized statements triggered an assert in C2. Changes: - Bail out on compilation when we create a `BoxLockNode` with a slot index that cannot fit in a `RegMask`. This is similar to how we handle the case when we do not have space to represent arguments in [`opto/matcher.cpp`](https://github.com/openjdk/jdk/blob/58b01dce054c50bcb5a28aad4c1b574acaa90f6d/src/hotspot/share/opto/matcher.cpp#L314-L318) - Add a regression test. Testing: - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/7446998688) - tier1, tier2, tier3, tier4, and tier5 on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64 ------------- Commit messages: - Fix issue and add test case Changes: https://git.openjdk.org/jdk/pull/17370/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17370&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8322996 Stats: 251 lines in 4 files changed: 248 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/17370.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17370/head:pull/17370 PR: https://git.openjdk.org/jdk/pull/17370 From kuaiwei.kw at alibaba-inc.com Thu Jan 11 11:58:26 2024 From: kuaiwei.kw at alibaba-inc.com (Kuai Wei) Date: Thu, 11 Jan 2024 19:58:26 +0800 Subject: =?UTF-8?B?UmU6IGRpc2N1c3MgYWJvdXQgcmVsZWFzZSBiYXJyaWVyIGZvciBmaW5hbCBmaWVsZHMgaW5p?= =?UTF-8?B?dGlhbGl6YXRpb24=?= In-Reply-To: References: <5e41f867-b31b-4ded-b737-8d0c869c8895.kuaiwei.kw@alibaba-inc.com> <20965b9a-ec64-4d81-866e-b1a0a94ed1e0@littlepinkcloud.com>, Message-ID: <822b0f93-ae9e-4196-8d20-1b87286b91d5.kuaiwei.kw@alibaba-inc.com> Hi Andrew and Dean, Thanks for reply. I checked the previous discussion and not clear about the root cause. If you can provide more detail about the optimize, like what load or load dependency will be elided, so we may check chance to detect or prevent. I list some cases I'm thinking 1) loaded value is used by final filed store, like x.final_field=x.other +1; it has data dependency, and can not be reordered by compiler 2) load from final field after final store x.final_field = xxx; t=x.final_field; The loaded value is always the final value. It's safe to elide below the barrier. 3) load from final field before final store t=x.final_field; x.final_field = xxx; The load could be elided with a non-final value, but it looks an expected behavior. Thanks, Kuai Wei From: Andrew Haley > Date: Wed, Jan 10, 2024 at 5:54?PM Subject: Re: discuss about release barrier for final fields initialization To: > On 1/9/24 06:23, Kuai Wei wrote: > I found only C2 will insert release membar and C1 just insert storestore for both final and normal allocation. In Doug Lea's cookbook https://gee.cs.oswego.edu/dl/jmm/cookbook.html > > Only storesotre is required. Alex has a great post on this topic https://shipilev.net/blog/2014/all-fields-are-final/ > . It referred a case why loadstore is needed. https://www.hboehm.info/c++mm/no_write_fences.html > > I checked this case and IMO it looks some legacy architecture may break data dependency and cause problem. As I know, alpha architecture is an example. I think it doesn't > break on modern architecture. Is there other case I missed? I think it requires a very careful analysis of the compiler to be sure. The problem occurs if an optimizer knows what a store is going to do. If it does, then there's nothing to prevent a load from being elided, and your load dependency has gone. This isn't a problem with C1, because C1 doesn't do that kind of optimization. I don't know that C2 does either, or even whether it is allowed to do so. From what I remember of the conversation, we left the release barrier in because of an abundance of caution rather than any proof that a storestore was inadequate. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. > https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 -------------- next part -------------- An HTML attachment was scrubbed... URL: From shade at openjdk.org Thu Jan 11 12:23:35 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 11 Jan 2024 12:23:35 GMT Subject: RFR: 8323584: AArch64: Unnecessary ResourceMark in NativeCall::set_destination_mt_safe Message-ID: Was looking at `ICBuffer` cleaning paths that run at safepoint. In `NativeCall::set_destination_mt_safe`, there is a `ResourceMark` that does not seem to have any purpose: no code in its scope uses resource allocation. There is adjacent dead code too. Looks like a development/debugging leftover. Additional testing: - [ ] Linux x86_64 AArch64 server fastdebug, `tier{1,2,3}` ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/17372/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17372&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8323584 Stats: 3 lines in 1 file changed: 0 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17372.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17372/head:pull/17372 PR: https://git.openjdk.org/jdk/pull/17372 From epeter at openjdk.org Thu Jan 11 12:37:32 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 11 Jan 2024 12:37:32 GMT Subject: RFR: 8323577: C2 SuperWord: remove AlignVector restrictions on IR tests added in JDK-8305055 Message-ID: These IR rules were restricted in [JDK-8305055](https://bugs.openjdk.org/browse/JDK-8305055), [PR for comparison](https://git.openjdk.org/jdk/pull/13236). This had to be done because those cases were no longer vectorized with `AlignVector` after my bugfix [JDK-8298935](https://bugs.openjdk.org/browse/JDK-8298935), they were "collateral damage". Before this bugfix, we would vectorize, even though the alignment constraints had rejected some memops, but then they were re-intriduced during pair extension. This re-introduction led to this bug for other reasons. I proposed to restore vectorization (i.e. fix the "collateral damage") by improving alignment-constraints ([JDK-8303827](https://bugs.openjdk.org/browse/JDK-8303827)). Since I had to already completely rework the alignment constraints because of a bug, I relaxed the constraints: [JDK-8310190](https://bugs.openjdk.org/browse/JDK-8310190) Now I can remove the restrictions on those rules. ------------- Commit messages: - 8323577 Changes: https://git.openjdk.org/jdk/pull/17369/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17369&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8323577 Stats: 13 lines in 2 files changed: 0 ins; 12 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17369.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17369/head:pull/17369 PR: https://git.openjdk.org/jdk/pull/17369 From rcastanedalo at openjdk.org Thu Jan 11 12:44:24 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 11 Jan 2024 12:44:24 GMT Subject: RFR: 8322996: BoxLockNode creation fails with assert(reg < CHUNK_SIZE) failed: sanity In-Reply-To: References: Message-ID: On Thu, 11 Jan 2024 10:19:12 GMT, Daniel Lund?n wrote: > This changeset fixes an issue where deeply nested synchronized statements triggered an assert in C2. > > Changes: > - Bail out on compilation when we create a `BoxLockNode` with a slot index that cannot fit in a `RegMask`. This is similar to how we handle the case when we do not have space to represent arguments in [`opto/matcher.cpp`](https://github.com/openjdk/jdk/blob/58b01dce054c50bcb5a28aad4c1b574acaa90f6d/src/hotspot/share/opto/matcher.cpp#L314-L318) > - Add a regression test. > > Testing: > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/7446998688) > - tier1, tier2, tier3, tier4, and tier5 on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64 Changes requested by rcastanedalo (Reviewer). src/hotspot/share/opto/locknode.cpp line 47: > 45: init_flags(Flag_rematerialize); > 46: OptoReg::Name reg = OptoReg::stack2reg(_slot); > 47: if (!RegMask::can_represent_arg(reg)) { I am not very familiar with this code, but would it be possible to use `!RegMask::can_represent(reg)` instead of `!RegMask::can_represent_arg(reg)` here? Or is it necessary to use the latter (which is stricter) for correctness? test/hotspot/jtreg/compiler/c2/TestNestedSynchronize.java line 27: > 25: * @test > 26: * @bug 8322996 > 27: * @requires vm.debug I suggest removing this line for better test coverage, the test does not really require debug mode. test/hotspot/jtreg/compiler/c2/TestNestedSynchronize.java line 32: > 30: * > 31: * @run main/othervm -XX:CompileCommand=compileonly,compiler.c2.TestNestedSynchronize::test > 32: * -XX:-TieredCompilation -Xcomp No need to use `-XX:-TieredCompilation` here (already in `-Xcomp` mode). test/hotspot/jtreg/compiler/c2/TestNestedSynchronize.java line 36: > 34: */ > 35: > 36: package compiler.c2; The test case might fit better under `test/hotspot/jtreg/compiler/locks`. ------------- PR Review: https://git.openjdk.org/jdk/pull/17370#pullrequestreview-1815464703 PR Review Comment: https://git.openjdk.org/jdk/pull/17370#discussion_r1448790888 PR Review Comment: https://git.openjdk.org/jdk/pull/17370#discussion_r1448797933 PR Review Comment: https://git.openjdk.org/jdk/pull/17370#discussion_r1448798283 PR Review Comment: https://git.openjdk.org/jdk/pull/17370#discussion_r1448801274 From tholenstein at openjdk.org Thu Jan 11 12:45:21 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 11 Jan 2024 12:45:21 GMT Subject: RFR: 8323584: AArch64: Unnecessary ResourceMark in NativeCall::set_destination_mt_safe In-Reply-To: References: Message-ID: <3Asv6oRvd6Ht-E7p-9l6zqfmNDsQgdTOlVVHM9sdqOo=.306bcd45-1b00-44ca-97d6-19976c6bd2f9@github.com> On Thu, 11 Jan 2024 12:17:21 GMT, Aleksey Shipilev wrote: > Was looking at `ICBuffer` cleaning paths that run at safepoint. In `NativeCall::set_destination_mt_safe`, there is a `ResourceMark` that does not seem to have any purpose: no code in its scope uses resource allocation. There is adjacent dead code too. Looks like a development/debugging leftover. > > Additional testing: > - [ ] Linux x86_64 AArch64 server fastdebug, `tier{1,2,3}` Thanks for removing! Looks good to me ------------- Marked as reviewed by tholenstein (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17372#pullrequestreview-1815483070 From rcastanedalo at openjdk.org Thu Jan 11 13:06:25 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 11 Jan 2024 13:06:25 GMT Subject: RFR: 8323577: C2 SuperWord: remove AlignVector restrictions on IR tests added in JDK-8305055 In-Reply-To: References: Message-ID: On Thu, 11 Jan 2024 10:16:14 GMT, Emanuel Peter wrote: > These IR rules were restricted in [JDK-8305055](https://bugs.openjdk.org/browse/JDK-8305055), [PR for comparison](https://git.openjdk.org/jdk/pull/13236). > > This had to be done because those cases were no longer vectorized with `AlignVector` after my bugfix [JDK-8298935](https://bugs.openjdk.org/browse/JDK-8298935), they were "collateral damage". Before this bugfix, we would vectorize, even though the alignment constraints had rejected some memops, but then they were re-intriduced during pair extension. This re-introduction led to this bug for other reasons. I proposed to restore vectorization (i.e. fix the "collateral damage") by improving alignment-constraints ([JDK-8303827](https://bugs.openjdk.org/browse/JDK-8303827)). > > Since I had to already completely rework the alignment constraints because of a bug, I relaxed the constraints: > [JDK-8310190](https://bugs.openjdk.org/browse/JDK-8310190) > > Now I can remove the restrictions on those rules. Looks good. ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17369#pullrequestreview-1815535313 From chagedorn at openjdk.org Thu Jan 11 13:17:25 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 11 Jan 2024 13:17:25 GMT Subject: RFR: 8323577: C2 SuperWord: remove AlignVector restrictions on IR tests added in JDK-8305055 In-Reply-To: References: Message-ID: On Thu, 11 Jan 2024 10:16:14 GMT, Emanuel Peter wrote: > These IR rules were restricted in [JDK-8305055](https://bugs.openjdk.org/browse/JDK-8305055), [PR for comparison](https://git.openjdk.org/jdk/pull/13236). > > This had to be done because those cases were no longer vectorized with `AlignVector` after my bugfix [JDK-8298935](https://bugs.openjdk.org/browse/JDK-8298935), they were "collateral damage". Before this bugfix, we would vectorize, even though the alignment constraints had rejected some memops, but then they were re-intriduced during pair extension. This re-introduction led to this bug for other reasons. I proposed to restore vectorization (i.e. fix the "collateral damage") by improving alignment-constraints ([JDK-8303827](https://bugs.openjdk.org/browse/JDK-8303827)). > > Since I had to already completely rework the alignment constraints because of a bug, I relaxed the constraints: > [JDK-8310190](https://bugs.openjdk.org/browse/JDK-8310190) > > Now I can remove the restrictions on those rules. Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17369#pullrequestreview-1815556911 From mdoerr at openjdk.org Thu Jan 11 13:21:28 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 11 Jan 2024 13:21:28 GMT Subject: RFR: 8290965: PPC64: Implement post-call NOPs [v4] In-Reply-To: References: Message-ID: On Thu, 11 Jan 2024 08:57:52 GMT, Richard Reingruber wrote: >> #### Implementation of post call nops (PCNs) on ppc64. >> >> Depends on https://github.com/openjdk/jdk/pull/17150 >> >> About post call nops: >> >> - instruction(s) at return addresses of compiled java calls >> - emitted iff vm continuations are enabled to support virtual threads >> - encode data that can be be used to find the corresponding CodeBlob and oop map faster >> - mt-safe patchable to trigger deoptimization >> >> Background: >> >> - Frames in continuation StackChunks are not visited if their compiled method is made not entrant (in contrast to frames on stack). >> Instead all PCNs of the compiled method are patched to trigger deoptimization when control returns to such frames. >> - With vm continuations, stacks are walked and inspected more frequently. This requires lookup of metadata like frame size and oop maps. As an optimization the offset of the CodeBlob to the PCN and the oop map slot are encoded as data in the PCN. >> >> Post call nops on ppc64 >> >> - 1 instruction, i.e. 4 bytes (either CMPI or CMPLI[1]) >> x86_64: 1 instruction, 8 bytes >> aarch64: 3 instruction, 12 bytes >> [1] 3.1.10 Fixed Point Compare Instructions in Power ISA 3.1B >> https://openpowerfoundation.org/specifications/isa/ >> >> - 26 bits data payload >> x86_64: 32 bits; aarch64: 32 bits >> - 9 bits dedicated to oop map slot. With 8 bits there where cases with SPECjvm2008 where the slot could not be encoded (on ppc64 and x86_64). >> x86_64: 8 bits; aarch64: 8 bits >> - 17 bits dedicated to cb offset. Effectively 19 bits due to instruction alignment. >> x86_64: 24 bits; aarch64: 24 bits >> - Also used when reconstructing the back chain after thawing continuation frames (see `Thaw::patch_caller_links`) >> >> - Refactored frame constructors to make use of fast CodeBlob lookup based on PCNs. >> The fast lookup may only be used if the pc is known to be in the code cache because `CodeCache::find_blob_fast` can yield wrong results if it finds instructions outside the code cache that look just like PCNs. Callers of the frame class constructors need to pass `frame::kind::native` in that case to avoid errors. Other platforms don't make this explicit which is a problem in my eyes. Picking the wrong constructor can cause errors when porting and in future development. >> >> - Currently only the PCNs in nmethods are initialized. Therefore we don't even try to make a fast lookup based on PCNs if we know the CodeBlob is, e.g., a RuntimeStub. To achieve this we call the frame cons... > > Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: > > Review Martin Thanks for the updates! The constructors should still be used with care, but I think your code is at least as good as other platforms (rather better IMHO). ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17171#pullrequestreview-1815566892 From dlunden at openjdk.org Thu Jan 11 13:51:25 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 11 Jan 2024 13:51:25 GMT Subject: RFR: 8322996: BoxLockNode creation fails with assert(reg < CHUNK_SIZE) failed: sanity In-Reply-To: References: Message-ID: <_hY0TrBgPqf3CtprnvKjNi3158j2U-49RGnP44f3p1c=.be4e2ce8-1163-4820-82c9-0a0ff2900dfa@github.com> On Thu, 11 Jan 2024 12:34:28 GMT, Roberto Casta?eda Lozano wrote: >> This changeset fixes an issue where deeply nested synchronized statements triggered an assert in C2. >> >> Changes: >> - Bail out on compilation when we create a `BoxLockNode` with a slot index that cannot fit in a `RegMask`. This is similar to how we handle the case when we do not have space to represent arguments in [`opto/matcher.cpp`](https://github.com/openjdk/jdk/blob/58b01dce054c50bcb5a28aad4c1b574acaa90f6d/src/hotspot/share/opto/matcher.cpp#L314-L318) >> - Add a regression test. >> >> Testing: >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/7446998688) >> - tier1, tier2, tier3, tier4, and tier5 on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64 > > src/hotspot/share/opto/locknode.cpp line 47: > >> 45: init_flags(Flag_rematerialize); >> 46: OptoReg::Name reg = OptoReg::stack2reg(_slot); >> 47: if (!RegMask::can_represent_arg(reg)) { > > I am not very familiar with this code, but would it be possible to use `!RegMask::can_represent(reg)` instead of `!RegMask::can_represent_arg(reg)` here? Or is it necessary to use the latter (which is stricter) for correctness? That is a fair question, and I'm not sure what is the preferred solution. The number of stack slots for a monitor seems to be determined by [`sync_stack_slots`](https://github.com/dlunde/jdk/blob/06d6b4be9750a326f87acf04a3dc717e307d14d5/src/hotspot/share/opto/compile.hpp#L1166-L1167). If I'm not mistaken the value of `sync_stack_slots()` varies between platforms (on my machine it is `int Compile::sync_stack_slots() const { return 2; }`). Therefore, I don't think `can_represent` always works. However, we should maybe have a new function `can_represent_sync_entry` (or similar) instead of `can_represent_arg`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17370#discussion_r1448893468 From tholenstein at openjdk.org Thu Jan 11 16:13:33 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 11 Jan 2024 16:13:33 GMT Subject: RFR: JDK-8316756: C2 EA fails with "missing memory path" when encountering unsafe_arraycopy stub call Message-ID: Before https://github.com/openjdk/jdk/pull/5259 the graph of the following program looked like this during Escape Analysis: static int test() { MyClass obj = new MyClass(); // Non-escaping to trigger Escape Analysis UNSAFE.copyMemory(null, SRC_BASE, null, DST_BASE, 4); obj.x = 42; return obj.x; } With MemBarCPUOrder: working Setting `RC_NARROW_MEM` flag in `LibraryCallKit::inline_unsafe_copyMemory()` removes the `428 MergeMem` node. Without a `MergeMem` node after `432 StoreB` EA failes in Phase 2 of `ConnectionGraph::split_unique_types(...)` when trying to push the allocation's users on the appropriate worklist - `429 CallLeafNoFP` is not an expected user of `428 StoreB`. Therefore the assert `"EA: missing memory path"` is hit. Without MemBarCPUOrdera and after setting `RC_NARROW_MEM`: failing ### Proposed Fix Dropping the `RC_NARROW_MEM` flag in `LibraryCallKit::inline_unsafe_copyMemory()` causes the introduction of a `MergeMem` between `StoreB` and `CallLeafNoFP`, so the corresponding code in EA doesn't encounter a `CallLeafNoFP` anymore: fixed Testing: Tier1-4 passed ------------- Commit messages: - JDK-8316756: C2 EA fails with "missing memory path" when encountering unsafe_arraycopy stub call Changes: https://git.openjdk.org/jdk/pull/17347/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17347&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8316756 Stats: 106 lines in 2 files changed: 67 ins; 37 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17347.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17347/head:pull/17347 PR: https://git.openjdk.org/jdk/pull/17347 From dlunden at openjdk.org Thu Jan 11 16:37:50 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 11 Jan 2024 16:37:50 GMT Subject: RFR: 8322996: BoxLockNode creation fails with assert(reg < CHUNK_SIZE) failed: sanity [v2] In-Reply-To: References: Message-ID: > This changeset fixes an issue where deeply nested synchronized statements triggered an assert in C2. > > Changes: > - Bail out on compilation when we create a `BoxLockNode` with a slot index that cannot fit in a `RegMask`. This is similar to how we handle the case when we do not have space to represent arguments in [`opto/matcher.cpp`](https://github.com/openjdk/jdk/blob/58b01dce054c50bcb5a28aad4c1b574acaa90f6d/src/hotspot/share/opto/matcher.cpp#L314-L318) > - Add a regression test. > > Testing: > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/7446998688) > - tier1, tier2, tier3, tier4, and tier5 on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64 Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: Fixes after comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17370/files - new: https://git.openjdk.org/jdk/pull/17370/files/06d6b4be..735543d0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17370&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17370&range=00-01 Stats: 10 lines in 3 files changed: 7 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17370.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17370/head:pull/17370 PR: https://git.openjdk.org/jdk/pull/17370 From dlunden at openjdk.org Thu Jan 11 16:50:52 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 11 Jan 2024 16:50:52 GMT Subject: RFR: 8322996: BoxLockNode creation fails with assert(reg < CHUNK_SIZE) failed: sanity [v3] In-Reply-To: References: Message-ID: <66zey4Kt56pbcieRZ0AjBLemzwsAQcHCdDG1nlK-P1c=.14dc02d6-bef8-4239-9f39-02456756b7ea@github.com> > This changeset fixes an issue where deeply nested synchronized statements triggered an assert in C2. > > Changes: > - Bail out on compilation when we create a `BoxLockNode` with a slot index that cannot fit in a `RegMask`. This is similar to how we handle the case when we do not have space to represent arguments in [`opto/matcher.cpp`](https://github.com/openjdk/jdk/blob/58b01dce054c50bcb5a28aad4c1b574acaa90f6d/src/hotspot/share/opto/matcher.cpp#L314-L318) > - Add a regression test. > > Testing: > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/7446998688) > - tier1, tier2, tier3, tier4, and tier5 on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64 Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: Remove superfluous -TieredCompilation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17370/files - new: https://git.openjdk.org/jdk/pull/17370/files/735543d0..9ab6e561 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17370&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17370&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17370.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17370/head:pull/17370 PR: https://git.openjdk.org/jdk/pull/17370 From sviswanathan at openjdk.org Thu Jan 11 16:57:34 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 11 Jan 2024 16:57:34 GMT Subject: Integrated: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp In-Reply-To: References: Message-ID: On Tue, 9 Jan 2024 00:01:04 GMT, Sandhya Viswanathan wrote: > The C2_MacroAssembler::vminmax_fp and evminmax_fp were incorrectly checking for the two USE operand registers to be different and thus resulting in assertion failure. > > In x86_64.ad: > > instruct maxF_reg(legRegF dst, legRegF a, legRegF b, legRegF tmp, legRegF atmp, legRegF btmp) %{ > ... > effect(USE a, USE b, TEMP tmp, TEMP atmp, TEMP btmp); > ... > __ vminmax_fp(Op_MaxV, T_FLOAT, $dst$$XMMRegister, $a$$XMMRegister, $b$$XMMRegister, $tmp$$XMMRegister, $atmp$$XMMRegister, $btmp$$XMMRegister, Assembler::AVX_128bit); > %} > > > Changing the assert in vminmax_fp from: > assert_different_registers(a, b, tmp, atmp, btmp); > to: > assert_different_registers(a, tmp, atmp, btmp); > assert_different_registers(b, tmp, atmp, btmp); > fixes the issue. > > Similar change done in evminmax_fp. > > Please review. > > Best Regards, > Sandhya This pull request has now been integrated. Changeset: e10d1400 Author: Sandhya Viswanathan URL: https://git.openjdk.org/jdk/commit/e10d14004fa25998231ab1d2611b75aea9b5c67d Stats: 16 lines in 2 files changed: 14 ins; 0 del; 2 mod 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp Co-authored-by: Volodymyr Paprotski Reviewed-by: kvn, thartmann, epeter, jbhateja ------------- PR: https://git.openjdk.org/jdk/pull/17315 From duke at openjdk.org Thu Jan 11 17:47:45 2024 From: duke at openjdk.org (Joshua Cao) Date: Thu, 11 Jan 2024 17:47:45 GMT Subject: RFR: 8323220: Reassociate loop invariants involved in Cmps and Add/Subs Message-ID: // inv1 == (x + inv2) => ( inv1 - inv2 ) == x // inv1 == (x - inv2) => ( inv1 + inv2 ) == x // inv1 == (inv2 - x) => (-inv1 + inv2 ) == x For example, fn(inv1, inv2) while(...) x = foobar() if inv1 == x + inv2 blackhole() We can transform this into fn(inv1, inv2) t = inv1 - inv2 while(...) x = foobar() if t == x blackhole() I have two examples in JDK source code 1. https://github.com/openjdk/jdk/blob/b78896b9aafcb15f453eaed6e154a5461581407b/src/java.base/share/classes/java/lang/invoke/LambdaFormEditor.java#L910. LHS `1` and RHS `pos` are both loop invariant 2. https://github.com/openjdk/jdk/blob/b78896b9aafcb15f453eaed6e154a5461581407b/src/jdk.zipfs/share/classes/jdk/nio/zipfs/ZipFileSystem.java#L1606. In separate transformation, the `>` is transformed into `!=` (not sure why TBH), and both sides have invariants Passes tier1 locally on Linux machine. Passes GHA on my fork. ------------- Commit messages: - 8323220: Reassociate loop invariants involved in Cmps and Add/Subs Changes: https://git.openjdk.org/jdk/pull/17375/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17375&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8323220 Stats: 270 lines in 3 files changed: 258 ins; 0 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/17375.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17375/head:pull/17375 PR: https://git.openjdk.org/jdk/pull/17375 From kvn at openjdk.org Thu Jan 11 18:02:22 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 11 Jan 2024 18:02:22 GMT Subject: RFR: JDK-8316756: C2 EA fails with "missing memory path" when encountering unsafe_arraycopy stub call In-Reply-To: References: Message-ID: On Wed, 10 Jan 2024 13:35:17 GMT, Tobias Holenstein wrote: > Before https://github.com/openjdk/jdk/pull/5259 the graph of the following program looked like this during Escape Analysis: > > > static int test() { > MyClass obj = new MyClass(); // Non-escaping to trigger Escape Analysis > UNSAFE.copyMemory(null, SRC_BASE, null, DST_BASE, 4); > obj.x = 42; > return obj.x; > } > > With MemBarCPUOrder: > working > > Setting `RC_NARROW_MEM` flag in `LibraryCallKit::inline_unsafe_copyMemory()` removes the `428 MergeMem` node. Without a `MergeMem` node after `432 StoreB` EA failes in Phase 2 of `ConnectionGraph::split_unique_types(...)` when trying to push the allocation's users on the appropriate worklist - `429 CallLeafNoFP` is not an expected user of `428 StoreB`. Therefore the assert `"EA: missing memory path"` is hit. > Without MemBarCPUOrdera and after setting `RC_NARROW_MEM`: > failing > > > ### Proposed Fix > Dropping the `RC_NARROW_MEM` flag in `LibraryCallKit::inline_unsafe_copyMemory()` causes the introduction of a `MergeMem` between `StoreB` and `CallLeafNoFP`, so the corresponding code in EA doesn't encounter a `CallLeafNoFP` anymore: > fixed > > Testing: Tier1-4 passed This should be reviewed by @iwanowww. I have too many question about this. Based on code and test `UNSAFE.copyMemory()` copies "native" memory which should not affect anything. [#5259](https://github.com/openjdk/jdk/pull/5259) sets RC_NARROW_MEM exactly for that as I understand. Flag setting (StoreB nodes) in `JavaThread::_doing_unsafe_access` is also not affected but it is volatile field and these stores should be staying where they are. They can't go up or down. And it should be accomplished by some kind of barriers between `StoreB` and `unsafe_Arraycopy` call. But the call's memory edge should not point to `StoreB` - it is incorrect since it does not affect that field in this case. Call's memory should point to root memory in this case. Operating on fields of new `MyClass` object could be moved around and object can be eliminated since it does not escape. ------------- PR Review: https://git.openjdk.org/jdk/pull/17347#pullrequestreview-1816177520 From sviswanathan at openjdk.org Thu Jan 11 18:12:32 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 11 Jan 2024 18:12:32 GMT Subject: [jdk22] RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp Message-ID: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp ------------- Commit messages: - Backport e10d14004fa25998231ab1d2611b75aea9b5c67d Changes: https://git.openjdk.org/jdk22/pull/62/files Webrev: https://webrevs.openjdk.org/?repo=jdk22&pr=62&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8321712 Stats: 16 lines in 2 files changed: 14 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk22/pull/62.diff Fetch: git fetch https://git.openjdk.org/jdk22.git pull/62/head:pull/62 PR: https://git.openjdk.org/jdk22/pull/62 From kvn at openjdk.org Thu Jan 11 18:18:23 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 11 Jan 2024 18:18:23 GMT Subject: RFR: JDK-8316756: C2 EA fails with "missing memory path" when encountering unsafe_arraycopy stub call In-Reply-To: References: Message-ID: On Wed, 10 Jan 2024 13:35:17 GMT, Tobias Holenstein wrote: > Before https://github.com/openjdk/jdk/pull/5259 the graph of the following program looked like this during Escape Analysis: > > > static int test() { > MyClass obj = new MyClass(); // Non-escaping to trigger Escape Analysis > UNSAFE.copyMemory(null, SRC_BASE, null, DST_BASE, 4); > obj.x = 42; > return obj.x; > } > > With MemBarCPUOrder: > working > > Setting `RC_NARROW_MEM` flag in `LibraryCallKit::inline_unsafe_copyMemory()` removes the `428 MergeMem` node. Without a `MergeMem` node after `432 StoreB` EA failes in Phase 2 of `ConnectionGraph::split_unique_types(...)` when trying to push the allocation's users on the appropriate worklist - `429 CallLeafNoFP` is not an expected user of `428 StoreB`. Therefore the assert `"EA: missing memory path"` is hit. > Without MemBarCPUOrdera and after setting `RC_NARROW_MEM`: > failing > > > ### Proposed Fix > Dropping the `RC_NARROW_MEM` flag in `LibraryCallKit::inline_unsafe_copyMemory()` causes the introduction of a `MergeMem` between `StoreB` and `CallLeafNoFP`, so the corresponding code in EA doesn't encounter a `CallLeafNoFP` anymore: > fixed > > Testing: Tier1-4 passed `JavaThread::_doing_unsafe_access` field is checked by runtime when we SEGV to find that it happens in `unsafe_arraycopy` code. Again, `unsafe_arraycopy` does not affect this field. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17347#issuecomment-1887710723 From kvn at openjdk.org Thu Jan 11 18:22:22 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 11 Jan 2024 18:22:22 GMT Subject: RFR: JDK-8316756: C2 EA fails with "missing memory path" when encountering unsafe_arraycopy stub call In-Reply-To: References: Message-ID: On Wed, 10 Jan 2024 13:35:17 GMT, Tobias Holenstein wrote: > Before https://github.com/openjdk/jdk/pull/5259 the graph of the following program looked like this during Escape Analysis: > > > static int test() { > MyClass obj = new MyClass(); // Non-escaping to trigger Escape Analysis > UNSAFE.copyMemory(null, SRC_BASE, null, DST_BASE, 4); > obj.x = 42; > return obj.x; > } > > With MemBarCPUOrder: > working > > Setting `RC_NARROW_MEM` flag in `LibraryCallKit::inline_unsafe_copyMemory()` removes the `428 MergeMem` node. Without a `MergeMem` node after `432 StoreB` EA failes in Phase 2 of `ConnectionGraph::split_unique_types(...)` when trying to push the allocation's users on the appropriate worklist - `429 CallLeafNoFP` is not an expected user of `428 StoreB`. Therefore the assert `"EA: missing memory path"` is hit. > Without MemBarCPUOrdera and after setting `RC_NARROW_MEM`: > failing > > > ### Proposed Fix > Dropping the `RC_NARROW_MEM` flag in `LibraryCallKit::inline_unsafe_copyMemory()` causes the introduction of a `MergeMem` between `StoreB` and `CallLeafNoFP`, so the corresponding code in EA doesn't encounter a `CallLeafNoFP` anymore: > fixed > > Testing: Tier1-4 passed The result of `unsafe_arraycopy` should not affect memory too. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17347#issuecomment-1887716294 From duke at openjdk.org Thu Jan 11 19:17:24 2024 From: duke at openjdk.org (Joshua Cao) Date: Thu, 11 Jan 2024 19:17:24 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v2] In-Reply-To: <_0ZJL7u55Fcg1yID2yjH4DHPkrgKTKeekpYtWG1YsAI=.e9caec05-a88d-4123-832d-6699a1990e49@github.com> References: <42h7t16pyeYV2jszIztjGu0JE2ZZWnnJCiyRd2s2oLg=.fffb35a5-e208-442c-9157-ec5d3fcaa31d@github.com> <_0ZJL7u55Fcg1yID2yjH4DHPkrgKTKeekpYtWG1YsAI=.e9caec05-a88d-4123-832d-6699a1990e49@github.com> Message-ID: On Mon, 8 Jan 2024 14:55:19 GMT, Roland Westrelin wrote: > When `InlineTree::ok_to_inline()` is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the `InlineTree::ok_to_inline()` has some useful information that's lost when late inlining happens? Yeah I think you're right. It should not matter for the string/methodhandle/vector/boxing late inlines. But we can lose information for generic late inlines, for example a hot method that could not get inlined earlier due to lack of budget. I'll look into it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16595#issuecomment-1887800021 From kvn at openjdk.org Thu Jan 11 19:29:27 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 11 Jan 2024 19:29:27 GMT Subject: [jdk22] RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp In-Reply-To: References: Message-ID: On Thu, 11 Jan 2024 18:07:43 GMT, Sandhya Viswanathan wrote: > Clean backport of JDK-8321712 to JDK 22. Original [PR](https://github.com/openjdk/jdk/pull/17315) was reviewed by Vladimir Kozlov, Tobias Hartmann, Emanuel Peter, and Jatin Bhateja. Please review and approve. > > Best Regards, > Sandhya Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk22/pull/62#pullrequestreview-1816417137 From duke at openjdk.org Thu Jan 11 20:10:59 2024 From: duke at openjdk.org (Joshua Cao) Date: Thu, 11 Jan 2024 20:10:59 GMT Subject: RFR: 8323220: Reassociate loop invariants involved in Cmps and Add/Subs In-Reply-To: References: Message-ID: On Thu, 11 Jan 2024 17:41:53 GMT, Joshua Cao wrote: > // inv1 == (x + inv2) => ( inv1 - inv2 ) == x > // inv1 == (x - inv2) => ( inv1 + inv2 ) == x > // inv1 == (inv2 - x) => (-inv1 + inv2 ) == x > > > For example, > > > fn(inv1, inv2) > while(...) > x = foobar() > if inv1 == x + inv2 > blackhole() > > > We can transform this into > > > fn(inv1, inv2) > t = inv1 - inv2 > while(...) > x = foobar() > if t == x > blackhole() > > > Here is an example: https://github.com/openjdk/jdk/blob/b78896b9aafcb15f453eaed6e154a5461581407b/src/java.base/share/classes/java/lang/invoke/LambdaFormEditor.java#L910. LHS `1` and RHS `pos` are both loop invariant > > Passes tier1 locally on Linux machine. Passes GHA on my fork. test/hotspot/jtreg/compiler/loopopts/InvariantCodeMotionReassociateCmp.java line 191: > 189: @Arguments({Argument.NUMBER_42, Argument.NUMBER_42}) > 190: @IR(failOn = {IRNode.SUB_I}) > 191: public void leDontReassociate(int inv1, int inv2) { I added DontReassociate tests for `le`, `gt`, and `ge`. For `lt`, C2 generates a second `SUB_I` as part of other transformations. IR matching for ADD/SUB is pretty hard in general. They commonly are created as part of other transformations. Any suggestions on how I can test this better is appreciated. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17375#discussion_r1449334691 From kvn at openjdk.org Thu Jan 11 20:11:00 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 11 Jan 2024 20:11:00 GMT Subject: RFR: 8322996: BoxLockNode creation fails with assert(reg < CHUNK_SIZE) failed: sanity [v3] In-Reply-To: <66zey4Kt56pbcieRZ0AjBLemzwsAQcHCdDG1nlK-P1c=.14dc02d6-bef8-4239-9f39-02456756b7ea@github.com> References: <66zey4Kt56pbcieRZ0AjBLemzwsAQcHCdDG1nlK-P1c=.14dc02d6-bef8-4239-9f39-02456756b7ea@github.com> Message-ID: On Thu, 11 Jan 2024 16:50:52 GMT, Daniel Lund?n wrote: >> This changeset fixes an issue where deeply nested synchronized statements triggered an assert in C2. >> >> Changes: >> - Bail out on compilation when we create a `BoxLockNode` with a slot index that cannot fit in a `RegMask`. This is similar to how we handle the case when we do not have space to represent arguments in [`opto/matcher.cpp`](https://github.com/openjdk/jdk/blob/58b01dce054c50bcb5a28aad4c1b574acaa90f6d/src/hotspot/share/opto/matcher.cpp#L314-L318) >> - Add a regression test. >> >> Testing: >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/7446998688) >> - tier1, tier2, tier3, tier4, and tier5 on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64 > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Remove superfluous -TieredCompilation src/hotspot/share/opto/locknode.hpp line 66: > 64: return (int)reg < (int)(RegMask::CHUNK_SIZE - 1 - Compile::current()->sync_stack_slots()); > 65: } > 66: I think it should be in `regmask.hpp` together with other `can_represent_*` methods. Then you don't need part of the comment about those methods. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17370#discussion_r1449340301 From vlivanov at openjdk.org Thu Jan 11 22:51:18 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 11 Jan 2024 22:51:18 GMT Subject: RFR: JDK-8316756: C2 EA fails with "missing memory path" when encountering unsafe_arraycopy stub call In-Reply-To: References: Message-ID: On Wed, 10 Jan 2024 13:35:17 GMT, Tobias Holenstein wrote: > Before https://github.com/openjdk/jdk/pull/5259 the graph of the following program looked like this during Escape Analysis: > > > static int test() { > MyClass obj = new MyClass(); // Non-escaping to trigger Escape Analysis > UNSAFE.copyMemory(null, SRC_BASE, null, DST_BASE, 4); > obj.x = 42; > return obj.x; > } > > With MemBarCPUOrder: > working > > Setting `RC_NARROW_MEM` flag in `LibraryCallKit::inline_unsafe_copyMemory()` removes the `428 MergeMem` node. Without a `MergeMem` node after `432 StoreB` EA failes in Phase 2 of `ConnectionGraph::split_unique_types(...)` when trying to push the allocation's users on the appropriate worklist - `429 CallLeafNoFP` is not an expected user of `428 StoreB`. Therefore the assert `"EA: missing memory path"` is hit. > Without MemBarCPUOrdera and after setting `RC_NARROW_MEM`: > failing > > > ### Proposed Fix > Dropping the `RC_NARROW_MEM` flag in `LibraryCallKit::inline_unsafe_copyMemory()` causes the introduction of a `MergeMem` between `StoreB` and `CallLeafNoFP`, so the corresponding code in EA doesn't encounter a `CallLeafNoFP` anymore: > fixed > > Testing: Tier1-4 passed Yes, proposed fix undoes some optimizations JDK-8269119 introduced: `RC_NARROW_MEM` was introduced to optimally represent memory effects of native-to-native memory copy. The whole off-heap memory state is tracked by a single raw memory slice, so it qualifies to be treated as operating on narrow memory. The IR shape as it is now looks fine. JVM models non-heap memory operations as raw accesses, but they are serialized on a single memory alias (raw memory). IMO the bug is in EA code which doesn't properly handle calls with narrow memory effects. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17347#issuecomment-1888090765 From vlivanov at openjdk.org Thu Jan 11 23:34:18 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 11 Jan 2024 23:34:18 GMT Subject: RFR: JDK-8316756: C2 EA fails with "missing memory path" when encountering unsafe_arraycopy stub call In-Reply-To: References: Message-ID: On Thu, 11 Jan 2024 17:59:59 GMT, Vladimir Kozlov wrote: > Flag setting (StoreB nodes) in JavaThread::_doing_unsafe_access is also not affected but it is volatile field and these stores should be staying where they are. They can't go up or down. In a private discussion with @tobiasholenstein, I proposed to move those stores into the corresponding stub. It would complicate the implementation a bit (platform-specific vs cross-platform implementation), but simplify things on IR level. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17347#issuecomment-1888131977 From kvn at openjdk.org Thu Jan 11 23:43:18 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 11 Jan 2024 23:43:18 GMT Subject: RFR: JDK-8316756: C2 EA fails with "missing memory path" when encountering unsafe_arraycopy stub call In-Reply-To: References: Message-ID: On Thu, 11 Jan 2024 22:48:44 GMT, Vladimir Ivanov wrote: > all non-heap memory operations as raw accesses. Right. StoreB is also RAW access. My previous comment is incorrect - StoreB can be memory for `unsafe_arraycopy` and such it can preserve the order of execution. I agree with moving Stores into stub. C2 don't need to know about them. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17347#issuecomment-1888138970 From fyang at openjdk.org Fri Jan 12 01:25:22 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 12 Jan 2024 01:25:22 GMT Subject: RFR: 8323584: AArch64: Unnecessary ResourceMark in NativeCall::set_destination_mt_safe In-Reply-To: References: Message-ID: <0bmTMOxYMebuuZyS-Yxg31x_nxEESCvTCsI2twowt9w=.e7b46583-d9eb-4fa7-bc51-903dfe51e7c0@github.com> On Thu, 11 Jan 2024 12:17:21 GMT, Aleksey Shipilev wrote: > Was looking at `ICBuffer` cleaning paths that run at safepoint. In `NativeCall::set_destination_mt_safe`, there is a `ResourceMark` that does not seem to have any purpose: no code in its scope uses resource allocation. There is adjacent dead code too. Looks like a development/debugging leftover. > > Additional testing: > - [ ] Linux x86_64 AArch64 server fastdebug, `tier{1,2,3}` LGTM. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17372#pullrequestreview-1817120885 From thartmann at openjdk.org Fri Jan 12 08:44:19 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 12 Jan 2024 08:44:19 GMT Subject: [jdk22] RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp In-Reply-To: References: Message-ID: <98CU5sgN8QGWIFJn5NYdo7c29T9WSstoqV17aed-sFU=.d6645440-5429-4a74-a30c-6a5b888fb648@github.com> On Thu, 11 Jan 2024 18:07:43 GMT, Sandhya Viswanathan wrote: > Clean backport of JDK-8321712 to JDK 22. Original [PR](https://github.com/openjdk/jdk/pull/17315) was reviewed by Vladimir Kozlov, Tobias Hartmann, Emanuel Peter, and Jatin Bhateja. Please review and approve. > > Best Regards, > Sandhya Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk22/pull/62#pullrequestreview-1817668195 From aph at openjdk.org Fri Jan 12 08:52:23 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 12 Jan 2024 08:52:23 GMT Subject: RFR: 8323584: AArch64: Unnecessary ResourceMark in NativeCall::set_destination_mt_safe In-Reply-To: References: Message-ID: On Thu, 11 Jan 2024 12:17:21 GMT, Aleksey Shipilev wrote: > Was looking at `ICBuffer` cleaning paths that run at safepoint. In `NativeCall::set_destination_mt_safe`, there is a `ResourceMark` that does not seem to have any purpose: no code in its scope uses resource allocation. There is adjacent dead code too. Looks like a development/debugging leftover. > > Additional testing: > - [ ] Linux x86_64 AArch64 server fastdebug, `tier{1,2,3}` There are more of those. While they don't have much of an effect on runtime, it might be worth a cleanup pass to remove them in one go. ------------- Marked as reviewed by aph (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17372#pullrequestreview-1817683863 From cslucas at openjdk.org Fri Jan 12 10:47:32 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Fri, 12 Jan 2024 10:47:32 GMT Subject: Integrated: JDK-8323190: Segfault during deoptimization of C2-compiled code In-Reply-To: References: Message-ID: <_eqVXpIpEfN4cEN6TlHb8bgzICxvJga5i1Fz4c6AP9U=.1a779876-80d7-4ee8-a8eb-01ead5e03053@github.com> On Wed, 10 Jan 2024 01:22:37 GMT, Cesar Soares Lucas wrote: > Currently, if `ReduceAllocationMerges` reduces an allocation merge that is used as a monitor C2 will SIGFAULT in `Process_OopMap_Node` because it's missing code to handle that case. This patch fixes C2 to properly handle reduced allocation merges that are used as monitors. > > Tested with Linux x86_64 hotspot_all. This pull request has now been integrated. Changeset: ed182223 Author: Cesar Soares Lucas Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/ed182223655feee5356d42a94dd74950e9595724 Stats: 92 lines in 2 files changed: 89 ins; 0 del; 3 mod 8323190: Segfault during deoptimization of C2-compiled code Reviewed-by: thartmann, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/17333 From thartmann at openjdk.org Fri Jan 12 10:52:03 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 12 Jan 2024 10:52:03 GMT Subject: [jdk22] RFR: 8323190: Segfault during deoptimization of C2-compiled code Message-ID: Hi all, This pull request contains a backport of commit [ed182223](https://github.com/openjdk/jdk/commit/ed182223655feee5356d42a94dd74950e9595724) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. The commit being backported was authored by Cesar Soares Lucas on 12 Jan 2024 and was reviewed by Tobias Hartmann and Christian Hagedorn. Thanks! ------------- Commit messages: - Backport ed182223655feee5356d42a94dd74950e9595724 Changes: https://git.openjdk.org/jdk22/pull/67/files Webrev: https://webrevs.openjdk.org/?repo=jdk22&pr=67&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8323190 Stats: 92 lines in 2 files changed: 89 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk22/pull/67.diff Fetch: git fetch https://git.openjdk.org/jdk22.git pull/67/head:pull/67 PR: https://git.openjdk.org/jdk22/pull/67 From rcastanedalo at openjdk.org Fri Jan 12 10:56:14 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 12 Jan 2024 10:56:14 GMT Subject: RFR: 8322692: ZGC: avoid over-unrolling due to hidden barrier size Message-ID: This changeset refines the C2 loop unrolling heuristic by including an estimation of the final size of (Generational) ZGC barriers in the loop size computation. These are not exposed in C2's intermediate representation and thus currently ignored by the heuristic, which can lead to over-unrolling. #### Testing - tier1-5, stress test, fuzzing (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64). - tier6-9 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64, ZGC-specific tests only). #### Performance and code size evaluation - DaCapo, SPECjvm2008, SPECjbb2015 (linux-x64 with `-XX:+UseZGC -XX:+ZGenerational`). The changeset reduces slightly the size of the C2-generated code (around 0.5% fewer bytes per compiled bytecode for the `fop` and `luindex` DaCapo benchmarks) and has no overall significant performance effect. ------------- Commit messages: - Take into account late barrier size estimation in C2 unrolling heuristics Changes: https://git.openjdk.org/jdk/pull/17367/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17367&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8322692 Stats: 110 lines in 7 files changed: 109 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17367.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17367/head:pull/17367 PR: https://git.openjdk.org/jdk/pull/17367 From qamai at openjdk.org Fri Jan 12 10:56:17 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 12 Jan 2024 10:56:17 GMT Subject: RFR: 8322692: ZGC: avoid over-unrolling due to hidden barrier size In-Reply-To: References: Message-ID: On Thu, 11 Jan 2024 08:47:41 GMT, Roberto Casta?eda Lozano wrote: > This changeset refines the C2 loop unrolling heuristic by including an estimation of the final size of (Generational) ZGC barriers in the loop size computation. These are not exposed in C2's intermediate representation and thus currently ignored by the heuristic, which can lead to over-unrolling. > > #### Testing > > - tier1-5, stress test, fuzzing (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64). > - tier6-9 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64, ZGC-specific tests only). > > #### Performance and code size evaluation > > - DaCapo, SPECjvm2008, SPECjbb2015 (linux-x64 with `-XX:+UseZGC -XX:+ZGenerational`). The changeset reduces slightly the size of the C2-generated code (around 0.5% fewer bytes per compiled bytecode for the `fop` and `luindex` DaCapo benchmarks) and has no overall significant performance effect. src/hotspot/share/gc/z/c2/zBarrierSetC2.cpp line 334: > 332: // seven more nodes (CallLeaf, control Proj, memory Proj, data Proj, Region, > 333: // memory Phi, data Phi). > 334: return uncolor_or_color_size + 12; I thought the runtime call does not lie inside the loop. Is it necessary to take them into account, too? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17367#discussion_r1448602659 From rcastanedalo at openjdk.org Fri Jan 12 10:56:19 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 12 Jan 2024 10:56:19 GMT Subject: RFR: 8322692: ZGC: avoid over-unrolling due to hidden barrier size In-Reply-To: References: Message-ID: On Thu, 11 Jan 2024 10:06:54 GMT, Quan Anh Mai wrote: >> This changeset refines the C2 loop unrolling heuristic by including an estimation of the final size of (Generational) ZGC barriers in the loop size computation. These are not exposed in C2's intermediate representation and thus currently ignored by the heuristic, which can lead to over-unrolling. >> >> #### Testing >> >> - tier1-5, stress test, fuzzing (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64). >> - tier6-9 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64, ZGC-specific tests only). >> >> #### Performance and code size evaluation >> >> - DaCapo, SPECjvm2008, SPECjbb2015 (linux-x64 with `-XX:+UseZGC -XX:+ZGenerational`). The changeset reduces slightly the size of the C2-generated code (around 0.5% fewer bytes per compiled bytecode for the `fop` and `luindex` DaCapo benchmarks) and has no overall significant performance effect. > > src/hotspot/share/gc/z/c2/zBarrierSetC2.cpp line 334: > >> 332: // seven more nodes (CallLeaf, control Proj, memory Proj, data Proj, Region, >> 333: // memory Phi, data Phi). >> 334: return uncolor_or_color_size + 12; > > I thought the runtime call does not lie inside the loop. Is it necessary to take them into account, too? Conceptually, the runtime call belongs to the loop, even if it is laid out in the cold section of the method. The current unrolling heuristic counts all basic blocks in the loop, regardless of whether they are hot or cold and how they are arranged in the final code. This changeset does the same for consistency. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17367#discussion_r1448818759 From chagedorn at openjdk.org Fri Jan 12 11:02:38 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 12 Jan 2024 11:02:38 GMT Subject: RFR: 8323101: C2: assert(n->in(0) == nullptr) failed: divisions with zero check should already have bailed out earlier in split-if Message-ID: <7RQTgz8ZyoAIk6gpdZgDCFsqIBA_WK6IJ6OkHFLg_ts=.da8bb3a9-674c-4b2e-863f-0b2afcb37f34@github.com> The assertion added by [JDK-8299259](https://bugs.openjdk.org/browse/JDK-8299259) is wrong. I've originally assumed that at this point, there are no pinned `Div/Mod` nodes anymore that we possibly want to split through a phi due to bailing out earlier here: https://github.com/openjdk/jdk/blob/3e19bf88d5b51fe10c183f930b99bce961a368c1/src/hotspot/share/opto/loopopts.cpp#L1137-L1140 The assumption was that a `Div/Mod` node with a control input to the zero-check `IfProj` could only have a `Phi` input with a `Region` that is further up in the graph. This is normally true. However, in `testIntDiv()`, we split an `If` with `do_split_if()` and need to empty the basic block. We split the store `iFld = sub` up. This includes the `StoreI` as well as the `AddI` which also has the `Region` as current `ctrl` (i.e. returned with `get_ctrl()`). The `AddI` also has the `DivI` as output which, however, is not pushed up since it's not part of the same basic block. We end up with the following graph after the completion of `do_split_if()`: ![image](https://github.com/openjdk/jdk/assets/17833009/b76293e1-a593-4319-b026-254be3c098fc) The `DivI` now has `252 Phi` as input which merges the split `AddI` nodes. `246 Region` of the `252 Phi` is further down than the control input `83 IfTrue` of the `DivI`.This is rather unusual and thus was missed when the assert was added in JDK-8299259. When finally processing the `DivI` node in the DFS walk of Split-If, we fail with the assertion. The fix is straight forward to turn this assert into a simple bailout: We should not split a `Div/Mod` node that is pinned (i.e. has a zero check). While working on this bug, I've also tried to trigger the assert with `DivL/ModL` nodes. However, this did not work because `split_up()` does not split the `Add` node up. The reason is that we set late ctrl to early ctrl for the `DivL/ModL` node (and thus also set the same late ctrl for the `Add` node) while we do not do that for `DivI/ModI` nodes. It seems that we miss to treat `DivL/ModL` nodes as unpinned here which would allow us to set a later ctrl: https://github.com/openjdk/jdk/blob/82a63a03c0155288e8e43b9f766c8be70be50b6a/src/hotspot/share/opto/loopnode.cpp#L6091-L6101 I've done some digging and found that `DivL/ModL` nodes were added after this switch statment. So, I assume we simply forgot to also treat them as unpinned here. It's not wrong but I think just an unnecessary limitation. I filed [JDK-8323652](https://bugs.openjdk.org/browse/JDK-8323652) to fix this. Either way, I left the code for the long cases in even though they do not trigger. They should once JDK-8323652 is fixed. Thanks, Christian ------------- Commit messages: - 8323101: C2: assert(n->in(0) == nullptr) failed: divisions with zero check should already have bailed out earlier in split-if Changes: https://git.openjdk.org/jdk/pull/17394/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17394&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8323101 Stats: 215 lines in 2 files changed: 214 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17394.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17394/head:pull/17394 PR: https://git.openjdk.org/jdk/pull/17394 From epeter at openjdk.org Fri Jan 12 11:45:20 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 12 Jan 2024 11:45:20 GMT Subject: [jdk22] RFR: 8323190: Segfault during deoptimization of C2-compiled code In-Reply-To: References: Message-ID: On Fri, 12 Jan 2024 10:45:11 GMT, Tobias Hartmann wrote: > Hi all, > > This pull request contains a backport of commit [ed182223](https://github.com/openjdk/jdk/commit/ed182223655feee5356d42a94dd74950e9595724) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Cesar Soares Lucas on 12 Jan 2024 and was reviewed by Tobias Hartmann and Christian Hagedorn. > > Thanks! LGTM ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk22/pull/67#pullrequestreview-1818008006 From duke at openjdk.org Fri Jan 12 11:49:19 2024 From: duke at openjdk.org (Yude Lin) Date: Fri, 12 Jan 2024 11:49:19 GMT Subject: RFR: 8323122: AArch64: Increase itable stub size estimate In-Reply-To: References: Message-ID: On Wed, 10 Jan 2024 02:03:08 GMT, Yude Lin wrote: > Since [JDK-8307352](https://bugs.openjdk.org/browse/JDK-8307352), itable stub size has grown by 20 bytes on linux-aarch64. In particular, the "slop-counted" code increases from 100->120 bytes, where the current estimate is 124 bytes. I haven't found a case where it exceeds the estimate. For now this size is stable across the few linux-aarch64 configurations I ran with. It doesn't vary (for example) by different klass decoding schemes. But I think the idea of the estimate is that we can never know. I propose we increase the estimate to be safe. > > Passed hotspot/jtreg/:tier1 Can I get a review on this small patch please : ) ------------- PR Comment: https://git.openjdk.org/jdk/pull/17336#issuecomment-1888950779 From thartmann at openjdk.org Fri Jan 12 11:49:22 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 12 Jan 2024 11:49:22 GMT Subject: [jdk22] RFR: 8323190: Segfault during deoptimization of C2-compiled code In-Reply-To: References: Message-ID: <7KK3ymYR5c1IVbrS7xj8QjkFiDO7MdWST5mETMckagg=.eb430ce6-5528-4905-b3ce-6bfba08e552c@github.com> On Fri, 12 Jan 2024 10:45:11 GMT, Tobias Hartmann wrote: > Hi all, > > This pull request contains a backport of commit [ed182223](https://github.com/openjdk/jdk/commit/ed182223655feee5356d42a94dd74950e9595724) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Cesar Soares Lucas on 12 Jan 2024 and was reviewed by Tobias Hartmann and Christian Hagedorn. > > Thanks! Thanks, Emanuel! ------------- PR Comment: https://git.openjdk.org/jdk22/pull/67#issuecomment-1888951022 From shade at openjdk.org Fri Jan 12 11:59:19 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 12 Jan 2024 11:59:19 GMT Subject: RFR: 8323584: AArch64: Unnecessary ResourceMark in NativeCall::set_destination_mt_safe In-Reply-To: References: Message-ID: On Fri, 12 Jan 2024 08:49:27 GMT, Andrew Haley wrote: > There are more of those. While they don't have much of an effect on runtime, it might be worth a cleanup pass to remove them in one go. Right. I would prefer to remove `ResourceMark`-s one by one, though, because one needs to go through all callees in the scope to actually verify the absence of resource allocations. I would not trust testing to find missing RMs reliably, especially when obscure uses are hiding on some branches. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17372#issuecomment-1888982565 From thartmann at openjdk.org Fri Jan 12 12:45:24 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 12 Jan 2024 12:45:24 GMT Subject: [jdk22] Integrated: 8323190: Segfault during deoptimization of C2-compiled code In-Reply-To: References: Message-ID: On Fri, 12 Jan 2024 10:45:11 GMT, Tobias Hartmann wrote: > Hi all, > > This pull request contains a backport of commit [ed182223](https://github.com/openjdk/jdk/commit/ed182223655feee5356d42a94dd74950e9595724) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Cesar Soares Lucas on 12 Jan 2024 and was reviewed by Tobias Hartmann and Christian Hagedorn. > > Thanks! This pull request has now been integrated. Changeset: d115295d Author: Tobias Hartmann URL: https://git.openjdk.org/jdk22/commit/d115295df8ccfec8670878ab5a7dc8d8661025d9 Stats: 92 lines in 2 files changed: 89 ins; 0 del; 3 mod 8323190: Segfault during deoptimization of C2-compiled code Reviewed-by: epeter Backport-of: ed182223655feee5356d42a94dd74950e9595724 ------------- PR: https://git.openjdk.org/jdk22/pull/67 From rcastanedalo at openjdk.org Fri Jan 12 14:04:22 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 12 Jan 2024 14:04:22 GMT Subject: RFR: 8322692: ZGC: avoid over-unrolling due to hidden barrier size In-Reply-To: References: Message-ID: On Thu, 11 Jan 2024 08:47:41 GMT, Roberto Casta?eda Lozano wrote: > This changeset refines the C2 loop unrolling heuristic by including an estimation of the final size of (Generational) ZGC barriers in the loop size computation. These are not exposed in C2's intermediate representation and thus currently ignored by the heuristic, which can lead to over-unrolling. > > #### Testing > > - tier1-5, stress test, fuzzing (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64). > - tier6-9 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64, ZGC-specific tests only). > > #### Performance and code size evaluation > > - DaCapo, SPECjvm2008, SPECjbb2015 (linux-x64 with `-XX:+UseZGC -XX:+ZGenerational`). The changeset reduces slightly the size of the C2-generated code (around 0.5% fewer bytes per compiled bytecode for the `fop` and `luindex` DaCapo benchmarks) and has no overall significant performance effect. Switching back to draft mode to address some offline comments from Erik ?sterlund. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17367#issuecomment-1889288008 From dnsimon at openjdk.org Fri Jan 12 14:34:49 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 12 Jan 2024 14:34:49 GMT Subject: RFR: 8323616: [JVMCI] TestInvalidJVMCIOption.java fails intermittently with NPE Message-ID: <8KAT1E7MkSLCl1JrnEBjDNvgmaCn_ryBbGVMtY0uiTQ=.e6bc008c-5338-4361-9042-20f1ba5e65cd@github.com> This PR removes an assertion from `TestInvalidJVMCIOption` that can fail intermittently due to a race between JIT initialization and runtime class initialization. The only thing the test should guarantee is that an invalid option is detected and results in a VM exit. ------------- Commit messages: - remove racy (and unnecessary) assertion in TestInvalidJVMCIOption Changes: https://git.openjdk.org/jdk/pull/17397/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17397&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8323616 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17397.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17397/head:pull/17397 PR: https://git.openjdk.org/jdk/pull/17397 From thartmann at openjdk.org Fri Jan 12 14:38:25 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 12 Jan 2024 14:38:25 GMT Subject: RFR: 8323616: [JVMCI] TestInvalidJVMCIOption.java fails intermittently with NPE In-Reply-To: <8KAT1E7MkSLCl1JrnEBjDNvgmaCn_ryBbGVMtY0uiTQ=.e6bc008c-5338-4361-9042-20f1ba5e65cd@github.com> References: <8KAT1E7MkSLCl1JrnEBjDNvgmaCn_ryBbGVMtY0uiTQ=.e6bc008c-5338-4361-9042-20f1ba5e65cd@github.com> Message-ID: On Fri, 12 Jan 2024 14:25:29 GMT, Doug Simon wrote: > This PR removes an assertion from `TestInvalidJVMCIOption` that can fail intermittently due to a race between JIT initialization and runtime class initialization. > The only thing the test should guarantee is that an invalid option is detected and results in a VM exit. Looks good and trivial. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17397#pullrequestreview-1818397891 From thartmann at openjdk.org Fri Jan 12 14:45:29 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 12 Jan 2024 14:45:29 GMT Subject: RFR: 8322077: Add Ideal transformation: (~a) | (~b) => ~(a & b) [v13] In-Reply-To: References: Message-ID: On Wed, 10 Jan 2024 16:57:47 GMT, Zhiqiang Zang wrote: >> Hello, >> >> (~a) | (~b) => ~(a & b) is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1617C28-L1617C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. >> >> Thanks. > > Zhiqiang Zang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 34 commits: > > - Merge branch 'master' of https://github.com/openjdk/jdk into ornode-PNewDeMorganLawOrToAnd > - update copyright dates. > - Merge branch 'andnode-PNewDeMorganLawAndToOr' into ornode-PNewDeMorganLawOrToAnd > - Revert "move the two helper functions to member functions of the node class." > > This reverts commit 7a962d69ac687c0476e54cd004037f2ebb0800ba. > - Revert "update copyright dates." > > This reverts commit 3665de2f3d38b4fce3ba968ec6247dd2576ecc32. > - Revert "move make_not back to AddNode because it cannot compile for architecture arm, s390x, ppc64le." > > This reverts commit 4ee8b089b6f119b1692dcd2c512d249d92734697. > - Revert "adapt changes from the dependent pr." > > This reverts commit 0c8d10776fe55a28ea9f60160b4f6f57aabad5ab. > - Revert "adapt to new changes from the dependant pr." > > This reverts commit b21e242b306d9cf01e1ba84ebba47ef5b471d98e. > - adapt to new changes from the dependant pr. > - Merge branch 'andnode-PNewDeMorganLawAndToOr' into ornode-PNewDeMorganLawOrToAnd > - ... and 24 more: https://git.openjdk.org/jdk/compare/b86c3b7a...f908668b All tests passed, this is good to go. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16334#issuecomment-1889393797 From thartmann at openjdk.org Fri Jan 12 14:46:21 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 12 Jan 2024 14:46:21 GMT Subject: RFR: 8320237 C2: late inlining of method handle invoke causes duplicate lines in PrintInlining output [v2] In-Reply-To: References: Message-ID: On Wed, 10 Jan 2024 16:37:44 GMT, Kangcheng Xu wrote: >> This PR resolves [JDK-8320237](https://bugs.openjdk.org/browse/JDK-8320237) >> >> The original behavior produces both a failure and success message upon late inlining which is confusing. The patch removes the failure message if inlining was successful. Huge thanks to @rwestrel for reporting and working out a solution. >> >> Unit test `test/hotspot/jtreg/compiler/inlining/TestDuplicatedLateInliningOutput.java` is added and passing. > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > fix VM crashes Thanks, I'll re-run testing. Could you please explain what the problem was? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17147#issuecomment-1889397470 From dlunden at openjdk.org Fri Jan 12 15:09:20 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Fri, 12 Jan 2024 15:09:20 GMT Subject: RFR: 8322996: BoxLockNode creation fails with assert(reg < CHUNK_SIZE) failed: sanity [v3] In-Reply-To: References: <66zey4Kt56pbcieRZ0AjBLemzwsAQcHCdDG1nlK-P1c=.14dc02d6-bef8-4239-9f39-02456756b7ea@github.com> Message-ID: On Thu, 11 Jan 2024 20:05:26 GMT, Vladimir Kozlov wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove superfluous -TieredCompilation > > src/hotspot/share/opto/locknode.hpp line 66: > >> 64: return (int)reg < (int)(RegMask::CHUNK_SIZE - 1 - Compile::current()->sync_stack_slots()); >> 65: } >> 66: > > I think it should be in `regmask.hpp` together with other `can_represent_*` methods. Then you don't need part of the comment about those methods. Thanks @vnkozlov. Do you know if we can directly use `can_represent` instead, and not take `sync_stack_slots()` into account? The field `_inmask` in `BoxLockNode` seems to only specify a single register (one bit in the mask). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17370#discussion_r1450578513 From ddong at openjdk.org Fri Jan 12 15:24:30 2024 From: ddong at openjdk.org (Denghui Dong) Date: Fri, 12 Jan 2024 15:24:30 GMT Subject: RFR: 8322735: C2: minor improvements of bubble sort used in SuperWord::packset_sort [v3] In-Reply-To: References: Message-ID: On Tue, 9 Jan 2024 05:26:43 GMT, Denghui Dong wrote: >> A minor improvement could be made for bubble sort in SuperWord::packset_sort to reduce the comparison count in bad cases. >> >> See https://en.wikipedia.org/wiki/Bubble_sort > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > update Thanks for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17190#issuecomment-1889494658 From ddong at openjdk.org Fri Jan 12 15:24:32 2024 From: ddong at openjdk.org (Denghui Dong) Date: Fri, 12 Jan 2024 15:24:32 GMT Subject: Integrated: 8322735: C2: minor improvements of bubble sort used in SuperWord::packset_sort In-Reply-To: References: Message-ID: On Mon, 25 Dec 2023 15:34:55 GMT, Denghui Dong wrote: > A minor improvement could be made for bubble sort in SuperWord::packset_sort to reduce the comparison count in bad cases. > > See https://en.wikipedia.org/wiki/Bubble_sort This pull request has now been integrated. Changeset: c5e72450 Author: Denghui Dong URL: https://git.openjdk.org/jdk/commit/c5e72450966ad50d57a8d22e9d634bfcb319aee9 Stats: 7 lines in 1 file changed: 0 ins; 2 del; 5 mod 8322735: C2: minor improvements of bubble sort used in SuperWord::packset_sort Reviewed-by: epeter, kvn ------------- PR: https://git.openjdk.org/jdk/pull/17190 From roland at openjdk.org Fri Jan 12 15:31:23 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 12 Jan 2024 15:31:23 GMT Subject: RFR: 8320237 C2: late inlining of method handle invoke causes duplicate lines in PrintInlining output [v2] In-Reply-To: References: Message-ID: On Fri, 12 Jan 2024 14:43:39 GMT, Tobias Hartmann wrote: > Could you please explain what the problem was? I did the fix. For virtual and method handle inline calls, late inlining happens because we couldn't resolve the call to a single target before and the late inlining logic goes through the inlining heuristics again to find one. As a result, a new inlining message is produced. For other type of calls, the call is known to successfully inline but was delayed due to lack of nodes. When late inlining succeeds then, it's because the graph has shrunk enough but there's no new inlining message. The other thing is that the print inlining logic is conditioned on PrintInlining and PrintIntrinsics. When PrintIntrinsics only is true, for method handle and virtual calls, during late inlining no new inlining message is produced. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17147#issuecomment-1889510391 From tholenstein at openjdk.org Fri Jan 12 15:47:21 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 12 Jan 2024 15:47:21 GMT Subject: RFR: JDK-8316756: C2 EA fails with "missing memory path" when encountering unsafe_arraycopy stub call In-Reply-To: References: Message-ID: On Thu, 11 Jan 2024 23:31:15 GMT, Vladimir Ivanov wrote: > > Flag setting (StoreB nodes) in JavaThread::_doing_unsafe_access is also not affected but it is volatile field and these stores should be staying where they are. They can't go up or down. > > In a private discussion with @tobiasholenstein, I proposed to move those stores into the corresponding stub. It would complicate the implementation a bit (platform-specific vs cross-platform implementation), but simplify things on IR level. So you think we should go for that solution instead of this fix? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17347#issuecomment-1889534817 From epeter at openjdk.org Fri Jan 12 16:20:28 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 12 Jan 2024 16:20:28 GMT Subject: RFR: 8323641: Test compiler/loopopts/superword/TestAlignVectorFuzzer.java timed out Message-ID: It seems that allowing `90%` of the timeout-time was cutting it too close. Some individual tests can take more time occasionally, one even took more than `80 sec` (very rare). Why do these tests take so long? - Sometimes compilation can take quite a bit of time. `-XX:LoopUnrollLimit=250` already increases the number of nodes allowed in a loop-body, and with unrolling this increases significantly. SuperWord then has to work through all these nodes, and has a lot of quadratic and higher complexity loops. - The loop bodies are quite large (hand unrolled), and often lead to partial vectorization, with lots of scalar memory ops. This produces quite sub-optimal code, with a bit too many instructions in the loop body. Combined with lots of array copying, this probably takes quite a hit on caches. I will investigate SuperWord compilation time in the future, and lower the runtime complexity if neccessary/possible, it is part of my autovectorization plans. I now lowered the allowance down to `40%`, which is hopefully small enough to avoid timeout, while still allowing sufficient many run to get decent test coverage. ------------- Commit messages: - reduce allowance even more, and fix typos - 8323641 Changes: https://git.openjdk.org/jdk/pull/17389/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17389&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8323641 Stats: 7 lines in 1 file changed: 2 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/17389.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17389/head:pull/17389 PR: https://git.openjdk.org/jdk/pull/17389 From chagedorn at openjdk.org Fri Jan 12 16:33:21 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 12 Jan 2024 16:33:21 GMT Subject: RFR: 8323641: Test compiler/loopopts/superword/TestAlignVectorFuzzer.java timed out In-Reply-To: References: Message-ID: <5OAn4KSIJ8gtoBjvYcx8M71lSHs_zzImW0MMIhX0ZOE=.6fa3b25d-943d-4068-8edc-f9e77e616f83@github.com> On Fri, 12 Jan 2024 08:22:54 GMT, Emanuel Peter wrote: > It seems that allowing `90%` of the timeout-time was cutting it too close. Some individual tests can take more time occasionally, one even took more than `80 sec` (very rare). > > Why do these tests take so long? > - Sometimes compilation can take quite a bit of time. `-XX:LoopUnrollLimit=250` already increases the number of nodes allowed in a loop-body, and with unrolling this increases significantly. SuperWord then has to work through all these nodes, and has a lot of quadratic and higher complexity loops. > - The loop bodies are quite large (hand unrolled), and often lead to partial vectorization, with lots of scalar memory ops. This produces quite sub-optimal code, with a bit too many instructions in the loop body. Combined with lots of array copying, this probably takes quite a hit on caches. > > I will investigate SuperWord compilation time in the future, and lower the runtime complexity if neccessary/possible, it is part of my autovectorization plans. > > I now lowered the allowance down to `40%`, which is hopefully small enough to avoid timeout, while still allowing sufficient many run to get decent test coverage. That looks reasonable. test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVectorFuzzer.java line 565: > 563: 20_000; > 564: System.out.println("Time Allowance: " + test_time_allowance_diff); > 565: long test_time_allowance = System.currentTimeMillis() + test_time_allowance_diff; Nit: You should use CamelCase for local Java variables. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17389#pullrequestreview-1818631559 PR Review Comment: https://git.openjdk.org/jdk/pull/17389#discussion_r1450676162 From duke at openjdk.org Fri Jan 12 16:47:26 2024 From: duke at openjdk.org (Zhiqiang Zang) Date: Fri, 12 Jan 2024 16:47:26 GMT Subject: RFR: 8322077: Add Ideal transformation: (~a) | (~b) => ~(a & b) [v13] In-Reply-To: <_b5ODSBa8YhAf5i7hafehvmw40MAdi4z5yF0EicXBUE=.bb562b91-9751-4dab-a487-0e9961b1f199@github.com> References: <_b5ODSBa8YhAf5i7hafehvmw40MAdi4z5yF0EicXBUE=.bb562b91-9751-4dab-a487-0e9961b1f199@github.com> Message-ID: On Thu, 11 Jan 2024 08:23:02 GMT, Emanuel Peter wrote: >> Zhiqiang Zang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 34 commits: >> >> - Merge branch 'master' of https://github.com/openjdk/jdk into ornode-PNewDeMorganLawOrToAnd >> - update copyright dates. >> - Merge branch 'andnode-PNewDeMorganLawAndToOr' into ornode-PNewDeMorganLawOrToAnd >> - Revert "move the two helper functions to member functions of the node class." >> >> This reverts commit 7a962d69ac687c0476e54cd004037f2ebb0800ba. >> - Revert "update copyright dates." >> >> This reverts commit 3665de2f3d38b4fce3ba968ec6247dd2576ecc32. >> - Revert "move make_not back to AddNode because it cannot compile for architecture arm, s390x, ppc64le." >> >> This reverts commit 4ee8b089b6f119b1692dcd2c512d249d92734697. >> - Revert "adapt changes from the dependent pr." >> >> This reverts commit 0c8d10776fe55a28ea9f60160b4f6f57aabad5ab. >> - Revert "adapt to new changes from the dependant pr." >> >> This reverts commit b21e242b306d9cf01e1ba84ebba47ef5b471d98e. >> - adapt to new changes from the dependant pr. >> - Merge branch 'andnode-PNewDeMorganLawAndToOr' into ornode-PNewDeMorganLawOrToAnd >> - ... and 24 more: https://git.openjdk.org/jdk/compare/b86c3b7a...f908668b > > Testing running for commit 34... @eme64 @TobiHartmann Thanks for testing! Can you sponsor it when you get a chance thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16334#issuecomment-1889628035 From sviswanathan at openjdk.org Fri Jan 12 17:02:25 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 12 Jan 2024 17:02:25 GMT Subject: [jdk22] RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp In-Reply-To: References: Message-ID: On Thu, 11 Jan 2024 19:26:57 GMT, Vladimir Kozlov wrote: >> Clean backport of JDK-8321712 to JDK 22. Original [PR](https://github.com/openjdk/jdk/pull/17315) was reviewed by Vladimir Kozlov, Tobias Hartmann, Emanuel Peter, and Jatin Bhateja. Please review and approve. >> >> Best Regards, >> Sandhya > > Good. Thanks a lot @vnkozlov @TobiHartmann. ------------- PR Comment: https://git.openjdk.org/jdk22/pull/62#issuecomment-1889646764 From sviswanathan at openjdk.org Fri Jan 12 17:02:27 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 12 Jan 2024 17:02:27 GMT Subject: [jdk22] Integrated: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp In-Reply-To: References: Message-ID: On Thu, 11 Jan 2024 18:07:43 GMT, Sandhya Viswanathan wrote: > Clean backport of JDK-8321712 to JDK 22. Original [PR](https://github.com/openjdk/jdk/pull/17315) was reviewed by Vladimir Kozlov, Tobias Hartmann, Emanuel Peter, and Jatin Bhateja. Please review and approve. > > Best Regards, > Sandhya This pull request has now been integrated. Changeset: b0920c24 Author: Sandhya Viswanathan URL: https://git.openjdk.org/jdk22/commit/b0920c24cd83d85a846a60fe2d784a48dd8c9b52 Stats: 16 lines in 2 files changed: 14 ins; 0 del; 2 mod 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp Reviewed-by: kvn, thartmann Backport-of: e10d14004fa25998231ab1d2611b75aea9b5c67d ------------- PR: https://git.openjdk.org/jdk22/pull/62 From kvn at openjdk.org Fri Jan 12 17:24:20 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 12 Jan 2024 17:24:20 GMT Subject: RFR: JDK-8316756: C2 EA fails with "missing memory path" when encountering unsafe_arraycopy stub call In-Reply-To: References: Message-ID: On Fri, 12 Jan 2024 15:44:21 GMT, Tobias Holenstein wrote: > > > Flag setting (StoreB nodes) in JavaThread::_doing_unsafe_access is also not affected but it is volatile field and these stores should be staying where they are. They can't go up or down. > > > > > > In a private discussion with @tobiasholenstein, I proposed to move those stores into the corresponding stub. It would complicate the implementation a bit (platform-specific vs cross-platform implementation), but simplify things on IR level. > > So you think we should go for that solution instead of this fix? Yes. You may still need to fix EA to recognize RAW memory for `unsafe_arraycopy`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17347#issuecomment-1889681521 From kvn at openjdk.org Fri Jan 12 19:01:19 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 12 Jan 2024 19:01:19 GMT Subject: RFR: JDK-8316756: C2 EA fails with "missing memory path" when encountering unsafe_arraycopy stub call In-Reply-To: References: Message-ID: On Wed, 10 Jan 2024 13:35:17 GMT, Tobias Holenstein wrote: > Before https://github.com/openjdk/jdk/pull/5259 the graph of the following program looked like this during Escape Analysis: > > > static int test() { > MyClass obj = new MyClass(); // Non-escaping to trigger Escape Analysis > UNSAFE.copyMemory(null, SRC_BASE, null, DST_BASE, 4); > obj.x = 42; > return obj.x; > } > > With MemBarCPUOrder: > working > > Setting `RC_NARROW_MEM` flag in `LibraryCallKit::inline_unsafe_copyMemory()` removes the `428 MergeMem` node. Without a `MergeMem` node after `432 StoreB` EA failes in Phase 2 of `ConnectionGraph::split_unique_types(...)` when trying to push the allocation's users on the appropriate worklist - `429 CallLeafNoFP` is not an expected user of `428 StoreB`. Therefore the assert `"EA: missing memory path"` is hit. > Without MemBarCPUOrdera and after setting `RC_NARROW_MEM`: > failing > > > ### Proposed Fix > Dropping the `RC_NARROW_MEM` flag in `LibraryCallKit::inline_unsafe_copyMemory()` causes the introduction of a `MergeMem` between `StoreB` and `CallLeafNoFP`, so the corresponding code in EA doesn't encounter a `CallLeafNoFP` anymore: > fixed > > Testing: Tier1-4 passed I think EA expect MergeMem node before all calls as memory edge. Even if we move StoreB inside stub we will have runtime call node. I made test more complicated to see how it will react to such runtime calls: static int[] test() { int[] src = new int[4]; int[] dst = new int[4]; MyClass obj = new MyClass(); UNSAFE.copyMemory(src, 0, dst, 0, 4); obj.x = 42; dst[1] = obj.x; return dst; } and it hit same assert: 119 Proj === 117 [[ 15 654 640 644 178 178 178 178 ]] #2 Memory: @rawptr:BotPTR, idx=Raw; !jvms: UnsafeArrayCopy::test @ bci:8 (line 25) 640 CallLeafNoFP === 181 1 119 8 1 (62 91 30 1 ) [[ 641 643 ]] # unsafe_arraycopy void ( NotNull *+bot, NotNull *+bot, long, half ) !jvms: Unsafe::copyMemory @ bci:29 (line 806) UnsafeArrayCopy::test @ bci:26 (line 26) ------------- PR Comment: https://git.openjdk.org/jdk/pull/17347#issuecomment-1889802171 From kvn at openjdk.org Fri Jan 12 19:31:20 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 12 Jan 2024 19:31:20 GMT Subject: RFR: JDK-8316756: C2 EA fails with "missing memory path" when encountering unsafe_arraycopy stub call In-Reply-To: References: Message-ID: <4Zj0P-En4qM_wpOK7Rss-WfbgUg-kC5JEYkelVlq0XA=.33510939-d757-46e6-89f1-13e44638d1c8@github.com> On Wed, 10 Jan 2024 13:35:17 GMT, Tobias Holenstein wrote: > Before https://github.com/openjdk/jdk/pull/5259 the graph of the following program looked like this during Escape Analysis: > > > static int test() { > MyClass obj = new MyClass(); // Non-escaping to trigger Escape Analysis > UNSAFE.copyMemory(null, SRC_BASE, null, DST_BASE, 4); > obj.x = 42; > return obj.x; > } > > With MemBarCPUOrder: > working > > Setting `RC_NARROW_MEM` flag in `LibraryCallKit::inline_unsafe_copyMemory()` removes the `428 MergeMem` node. Without a `MergeMem` node after `432 StoreB` EA failes in Phase 2 of `ConnectionGraph::split_unique_types(...)` when trying to push the allocation's users on the appropriate worklist - `429 CallLeafNoFP` is not an expected user of `428 StoreB`. Therefore the assert `"EA: missing memory path"` is hit. > Without MemBarCPUOrdera and after setting `RC_NARROW_MEM`: > failing > > > ### Proposed Fix > Dropping the `RC_NARROW_MEM` flag in `LibraryCallKit::inline_unsafe_copyMemory()` causes the introduction of a `MergeMem` between `StoreB` and `CallLeafNoFP`, so the corresponding code in EA doesn't encounter a `CallLeafNoFP` anymore: > fixed > > Testing: Tier1-4 passed I am investigating how to handle it in EA. I understand that it is not easy for you. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17347#issuecomment-1889838084 From kvn at openjdk.org Fri Jan 12 20:00:18 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 12 Jan 2024 20:00:18 GMT Subject: RFR: JDK-8316756: C2 EA fails with "missing memory path" when encountering unsafe_arraycopy stub call In-Reply-To: References: Message-ID: On Fri, 12 Jan 2024 15:44:21 GMT, Tobias Holenstein wrote: >>> Flag setting (StoreB nodes) in JavaThread::_doing_unsafe_access is also not affected but it is volatile field and these stores should be staying where they are. They can't go up or down. >> >> In a private discussion with @tobiasholenstein, I proposed to move those stores into the corresponding stub. It would complicate the implementation a bit (platform-specific vs cross-platform implementation), but simplify things on IR level. > >> > Flag setting (StoreB nodes) in JavaThread::_doing_unsafe_access is also not affected but it is volatile field and these stores should be staying where they are. They can't go up or down. >> >> In a private discussion with @tobiasholenstein, I proposed to move those stores into the corresponding stub. It would complicate the implementation a bit (platform-specific vs cross-platform implementation), but simplify things on IR level. > > So you think we should go for that solution instead of this fix? @tobiasholenstein I suggest to file separate REF to move StoreB which updates `JavaThread::_doing_unsafe_access` into stub and work on it. For this issue we need to fix EA. I will work on patch. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17347#issuecomment-1889874468 From kvn at openjdk.org Fri Jan 12 22:13:20 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 12 Jan 2024 22:13:20 GMT Subject: RFR: JDK-8316756: C2 EA fails with "missing memory path" when encountering unsafe_arraycopy stub call In-Reply-To: References: Message-ID: On Wed, 10 Jan 2024 13:35:17 GMT, Tobias Holenstein wrote: > Before https://github.com/openjdk/jdk/pull/5259 the graph of the following program looked like this during Escape Analysis: > > > static int test() { > MyClass obj = new MyClass(); // Non-escaping to trigger Escape Analysis > UNSAFE.copyMemory(null, SRC_BASE, null, DST_BASE, 4); > obj.x = 42; > return obj.x; > } > > With MemBarCPUOrder: > working > > Setting `RC_NARROW_MEM` flag in `LibraryCallKit::inline_unsafe_copyMemory()` removes the `428 MergeMem` node. Without a `MergeMem` node after `432 StoreB` EA failes in Phase 2 of `ConnectionGraph::split_unique_types(...)` when trying to push the allocation's users on the appropriate worklist - `429 CallLeafNoFP` is not an expected user of `428 StoreB`. Therefore the assert `"EA: missing memory path"` is hit. > Without MemBarCPUOrdera and after setting `RC_NARROW_MEM`: > failing > > > ### Proposed Fix > Dropping the `RC_NARROW_MEM` flag in `LibraryCallKit::inline_unsafe_copyMemory()` causes the introduction of a `MergeMem` between `StoreB` and `CallLeafNoFP`, so the corresponding code in EA doesn't encounter a `CallLeafNoFP` anymore: > fixed > > Testing: Tier1-4 passed I added 8316756.patch for EA fix to bug report. Please, also add my test case (with local arrays and object) to your test. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17347#issuecomment-1890048933 From kvn at openjdk.org Fri Jan 12 23:16:18 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 12 Jan 2024 23:16:18 GMT Subject: RFR: 8322996: BoxLockNode creation fails with assert(reg < CHUNK_SIZE) failed: sanity [v3] In-Reply-To: References: <66zey4Kt56pbcieRZ0AjBLemzwsAQcHCdDG1nlK-P1c=.14dc02d6-bef8-4239-9f39-02456756b7ea@github.com> Message-ID: On Fri, 12 Jan 2024 15:06:33 GMT, Daniel Lund?n wrote: >> src/hotspot/share/opto/locknode.hpp line 66: >> >>> 64: return (int)reg < (int)(RegMask::CHUNK_SIZE - 1 - Compile::current()->sync_stack_slots()); >>> 65: } >>> 66: >> >> I think it should be in `regmask.hpp` together with other `can_represent_*` methods. Then you don't need part of the comment about those methods. > > Thanks @vnkozlov. Do you know if we can directly use `can_represent` instead, and not take `sync_stack_slots()` into account? The field `_inmask` in `BoxLockNode` seems to only specify a single register (one bit in the mask). I think your current code is correct. On x64 `sync_stack_slots` defined as 2 (takes 2 bits in regmask) in `x86_64.ad` and as 1 in `x86_32.ad`. On most 64 bit platforms it is also 2 slots, from what I see. But we can't guarantee that some platforms will not have bigger value. We can't use last odd bit on 64 bit platform in regmask - it is taking anyway already by "infinite stack flag". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17370#discussion_r1451019614 From kvn at openjdk.org Fri Jan 12 23:23:18 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 12 Jan 2024 23:23:18 GMT Subject: RFR: 8323641: Test compiler/loopopts/superword/TestAlignVectorFuzzer.java timed out In-Reply-To: References: Message-ID: On Fri, 12 Jan 2024 08:22:54 GMT, Emanuel Peter wrote: > It seems that allowing `90%` of the timeout-time was cutting it too close. Some individual tests can take more time occasionally, one even took more than `80 sec` (very rare). > > Why do these tests take so long? > - Sometimes compilation can take quite a bit of time. `-XX:LoopUnrollLimit=250` already increases the number of nodes allowed in a loop-body, and with unrolling this increases significantly. SuperWord then has to work through all these nodes, and has a lot of quadratic and higher complexity loops. > - The loop bodies are quite large (hand unrolled), and often lead to partial vectorization, with lots of scalar memory ops. This produces quite sub-optimal code, with a bit too many instructions in the loop body. Combined with lots of array copying, this probably takes quite a hit on caches. > > I will investigate SuperWord compilation time in the future, and lower the runtime complexity if neccessary/possible, it is part of my autovectorization plans. > > I now lowered the allowance down to `40%`, which is hopefully small enough to avoid timeout, while still allowing sufficient many run to get decent test coverage. Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17389#pullrequestreview-1819317455 From kvn at openjdk.org Fri Jan 12 23:57:18 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 12 Jan 2024 23:57:18 GMT Subject: RFR: 8323101: C2: assert(n->in(0) == nullptr) failed: divisions with zero check should already have bailed out earlier in split-if In-Reply-To: <7RQTgz8ZyoAIk6gpdZgDCFsqIBA_WK6IJ6OkHFLg_ts=.da8bb3a9-674c-4b2e-863f-0b2afcb37f34@github.com> References: <7RQTgz8ZyoAIk6gpdZgDCFsqIBA_WK6IJ6OkHFLg_ts=.da8bb3a9-674c-4b2e-863f-0b2afcb37f34@github.com> Message-ID: On Fri, 12 Jan 2024 10:52:55 GMT, Christian Hagedorn wrote: > The assertion added by [JDK-8299259](https://bugs.openjdk.org/browse/JDK-8299259) is wrong. I've originally assumed that at this point, there are no pinned `Div/Mod` nodes anymore that we possibly want to split through a phi due to bailing out earlier here: > https://github.com/openjdk/jdk/blob/3e19bf88d5b51fe10c183f930b99bce961a368c1/src/hotspot/share/opto/loopopts.cpp#L1137-L1140 > > The assumption was that a `Div/Mod` node with a control input to the zero-check `IfProj` could only have a `Phi` input with a `Region` that is further up in the graph. This is normally true. However, in `testIntDiv()`, we split an `If` with `do_split_if()` and need to empty the basic block. We split the store `iFld = sub` up. This includes the `StoreI` as well as the `AddI` which also has the `Region` as current `ctrl` (i.e. returned with `get_ctrl()`). > > The `AddI` also has the `DivI` as output which, however, is not pushed up since it's not part of the same basic block. We end up with the following graph after the completion of `do_split_if()`: > > ![image](https://github.com/openjdk/jdk/assets/17833009/b76293e1-a593-4319-b026-254be3c098fc) > > The `DivI` now has `252 Phi` as input which merges the split `AddI` nodes. `246 Region` of the `252 Phi` is further down than the control input `83 IfTrue` of the `DivI`.This is rather unusual and thus was missed when the assert was added in JDK-8299259. When finally processing the `DivI` node in the DFS walk of Split-If, we fail with the assertion. > > The fix is straight forward to turn this assert into a simple bailout: We should not split a `Div/Mod` node that is pinned (i.e. has a zero check). > > While working on this bug, I've also tried to trigger the assert with `DivL/ModL` nodes. However, this did not work because `split_up()` does not split the `Add` node up. The reason is that we set late ctrl to early ctrl for the `DivL/ModL` node (and thus also set the same late ctrl for the `Add` node) while we do not do that for `DivI/ModI` nodes. It seems that we miss to treat `DivL/ModL` nodes as unpinned here which would allow us to set a later ctrl: > https://github.com/openjdk/jdk/blob/82a63a03c0155288e8e43b9f766c8be70be50b6a/src/hotspot/share/opto/loopnode.cpp#L6091-L6101 > > I've done some digging and found that `DivL/ModL` nodes were added after this switch statment. So, I assume we simply forgot to also treat them as unpinned here. It's not wrong but I think just an unnecessary limitation. I filed [JDK-8323652](https://bugs.openjdk.org/browse/JDK-... Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17394#pullrequestreview-1819375218 From duke at openjdk.org Sat Jan 13 09:27:25 2024 From: duke at openjdk.org (Yuri Gaevsky) Date: Sat, 13 Jan 2024 09:27:25 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version Message-ID: The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. ------------- Commit messages: - 8322174: RISC-V: C2 VectorizedHashCode RVV Version Changes: https://git.openjdk.org/jdk/pull/17413/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17413&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8322174 Stats: 422 lines in 7 files changed: 421 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17413.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17413/head:pull/17413 PR: https://git.openjdk.org/jdk/pull/17413 From duke at openjdk.org Sat Jan 13 09:27:26 2024 From: duke at openjdk.org (Yuri Gaevsky) Date: Sat, 13 Jan 2024 09:27:26 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version In-Reply-To: References: Message-ID: On Sat, 13 Jan 2024 09:21:37 GMT, Yuri Gaevsky wrote: > The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. > > Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. NB: I have no access to RVV v1.0.0 hardware so to estimate performance improvements adopted the patch to RVV v0.7.1 ISA under OpenJDK-21 and run the JMH test org.openjdk.bench.java.lang.ArraysHashCode on LicheePi-4A TH1520 which does support RVV v.0.7.1. The results are below. Hopefully they will be similar on RVV v1.0.0 hardware. Legend: UseVHI ==> UseVectorizedHashCodeIntrinsic ---------------------------------------------------------------------------------------------------------------------------------------------- [-XX:-UseVHI -XX:-UseRVV] [-XX:-UseVHI -XX:+UseRVV] [-XX:+UseVHI -XX:-UseRVV] [-XX:+UseVHi -XX:+UseRVV] ---------------------------------------------------------------------------------------------------------------------------------------------- Benchmark (size) Mode Cnt | Score Error | Score Error | Score Error | Score Error |Units| ---------------------------------------------------------------------------------------------------------------------------------------------- bytes 1 avgt 10 | 20.292 ? 0.524 | 20.693 ? 1.706 | 20.458 ? 0.718 | 20.276 ? 0.525 |ns/op| bytes 10 avgt 10 | 35.107 ? 0.180 | 35.054 ? 0.029 | 30.898 ? 0.109 | 31.033 ? 0.132 |ns/op| bytes 100 avgt 10 | 188.190 ? 4.192 | 188.805 ? 4.345 | 152.324 ? 2.205 | 97.673 ? 3.145 |ns/op| bytes 1000 avgt 10 | 1664.569 ? 1.662 | 1663.711 ? 2.229 | 1184.224 ? 0.731 | 656.340 ? 1.908 |ns/op| bytes 10000 avgt 10 | 16419.434 ? 68.995 | 16407.357 ? 43.737 | 11599.876 ? 23.574 | 6171.500 ? 16.633 |ns/op| bytes 100000 avgt 10 | 167738.927 ? 3313.255 | 166577.887 ? 1552.963 | 119475.413 ? 1358.363 | 62061.873 ? 130.268 |ns/op| chars 1 avgt 10 | 20.420 ? 1.031 | 20.294 ? 0.527 | 20.402 ? 0.992 | 21.267 ? 0.027 |ns/op| chars 10 avgt 10 | 35.800 ? 0.032 | 35.778 ? 0.049 | 31.170 ? 0.199 | 31.744 ? 0.169 |ns/op| chars 100 avgt 10 | 185.715 ? 0.674 | 184.531 ? 1.152 | 143.918 ? 1.147 | 90.613 ? 0.092 |ns/op| chars 1000 avgt 10 | 1683.711 ? 46.493 | 1668.926 ? 6.850 | 1120.730 ? 3.017 | 652.677 ? 2.026 |ns/op| chars 10000 avgt 10 | 16402.007 ? 16.654 | 16468.497 ? 136.411 | 10939.505 ? 72.647 | 6174.555 ? 28.879 |ns/op| chars 100000 avgt 10 | 164826.072 ? 381.240 | 165807.663 ? 4328.908 | 114787.826 ? 4217.557 | 61724.436 ? 45.819 |ns/op| ints 1 avgt 10 | 20.730 ? 2.375 | 20.506 ? 1.458 | 20.277 ? 0.517 | 20.169 ? 0.015 |ns/op| ints 10 avgt 10 | 36.878 ? 0.059 | 36.162 ? 1.033 | 31.338 ? 0.243 | 32.511 ? 0.165 |ns/op| ints 100 avgt 10 | 184.288 ? 0.790 | 184.939 ? 0.624 | 143.794 ? 0.708 | 80.406 ? 6.987 |ns/op| ints 1000 avgt 10 | 1669.219 ? 3.559 | 1670.992 ? 13.830 | 1118.856 ? 1.086 | 486.305 ? 4.471 |ns/op| ints 10000 avgt 10 | 16432.730 ? 62.326 | 16710.540 ? 68.028 | 11128.766 ? 57.448 | 5232.062 ? 291.835 |ns/op| ints 100000 avgt 10 | 165387.705 ? 431.814 | 165597.050 ? 278.567 | 115605.648 ? 8245.853 | 45468.032 ? 1793.979 |ns/op| multibytes 1 avgt 10 | 3.459 ? 0.020 | 3.473 ? 0.055 | 3.477 ? 0.145 | 3.480 ? 0.043 |ns/op| multibytes 10 avgt 10 | 16.983 ? 0.264 | 17.526 ? 0.375 | 12.325 ? 0.117 | 13.415 ? 0.136 |ns/op| multibytes 100 avgt 10 | 105.251 ? 0.250 | 105.032 ? 0.180 | 78.795 ? 0.260 | 53.210 ? 1.024 |ns/op| multibytes 1000 avgt 10 | 948.171 ? 5.950 | 957.757 ? 12.117 | 700.407 ? 1.928 | 440.352 ? 2.248 |ns/op| multibytes 10000 avgt 10 | 8829.949 ? 64.161 | 9007.879 ? 510.217 | 6406.776 ? 17.982 | 3430.480 ? 35.108 |ns/op| multibytes 100000 avgt 10 | 89545.793 ? 6151.064 | 88335.319 ? 51.310 | 64236.061 ? 46.572 | 33380.485 ? 56.708 |ns/op| multichars 1 avgt 10 | 3.475 ? 0.054 | 3.453 ? 0.066 | 3.492 ? 0.122 | 3.495 ? 0.047 |ns/op| multichars 10 avgt 10 | 17.719 ? 0.645 | 17.201 ? 0.152 | 12.318 ? 0.141 | 13.093 ? 0.147 |ns/op| multichars 100 avgt 10 | 106.735 ? 0.283 | 106.625 ? 0.177 | 77.695 ? 0.212 | 51.495 ? 0.166 |ns/op| multichars 1000 avgt 10 | 927.573 ? 6.839 | 932.211 ? 3.445 | 696.374 ? 1.757 | 471.226 ? 1.499 |ns/op| multichars 10000 avgt 10 | 9846.872 ? 20.840 | 9909.611 ? 188.165 | 6392.901 ? 4.849 | 3978.730 ? 180.130 |ns/op| multichars 100000 avgt 10 | 88110.303 ? 41.764 | 88892.543 ? 2534.299 | 60615.033 ? 94.002 | 33956.859 ? 199.178 |ns/op| multiints 1 avgt 10 | 3.450 ? 0.328 | 3.382 ? 0.150 | 3.345 ? 0.024 | 3.380 ? 0.040 |ns/op| multiints 10 avgt 10 | 18.265 ? 0.424 | 18.644 ? 1.433 | 12.036 ? 0.041 | 13.773 ? 0.114 |ns/op| multiints 100 avgt 10 | 107.500 ? 0.636 | 107.318 ? 0.466 | 77.971 ? 0.296 | 47.700 ? 0.408 |ns/op| multiints 1000 avgt 10 | 924.920 ? 9.106 | 937.609 ? 44.303 | 695.427 ? 2.075 | 449.475 ? 2.061 |ns/op| multiints 10000 avgt 10 | 9322.880 ? 49.589 | 9277.425 ? 91.828 | 7009.704 ? 297.983 | 6196.819 ? 367.531 |ns/op| multiints 100000 avgt 10 | 88154.281 ? 279.258 | 88272.818 ? 103.608 | 64118.963 ? 6445.702 | 55317.212 ? 916.179 |ns/op| multishorts 1 avgt 10 | 3.488 ? 0.034 | 3.531 ? 0.227 | 3.521 ? 0.051 | 3.512 ? 0.054 |ns/op| multishorts 10 avgt 10 | 17.907 ? 0.380 | 17.408 ? 0.659 | 12.252 ? 0.110 | 13.445 ? 0.102 |ns/op| multishorts 100 avgt 10 | 106.588 ? 0.188 | 107.500 ? 0.531 | 79.630 ? 0.428 | 53.886 ? 3.243 |ns/op| multishorts 1000 avgt 10 | 931.732 ? 6.891 | 923.814 ? 11.836 | 701.534 ? 1.742 | 470.312 ? 2.117 |ns/op| multishorts 10000 avgt 10 | 9663.105 ? 1017.387 | 9859.034 ? 66.672 | 6422.864 ? 7.486 | 3785.710 ? 37.656 |ns/op| multishorts 100000 avgt 10 | 88799.262 ? 2363.672 | 88015.545 ? 52.795 | 60541.966 ? 155.521 | 33888.677 ? 127.071 |ns/op| shorts 1 avgt 10 | 20.199 ? 0.083 | 20.190 ? 0.027 | 21.389 ? 0.600 | 21.250 ? 0.024 |ns/op| shorts 10 avgt 10 | 35.842 ? 0.189 | 35.806 ? 0.167 | 30.960 ? 0.186 | 31.451 ? 0.182 |ns/op| shorts 100 avgt 10 | 184.323 ? 0.488 | 185.318 ? 0.776 | 143.652 ? 1.057 | 90.657 ? 0.052 |ns/op| shorts 1000 avgt 10 | 1664.583 ? 2.016 | 1666.803 ? 3.100 | 1118.623 ? 0.661 | 652.112 ? 0.346 |ns/op| shorts 10000 avgt 10 | 16395.042 ? 39.388 | 16426.231 ? 75.461 | 10933.090 ? 16.165 | 6200.135 ? 116.218 |ns/op| shorts 100000 avgt 10 | 165037.332 ? 226.003 | 167782.156 ? 8844.288 | 114329.012 ? 4326.851 | 61693.056 ? 93.278 |ns/op| ---------------------------------------------------------------------------------------------------------------------------------------------- ------------- PR Comment: https://git.openjdk.org/jdk/pull/17413#issuecomment-1890392431 From aph at openjdk.org Sat Jan 13 18:14:17 2024 From: aph at openjdk.org (Andrew Haley) Date: Sat, 13 Jan 2024 18:14:17 GMT Subject: RFR: 8323122: AArch64: Increase itable stub size estimate In-Reply-To: References: Message-ID: On Wed, 10 Jan 2024 02:03:08 GMT, Yude Lin wrote: > Since [JDK-8307352](https://bugs.openjdk.org/browse/JDK-8307352), itable stub size has grown by 20 bytes on linux-aarch64. In particular, the "slop-counted" code increases from 100->120 bytes, where the current estimate is 124 bytes. I haven't found a case where it exceeds the estimate. For now this size is stable across the few linux-aarch64 configurations I ran with. It doesn't vary (for example) by different klass decoding schemes. But I think the idea of the estimate is that we can never know. I propose we increase the estimate to be safe. > > Passed hotspot/jtreg/:tier1 Marked as reviewed by aph (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/17336#pullrequestreview-1820079078 From duke at openjdk.org Mon Jan 15 06:17:17 2024 From: duke at openjdk.org (Yude Lin) Date: Mon, 15 Jan 2024 06:17:17 GMT Subject: RFR: 8323122: AArch64: Increase itable stub size estimate In-Reply-To: References: Message-ID: On Sat, 13 Jan 2024 18:11:16 GMT, Andrew Haley wrote: >> Since [JDK-8307352](https://bugs.openjdk.org/browse/JDK-8307352), itable stub size has grown by 20 bytes on linux-aarch64. In particular, the "slop-counted" code increases from 100->120 bytes, where the current estimate is 124 bytes. I haven't found a case where it exceeds the estimate. For now this size is stable across the few linux-aarch64 configurations I ran with. It doesn't vary (for example) by different klass decoding schemes. But I think the idea of the estimate is that we can never know. I propose we increase the estimate to be safe. >> >> Passed hotspot/jtreg/:tier1 > > Marked as reviewed by aph (Reviewer). @theRealAph Do you mind sponsoring this change? Thank you. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17336#issuecomment-1891365720 From duke at openjdk.org Mon Jan 15 06:50:37 2024 From: duke at openjdk.org (Zhiqiang Zang) Date: Mon, 15 Jan 2024 06:50:37 GMT Subject: Integrated: 8322077: Add Ideal transformation: (~a) | (~b) => ~(a & b) In-Reply-To: References: Message-ID: On Tue, 24 Oct 2023 05:02:56 GMT, Zhiqiang Zang wrote: > Hello, > > (~a) | (~b) => ~(a & b) is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1617C28-L1617C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. > > Thanks. This pull request has now been integrated. Changeset: 1515bd7c Author: Zhiqiang Zang Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/1515bd7c9d70e3d6153fc82cd7db0502a15427aa Stats: 369 lines in 5 files changed: 369 ins; 0 del; 0 mod 8322077: Add Ideal transformation: (~a) | (~b) => ~(a & b) Reviewed-by: thartmann, epeter ------------- PR: https://git.openjdk.org/jdk/pull/16334 From epeter at openjdk.org Mon Jan 15 07:43:45 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 Jan 2024 07:43:45 GMT Subject: RFR: 8323641: Test compiler/loopopts/superword/TestAlignVectorFuzzer.java timed out [v2] In-Reply-To: References: Message-ID: > It seems that allowing `90%` of the timeout-time was cutting it too close. Some individual tests can take more time occasionally, one even took more than `80 sec` (very rare). > > Why do these tests take so long? > - Sometimes compilation can take quite a bit of time. `-XX:LoopUnrollLimit=250` already increases the number of nodes allowed in a loop-body, and with unrolling this increases significantly. SuperWord then has to work through all these nodes, and has a lot of quadratic and higher complexity loops. > - The loop bodies are quite large (hand unrolled), and often lead to partial vectorization, with lots of scalar memory ops. This produces quite sub-optimal code, with a bit too many instructions in the loop body. Combined with lots of array copying, this probably takes quite a hit on caches. > > I will investigate SuperWord compilation time in the future, and lower the runtime complexity if neccessary/possible, it is part of my autovectorization plans. > > I now lowered the allowance down to `40%`, which is hopefully small enough to avoid timeout, while still allowing sufficient many run to get decent test coverage. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: camelCase for local variable, for Christian ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17389/files - new: https://git.openjdk.org/jdk/pull/17389/files/27ba573c..e1a0deb8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17389&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17389&range=00-01 Stats: 15 lines in 1 file changed: 0 ins; 0 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/17389.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17389/head:pull/17389 PR: https://git.openjdk.org/jdk/pull/17389 From chagedorn at openjdk.org Mon Jan 15 07:48:20 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 15 Jan 2024 07:48:20 GMT Subject: RFR: 8323101: C2: assert(n->in(0) == nullptr) failed: divisions with zero check should already have bailed out earlier in split-if In-Reply-To: <7RQTgz8ZyoAIk6gpdZgDCFsqIBA_WK6IJ6OkHFLg_ts=.da8bb3a9-674c-4b2e-863f-0b2afcb37f34@github.com> References: <7RQTgz8ZyoAIk6gpdZgDCFsqIBA_WK6IJ6OkHFLg_ts=.da8bb3a9-674c-4b2e-863f-0b2afcb37f34@github.com> Message-ID: On Fri, 12 Jan 2024 10:52:55 GMT, Christian Hagedorn wrote: > The assertion added by [JDK-8299259](https://bugs.openjdk.org/browse/JDK-8299259) is wrong. I've originally assumed that at this point, there are no pinned `Div/Mod` nodes anymore that we possibly want to split through a phi due to bailing out earlier here: > https://github.com/openjdk/jdk/blob/3e19bf88d5b51fe10c183f930b99bce961a368c1/src/hotspot/share/opto/loopopts.cpp#L1137-L1140 > > The assumption was that a `Div/Mod` node with a control input to the zero-check `IfProj` could only have a `Phi` input with a `Region` that is further up in the graph. This is normally true. However, in `testIntDiv()`, we split an `If` with `do_split_if()` and need to empty the basic block. We split the store `iFld = sub` up. This includes the `StoreI` as well as the `AddI` which also has the `Region` as current `ctrl` (i.e. returned with `get_ctrl()`). > > The `AddI` also has the `DivI` as output which, however, is not pushed up since it's not part of the same basic block. We end up with the following graph after the completion of `do_split_if()`: > > ![image](https://github.com/openjdk/jdk/assets/17833009/b76293e1-a593-4319-b026-254be3c098fc) > > The `DivI` now has `252 Phi` as input which merges the split `AddI` nodes. `246 Region` of the `252 Phi` is further down than the control input `83 IfTrue` of the `DivI`.This is rather unusual and thus was missed when the assert was added in JDK-8299259. When finally processing the `DivI` node in the DFS walk of Split-If, we fail with the assertion. > > The fix is straight forward to turn this assert into a simple bailout: We should not split a `Div/Mod` node that is pinned (i.e. has a zero check). > > While working on this bug, I've also tried to trigger the assert with `DivL/ModL` nodes. However, this did not work because `split_up()` does not split the `Add` node up. The reason is that we set late ctrl to early ctrl for the `DivL/ModL` node (and thus also set the same late ctrl for the `Add` node) while we do not do that for `DivI/ModI` nodes. It seems that we miss to treat `DivL/ModL` nodes as unpinned here which would allow us to set a later ctrl: > https://github.com/openjdk/jdk/blob/82a63a03c0155288e8e43b9f766c8be70be50b6a/src/hotspot/share/opto/loopnode.cpp#L6091-L6101 > > I've done some digging and found that `DivL/ModL` nodes were added after this switch statment. So, I assume we simply forgot to also treat them as unpinned here. It's not wrong but I think just an unnecessary limitation. I filed [JDK-8323652](https://bugs.openjdk.org/browse/JDK-... Thanks Vladimir for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/17394#issuecomment-1891490208 From chagedorn at openjdk.org Mon Jan 15 07:48:20 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 15 Jan 2024 07:48:20 GMT Subject: RFR: 8323641: Test compiler/loopopts/superword/TestAlignVectorFuzzer.java timed out [v2] In-Reply-To: References: Message-ID: <9y9UeYlQm73fi4vs-_-sFAl4IvwDdVQQn_o6BQ6CHnM=.e303a50e-d10e-42a8-ad78-3a62e49c0177@github.com> On Mon, 15 Jan 2024 07:43:45 GMT, Emanuel Peter wrote: >> It seems that allowing `90%` of the timeout-time was cutting it too close. Some individual tests can take more time occasionally, one even took more than `80 sec` (very rare). >> >> Why do these tests take so long? >> - Sometimes compilation can take quite a bit of time. `-XX:LoopUnrollLimit=250` already increases the number of nodes allowed in a loop-body, and with unrolling this increases significantly. SuperWord then has to work through all these nodes, and has a lot of quadratic and higher complexity loops. >> - The loop bodies are quite large (hand unrolled), and often lead to partial vectorization, with lots of scalar memory ops. This produces quite sub-optimal code, with a bit too many instructions in the loop body. Combined with lots of array copying, this probably takes quite a hit on caches. >> >> I will investigate SuperWord compilation time in the future, and lower the runtime complexity if neccessary/possible, it is part of my autovectorization plans. >> >> I now lowered the allowance down to `40%`, which is hopefully small enough to avoid timeout, while still allowing sufficient many run to get decent test coverage. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > camelCase for local variable, for Christian Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/17389#pullrequestreview-1821118498 From aturbanov at openjdk.org Mon Jan 15 07:52:21 2024 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Mon, 15 Jan 2024 07:52:21 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v5] In-Reply-To: References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Message-ID: On Tue, 9 Jan 2024 16:48:56 GMT, Jatin Bhateja wrote: >> Hi, >> >> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only targets. >> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 instruction set. >> These are very frequently used APIs in columnar database filter operation. >> >> Implementation uses a lookup table to record permute indices. Table index is computed using >> mask argument of compress/expand operation. >> >> Following are the performance number of JMH micro included with the patch. >> >> >> System : Intel(R) Xeon(R) Platinum 8480+ (Sapphire Rapids) >> >> Baseline: >> Benchmark (size) Mode Cnt Score Error Units >> ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 142.767 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 71.436 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 35.992 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 182.151 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 91.096 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 44.757 ops/ms >> ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 184.099 ops/ms >> ColumnFilterBenchmark.filterIntColumn 2047 thrpt 2 91.981 ops/ms >> ColumnFilterBenchmark.filterIntColumn 4096 thrpt 2 45.170 ops/ms >> ColumnFilterBenchmark.filterLongColumn 1024 thrpt 2 148.017 ops/ms >> ColumnFilterBenchmark.filterLongColumn 2047 thrpt 2 73.516 ops/ms >> ColumnFilterBenchmark.filterLongColumn 4096 thrpt 2 36.844 ops/ms >> >> Withopt: >> Benchmark (size) Mode Cnt Score Error Units >> ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 2051.707 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 914.072 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 489.898 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 5324.195 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 2587.229 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 1278.665 ops/ms >> ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 4149.384 ops/ms >> ColumnFilterBenchmark.filterIntColumn 2047 thrpt ... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Using emulated variable blend E-Core optimized instruction. test/micro/org/openjdk/bench/jdk/incubator/vector/ColumnFilterBenchmark.java line 37: > 35: @Fork(jvmArgsPrepend = {"--add-modules=jdk.incubator.vector", "-XX:UseAVX=2"}) > 36: public class ColumnFilterBenchmark { > 37: @Param({"1024","2047", "4096"}) Suggestion: @Param({"1024", "2047", "4096"}) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1452021322 From rcastanedalo at openjdk.org Mon Jan 15 09:01:14 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 15 Jan 2024 09:01:14 GMT Subject: RFR: 8322692: ZGC: avoid over-unrolling due to hidden barrier size [v2] In-Reply-To: References: Message-ID: > This changeset refines the C2 loop unrolling heuristic by including an estimation of the final size of (Generational) ZGC barriers in the loop size computation. These are not exposed in C2's intermediate representation and thus currently ignored by the heuristic, which can lead to over-unrolling. > > #### Testing > > - tier1-5, stress test, fuzzing (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64). > - tier6-9 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64, ZGC-specific tests only). > > #### Performance and code size evaluation > > - DaCapo, SPECjvm2008, SPECjbb2015 (linux-x64 with `-XX:+UseZGC -XX:+ZGenerational`). The changeset reduces slightly the size of the C2-generated code (around 0.3% fewer bytes per compiled bytecode for the DaCapo `fop` benchmark) and speeds up SPECjvm2008's `Serial` by around 4%. Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision: - Update copyright years - Exclude size of slow path from estimation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17367/files - new: https://git.openjdk.org/jdk/pull/17367/files/6390878f..f90046b1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17367&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17367&range=00-01 Stats: 9 lines in 3 files changed: 0 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/17367.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17367/head:pull/17367 PR: https://git.openjdk.org/jdk/pull/17367 From rcastanedalo at openjdk.org Mon Jan 15 09:01:15 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 15 Jan 2024 09:01:15 GMT Subject: RFR: 8322692: ZGC: avoid over-unrolling due to hidden barrier size In-Reply-To: References: Message-ID: On Thu, 11 Jan 2024 08:47:41 GMT, Roberto Casta?eda Lozano wrote: > This changeset refines the C2 loop unrolling heuristic by including an estimation of the final size of (Generational) ZGC barriers in the loop size computation. These are not exposed in C2's intermediate representation and thus currently ignored by the heuristic, which can lead to over-unrolling. > > #### Testing > > - tier1-5, stress test, fuzzing (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64). > - tier6-9 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64, ZGC-specific tests only). > > #### Performance and code size evaluation > > - DaCapo, SPECjvm2008, SPECjbb2015 (linux-x64 with `-XX:+UseZGC -XX:+ZGenerational`). The changeset reduces slightly the size of the C2-generated code (around 0.3% fewer bytes per compiled bytecode for the DaCapo `fop` benchmark) and speeds up SPECjvm2008's `Serial` by around 4%. The latest changes exclude the barrier slow path from the loop size estimation, as suggested by @fisk (offline) and @merykitty. Compared to the original changeset, this makes loop unrolling for ZGC more aggressive at the expense of code size, which is deemed acceptable in the typical scenarios in which ZGC is used. Compared to mainline, the code size improvement is now reduced to a mere 0.3% for DaCapo `fop` only, but in return SPECjvm2008 `Serial` is sped up by 4%. Please review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17367#issuecomment-1891614933 From epeter at openjdk.org Mon Jan 15 09:13:15 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 Jan 2024 09:13:15 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v5] In-Reply-To: References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Message-ID: On Tue, 9 Jan 2024 16:48:56 GMT, Jatin Bhateja wrote: >> Hi, >> >> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only targets. >> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 instruction set. >> These are very frequently used APIs in columnar database filter operation. >> >> Implementation uses a lookup table to record permute indices. Table index is computed using >> mask argument of compress/expand operation. >> >> Following are the performance number of JMH micro included with the patch. >> >> >> System : Intel(R) Xeon(R) Platinum 8480+ (Sapphire Rapids) >> >> Baseline: >> Benchmark (size) Mode Cnt Score Error Units >> ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 142.767 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 71.436 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 35.992 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 182.151 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 91.096 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 44.757 ops/ms >> ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 184.099 ops/ms >> ColumnFilterBenchmark.filterIntColumn 2047 thrpt 2 91.981 ops/ms >> ColumnFilterBenchmark.filterIntColumn 4096 thrpt 2 45.170 ops/ms >> ColumnFilterBenchmark.filterLongColumn 1024 thrpt 2 148.017 ops/ms >> ColumnFilterBenchmark.filterLongColumn 2047 thrpt 2 73.516 ops/ms >> ColumnFilterBenchmark.filterLongColumn 4096 thrpt 2 36.844 ops/ms >> >> Withopt: >> Benchmark (size) Mode Cnt Score Error Units >> ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 2051.707 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 914.072 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 489.898 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 5324.195 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 2587.229 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 1278.665 ops/ms >> ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 4149.384 ops/ms >> ColumnFilterBenchmark.filterIntColumn 2047 thrpt ... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Using emulated variable blend E-Core optimized instruction. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 5309: > 5307: assert(bt == T_LONG || bt == T_DOUBLE, ""); > 5308: vmovmskpd(rtmp, mask, vec_enc); > 5309: shlq(rtmp, 5); // for 64 bit rows (4 longs) Suggestion: shlq(rtmp, 5); // for 32 bit rows (4 longs) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1452098849 From dlunden at openjdk.org Mon Jan 15 09:35:20 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 15 Jan 2024 09:35:20 GMT Subject: RFR: 8322996: BoxLockNode creation fails with assert(reg < CHUNK_SIZE) failed: sanity [v3] In-Reply-To: References: <66zey4Kt56pbcieRZ0AjBLemzwsAQcHCdDG1nlK-P1c=.14dc02d6-bef8-4239-9f39-02456756b7ea@github.com> Message-ID: On Fri, 12 Jan 2024 23:13:56 GMT, Vladimir Kozlov wrote: >> Thanks @vnkozlov. Do you know if we can directly use `can_represent` instead, and not take `sync_stack_slots()` into account? The field `_inmask` in `BoxLockNode` seems to only specify a single register (one bit in the mask). > > I think your current code is correct. > > On x64 `sync_stack_slots` defined as 2 (takes 2 bits in regmask) in `x86_64.ad` and as 1 in `x86_32.ad`. On most 64 bit platforms it is also 2 slots, from what I see. But we can't guarantee that some platforms will not have bigger value. We can't use last odd bit on 64 bit platform in regmask - it is taking anyway already by "infinite stack flag". Yes, that is my intuition as well. Therefore, I'm left wondering if the [construction of `_inmask`](https://github.com/dlunde/jdk/blob/9ab6e561780aee0f2cc2f06cd40ec487d60fe39c/src/hotspot/share/opto/locknode.cpp#L51) in the `BoxLockNode` constructor is incorrect, as it always just sets a single bit in the mask (no matter the value of `sync_stack_slots()`). Should we perhaps change it to instead set the range [ reg, reg + sync_stack_slots() ) in `_inmask`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17370#discussion_r1452123954 From tholenstein at openjdk.org Mon Jan 15 10:18:53 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 15 Jan 2024 10:18:53 GMT Subject: RFR: JDK-8316756: C2 EA fails with "missing memory path" when encountering unsafe_arraycopy stub call [v2] In-Reply-To: References: Message-ID: > Before https://github.com/openjdk/jdk/pull/5259 the graph of the following program looked like this during Escape Analysis: > > > static int test() { > MyClass obj = new MyClass(); // Non-escaping to trigger Escape Analysis > UNSAFE.copyMemory(null, SRC_BASE, null, DST_BASE, 4); > obj.x = 42; > return obj.x; > } > > With MemBarCPUOrder: > working > > Setting `RC_NARROW_MEM` flag in `LibraryCallKit::inline_unsafe_copyMemory()` removes the `428 MergeMem` node. Without a `MergeMem` node after `432 StoreB` EA failes in Phase 2 of `ConnectionGraph::split_unique_types(...)` when trying to push the allocation's users on the appropriate worklist - `429 CallLeafNoFP` is not an expected user of `428 StoreB`. Therefore the assert `"EA: missing memory path"` is hit. > Without MemBarCPUOrdera and after setting `RC_NARROW_MEM`: > failing > > > ### Proposed Fix > Dropping the `RC_NARROW_MEM` flag in `LibraryCallKit::inline_unsafe_copyMemory()` causes the introduction of a `MergeMem` between `StoreB` and `CallLeafNoFP`, so the corresponding code in EA doesn't encounter a `CallLeafNoFP` anymore: > fixed > > Testing: Tier1-4 passed Tobias Holenstein has updated the pull request incrementally with two additional commits since the last revision: - vladimir EA patch - undo fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17347/files - new: https://git.openjdk.org/jdk/pull/17347/files/88b4e827..8fafb163 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17347&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17347&range=00-01 Stats: 47 lines in 2 files changed: 44 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/17347.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17347/head:pull/17347 PR: https://git.openjdk.org/jdk/pull/17347 From epeter at openjdk.org Mon Jan 15 10:43:27 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 Jan 2024 10:43:27 GMT Subject: RFR: 8323577: C2 SuperWord: remove AlignVector restrictions on IR tests added in JDK-8305055 In-Reply-To: References: Message-ID: On Thu, 11 Jan 2024 13:14:59 GMT, Christian Hagedorn wrote: >> These IR rules were restricted in [JDK-8305055](https://bugs.openjdk.org/browse/JDK-8305055), [PR for comparison](https://git.openjdk.org/jdk/pull/13236). >> >> This had to be done because those cases were no longer vectorized with `AlignVector` after my bugfix [JDK-8298935](https://bugs.openjdk.org/browse/JDK-8298935), they were "collateral damage". Before this bugfix, we would vectorize, even though the alignment constraints had rejected some memops, but then they were re-intriduced during pair extension. This re-introduction led to this bug for other reasons. I proposed to restore vectorization (i.e. fix the "collateral damage") by improving alignment-constraints ([JDK-8303827](https://bugs.openjdk.org/browse/JDK-8303827)). >> >> Since I had to already completely rework the alignment constraints because of a bug, I relaxed the constraints: >> [JDK-8310190](https://bugs.openjdk.org/browse/JDK-8310190) >> >> Now I can remove the restrictions on those rules. > > Looks good! @chhagedorn @robcasloz thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/17369#issuecomment-1891843377 From epeter at openjdk.org Mon Jan 15 10:43:29 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 Jan 2024 10:43:29 GMT Subject: Integrated: 8323577: C2 SuperWord: remove AlignVector restrictions on IR tests added in JDK-8305055 In-Reply-To: References: Message-ID: On Thu, 11 Jan 2024 10:16:14 GMT, Emanuel Peter wrote: > These IR rules were restricted in [JDK-8305055](https://bugs.openjdk.org/browse/JDK-8305055), [PR for comparison](https://git.openjdk.org/jdk/pull/13236). > > This had to be done because those cases were no longer vectorized with `AlignVector` after my bugfix [JDK-8298935](https://bugs.openjdk.org/browse/JDK-8298935), they were "collateral damage". Before this bugfix, we would vectorize, even though the alignment constraints had rejected some memops, but then they were re-intriduced during pair extension. This re-introduction led to this bug for other reasons. I proposed to restore vectorization (i.e. fix the "collateral damage") by improving alignment-constraints ([JDK-8303827](https://bugs.openjdk.org/browse/JDK-8303827)). > > Since I had to already completely rework the alignment constraints because of a bug, I relaxed the constraints: > [JDK-8310190](https://bugs.openjdk.org/browse/JDK-8310190) > > Now I can remove the restrictions on those rules. This pull request has now been integrated. Changeset: 45c65e6b Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/45c65e6b1ac06aa06757393f1752661252e6f827 Stats: 13 lines in 2 files changed: 0 ins; 12 del; 1 mod 8323577: C2 SuperWord: remove AlignVector restrictions on IR tests added in JDK-8305055 Reviewed-by: rcastanedalo, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/17369 From epeter at openjdk.org Mon Jan 15 10:47:28 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 Jan 2024 10:47:28 GMT Subject: RFR: 8323641: Test compiler/loopopts/superword/TestAlignVectorFuzzer.java timed out [v2] In-Reply-To: <9y9UeYlQm73fi4vs-_-sFAl4IvwDdVQQn_o6BQ6CHnM=.e303a50e-d10e-42a8-ad78-3a62e49c0177@github.com> References: <9y9UeYlQm73fi4vs-_-sFAl4IvwDdVQQn_o6BQ6CHnM=.e303a50e-d10e-42a8-ad78-3a62e49c0177@github.com> Message-ID: On Mon, 15 Jan 2024 07:45:37 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> camelCase for local variable, for Christian > > Marked as reviewed by chagedorn (Reviewer). @chhagedorn @vnkozlov thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/17389#issuecomment-1891850165 From epeter at openjdk.org Mon Jan 15 10:47:30 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 Jan 2024 10:47:30 GMT Subject: Integrated: 8323641: Test compiler/loopopts/superword/TestAlignVectorFuzzer.java timed out In-Reply-To: References: Message-ID: On Fri, 12 Jan 2024 08:22:54 GMT, Emanuel Peter wrote: > It seems that allowing `90%` of the timeout-time was cutting it too close. Some individual tests can take more time occasionally, one even took more than `80 sec` (very rare). > > Why do these tests take so long? > - Sometimes compilation can take quite a bit of time. `-XX:LoopUnrollLimit=250` already increases the number of nodes allowed in a loop-body, and with unrolling this increases significantly. SuperWord then has to work through all these nodes, and has a lot of quadratic and higher complexity loops. > - The loop bodies are quite large (hand unrolled), and often lead to partial vectorization, with lots of scalar memory ops. This produces quite sub-optimal code, with a bit too many instructions in the loop body. Combined with lots of array copying, this probably takes quite a hit on caches. > > I will investigate SuperWord compilation time in the future, and lower the runtime complexity if neccessary/possible, it is part of my autovectorization plans. > > I now lowered the allowance down to `40%`, which is hopefully small enough to avoid timeout, while still allowing sufficient many run to get decent test coverage. This pull request has now been integrated. Changeset: cd0fe377 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/cd0fe377417be65dbf1338d8b47da8817985c7d8 Stats: 18 lines in 1 file changed: 2 ins; 0 del; 16 mod 8323641: Test compiler/loopopts/superword/TestAlignVectorFuzzer.java timed out Reviewed-by: chagedorn, kvn ------------- PR: https://git.openjdk.org/jdk/pull/17389 From thartmann at openjdk.org Mon Jan 15 11:55:19 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 15 Jan 2024 11:55:19 GMT Subject: RFR: 8323101: C2: assert(n->in(0) == nullptr) failed: divisions with zero check should already have bailed out earlier in split-if In-Reply-To: <7RQTgz8ZyoAIk6gpdZgDCFsqIBA_WK6IJ6OkHFLg_ts=.da8bb3a9-674c-4b2e-863f-0b2afcb37f34@github.com> References: <7RQTgz8ZyoAIk6gpdZgDCFsqIBA_WK6IJ6OkHFLg_ts=.da8bb3a9-674c-4b2e-863f-0b2afcb37f34@github.com> Message-ID: On Fri, 12 Jan 2024 10:52:55 GMT, Christian Hagedorn wrote: > The assertion added by [JDK-8299259](https://bugs.openjdk.org/browse/JDK-8299259) is wrong. I've originally assumed that at this point, there are no pinned `Div/Mod` nodes anymore that we possibly want to split through a phi due to bailing out earlier here: > https://github.com/openjdk/jdk/blob/3e19bf88d5b51fe10c183f930b99bce961a368c1/src/hotspot/share/opto/loopopts.cpp#L1137-L1140 > > The assumption was that a `Div/Mod` node with a control input to the zero-check `IfProj` could only have a `Phi` input with a `Region` that is further up in the graph. This is normally true. However, in `testIntDiv()`, we split an `If` with `do_split_if()` and need to empty the basic block. We split the store `iFld = sub` up. This includes the `StoreI` as well as the `AddI` which also has the `Region` as current `ctrl` (i.e. returned with `get_ctrl()`). > > The `AddI` also has the `DivI` as output which, however, is not pushed up since it's not part of the same basic block. We end up with the following graph after the completion of `do_split_if()`: > > ![image](https://github.com/openjdk/jdk/assets/17833009/b76293e1-a593-4319-b026-254be3c098fc) > > The `DivI` now has `252 Phi` as input which merges the split `AddI` nodes. `246 Region` of the `252 Phi` is further down than the control input `83 IfTrue` of the `DivI`.This is rather unusual and thus was missed when the assert was added in JDK-8299259. When finally processing the `DivI` node in the DFS walk of Split-If, we fail with the assertion. > > The fix is straight forward to turn this assert into a simple bailout: We should not split a `Div/Mod` node that is pinned (i.e. has a zero check). > > While working on this bug, I've also tried to trigger the assert with `DivL/ModL` nodes. However, this did not work because `split_up()` does not split the `Add` node up. The reason is that we set late ctrl to early ctrl for the `DivL/ModL` node (and thus also set the same late ctrl for the `Add` node) while we do not do that for `DivI/ModI` nodes. It seems that we miss to treat `DivL/ModL` nodes as unpinned here which would allow us to set a later ctrl: > https://github.com/openjdk/jdk/blob/82a63a03c0155288e8e43b9f766c8be70be50b6a/src/hotspot/share/opto/loopnode.cpp#L6091-L6101 > > I've done some digging and found that `DivL/ModL` nodes were added after this switch statment. So, I assume we simply forgot to also treat them as unpinned here. It's not wrong but I think just an unnecessary limitation. I filed [JDK-8323652](https://bugs.openjdk.org/browse/JDK-... Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17394#pullrequestreview-1821534844 From tholenstein at openjdk.org Mon Jan 15 11:58:57 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 15 Jan 2024 11:58:57 GMT Subject: RFR: JDK-8316756: C2 EA fails with "missing memory path" when encountering unsafe_arraycopy stub call [v3] In-Reply-To: References: Message-ID: > Before https://github.com/openjdk/jdk/pull/5259 the graph of the following program looked like this during Escape Analysis: > > > static int test() { > MyClass obj = new MyClass(); // Non-escaping to trigger Escape Analysis > UNSAFE.copyMemory(null, SRC_BASE, null, DST_BASE, 4); > obj.x = 42; > return obj.x; > } > > With MemBarCPUOrder: > working > > Setting `RC_NARROW_MEM` flag in `LibraryCallKit::inline_unsafe_copyMemory()` removes the `428 MergeMem` node. Without a `MergeMem` node after `432 StoreB` EA failes in Phase 2 of `ConnectionGraph::split_unique_types(...)` when trying to push the allocation's users on the appropriate worklist - `429 CallLeafNoFP` is not an expected user of `428 StoreB`. Therefore the assert `"EA: missing memory path"` is hit. > Without MemBarCPUOrdera and after setting `RC_NARROW_MEM`: > failing > > > ### Proposed Fix > Dropping the `RC_NARROW_MEM` flag in `LibraryCallKit::inline_unsafe_copyMemory()` causes the introduction of a `MergeMem` between `StoreB` and `CallLeafNoFP`, so the corresponding code in EA doesn't encounter a `CallLeafNoFP` anymore: > fixed > > Testing: Tier1-4 passed Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: added testcase ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17347/files - new: https://git.openjdk.org/jdk/pull/17347/files/8fafb163..46adf3c3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17347&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17347&range=01-02 Stats: 12 lines in 1 file changed: 11 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17347.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17347/head:pull/17347 PR: https://git.openjdk.org/jdk/pull/17347 From shade at openjdk.org Mon Jan 15 12:05:32 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 15 Jan 2024 12:05:32 GMT Subject: RFR: 8323584: AArch64: Unnecessary ResourceMark in NativeCall::set_destination_mt_safe In-Reply-To: References: Message-ID: On Thu, 11 Jan 2024 12:17:21 GMT, Aleksey Shipilev wrote: > Was looking at `ICBuffer` cleaning paths that run at safepoint. In `NativeCall::set_destination_mt_safe`, there is a `ResourceMark` that does not seem to have any purpose: no code in its scope uses resource allocation. There is adjacent dead code too. Looks like a development/debugging leftover. > > Additional testing: > - [x] Linux x86_64 AArch64 server fastdebug, `tier{1,2,3}` GHAs are finally unbroken, so I am integrating. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/17372#issuecomment-1892037063 From shade at openjdk.org Mon Jan 15 12:05:32 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 15 Jan 2024 12:05:32 GMT Subject: Integrated: 8323584: AArch64: Unnecessary ResourceMark in NativeCall::set_destination_mt_safe In-Reply-To: References: Message-ID: <6ad0cq3ynz0AER3sPkDfNPpCv6ihV2opoHhdS4CjL4U=.a2324876-bd50-4f59-b16e-7df876fa44eb@github.com> On Thu, 11 Jan 2024 12:17:21 GMT, Aleksey Shipilev wrote: > Was looking at `ICBuffer` cleaning paths that run at safepoint. In `NativeCall::set_destination_mt_safe`, there is a `ResourceMark` that does not seem to have any purpose: no code in its scope uses resource allocation. There is adjacent dead code too. Looks like a development/debugging leftover. > > Additional testing: > - [x] Linux x86_64 AArch64 server fastdebug, `tier{1,2,3}` This pull request has now been integrated. Changeset: 34f85ee9 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/34f85ee94e8b45bcebbf8ba52a38c92a7185b54a Stats: 3 lines in 1 file changed: 0 ins; 3 del; 0 mod 8323584: AArch64: Unnecessary ResourceMark in NativeCall::set_destination_mt_safe Reviewed-by: tholenstein, fyang, aph ------------- PR: https://git.openjdk.org/jdk/pull/17372 From chagedorn at openjdk.org Mon Jan 15 12:15:20 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 15 Jan 2024 12:15:20 GMT Subject: RFR: 8323101: C2: assert(n->in(0) == nullptr) failed: divisions with zero check should already have bailed out earlier in split-if In-Reply-To: <7RQTgz8ZyoAIk6gpdZgDCFsqIBA_WK6IJ6OkHFLg_ts=.da8bb3a9-674c-4b2e-863f-0b2afcb37f34@github.com> References: <7RQTgz8ZyoAIk6gpdZgDCFsqIBA_WK6IJ6OkHFLg_ts=.da8bb3a9-674c-4b2e-863f-0b2afcb37f34@github.com> Message-ID: <-4OsvT1XAkR-2dFkAT7oiaQwdntUYJ2HLx6LukGGovM=.757551b8-4a15-4488-bd66-ac3b8ec35973@github.com> On Fri, 12 Jan 2024 10:52:55 GMT, Christian Hagedorn wrote: > The assertion added by [JDK-8299259](https://bugs.openjdk.org/browse/JDK-8299259) is wrong. I've originally assumed that at this point, there are no pinned `Div/Mod` nodes anymore that we possibly want to split through a phi due to bailing out earlier here: > https://github.com/openjdk/jdk/blob/3e19bf88d5b51fe10c183f930b99bce961a368c1/src/hotspot/share/opto/loopopts.cpp#L1137-L1140 > > The assumption was that a `Div/Mod` node with a control input to the zero-check `IfProj` could only have a `Phi` input with a `Region` that is further up in the graph. This is normally true. However, in `testIntDiv()`, we split an `If` with `do_split_if()` and need to empty the basic block. We split the store `iFld = sub` up. This includes the `StoreI` as well as the `AddI` which also has the `Region` as current `ctrl` (i.e. returned with `get_ctrl()`). > > The `AddI` also has the `DivI` as output which, however, is not pushed up since it's not part of the same basic block. We end up with the following graph after the completion of `do_split_if()`: > > ![image](https://github.com/openjdk/jdk/assets/17833009/b76293e1-a593-4319-b026-254be3c098fc) > > The `DivI` now has `252 Phi` as input which merges the split `AddI` nodes. `246 Region` of the `252 Phi` is further down than the control input `83 IfTrue` of the `DivI`.This is rather unusual and thus was missed when the assert was added in JDK-8299259. When finally processing the `DivI` node in the DFS walk of Split-If, we fail with the assertion. > > The fix is straight forward to turn this assert into a simple bailout: We should not split a `Div/Mod` node that is pinned (i.e. has a zero check). > > While working on this bug, I've also tried to trigger the assert with `DivL/ModL` nodes. However, this did not work because `split_up()` does not split the `Add` node up. The reason is that we set late ctrl to early ctrl for the `DivL/ModL` node (and thus also set the same late ctrl for the `Add` node) while we do not do that for `DivI/ModI` nodes. It seems that we miss to treat `DivL/ModL` nodes as unpinned here which would allow us to set a later ctrl: > https://github.com/openjdk/jdk/blob/82a63a03c0155288e8e43b9f766c8be70be50b6a/src/hotspot/share/opto/loopnode.cpp#L6091-L6101 > > I've done some digging and found that `DivL/ModL` nodes were added after this switch statment. So, I assume we simply forgot to also treat them as unpinned here. It's not wrong but I think just an unnecessary limitation. I filed [JDK-8323652](https://bugs.openjdk.org/browse/JDK-... Thanks Tobias for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/17394#issuecomment-1892053925 From chagedorn at openjdk.org Mon Jan 15 12:19:27 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 15 Jan 2024 12:19:27 GMT Subject: Integrated: 8323101: C2: assert(n->in(0) == nullptr) failed: divisions with zero check should already have bailed out earlier in split-if In-Reply-To: <7RQTgz8ZyoAIk6gpdZgDCFsqIBA_WK6IJ6OkHFLg_ts=.da8bb3a9-674c-4b2e-863f-0b2afcb37f34@github.com> References: <7RQTgz8ZyoAIk6gpdZgDCFsqIBA_WK6IJ6OkHFLg_ts=.da8bb3a9-674c-4b2e-863f-0b2afcb37f34@github.com> Message-ID: On Fri, 12 Jan 2024 10:52:55 GMT, Christian Hagedorn wrote: > The assertion added by [JDK-8299259](https://bugs.openjdk.org/browse/JDK-8299259) is wrong. I've originally assumed that at this point, there are no pinned `Div/Mod` nodes anymore that we possibly want to split through a phi due to bailing out earlier here: > https://github.com/openjdk/jdk/blob/3e19bf88d5b51fe10c183f930b99bce961a368c1/src/hotspot/share/opto/loopopts.cpp#L1137-L1140 > > The assumption was that a `Div/Mod` node with a control input to the zero-check `IfProj` could only have a `Phi` input with a `Region` that is further up in the graph. This is normally true. However, in `testIntDiv()`, we split an `If` with `do_split_if()` and need to empty the basic block. We split the store `iFld = sub` up. This includes the `StoreI` as well as the `AddI` which also has the `Region` as current `ctrl` (i.e. returned with `get_ctrl()`). > > The `AddI` also has the `DivI` as output which, however, is not pushed up since it's not part of the same basic block. We end up with the following graph after the completion of `do_split_if()`: > > ![image](https://github.com/openjdk/jdk/assets/17833009/b76293e1-a593-4319-b026-254be3c098fc) > > The `DivI` now has `252 Phi` as input which merges the split `AddI` nodes. `246 Region` of the `252 Phi` is further down than the control input `83 IfTrue` of the `DivI`.This is rather unusual and thus was missed when the assert was added in JDK-8299259. When finally processing the `DivI` node in the DFS walk of Split-If, we fail with the assertion. > > The fix is straight forward to turn this assert into a simple bailout: We should not split a `Div/Mod` node that is pinned (i.e. has a zero check). > > While working on this bug, I've also tried to trigger the assert with `DivL/ModL` nodes. However, this did not work because `split_up()` does not split the `Add` node up. The reason is that we set late ctrl to early ctrl for the `DivL/ModL` node (and thus also set the same late ctrl for the `Add` node) while we do not do that for `DivI/ModI` nodes. It seems that we miss to treat `DivL/ModL` nodes as unpinned here which would allow us to set a later ctrl: > https://github.com/openjdk/jdk/blob/82a63a03c0155288e8e43b9f766c8be70be50b6a/src/hotspot/share/opto/loopnode.cpp#L6091-L6101 > > I've done some digging and found that `DivL/ModL` nodes were added after this switch statment. So, I assume we simply forgot to also treat them as unpinned here. It's not wrong but I think just an unnecessary limitation. I filed [JDK-8323652](https://bugs.openjdk.org/browse/JDK-... This pull request has now been integrated. Changeset: 7e0a4ed6 Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/7e0a4ed6292586772c23292dbdd67ed1db5c12f7 Stats: 215 lines in 2 files changed: 214 ins; 0 del; 1 mod 8323101: C2: assert(n->in(0) == nullptr) failed: divisions with zero check should already have bailed out earlier in split-if Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/17394 From chagedorn at openjdk.org Mon Jan 15 12:41:38 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 15 Jan 2024 12:41:38 GMT Subject: [jdk22] RFR: 8323101: C2: assert(n->in(0) == nullptr) failed: divisions with zero check should already have bailed out earlier in split-if Message-ID: Hi all, This pull request contains a backport of commit [7e0a4ed6](https://github.com/openjdk/jdk/commit/7e0a4ed6292586772c23292dbdd67ed1db5c12f7) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. The commit being backported was authored by Christian Hagedorn on 15 Jan 2024 and was reviewed by Vladimir Kozlov and Tobias Hartmann. Thanks! ------------- Commit messages: - Backport 7e0a4ed6292586772c23292dbdd67ed1db5c12f7 Changes: https://git.openjdk.org/jdk22/pull/76/files Webrev: https://webrevs.openjdk.org/?repo=jdk22&pr=76&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8323101 Stats: 215 lines in 2 files changed: 214 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk22/pull/76.diff Fetch: git fetch https://git.openjdk.org/jdk22.git pull/76/head:pull/76 PR: https://git.openjdk.org/jdk22/pull/76 From shade at openjdk.org Mon Jan 15 12:47:20 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 15 Jan 2024 12:47:20 GMT Subject: RFR: 8323122: AArch64: Increase itable stub size estimate In-Reply-To: References: Message-ID: <8OFy0pwaz3FzYTpOBkiGBC8OfpiXc-xNBgtALoPDcwg=.93064678-382e-444a-87cf-2625b418ce96@github.com> On Wed, 10 Jan 2024 02:03:08 GMT, Yude Lin wrote: > Since [JDK-8307352](https://bugs.openjdk.org/browse/JDK-8307352), itable stub size has grown by 20 bytes on linux-aarch64. In particular, the "slop-counted" code increases from 100->120 bytes, where the current estimate is 124 bytes. I haven't found a case where it exceeds the estimate. For now this size is stable across the few linux-aarch64 configurations I ran with. It doesn't vary (for example) by different klass decoding schemes. But I think the idea of the estimate is that we can never know. I propose we increase the estimate to be safe. > > Passed hotspot/jtreg/:tier1 I think the patch is fine, but @eastig should also take a look. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17336#issuecomment-1892110677 From thartmann at openjdk.org Mon Jan 15 13:25:22 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 15 Jan 2024 13:25:22 GMT Subject: [jdk22] RFR: 8323101: C2: assert(n->in(0) == nullptr) failed: divisions with zero check should already have bailed out earlier in split-if In-Reply-To: References: Message-ID: On Mon, 15 Jan 2024 12:34:56 GMT, Christian Hagedorn wrote: > Hi all, > > This pull request contains a backport of commit [7e0a4ed6](https://github.com/openjdk/jdk/commit/7e0a4ed6292586772c23292dbdd67ed1db5c12f7) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Christian Hagedorn on 15 Jan 2024 and was reviewed by Vladimir Kozlov and Tobias Hartmann. > > Thanks! Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk22/pull/76#pullrequestreview-1821680214 From aturbanov at openjdk.org Mon Jan 15 13:30:23 2024 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Mon, 15 Jan 2024 13:30:23 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v4] In-Reply-To: References: Message-ID: <-SHK5KpqFJ7_6UezPIKgCGADJC58fcvt9gFn2jHMHNY=.dad87280-3d86-46fb-88d9-ef667657d107@github.com> On Thu, 4 Jan 2024 08:12:22 GMT, Roland Westrelin wrote: >> This change implements C2 optimizations for calls to >> ScopedValue.get(). Indeed, in: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> `v2` can be replaced by `v1` and the second call to `get()` can be >> optimized out. That's true whatever is between the 2 calls unless a >> new mapping for `scopedValue` is created in between (when that happens >> no optimizations is performed for the method being compiled). Hoisting >> a `get()` call out of loop for a loop invariant `scopedValue` should >> also be legal in most cases. >> >> `ScopedValue.get()` is implemented in java code as a 2 step process. A >> cache is attached to the current thread object. If the `ScopedValue` >> object is in the cache then the result from `get()` is read from >> there. Otherwise a slow call is performed that also inserts the >> mapping in the cache. The cache itself is lazily allocated. One >> `ScopedValue` can be hashed to 2 different indexes in the cache. On a >> cache probe, both indexes are checked. As a consequence, the process >> of probing the cache is a multi step process (check if the cache is >> present, check first index, check second index if first index >> failed). If the cache is populated early on, then when the method that >> calls `ScopedValue.get()` is compiled, profile reports the slow path >> as never taken and only the read from the cache is compiled. >> >> To perform the optimizations, I added 3 new node types to C2: >> >> - the pair >> ScopedValueGetHitsInCacheNode/ScopedValueGetLoadFromCacheNode for >> the cache probe >> >> - a cfg node ScopedValueGetResultNode to help locate the result of the >> `get()` call in the IR graph. >> >> In pseudo code, once the nodes are inserted, the code of a `get()` is: >> >> >> hits_in_the_cache = ScopedValueGetHitsInCache(scopedValue) >> if (hits_in_the_cache) { >> res = ScopedValueGetLoadFromCache(hits_in_the_cache); >> } else { >> res = ..; //slow call possibly inlined. Subgraph can be arbitray complex >> } >> res = ScopedValueGetResult(res) >> >> >> In the snippet: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> Replacing `v2` by `v1` is then done by starting from the >> `ScopedValueGetResult` node for the second `get()` and looking for a >> dominating `ScopedValueGetResult` for the same `ScopedValue` >> object. When one is found, it is used as a replacement. Eliminating >> the second `get()` call is achieved by making >> `ScopedValueGetHitsInCache` always successful if there's a dominating >> `Scoped... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > merge fix test/hotspot/jtreg/compiler/c2/irTests/TestScopedValue.java line 59: > 57: List tests = List.of("testFastPath1", "testFastPath2", "testFastPath3", "testFastPath5", > 58: "testFastPath6", "testFastPath7", "testFastPath8", "testFastPath9", "testFastPath10", > 59: "testFastPath11", "testFastPath12", "testFastPath13","testFastPath14", Suggestion: "testFastPath11", "testFastPath12", "testFastPath13", "testFastPath14", ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1452377677 From epeter at openjdk.org Mon Jan 15 13:47:50 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 Jan 2024 13:47:50 GMT Subject: RFR: 8320175: [BACKOUT] 8316533: C2 compilation fails with assert(verify(phase)) failed: missing Value() optimization Message-ID: This reverts commit b5c863b772603b3fbf159d2bd3f6d1caffaff16a. [JDK-8316533](https://bugs.openjdk.org/browse/JDK-8316533). The patch fixed a bug in verification, but created a regression which will take more time to investigate and fix. Hence the backout for now. Testing: running... ------------- Commit messages: - 8320175 Changes: https://git.openjdk.org/jdk/pull/17425/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17425&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8320175 Stats: 76 lines in 2 files changed: 0 ins; 76 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17425.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17425/head:pull/17425 PR: https://git.openjdk.org/jdk/pull/17425 From chagedorn at openjdk.org Mon Jan 15 14:03:22 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 15 Jan 2024 14:03:22 GMT Subject: [jdk22] RFR: 8323101: C2: assert(n->in(0) == nullptr) failed: divisions with zero check should already have bailed out earlier in split-if In-Reply-To: References: Message-ID: On Mon, 15 Jan 2024 12:34:56 GMT, Christian Hagedorn wrote: > Hi all, > > This pull request contains a backport of commit [7e0a4ed6](https://github.com/openjdk/jdk/commit/7e0a4ed6292586772c23292dbdd67ed1db5c12f7) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Christian Hagedorn on 15 Jan 2024 and was reviewed by Vladimir Kozlov and Tobias Hartmann. > > Thanks! Thanks Tobias for your review! ------------- PR Comment: https://git.openjdk.org/jdk22/pull/76#issuecomment-1892230846 From thartmann at openjdk.org Mon Jan 15 14:05:18 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 15 Jan 2024 14:05:18 GMT Subject: RFR: 8320175: [BACKOUT] 8316533: C2 compilation fails with assert(verify(phase)) failed: missing Value() optimization In-Reply-To: References: Message-ID: On Mon, 15 Jan 2024 13:39:29 GMT, Emanuel Peter wrote: > This reverts commit b5c863b772603b3fbf159d2bd3f6d1caffaff16a. [JDK-8316533](https://bugs.openjdk.org/browse/JDK-8316533). > > The patch fixed a bug in verification, but created a regression which will take more time to investigate and fix. Hence the backout for now. > > Testing: running... Looks good and trivial. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17425#pullrequestreview-1821748733 From chagedorn at openjdk.org Mon Jan 15 14:38:24 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 15 Jan 2024 14:38:24 GMT Subject: RFR: 8320175: [BACKOUT] 8316533: C2 compilation fails with assert(verify(phase)) failed: missing Value() optimization In-Reply-To: References: Message-ID: On Mon, 15 Jan 2024 13:39:29 GMT, Emanuel Peter wrote: > This reverts commit b5c863b772603b3fbf159d2bd3f6d1caffaff16a. [JDK-8316533](https://bugs.openjdk.org/browse/JDK-8316533). > > The patch fixed a bug in verification, but created a regression which will take more time to investigate and fix. Hence the backout for now. > > Testing: running... Looks good to me, too. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17425#pullrequestreview-1821808702 From epeter at openjdk.org Mon Jan 15 14:42:30 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 Jan 2024 14:42:30 GMT Subject: RFR: 8318650: Optimized subword gather for x86 targets. [v10] In-Reply-To: References: Message-ID: On Mon, 1 Jan 2024 14:36:06 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch optimizes sub-word gather operation for x86 targets with AVX2 and AVX512 features. >> >> Following is the summary of changes:- >> >> 1) Intrinsify sub-word gather using hybrid algorithm which initially partially unrolls scalar loop to accumulates values from gather indices into a quadword(64bit) slice followed by vector permutation to place the slice into appropriate vector lanes, it prevents code bloating and generates compact JIT sequence. This coupled with savings from expansive array allocation in existing java implementation translates into significant performance of 1.5-10x gains with included micro on Intel Atom family CPUs and with JVM option UseAVX=2. >> >> ![image](https://github.com/openjdk/jdk/assets/59989778/e25ba4ad-6a61-42fa-9566-452f741a9c6d) >> >> >> 2) For AVX512 targets algorithm uses integral gather instructions to load values from normalized indices which are multiple of integer size, followed by shuffling and packing exact sub-word values from integral lanes. >> >> 3) Patch was also compared against modified java fallback implementation by replacing temporary array allocation with zero initialized vector and a scalar loops which inserts gathered values into vector. But, vector insert operation in higher vector lanes is a three step process which first extracts the upper vector 128 bit lane, updates it with gather subword value and then inserts the lane back to its original position. This makes inserts into higher order lanes costly w.r.t to proposed solution. In addition generated JIT code for modified fallback implementation was very bulky. This may impact in-lining decisions into caller contexts. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: > > - Accelerating masked sub-word gathers for AVX2 targets, this gives additional 1.5-4x speedups over existing implementation. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8318650 > - Removing JDK-8321648 related changes. > - Refined AVX3 implementation with integral gather. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8318650 > - Fix incorrect comment > - Review comments resolutions. > - Review comments resolutions. > - Review comments resolutions. > - Restricting masked sub-word gather to AVX512 target to align with integral gather support. > - ... and 2 more: https://git.openjdk.org/jdk/compare/518ec971...de47076e Just had a quick look at this. Is there any support for gather with different indices for each element in the vector? src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1627: > 1625: vpsrlvd(dst, dst, xtmp, vlen_enc); > 1626: // Pack double word vector into byte vector. > 1627: vpackI2X(T_BYTE, dst, ones, xtmp, vlen_enc); I would prefer if there was less code duplication here. I think there are just a few values which you could set to variables, and then apply for both versions. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1634: > 1632: Register offset, XMMRegister offset_vec, XMMRegister idx_vec, > 1633: XMMRegister xtmp1, XMMRegister xtmp2, XMMRegister xtmp3, KRegister mask, > 1634: KRegister gmask, int vlen_enc, int vlen) { Would you mind giving a quick summary of what the input registers are and what exactly this method does? Why do we need to call `vgather_subword_avx3` so many times (`lane_count_subwords`)? src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1757: > 1755: for (int i = 0; i < 4; i++) { > 1756: movl(rtmp, Address(idx_base, i * 4)); > 1757: pinsrw(dst, Address(base, rtmp, Address::times_2), i); Do I understand this right that you are basically doing this? `dst[i*4 .. i*4 + 3] = load_8bytes(base + (idx_base + i * 4) * 2)` But this does not look like a gather, rather like 4 adjacent loads that pack the data together into a single 8*4 byte vector. If so, maybe you should either leave a comment, or even rename the method. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1776: > 1774: for (int i = 0; i < 4; i++) { > 1775: movl(rtmp, Address(idx_base, i * 4)); > 1776: addl(rtmp, offset); Can the `offset` not be added to `idx_base` before the loop? src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1900: > 1898: vgather8b(elem_ty, xtmp3, base, idx_base, rtmp, vlen_enc); > 1899: } else { > 1900: LP64_ONLY(vgather8b_masked(elem_ty, xtmp3, base, idx_base, mask, midx, rtmp, vlen_enc)); What happens if if not `LP64_ONLY`? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16354#pullrequestreview-1821723578 PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1452399791 PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1452425355 PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1452440206 PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1452441071 PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1452443784 From epeter at openjdk.org Mon Jan 15 14:42:32 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 15 Jan 2024 14:42:32 GMT Subject: RFR: 8318650: Optimized subword gather for x86 targets. [v10] In-Reply-To: References: Message-ID: On Mon, 15 Jan 2024 14:25:28 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: >> >> - Accelerating masked sub-word gathers for AVX2 targets, this gives additional 1.5-4x speedups over existing implementation. >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8318650 >> - Removing JDK-8321648 related changes. >> - Refined AVX3 implementation with integral gather. >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8318650 >> - Fix incorrect comment >> - Review comments resolutions. >> - Review comments resolutions. >> - Review comments resolutions. >> - Restricting masked sub-word gather to AVX512 target to align with integral gather support. >> - ... and 2 more: https://git.openjdk.org/jdk/compare/518ec971...de47076e > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1776: > >> 1774: for (int i = 0; i < 4; i++) { >> 1775: movl(rtmp, Address(idx_base, i * 4)); >> 1776: addl(rtmp, offset); > > Can the `offset` not be added to `idx_base` before the loop? Or would that require too many registers? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1452453827 From dlunden at openjdk.org Mon Jan 15 15:20:40 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 15 Jan 2024 15:20:40 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test Message-ID: This changeset translates the tests in `compiler/c2/cr7200264/` to use the IR verification framework. The proposed translation, to the extent possible, attempts to preserve the semantics of the original tests. We may also want to refactor the tests to better make use of the various features of the IR verification framework. The proposed translated tests takes approximately twice as long to run compared to the original tests. Testing: - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/7528802846) - Ran the new translated tests within all tier1 through tier10 contexts on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. - Tested that manually adding `-XX:LoopUnrollLimit=0` to the test framework flags caused the translated tests to fail. Note: it is, however, no longer possible to break the tests by passing `-XX:LoopUnrollLimit=0` on the command line. ------------- Commit messages: - Remove TestDriver - Readd verification - Finalize changes - Use static initialization block - Experiments - Naive translation complete - First attempt Changes: https://git.openjdk.org/jdk/pull/17428/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17428&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8291809 Stats: 872 lines in 4 files changed: 169 ins; 240 del; 463 mod Patch: https://git.openjdk.org/jdk/pull/17428.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17428/head:pull/17428 PR: https://git.openjdk.org/jdk/pull/17428 From eastigeevich at openjdk.org Mon Jan 15 15:55:22 2024 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Mon, 15 Jan 2024 15:55:22 GMT Subject: RFR: 8323122: AArch64: Increase itable stub size estimate In-Reply-To: References: Message-ID: On Wed, 10 Jan 2024 02:03:08 GMT, Yude Lin wrote: > Since [JDK-8307352](https://bugs.openjdk.org/browse/JDK-8307352), itable stub size has grown by 20 bytes on linux-aarch64. In particular, the "slop-counted" code increases from 100->120 bytes, where the current estimate is 124 bytes. I haven't found a case where it exceeds the estimate. For now this size is stable across the few linux-aarch64 configurations I ran with. It doesn't vary (for example) by different klass decoding schemes. But I think the idea of the estimate is that we can never know. I propose we increase the estimate to be safe. > > Passed hotspot/jtreg/:tier1 lgtm ------------- Marked as reviewed by eastigeevich (Committer). PR Review: https://git.openjdk.org/jdk/pull/17336#pullrequestreview-1821943502 From tholenstein at openjdk.org Mon Jan 15 15:55:22 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 15 Jan 2024 15:55:22 GMT Subject: RFR: JDK-8316756: C2 EA fails with "missing memory path" when encountering unsafe_arraycopy stub call [v3] In-Reply-To: References: Message-ID: On Fri, 12 Jan 2024 17:21:35 GMT, Vladimir Kozlov wrote: > > > > Flag setting (StoreB nodes) in JavaThread::_doing_unsafe_access is also not affected but it is volatile field and these stores should be staying where they are. They can't go up or down. > > > > > > > > > In a private discussion with @tobiasholenstein, I proposed to move those stores into the corresponding stub. It would complicate the implementation a bit (platform-specific vs cross-platform implementation), but simplify things on IR level. > > > > > > So you think we should go for that solution instead of this fix? > > Yes. You may still need to fix EA to recognize RAW memory for `unsafe_arraycopy`. I applied your patch for EA and added the test. Thanks! tier1-4 pass ------------- PR Comment: https://git.openjdk.org/jdk/pull/17347#issuecomment-1892420763 From eastigeevich at openjdk.org Mon Jan 15 16:12:21 2024 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Mon, 15 Jan 2024 16:12:21 GMT Subject: RFR: 8323122: AArch64: Increase itable stub size estimate In-Reply-To: References: Message-ID: On Mon, 15 Jan 2024 06:14:39 GMT, Yude Lin wrote: >> Marked as reviewed by aph (Reviewer). > > @theRealAph Do you mind sponsoring this change? Thank you. @linade This is not the first time estimates are updated. JDK-8207343 was implemented to automate vtable/itable stub size calculation. I think it needs improvements. I created JDK-8323741. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17336#issuecomment-1892447408 From jvernee at openjdk.org Mon Jan 15 17:17:33 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 15 Jan 2024 17:17:33 GMT Subject: RFR: 8323651: compiler/c2/irTests/TestPrunedExHandler.java fails with -XX:+DeoptimizeALot Message-ID: <4zl1kION6ggeCNof1I8FBIqNKjpfxlUl23iT4_9XJPs=.8e9194a8-4da2-4771-94b6-1c94e6146cbc@github.com> This test can not work with `-XX:+DeoptimizeALot`. Parts of it depend on a particular sequence of compilation and deoptimization, so if DeoptimizeALot deoptimizes things prematurely, the test can fail. This PR adds `@requires vm.opt.DeoptimizeALot != true` to the test so that it is skipped when `-XX:+DeoptimizeALot` is used. ------------- Commit messages: - disable TestPrunedExHandler when running with -XX:+DeoptimizeALot Changes: https://git.openjdk.org/jdk/pull/17432/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17432&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8323651 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17432.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17432/head:pull/17432 PR: https://git.openjdk.org/jdk/pull/17432 From alanb at openjdk.org Mon Jan 15 17:29:19 2024 From: alanb at openjdk.org (Alan Bateman) Date: Mon, 15 Jan 2024 17:29:19 GMT Subject: RFR: 8323651: compiler/c2/irTests/TestPrunedExHandler.java fails with -XX:+DeoptimizeALot In-Reply-To: <4zl1kION6ggeCNof1I8FBIqNKjpfxlUl23iT4_9XJPs=.8e9194a8-4da2-4771-94b6-1c94e6146cbc@github.com> References: <4zl1kION6ggeCNof1I8FBIqNKjpfxlUl23iT4_9XJPs=.8e9194a8-4da2-4771-94b6-1c94e6146cbc@github.com> Message-ID: <4BnQnkFssKN2uiH6QTXLRpTu0akZjbfQOVenE-Ok3tk=.e8935fb8-740c-4714-8630-37bc6e606a6a@github.com> On Mon, 15 Jan 2024 16:58:33 GMT, Jorn Vernee wrote: > This test can not work with `-XX:+DeoptimizeALot`. Parts of it depend on a particular sequence of compilation and deoptimization, so if DeoptimizeALot deoptimizes things prematurely, the test can fail. > > This PR adds `@requires vm.opt.DeoptimizeALot != true` to the test so that it is skipped when `-XX:+DeoptimizeALot` is used. Thanks, this test runs in the loom with DeoptimizeALot so we've had to exclude it. I assume you'll bump the copyright date before integrating. ------------- Marked as reviewed by alanb (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17432#pullrequestreview-1822073572 From jvernee at openjdk.org Mon Jan 15 17:54:33 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 15 Jan 2024 17:54:33 GMT Subject: RFR: 8323651: compiler/c2/irTests/TestPrunedExHandler.java fails with -XX:+DeoptimizeALot [v2] In-Reply-To: <4zl1kION6ggeCNof1I8FBIqNKjpfxlUl23iT4_9XJPs=.8e9194a8-4da2-4771-94b6-1c94e6146cbc@github.com> References: <4zl1kION6ggeCNof1I8FBIqNKjpfxlUl23iT4_9XJPs=.8e9194a8-4da2-4771-94b6-1c94e6146cbc@github.com> Message-ID: > This test can not work with `-XX:+DeoptimizeALot`. Parts of it depend on a particular sequence of compilation and deoptimization, so if DeoptimizeALot deoptimizes things prematurely, the test can fail. > > This PR adds `@requires vm.opt.DeoptimizeALot != true` to the test so that it is skipped when `-XX:+DeoptimizeALot` is used. Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: Bump copyright year ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17432/files - new: https://git.openjdk.org/jdk/pull/17432/files/956a4f78..7210d696 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17432&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17432&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17432.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17432/head:pull/17432 PR: https://git.openjdk.org/jdk/pull/17432 From kvn at openjdk.org Mon Jan 15 19:19:19 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 15 Jan 2024 19:19:19 GMT Subject: RFR: 8323651: compiler/c2/irTests/TestPrunedExHandler.java fails with -XX:+DeoptimizeALot [v2] In-Reply-To: References: <4zl1kION6ggeCNof1I8FBIqNKjpfxlUl23iT4_9XJPs=.8e9194a8-4da2-4771-94b6-1c94e6146cbc@github.com> Message-ID: On Mon, 15 Jan 2024 17:54:33 GMT, Jorn Vernee wrote: >> This test can not work with `-XX:+DeoptimizeALot`. Parts of it depend on a particular sequence of compilation and deoptimization, so if DeoptimizeALot deoptimizes things prematurely, the test can fail. >> >> This PR adds `@requires vm.opt.DeoptimizeALot != true` to the test so that it is skipped when `-XX:+DeoptimizeALot` is used. > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > Bump copyright year Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17432#pullrequestreview-1822170365 From kvn at openjdk.org Mon Jan 15 19:27:19 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 15 Jan 2024 19:27:19 GMT Subject: RFR: JDK-8316756: C2 EA fails with "missing memory path" when encountering unsafe_arraycopy stub call [v3] In-Reply-To: References: Message-ID: <6_AwJfpSEdDqzEPpq2ns9JeC9cPmIjURIcbxAijlE4Y=.8ea0ba42-e1da-42d1-9b34-d9a89d149a02@github.com> On Mon, 15 Jan 2024 11:58:57 GMT, Tobias Holenstein wrote: >> Before https://github.com/openjdk/jdk/pull/5259 the graph of the following program looked like this during Escape Analysis: >> >> >> static int test() { >> MyClass obj = new MyClass(); // Non-escaping to trigger Escape Analysis >> UNSAFE.copyMemory(null, SRC_BASE, null, DST_BASE, 4); >> obj.x = 42; >> return obj.x; >> } >> >> With MemBarCPUOrder: >> working >> >> Setting `RC_NARROW_MEM` flag in `LibraryCallKit::inline_unsafe_copyMemory()` removes the `428 MergeMem` node. Without a `MergeMem` node after `432 StoreB` EA failes in Phase 2 of `ConnectionGraph::split_unique_types(...)` when trying to push the allocation's users on the appropriate worklist - `429 CallLeafNoFP` is not an expected user of `428 StoreB`. Therefore the assert `"EA: missing memory path"` is hit. >> Without MemBarCPUOrdera and after setting `RC_NARROW_MEM`: >> failing >> >> >> ### Proposed Fix >> Dropping the `RC_NARROW_MEM` flag in `LibraryCallKit::inline_unsafe_copyMemory()` causes the introduction of a `MergeMem` between `StoreB` and `CallLeafNoFP`, so the corresponding code in EA doesn't encounter a `CallLeafNoFP` anymore: >> fixed >> >> Testing: Tier1-4 passed > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > added testcase Good. Did you file RFE for StoreB move? ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17347#pullrequestreview-1822175931 PR Comment: https://git.openjdk.org/jdk/pull/17347#issuecomment-1892675308 From gcao at openjdk.org Tue Jan 16 02:26:43 2024 From: gcao at openjdk.org (Gui Cao) Date: Tue, 16 Jan 2024 02:26:43 GMT Subject: RFR: 8323694: RISC-V: Unnecessary ResourceMark in NativeCall::set_destination_mt_safe Message-ID: Hi, We noticed that RISC-V bears a similar issue as: https://bugs.openjdk.org/browse/JDK-8323584. In `NativeCall::set_destination_mt_safe`, there is a `ResourceMark` that does not seem to have any purpose: no code in its scope uses resource allocation. ### Testing: - [x] Run tier1 tests on qemu 8.1.0 with UseRVV (fastdebug) - [x] Run tier1-3 tests on qemu 8.1.0 with UseRVV (release) ------------- Commit messages: - 8323694: RISC-V: Unnecessary ResourceMark in NativeCall::set_destination_mt_safe Changes: https://git.openjdk.org/jdk/pull/17436/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17436&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8323694 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17436.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17436/head:pull/17436 PR: https://git.openjdk.org/jdk/pull/17436 From fyang at openjdk.org Tue Jan 16 02:32:19 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 16 Jan 2024 02:32:19 GMT Subject: RFR: 8323694: RISC-V: Unnecessary ResourceMark in NativeCall::set_destination_mt_safe In-Reply-To: References: Message-ID: On Tue, 16 Jan 2024 02:21:50 GMT, Gui Cao wrote: > Hi, We noticed that RISC-V bears a similar issue as: https://bugs.openjdk.org/browse/JDK-8323584. > In `NativeCall::set_destination_mt_safe`, there is a `ResourceMark` that does not seem to have any purpose: no code in its scope uses resource allocation. > > ### Testing: > > - [x] Run tier1 tests on qemu 8.1.0 with UseRVV (fastdebug) > - [x] Run tier1-3 tests on qemu 8.1.0 with UseRVV (release) Marked as reviewed by fyang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/17436#pullrequestreview-1822437193 From duke at openjdk.org Tue Jan 16 03:42:21 2024 From: duke at openjdk.org (Yude Lin) Date: Tue, 16 Jan 2024 03:42:21 GMT Subject: RFR: 8323122: AArch64: Increase itable stub size estimate In-Reply-To: References: Message-ID: On Mon, 15 Jan 2024 16:09:46 GMT, Evgeny Astigeevich wrote: >> @theRealAph Do you mind sponsoring this change? Thank you. > > @linade This is not the first time estimates are updated. JDK-8207343 was implemented to automate vtable/itable stub size calculation. I think it needs improvements. I created JDK-8323741. @eastig Thank you for reviewing. It seems to me that the size calculation can not be easily automated. But the overflow prevention might be, by somehow relocating the stubs (not sure if it's common or possible in hotspot). ------------- PR Comment: https://git.openjdk.org/jdk/pull/17336#issuecomment-1893023368 From duke at openjdk.org Tue Jan 16 05:18:26 2024 From: duke at openjdk.org (Yude Lin) Date: Tue, 16 Jan 2024 05:18:26 GMT Subject: Integrated: 8323122: AArch64: Increase itable stub size estimate In-Reply-To: References: Message-ID: <3nhNTLKKZ0fbP26cOV9aMJsEi-6BffHQA_PAmXSjrRM=.eb9d7801-e8ed-4442-bf6e-c8dde0ab22c0@github.com> On Wed, 10 Jan 2024 02:03:08 GMT, Yude Lin wrote: > Since [JDK-8307352](https://bugs.openjdk.org/browse/JDK-8307352), itable stub size has grown by 20 bytes on linux-aarch64. In particular, the "slop-counted" code increases from 100->120 bytes, where the current estimate is 124 bytes. I haven't found a case where it exceeds the estimate. For now this size is stable across the few linux-aarch64 configurations I ran with. It doesn't vary (for example) by different klass decoding schemes. But I think the idea of the estimate is that we can never know. I propose we increase the estimate to be safe. > > Passed hotspot/jtreg/:tier1 This pull request has now been integrated. Changeset: 36f4b34f Author: Yude Lin Committer: Denghui Dong URL: https://git.openjdk.org/jdk/commit/36f4b34f1953af736706ec67192204727808bc6c Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8323122: AArch64: Increase itable stub size estimate Reviewed-by: aph, eastigeevich ------------- PR: https://git.openjdk.org/jdk/pull/17336 From jbhateja at openjdk.org Tue Jan 16 06:11:24 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 16 Jan 2024 06:11:24 GMT Subject: RFR: 8318650: Optimized subword gather for x86 targets. [v10] In-Reply-To: References: Message-ID: On Mon, 15 Jan 2024 13:49:06 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: >> >> - Accelerating masked sub-word gathers for AVX2 targets, this gives additional 1.5-4x speedups over existing implementation. >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8318650 >> - Removing JDK-8321648 related changes. >> - Refined AVX3 implementation with integral gather. >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8318650 >> - Fix incorrect comment >> - Review comments resolutions. >> - Review comments resolutions. >> - Review comments resolutions. >> - Restricting masked sub-word gather to AVX512 target to align with integral gather support. >> - ... and 2 more: https://git.openjdk.org/jdk/compare/518ec971...de47076e > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1627: > >> 1625: vpsrlvd(dst, dst, xtmp, vlen_enc); >> 1626: // Pack double word vector into byte vector. >> 1627: vpackI2X(T_BYTE, dst, ones, xtmp, vlen_enc); > > I would prefer if there was less code duplication here. I think there are just a few values which you could set to variables, and then apply for both versions. Meaty part of the algorithm accept different operands, line #1593, #1599 and #1601, keep two flows for SHORT and BYTE separate will be better maintainable. > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1634: > >> 1632: Register offset, XMMRegister offset_vec, XMMRegister idx_vec, >> 1633: XMMRegister xtmp1, XMMRegister xtmp2, XMMRegister xtmp3, KRegister mask, >> 1634: KRegister gmask, int vlen_enc, int vlen) { > > Would you mind giving a quick summary of what the input registers are and what exactly this method does? > Why do we need to call `vgather_subword_avx3` so many times (`lane_count_subwords`)? Method gathers sub-words from gather indices using integral gather instructions, because of the lane size mismatch b/w int and sub-words algorithm makes multiple calls to vgather_subword_avx3. > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1757: > >> 1755: for (int i = 0; i < 4; i++) { >> 1756: movl(rtmp, Address(idx_base, i * 4)); >> 1757: pinsrw(dst, Address(base, rtmp, Address::times_2), i); > > Do I understand this right that you are basically doing this? > `dst[i*4 .. i*4 + 3] = load_8bytes(base + (idx_base + i * 4) * 2)` > But this does not look like a gather, rather like 4 adjacent loads that pack the data together into a single 8*4 byte vector. > > Why can this not be done by a simple `32bit` load? Loop scans over integral index array and pick the work from computed address, indexes could be non-contiguous. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1452964120 PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1452964077 PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1452964030 From jbhateja at openjdk.org Tue Jan 16 06:11:27 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 16 Jan 2024 06:11:27 GMT Subject: RFR: 8318650: Optimized subword gather for x86 targets. [v10] In-Reply-To: References: Message-ID: <5G47VHfwKVS0dm89ZHHKyyvA-LV5sqTCal0E52Ocof8=.97c7f971-e20c-41af-b0b4-49aff274351d@github.com> On Mon, 15 Jan 2024 14:36:38 GMT, Emanuel Peter wrote: >> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1776: >> >>> 1774: for (int i = 0; i < 4; i++) { >>> 1775: movl(rtmp, Address(idx_base, i * 4)); >>> 1776: addl(rtmp, offset); >> >> Can the `offset` not be added to `idx_base` before the loop? > > Or would that require too many registers? > Can the `offset` not be added to `idx_base` before the loop? Offset needs to be added to each index element, please refer to API specification for details. https://docs.oracle.com/en/java/javase/21/docs/api/jdk.incubator.vector/jdk/incubator/vector/ShortVector.html#fromArray(jdk.incubator.vector.VectorSpecies,short[],int,int[],int) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1452964002 From jbhateja at openjdk.org Tue Jan 16 06:16:22 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 16 Jan 2024 06:16:22 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v5] In-Reply-To: References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Message-ID: <4-XrsvK-2HpBV3neMmQQ5Q1A4FDOAnmyFtCkKKZcf2A=.32df7d9e-e399-4715-a6b5-f3f2e9c77150@github.com> On Mon, 15 Jan 2024 09:10:38 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Using emulated variable blend E-Core optimized instruction. > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 5309: > >> 5307: assert(bt == T_LONG || bt == T_DOUBLE, ""); >> 5308: vmovmskpd(rtmp, mask, vec_enc); >> 5309: shlq(rtmp, 5); // for 64 bit rows (4 longs) > > Suggestion: > > shlq(rtmp, 5); // for 32 bit rows (4 longs) Each long/double permute lane holds 64 bit value. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1452967063 From jbhateja at openjdk.org Tue Jan 16 06:20:24 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 16 Jan 2024 06:20:24 GMT Subject: RFR: 8318650: Optimized subword gather for x86 targets. [v10] In-Reply-To: References: Message-ID: On Mon, 15 Jan 2024 14:27:43 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: >> >> - Accelerating masked sub-word gathers for AVX2 targets, this gives additional 1.5-4x speedups over existing implementation. >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8318650 >> - Removing JDK-8321648 related changes. >> - Refined AVX3 implementation with integral gather. >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8318650 >> - Fix incorrect comment >> - Review comments resolutions. >> - Review comments resolutions. >> - Review comments resolutions. >> - Restricting masked sub-word gather to AVX512 target to align with integral gather support. >> - ... and 2 more: https://git.openjdk.org/jdk/compare/518ec971...de47076e > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1900: > >> 1898: vgather8b(elem_ty, xtmp3, base, idx_base, rtmp, vlen_enc); >> 1899: } else { >> 1900: LP64_ONLY(vgather8b_masked(elem_ty, xtmp3, base, idx_base, mask, midx, rtmp, vlen_enc)); > > What happens if if not `LP64_ONLY`? 32bit skip over check is part of match_rule_supported_vector, https://github.com/openjdk/jdk/pull/16354/files#diff-d6a3624f0f0af65a98a47378a5c146eed5016ca09b4de1acd0a3acc823242e82R1921 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1452969364 From rrich at openjdk.org Tue Jan 16 07:05:30 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 16 Jan 2024 07:05:30 GMT Subject: RFR: 8290965: PPC64: Implement post-call NOPs [v5] In-Reply-To: References: Message-ID: > #### Implementation of post call nops (PCNs) on ppc64. > > Depends on https://github.com/openjdk/jdk/pull/17150 > > About post call nops: > > - instruction(s) at return addresses of compiled java calls > - emitted iff vm continuations are enabled to support virtual threads > - encode data that can be be used to find the corresponding CodeBlob and oop map faster > - mt-safe patchable to trigger deoptimization > > Background: > > - Frames in continuation StackChunks are not visited if their compiled method is made not entrant (in contrast to frames on stack). > Instead all PCNs of the compiled method are patched to trigger deoptimization when control returns to such frames. > - With vm continuations, stacks are walked and inspected more frequently. This requires lookup of metadata like frame size and oop maps. As an optimization the offset of the CodeBlob to the PCN and the oop map slot are encoded as data in the PCN. > > Post call nops on ppc64 > > - 1 instruction, i.e. 4 bytes (either CMPI or CMPLI[1]) > x86_64: 1 instruction, 8 bytes > aarch64: 3 instruction, 12 bytes > [1] 3.1.10 Fixed Point Compare Instructions in Power ISA 3.1B > https://openpowerfoundation.org/specifications/isa/ > > - 26 bits data payload > x86_64: 32 bits; aarch64: 32 bits > - 9 bits dedicated to oop map slot. With 8 bits there where cases with SPECjvm2008 where the slot could not be encoded (on ppc64 and x86_64). > x86_64: 8 bits; aarch64: 8 bits > - 17 bits dedicated to cb offset. Effectively 19 bits due to instruction alignment. > x86_64: 24 bits; aarch64: 24 bits > - Also used when reconstructing the back chain after thawing continuation frames (see `Thaw::patch_caller_links`) > > - Refactored frame constructors to make use of fast CodeBlob lookup based on PCNs. > The fast lookup may only be used if the pc is known to be in the code cache because `CodeCache::find_blob_fast` can yield wrong results if it finds instructions outside the code cache that look just like PCNs. Callers of the frame class constructors need to pass `frame::kind::native` in that case to avoid errors. Other platforms don't make this explicit which is a problem in my eyes. Picking the wrong constructor can cause errors when porting and in future development. > > - Currently only the PCNs in nmethods are initialized. Therefore we don't even try to make a fast lookup based on PCNs if we know the CodeBlob is, e.g., a RuntimeStub. To achieve this we call the frame constructor passing `frame::kind::code_blob`. > > #### Statistics > > > | SpecJVM2008... Richard Reingruber has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - Merge branch 'master' - Review Martin - Merge branch 'master' - Fix comment Co-authored-by: Andrew Haley - 8290965: PPC64: Implement post-call NOPs - 8322294: Cleanup NativePostCallNop ------------- Changes: https://git.openjdk.org/jdk/pull/17171/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17171&range=04 Stats: 132 lines in 13 files changed: 96 ins; 0 del; 36 mod Patch: https://git.openjdk.org/jdk/pull/17171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17171/head:pull/17171 PR: https://git.openjdk.org/jdk/pull/17171 From rrich at openjdk.org Tue Jan 16 07:05:32 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 16 Jan 2024 07:05:32 GMT Subject: RFR: 8290965: PPC64: Implement post-call NOPs [v4] In-Reply-To: References: Message-ID: On Thu, 11 Jan 2024 08:57:52 GMT, Richard Reingruber wrote: >> #### Implementation of post call nops (PCNs) on ppc64. >> >> Depends on https://github.com/openjdk/jdk/pull/17150 >> >> About post call nops: >> >> - instruction(s) at return addresses of compiled java calls >> - emitted iff vm continuations are enabled to support virtual threads >> - encode data that can be be used to find the corresponding CodeBlob and oop map faster >> - mt-safe patchable to trigger deoptimization >> >> Background: >> >> - Frames in continuation StackChunks are not visited if their compiled method is made not entrant (in contrast to frames on stack). >> Instead all PCNs of the compiled method are patched to trigger deoptimization when control returns to such frames. >> - With vm continuations, stacks are walked and inspected more frequently. This requires lookup of metadata like frame size and oop maps. As an optimization the offset of the CodeBlob to the PCN and the oop map slot are encoded as data in the PCN. >> >> Post call nops on ppc64 >> >> - 1 instruction, i.e. 4 bytes (either CMPI or CMPLI[1]) >> x86_64: 1 instruction, 8 bytes >> aarch64: 3 instruction, 12 bytes >> [1] 3.1.10 Fixed Point Compare Instructions in Power ISA 3.1B >> https://openpowerfoundation.org/specifications/isa/ >> >> - 26 bits data payload >> x86_64: 32 bits; aarch64: 32 bits >> - 9 bits dedicated to oop map slot. With 8 bits there where cases with SPECjvm2008 where the slot could not be encoded (on ppc64 and x86_64). >> x86_64: 8 bits; aarch64: 8 bits >> - 17 bits dedicated to cb offset. Effectively 19 bits due to instruction alignment. >> x86_64: 24 bits; aarch64: 24 bits >> - Also used when reconstructing the back chain after thawing continuation frames (see `Thaw::patch_caller_links`) >> >> - Refactored frame constructors to make use of fast CodeBlob lookup based on PCNs. >> The fast lookup may only be used if the pc is known to be in the code cache because `CodeCache::find_blob_fast` can yield wrong results if it finds instructions outside the code cache that look just like PCNs. Callers of the frame class constructors need to pass `frame::kind::native` in that case to avoid errors. Other platforms don't make this explicit which is a problem in my eyes. Picking the wrong constructor can cause errors when porting and in future development. >> >> - Currently only the PCNs in nmethods are initialized. Therefore we don't even try to make a fast lookup based on PCNs if we know the CodeBlob is, e.g., a RuntimeStub. To achieve this we call the frame cons... > > Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: > > Review Martin I intend to ship this ppc only pr tomorrow if the tests pass after merging master. I don't expect another review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17171#issuecomment-1893171295 From epeter at openjdk.org Tue Jan 16 07:11:25 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 16 Jan 2024 07:11:25 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v5] In-Reply-To: <4-XrsvK-2HpBV3neMmQQ5Q1A4FDOAnmyFtCkKKZcf2A=.32df7d9e-e399-4715-a6b5-f3f2e9c77150@github.com> References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> <4-XrsvK-2HpBV3neMmQQ5Q1A4FDOAnmyFtCkKKZcf2A=.32df7d9e-e399-4715-a6b5-f3f2e9c77150@github.com> Message-ID: On Tue, 16 Jan 2024 06:13:43 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 5309: >> >>> 5307: assert(bt == T_LONG || bt == T_DOUBLE, ""); >>> 5308: vmovmskpd(rtmp, mask, vec_enc); >>> 5309: shlq(rtmp, 5); // for 64 bit rows (4 longs) >> >> Suggestion: >> >> shlq(rtmp, 5); // for 32 bit rows (4 longs) > > Each long/double permute lane holds 64 bit value. @jatin-bhateja so why do you shift by 5? I thought 4 longs are 32 bit? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1453003935 From epeter at openjdk.org Tue Jan 16 07:13:24 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 16 Jan 2024 07:13:24 GMT Subject: RFR: 8318650: Optimized subword gather for x86 targets. [v10] In-Reply-To: References: Message-ID: On Tue, 16 Jan 2024 06:17:43 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1900: >> >>> 1898: vgather8b(elem_ty, xtmp3, base, idx_base, rtmp, vlen_enc); >>> 1899: } else { >>> 1900: LP64_ONLY(vgather8b_masked(elem_ty, xtmp3, base, idx_base, mask, midx, rtmp, vlen_enc)); >> >> What happens if if not `LP64_ONLY`? > > 32bit skip over check is part of match_rule_supported_vector, https://github.com/openjdk/jdk/pull/16354/files#diff-d6a3624f0f0af65a98a47378a5c146eed5016ca09b4de1acd0a3acc823242e82R1921 An assert might still be nice. Or just call the method anyway but then hit an assert there? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1453005169 From epeter at openjdk.org Tue Jan 16 07:24:21 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 16 Jan 2024 07:24:21 GMT Subject: RFR: 8318650: Optimized subword gather for x86 targets. [v10] In-Reply-To: References: Message-ID: On Tue, 16 Jan 2024 06:08:31 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1757: >> >>> 1755: for (int i = 0; i < 4; i++) { >>> 1756: movl(rtmp, Address(idx_base, i * 4)); >>> 1757: pinsrw(dst, Address(base, rtmp, Address::times_2), i); >> >> Do I understand this right that you are basically doing this? >> `dst[i*4 .. i*4 + 3] = load_8bytes(base + (idx_base + i * 4) * 2)` >> But this does not look like a gather, rather like 4 adjacent loads that pack the data together into a single 8*4 byte vector. >> >> Why can this not be done by a simple `32bit` load? > > Loop scans over integral index array and pick the work from computed address, indexes could be non-contiguous. Maybe you could have comment lines that state this, similar like in the documentation? `dst[i] = load(base + 2 * load(idx_base + i * 4))` Or maybe: `dst[i] = base[idx_base[i * 4] * 2]` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1453013821 From epeter at openjdk.org Tue Jan 16 07:28:22 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 16 Jan 2024 07:28:22 GMT Subject: RFR: 8318650: Optimized subword gather for x86 targets. [v10] In-Reply-To: References: Message-ID: On Tue, 16 Jan 2024 06:08:40 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1627: >> >>> 1625: vpsrlvd(dst, dst, xtmp, vlen_enc); >>> 1626: // Pack double word vector into byte vector. >>> 1627: vpackI2X(T_BYTE, dst, ones, xtmp, vlen_enc); >> >> I would prefer if there was less code duplication here. I think there are just a few values which you could set to variables, and then apply for both versions. > > Meaty part of the algorithm accept different operands, line #1593, #1599 and #1601, keep two flows for SHORT and BYTE separate will be better maintainable. All I see is replacing a 3 with a 4, etc. It just makes this very long to review, and spot the differences. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1453017659 From epeter at openjdk.org Tue Jan 16 07:34:24 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 16 Jan 2024 07:34:24 GMT Subject: RFR: 8318650: Optimized subword gather for x86 targets. [v10] In-Reply-To: References: Message-ID: On Tue, 16 Jan 2024 06:08:35 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1634: >> >>> 1632: Register offset, XMMRegister offset_vec, XMMRegister idx_vec, >>> 1633: XMMRegister xtmp1, XMMRegister xtmp2, XMMRegister xtmp3, KRegister mask, >>> 1634: KRegister gmask, int vlen_enc, int vlen) { >> >> Would you mind giving a quick summary of what the input registers are and what exactly this method does? >> Why do we need to call `vgather_subword_avx3` so many times (`lane_count_subwords`)? > > Method gathers sub-words from gather indices using integral gather instructions, because of the lane size mismatch b/w int and sub-words algorithm makes multiple calls to vgather_subword_avx3. As a reviewer, I feel like I have to reverse engineer this now. I would really appreciate if there was a proper comment at the beginning, that tells me what is happening here. Maybe use some equation at the beginning, of what we want to acheive in the abstract, then explain why that does not work directly, and why you have to break it down into a loop, and then state the equation again in the loop form. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1453020617 From epeter at openjdk.org Tue Jan 16 07:34:26 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 16 Jan 2024 07:34:26 GMT Subject: RFR: 8318650: Optimized subword gather for x86 targets. [v10] In-Reply-To: <5G47VHfwKVS0dm89ZHHKyyvA-LV5sqTCal0E52Ocof8=.97c7f971-e20c-41af-b0b4-49aff274351d@github.com> References: <5G47VHfwKVS0dm89ZHHKyyvA-LV5sqTCal0E52Ocof8=.97c7f971-e20c-41af-b0b4-49aff274351d@github.com> Message-ID: On Tue, 16 Jan 2024 06:08:28 GMT, Jatin Bhateja wrote: >> Or would that require too many registers? > >> Can the `offset` not be added to `idx_base` before the loop? > > Offset needs to be added to each index element, please refer to API specification for details. > https://docs.oracle.com/en/java/javase/21/docs/api/jdk.incubator.vector/jdk/incubator/vector/ShortVector.html#fromArray(jdk.incubator.vector.VectorSpecies,short[],int,int[],int) Ah great, thanks for the link ? Can you put such equations in the code, using the register names? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1453021973 From dnsimon at openjdk.org Tue Jan 16 08:24:32 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 16 Jan 2024 08:24:32 GMT Subject: RFR: 8323616: [JVMCI] TestInvalidJVMCIOption.java fails intermittently with NPE [v2] In-Reply-To: <8KAT1E7MkSLCl1JrnEBjDNvgmaCn_ryBbGVMtY0uiTQ=.e6bc008c-5338-4361-9042-20f1ba5e65cd@github.com> References: <8KAT1E7MkSLCl1JrnEBjDNvgmaCn_ryBbGVMtY0uiTQ=.e6bc008c-5338-4361-9042-20f1ba5e65cd@github.com> Message-ID: > This PR removes an assertion from `TestInvalidJVMCIOption` that can fail intermittently due to a race between JIT initialization and runtime class initialization. > The only thing the test should guarantee is that an invalid option is detected and results in a VM exit. Doug Simon has updated the pull request incrementally with two additional commits since the last revision: - do not call System.exit from libjvmci before module system is initialized - Revert "remove racy (and unnecessary) assertion in TestInvalidJVMCIOption" This reverts commit 5de1dcea6fc8926b6e17b6f12ef17f527fa8a007. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17397/files - new: https://git.openjdk.org/jdk/pull/17397/files/5de1dcea..0b183260 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17397&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17397&range=00-01 Stats: 8 lines in 2 files changed: 8 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17397.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17397/head:pull/17397 PR: https://git.openjdk.org/jdk/pull/17397 From dnsimon at openjdk.org Tue Jan 16 08:36:37 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 16 Jan 2024 08:36:37 GMT Subject: RFR: 8323616: [JVMCI] TestInvalidJVMCIOption.java fails intermittently with NPE [v3] In-Reply-To: <8KAT1E7MkSLCl1JrnEBjDNvgmaCn_ryBbGVMtY0uiTQ=.e6bc008c-5338-4361-9042-20f1ba5e65cd@github.com> References: <8KAT1E7MkSLCl1JrnEBjDNvgmaCn_ryBbGVMtY0uiTQ=.e6bc008c-5338-4361-9042-20f1ba5e65cd@github.com> Message-ID: > This PR removes an assertion from `TestInvalidJVMCIOption` that can fail intermittently due to a race between JIT initialization and runtime class initialization. > The only thing the test should guarantee is that an invalid option is detected and results in a VM exit. Doug Simon has updated the pull request incrementally with one additional commit since the last revision: use vm_exit_during_initialization instead of vm_exit ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17397/files - new: https://git.openjdk.org/jdk/pull/17397/files/0b183260..91be4c2d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17397&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17397&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17397.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17397/head:pull/17397 PR: https://git.openjdk.org/jdk/pull/17397 From dnsimon at openjdk.org Tue Jan 16 08:41:21 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 16 Jan 2024 08:41:21 GMT Subject: RFR: 8323616: [JVMCI] TestInvalidJVMCIOption.java fails intermittently with NPE [v3] In-Reply-To: References: <8KAT1E7MkSLCl1JrnEBjDNvgmaCn_ryBbGVMtY0uiTQ=.e6bc008c-5338-4361-9042-20f1ba5e65cd@github.com> Message-ID: On Fri, 12 Jan 2024 14:35:31 GMT, Tobias Hartmann wrote: >> Doug Simon has updated the pull request incrementally with one additional commit since the last revision: >> >> use vm_exit_during_initialization instead of vm_exit > > Looks good and trivial. @TobiHartmann I've changed the PR based on further discussion in https://bugs.openjdk.org/browse/JDK-8323616 - please re-review. @tkrodriguez can you also please review this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17397#issuecomment-1893284551 From rcastanedalo at openjdk.org Tue Jan 16 08:57:19 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 16 Jan 2024 08:57:19 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test In-Reply-To: References: Message-ID: On Mon, 15 Jan 2024 15:13:27 GMT, Daniel Lund?n wrote: > This changeset translates the tests in `compiler/c2/cr7200264/` to use the IR verification framework. > > The proposed translation, to the extent possible, attempts to preserve the semantics of the original tests. We may also want to refactor the tests to better make use of the various features of the IR verification framework. The proposed translated tests takes approximately twice as long to run compared to the original tests. > > Testing: > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/7528802846) > - Ran the new translated tests within all tier1 through tier10 contexts on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. > - Tested that manually adding `-XX:LoopUnrollLimit=0` to the test framework flags caused the translated tests to fail. Note: it is, however, no longer possible to break the tests by passing `-XX:LoopUnrollLimit=0` on the command line. Thanks for working on this, Daniel! These tests would be more idiomatic, precise, and possibly even faster if the IR verification was applied to each vectorization method (`test_sum`, `test_addc`, etc.) separately, instead of doing it as a bulk check over the entire `TestIntVect::testInner()`. This can be achieved by using `applyIfCPUFeature` annotations in the IR checks, similarly to e.g. `test/hotspot/jtreg/compiler/loopopts/superword/RedTest_int.java`. I recognize this limitation is pre-existing, but this issue seems a good place to address it. ------------- Changes requested by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17428#pullrequestreview-1822807360 From dlunden at openjdk.org Tue Jan 16 09:56:21 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Tue, 16 Jan 2024 09:56:21 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test In-Reply-To: References: Message-ID: On Tue, 16 Jan 2024 08:54:10 GMT, Roberto Casta?eda Lozano wrote: > Thanks for working on this, Daniel! > > These tests would be more idiomatic, precise, and possibly even faster if the IR verification was applied to each vectorization method (`test_sum`, `test_addc`, etc.) separately, instead of doing it as a bulk check over the entire `TestIntVect::testInner()`. This can be achieved by using `applyIfCPUFeature` annotations in the IR checks, similarly to e.g. `test/hotspot/jtreg/compiler/loopopts/superword/RedTest_int.java`. I recognize this limitation is pre-existing, but this issue seems a good place to address it. Thanks Roberto, it sounds reasonable to diverge from the original test and make it modular instead of sticking with the bulk test. I'll propose a new updated version. I'll also have a look at which jtreg `@requires` I can replace with IR framework annotations. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17428#issuecomment-1893407395 From eosterlund at openjdk.org Tue Jan 16 10:07:22 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 16 Jan 2024 10:07:22 GMT Subject: RFR: 8322692: ZGC: avoid over-unrolling due to hidden barrier size [v2] In-Reply-To: References: Message-ID: On Mon, 15 Jan 2024 09:01:14 GMT, Roberto Casta?eda Lozano wrote: >> This changeset refines the C2 loop unrolling heuristic by including an estimation of the final size of (Generational) ZGC barriers in the loop size computation. These are not exposed in C2's intermediate representation and thus currently ignored by the heuristic, which can lead to over-unrolling. >> >> #### Testing >> >> - tier1-5, stress test, fuzzing (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64). >> - tier6-9 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64, ZGC-specific tests only). >> >> #### Performance and code size evaluation >> >> - DaCapo, SPECjvm2008, SPECjbb2015 (linux-x64 with `-XX:+UseZGC -XX:+ZGenerational`). The changeset reduces slightly the size of the C2-generated code (around 0.3% fewer bytes per compiled bytecode for the DaCapo `fop` benchmark) and speeds up SPECjvm2008's `Serial` by around 4%. > > Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision: > > - Update copyright years > - Exclude size of slow path from estimation Looks good. Thanks for fixing! ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17367#pullrequestreview-1822961423 From epeter at openjdk.org Tue Jan 16 10:21:30 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 16 Jan 2024 10:21:30 GMT Subject: RFR: 8320175: [BACKOUT] 8316533: C2 compilation fails with assert(verify(phase)) failed: missing Value() optimization In-Reply-To: References: Message-ID: On Mon, 15 Jan 2024 14:35:35 GMT, Christian Hagedorn wrote: >> This reverts commit b5c863b772603b3fbf159d2bd3f6d1caffaff16a. [JDK-8316533](https://bugs.openjdk.org/browse/JDK-8316533). >> >> The patch fixed a bug in verification, but created a regression which will take more time to investigate and fix. Hence the backout for now. >> >> Testing: passed > > Looks good to me, too. Thanks for the reviews @chhagedorn @TobiHartmann ------------- PR Comment: https://git.openjdk.org/jdk/pull/17425#issuecomment-1893440205 From epeter at openjdk.org Tue Jan 16 10:21:32 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 16 Jan 2024 10:21:32 GMT Subject: Integrated: 8320175: [BACKOUT] 8316533: C2 compilation fails with assert(verify(phase)) failed: missing Value() optimization In-Reply-To: References: Message-ID: On Mon, 15 Jan 2024 13:39:29 GMT, Emanuel Peter wrote: > This reverts commit b5c863b772603b3fbf159d2bd3f6d1caffaff16a. [JDK-8316533](https://bugs.openjdk.org/browse/JDK-8316533). > > The patch fixed a bug in verification, but created a regression which will take more time to investigate and fix. Hence the backout for now. > > Testing: passed This pull request has now been integrated. Changeset: e01f6da1 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/e01f6da1b8e7de19f90c7cb21b3cd1ff2ab29cb7 Stats: 76 lines in 2 files changed: 0 ins; 76 del; 0 mod 8320175: [BACKOUT] 8316533: C2 compilation fails with assert(verify(phase)) failed: missing Value() optimization Reviewed-by: thartmann, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/17425 From rcastanedalo at openjdk.org Tue Jan 16 10:19:22 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 16 Jan 2024 10:19:22 GMT Subject: RFR: 8322692: ZGC: avoid over-unrolling due to hidden barrier size [v2] In-Reply-To: References: Message-ID: On Tue, 16 Jan 2024 10:04:18 GMT, Erik ?sterlund wrote: > Looks good. Thanks for fixing! Thanks for reviewing, Erik! ------------- PR Comment: https://git.openjdk.org/jdk/pull/17367#issuecomment-1893444846 From qamai at openjdk.org Tue Jan 16 10:28:24 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 16 Jan 2024 10:28:24 GMT Subject: RFR: 8322692: ZGC: avoid over-unrolling due to hidden barrier size [v2] In-Reply-To: References: Message-ID: On Mon, 15 Jan 2024 09:01:14 GMT, Roberto Casta?eda Lozano wrote: >> This changeset refines the C2 loop unrolling heuristic by including an estimation of the final size of (Generational) ZGC barriers in the loop size computation. These are not exposed in C2's intermediate representation and thus currently ignored by the heuristic, which can lead to over-unrolling. >> >> #### Testing >> >> - tier1-5, stress test, fuzzing (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64). >> - tier6-9 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64, ZGC-specific tests only). >> >> #### Performance and code size evaluation >> >> - DaCapo, SPECjvm2008, SPECjbb2015 (linux-x64 with `-XX:+UseZGC -XX:+ZGenerational`). The changeset reduces slightly the size of the C2-generated code (around 0.3% fewer bytes per compiled bytecode for the DaCapo `fop` benchmark) and speeds up SPECjvm2008's `Serial` by around 4%. > > Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision: > > - Update copyright years > - Exclude size of slow path from estimation src/hotspot/share/opto/loopTransform.cpp line 1003: > 1001: // Also count ModL, DivL, MulL, and other nodes that expand mightly > 1002: for (uint k = 0; k < _body.size(); k++) { > 1003: Node* n = _body.at(k); A lot of functions here are used to do the same thing (that is to estimate the size of a node), I think we should consolidate them, and use a specified value such as number of machine instructions instead. Maybe that could be done later? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17367#discussion_r1453225329 From thartmann at openjdk.org Tue Jan 16 10:30:20 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 16 Jan 2024 10:30:20 GMT Subject: RFR: JDK-8316756: C2 EA fails with "missing memory path" when encountering unsafe_arraycopy stub call [v3] In-Reply-To: References: Message-ID: On Mon, 15 Jan 2024 11:58:57 GMT, Tobias Holenstein wrote: >> Before https://github.com/openjdk/jdk/pull/5259 the graph of the following program looked like this during Escape Analysis: >> >> >> static int test() { >> MyClass obj = new MyClass(); // Non-escaping to trigger Escape Analysis >> UNSAFE.copyMemory(null, SRC_BASE, null, DST_BASE, 4); >> obj.x = 42; >> return obj.x; >> } >> >> With MemBarCPUOrder: >> working >> >> Setting `RC_NARROW_MEM` flag in `LibraryCallKit::inline_unsafe_copyMemory()` removes the `428 MergeMem` node. Without a `MergeMem` node after `432 StoreB` EA failes in Phase 2 of `ConnectionGraph::split_unique_types(...)` when trying to push the allocation's users on the appropriate worklist - `429 CallLeafNoFP` is not an expected user of `428 StoreB`. Therefore the assert `"EA: missing memory path"` is hit. >> Without MemBarCPUOrdera and after setting `RC_NARROW_MEM`: >> failing >> >> >> ### Proposed Fix >> Dropping the `RC_NARROW_MEM` flag in `LibraryCallKit::inline_unsafe_copyMemory()` causes the introduction of a `MergeMem` between `StoreB` and `CallLeafNoFP`, so the corresponding code in EA doesn't encounter a `CallLeafNoFP` anymore: >> fixed >> >> Testing: Tier1-4 passed > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > added testcase Looks good. Please run hs-comp-stress and hs-precheckin-comp as well before integration. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17347#pullrequestreview-1823006901 From epeter at openjdk.org Tue Jan 16 10:31:34 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 16 Jan 2024 10:31:34 GMT Subject: [jdk22] RFR: 8320175: [BACKOUT] 8316533: C2 compilation fails with assert(verify(phase)) failed: missing Value() optimization Message-ID: Hi all, This pull request contains a backport of commit [e01f6da1](https://github.com/openjdk/jdk/commit/e01f6da1b8e7de19f90c7cb21b3cd1ff2ab29cb7) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. The commit being backported was authored by Emanuel Peter on 16 Jan 2024 and was reviewed by Tobias Hartmann and Christian Hagedorn. Thanks! ------------- Commit messages: - Backport e01f6da1b8e7de19f90c7cb21b3cd1ff2ab29cb7 Changes: https://git.openjdk.org/jdk22/pull/78/files Webrev: https://webrevs.openjdk.org/?repo=jdk22&pr=78&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8320175 Stats: 76 lines in 2 files changed: 0 ins; 76 del; 0 mod Patch: https://git.openjdk.org/jdk22/pull/78.diff Fetch: git fetch https://git.openjdk.org/jdk22.git pull/78/head:pull/78 PR: https://git.openjdk.org/jdk22/pull/78 From thartmann at openjdk.org Tue Jan 16 10:31:35 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 16 Jan 2024 10:31:35 GMT Subject: [jdk22] RFR: 8320175: [BACKOUT] 8316533: C2 compilation fails with assert(verify(phase)) failed: missing Value() optimization In-Reply-To: References: Message-ID: On Tue, 16 Jan 2024 10:20:13 GMT, Emanuel Peter wrote: > Hi all, > > This pull request contains a backport of commit [e01f6da1](https://github.com/openjdk/jdk/commit/e01f6da1b8e7de19f90c7cb21b3cd1ff2ab29cb7) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Emanuel Peter on 16 Jan 2024 and was reviewed by Tobias Hartmann and Christian Hagedorn. > > Thanks! Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk22/pull/78#pullrequestreview-1823004298 From chagedorn at openjdk.org Tue Jan 16 10:37:24 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 16 Jan 2024 10:37:24 GMT Subject: [jdk22] Integrated: 8323101: C2: assert(n->in(0) == nullptr) failed: divisions with zero check should already have bailed out earlier in split-if In-Reply-To: References: Message-ID: On Mon, 15 Jan 2024 12:34:56 GMT, Christian Hagedorn wrote: > Hi all, > > This pull request contains a backport of commit [7e0a4ed6](https://github.com/openjdk/jdk/commit/7e0a4ed6292586772c23292dbdd67ed1db5c12f7) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Christian Hagedorn on 15 Jan 2024 and was reviewed by Vladimir Kozlov and Tobias Hartmann. > > Thanks! This pull request has now been integrated. Changeset: 92575050 Author: Christian Hagedorn URL: https://git.openjdk.org/jdk22/commit/92575050590ebd28f9c4d2e90371bdc08bbc5940 Stats: 215 lines in 2 files changed: 214 ins; 0 del; 1 mod 8323101: C2: assert(n->in(0) == nullptr) failed: divisions with zero check should already have bailed out earlier in split-if Reviewed-by: thartmann Backport-of: 7e0a4ed6292586772c23292dbdd67ed1db5c12f7 ------------- PR: https://git.openjdk.org/jdk22/pull/76 From rcastanedalo at openjdk.org Tue Jan 16 10:41:23 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 16 Jan 2024 10:41:23 GMT Subject: RFR: 8322692: ZGC: avoid over-unrolling due to hidden barrier size [v2] In-Reply-To: References: Message-ID: On Tue, 16 Jan 2024 10:25:18 GMT, Quan Anh Mai wrote: >> Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update copyright years >> - Exclude size of slow path from estimation > > src/hotspot/share/opto/loopTransform.cpp line 1003: > >> 1001: // Also count ModL, DivL, MulL, and other nodes that expand mightly >> 1002: for (uint k = 0; k < _body.size(); k++) { >> 1003: Node* n = _body.at(k); > > A lot of functions here are used to do the same thing (that is to estimate the size of a node), I think we should consolidate them, and use a specified value such as number of machine instructions instead. Maybe that could be done later? Yes, that sounds like a good idea for a later enhancement, please feel free to file a RFE. Note that the body size unit currently used by the heuristic is Ideal nodes (as opposed to machine instructions). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17367#discussion_r1453242353 From thartmann at openjdk.org Tue Jan 16 10:50:29 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 16 Jan 2024 10:50:29 GMT Subject: RFR: 8323616: [JVMCI] TestInvalidJVMCIOption.java fails intermittently with NPE [v3] In-Reply-To: References: <8KAT1E7MkSLCl1JrnEBjDNvgmaCn_ryBbGVMtY0uiTQ=.e6bc008c-5338-4361-9042-20f1ba5e65cd@github.com> Message-ID: On Tue, 16 Jan 2024 08:36:37 GMT, Doug Simon wrote: >> This PR changes callSystemExit to call `vm_exit_during_initialization()` instead of `System.exit` if the module system has not been initialized. This avoids an NPE in the `System.exit` code path where it is assumed that the `Class.module` field is non-null for `java.lang.Shutdown`. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > use vm_exit_during_initialization instead of vm_exit Looks reasonable. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17397#pullrequestreview-1823045374 From yyang at openjdk.org Tue Jan 16 11:14:36 2024 From: yyang at openjdk.org (Yi Yang) Date: Tue, 16 Jan 2024 11:14:36 GMT Subject: RFR: 8323795: jcmd Compiler.codecache counts total sizes of used/free Message-ID: CodeHeap 'non-profiled nmethods': size=118592Kb used=29Kb max_used=29Kb free=118562Kb bounds [0x00007fbe84622000, 0x00007fbe84892000, 0x00007fbe8b9f2000] CodeHeap 'profiled nmethods': size=118588Kb used=80Kb max_used=80Kb free=118507Kb bounds [0x00007fbe7c9f2000, 0x00007fbe7cc62000, 0x00007fbe83dc1000] CodeHeap 'non-nmethods': size=8580Kb used=1258Kb max_used=1834Kb free=7321Kb bounds [0x00007fbe83dc1000, 0x00007fbe84031000, 0x00007fbe84622000] total_blobs=474 nmethods=87 adapters=293 compilation: enabled stopped_count=0, restarted_count=0 full_count=0 It's better to accumulates total size of used/free/size, for example CodeHeap 'non-profiled nmethods': size=118592Kb used=29Kb max_used=29Kb free=118562Kb bounds [0x00007fbe84622000, 0x00007fbe84892000, 0x00007fbe8b9f2000] CodeHeap 'profiled nmethods': size=118588Kb used=80Kb max_used=80Kb free=118507Kb bounds [0x00007fbe7c9f2000, 0x00007fbe7cc62000, 0x00007fbe83dc1000] CodeHeap 'non-nmethods': size=8580Kb used=1258Kb max_used=1834Kb free=7321Kb bounds [0x00007fbe83dc1000, 0x00007fbe84031000, 0x00007fbe84622000] total_blobs=474 nmethods=87 adapters=293 compilation: enabled stopped_count=0, restarted_count=0 full_count=0 Total CodeHeap: size=245760Kb, used=1367Kb, max used=1943Kb, free=244390Kb ------------- Commit messages: - 8323795: jcmd Compiler.codecache counts total sizes of used/free Changes: https://git.openjdk.org/jdk/pull/17445/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17445&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8323795 Stats: 18 lines in 1 file changed: 16 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17445.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17445/head:pull/17445 PR: https://git.openjdk.org/jdk/pull/17445 From chagedorn at openjdk.org Tue Jan 16 11:40:23 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 16 Jan 2024 11:40:23 GMT Subject: RFR: JDK-8316756: C2 EA fails with "missing memory path" when encountering unsafe_arraycopy stub call [v3] In-Reply-To: References: Message-ID: <2_AtOPRPpm2wINLTviyCnA3E7DVj3bT5-ErEuQmc660=.ed1b019a-28b1-44a1-ae26-ec31ed5ce13c@github.com> On Mon, 15 Jan 2024 11:58:57 GMT, Tobias Holenstein wrote: >> Before https://github.com/openjdk/jdk/pull/5259 the graph of the following program looked like this during Escape Analysis: >> >> >> static int test() { >> MyClass obj = new MyClass(); // Non-escaping to trigger Escape Analysis >> UNSAFE.copyMemory(null, SRC_BASE, null, DST_BASE, 4); >> obj.x = 42; >> return obj.x; >> } >> >> With MemBarCPUOrder: >> working >> >> Setting `RC_NARROW_MEM` flag in `LibraryCallKit::inline_unsafe_copyMemory()` removes the `428 MergeMem` node. Without a `MergeMem` node after `432 StoreB` EA failes in Phase 2 of `ConnectionGraph::split_unique_types(...)` when trying to push the allocation's users on the appropriate worklist - `429 CallLeafNoFP` is not an expected user of `428 StoreB`. Therefore the assert `"EA: missing memory path"` is hit. >> Without MemBarCPUOrdera and after setting `RC_NARROW_MEM`: >> failing >> >> >> ### Proposed Fix >> Dropping the `RC_NARROW_MEM` flag in `LibraryCallKit::inline_unsafe_copyMemory()` causes the introduction of a `MergeMem` between `StoreB` and `CallLeafNoFP`, so the corresponding code in EA doesn't encounter a `CallLeafNoFP` anymore: >> fixed >> >> Testing: Tier1-4 passed > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > added testcase src/hotspot/share/opto/escape.cpp line 4010: > 4008: } > 4009: } else if (n->is_CallLeaf()) { > 4010: // Runtime calls with narrow memory input (no MergeMem node) Could we somehow assert here that we have a call with an intended narrow memory input? Directly asserting that this is an unsafe arraycopy might be too specific. But maybe we can add the following sanity check? n->as_CallLeaf()->adr_type()->is_rawptr() ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17347#discussion_r1453306060 From jbhateja at openjdk.org Tue Jan 16 12:12:34 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 16 Jan 2024 12:12:34 GMT Subject: RFR: JDK-8320448 Accelerate IndexOf using AVX2 [v7] In-Reply-To: References: Message-ID: On Thu, 11 Jan 2024 23:06:32 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: > > - Merge branch 'openjdk:master' into indexof > - Merge branch 'openjdk:master' into indexof > - Addressing review comments. > - Fix for JDK-8321599 > - Support UU IndexOf > - Only use optimization when EnableX86ECoreOpts is true > - Fix whitespace > - Merge branch 'openjdk:master' into indexof > - Comments; added exhaustive-ish test > - Subtracting 0x10 twice. > - ... and 12 more: https://git.openjdk.org/jdk/compare/8e12053e...3e58d0c2 src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 197: > 195: __ bind(L_small_string); > 196: __ cmpq(r15, 0x20); > 197: __ ja(L_small_string2); ja should replaced by jg. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1526: > 1524: __ movq(rdx, r8); > 1525: __ movq(rcx, r9); > 1526: #endif Can we spill them into XXMs, to save costly stack operations. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1545: > 1543: // return 0; > 1544: // } > 1545: __ movq(r12, rcx); Kindly use meaningful variable and label names. It will ease the review process and maintenance. src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1551: > 1549: __ movq(r15, rsi); > 1550: __ movq(r11, rdi); > 1551: __ cmpq(rsi, 0x20); All comparisons are with 32 bit int value , cmpq -> cmpl, may save emitting REX encoding prefix (no need for setting REX.W). src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 1552: > 1550: __ movq(r11, rdi); > 1551: __ cmpq(rsi, 0x20); > 1552: __ jb(L_small_string); All the comparisons against needle length are signed integer comparisons, so jb should be replaced by jl src/hotspot/share/opto/library_call.cpp line 1206: > 1204: > 1205: Node* result = nullptr; > 1206: bool do_intrinsic = Name change suggestion: do_intrinsic -> call_opt_stub src/hotspot/share/opto/library_call.cpp line 1229: > 1227: } else { > 1228: result = make_indexOf_node(src_start, src_count, tgt_start, tgt_count, > 1229: result_rgn, result_phi, ae); Existing routines emits IR to handle following special cases. tgt_cnt > src_cnt return -1 tgt_cnt == 0 return 0. Should we not be preserving those check before calling stub ? As of now these checks are part of stub and doing them in JIT code will save call overhead. src/hotspot/share/opto/runtime.cpp line 1347: > 1345: fields[argp++] = TypeInt::INT; // needle length > 1346: fields[argp++] = TypePtr::NOTNULL; // haystack array > 1347: fields[argp++] = TypeInt::INT; // haystack length Do we need to swap the comments? first two arguments corresponds to value (haystack) as per java side intrinsic signature. https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/StringLatin1.java#L348 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1453304911 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1453332647 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1453333045 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1453333555 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1453333878 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1453338427 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1453338718 PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1453329079 From dnsimon at openjdk.org Tue Jan 16 12:28:31 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 16 Jan 2024 12:28:31 GMT Subject: RFR: 8323616: [JVMCI] TestInvalidJVMCIOption.java fails intermittently with NPE [v4] In-Reply-To: <8KAT1E7MkSLCl1JrnEBjDNvgmaCn_ryBbGVMtY0uiTQ=.e6bc008c-5338-4361-9042-20f1ba5e65cd@github.com> References: <8KAT1E7MkSLCl1JrnEBjDNvgmaCn_ryBbGVMtY0uiTQ=.e6bc008c-5338-4361-9042-20f1ba5e65cd@github.com> Message-ID: > This PR changes callSystemExit to call `vm_exit_during_initialization()` instead of `System.exit` if the module system has not been initialized. This avoids an NPE in the `System.exit` code path where it is assumed that the `Class.module` field is non-null for `java.lang.Shutdown`. Doug Simon has updated the pull request incrementally with one additional commit since the last revision: use substring instead of equality test for expected error message ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17397/files - new: https://git.openjdk.org/jdk/pull/17397/files/91be4c2d..a9967cc4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17397&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17397&range=02-03 Stats: 6 lines in 1 file changed: 4 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17397.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17397/head:pull/17397 PR: https://git.openjdk.org/jdk/pull/17397 From chagedorn at openjdk.org Tue Jan 16 12:35:21 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 16 Jan 2024 12:35:21 GMT Subject: RFR: JDK-8316756: C2 EA fails with "missing memory path" when encountering unsafe_arraycopy stub call [v3] In-Reply-To: References: Message-ID: <9TBHM9ao0Tyjc972h-X9OhyvO3Aj6CxYYukDmdSg4ps=.ee848b99-533d-42d3-a82c-74ef323b44bb@github.com> On Mon, 15 Jan 2024 11:58:57 GMT, Tobias Holenstein wrote: >> Before https://github.com/openjdk/jdk/pull/5259 the graph of the following program looked like this during Escape Analysis: >> >> >> static int test() { >> MyClass obj = new MyClass(); // Non-escaping to trigger Escape Analysis >> UNSAFE.copyMemory(null, SRC_BASE, null, DST_BASE, 4); >> obj.x = 42; >> return obj.x; >> } >> >> With MemBarCPUOrder: >> working >> >> Setting `RC_NARROW_MEM` flag in `LibraryCallKit::inline_unsafe_copyMemory()` removes the `428 MergeMem` node. Without a `MergeMem` node after `432 StoreB` EA failes in Phase 2 of `ConnectionGraph::split_unique_types(...)` when trying to push the allocation's users on the appropriate worklist - `429 CallLeafNoFP` is not an expected user of `428 StoreB`. Therefore the assert `"EA: missing memory path"` is hit. >> Without MemBarCPUOrdera and after setting `RC_NARROW_MEM`: >> failing >> >> >> ### Proposed Fix >> Dropping the `RC_NARROW_MEM` flag in `LibraryCallKit::inline_unsafe_copyMemory()` causes the introduction of a `MergeMem` between `StoreB` and `CallLeafNoFP`, so the corresponding code in EA doesn't encounter a `CallLeafNoFP` anymore: >> fixed >> >> Testing: Tier1-4 passed > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > added testcase The fix looks good to me, too. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17347#pullrequestreview-1823230276 From chagedorn at openjdk.org Tue Jan 16 12:35:23 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 16 Jan 2024 12:35:23 GMT Subject: RFR: JDK-8316756: C2 EA fails with "missing memory path" when encountering unsafe_arraycopy stub call [v3] In-Reply-To: <2_AtOPRPpm2wINLTviyCnA3E7DVj3bT5-ErEuQmc660=.ed1b019a-28b1-44a1-ae26-ec31ed5ce13c@github.com> References: <2_AtOPRPpm2wINLTviyCnA3E7DVj3bT5-ErEuQmc660=.ed1b019a-28b1-44a1-ae26-ec31ed5ce13c@github.com> Message-ID: On Tue, 16 Jan 2024 11:37:19 GMT, Christian Hagedorn wrote: >> Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: >> >> added testcase > > src/hotspot/share/opto/escape.cpp line 4010: > >> 4008: } >> 4009: } else if (n->is_CallLeaf()) { >> 4010: // Runtime calls with narrow memory input (no MergeMem node) > > Could we somehow assert here that we have a call with an intended narrow memory input? Directly asserting that this is an unsafe arraycopy might be too specific. But maybe we can add the following sanity check? > > n->as_CallLeaf()->adr_type()->is_rawptr() Okay, it does not always need to be raw memory. But maybe we still want to assert that we have an unsafe arraycopy in this case? If we ever have more valid cases, the assert could easily be adjusted to allow them. But given how close we are to RDP 2, I suggest to go with this general fix and follow up with an RFE to add that assert if you all agree with that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17347#discussion_r1453362468 From jvernee at openjdk.org Tue Jan 16 13:28:30 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 16 Jan 2024 13:28:30 GMT Subject: Integrated: 8323651: compiler/c2/irTests/TestPrunedExHandler.java fails with -XX:+DeoptimizeALot In-Reply-To: <4zl1kION6ggeCNof1I8FBIqNKjpfxlUl23iT4_9XJPs=.8e9194a8-4da2-4771-94b6-1c94e6146cbc@github.com> References: <4zl1kION6ggeCNof1I8FBIqNKjpfxlUl23iT4_9XJPs=.8e9194a8-4da2-4771-94b6-1c94e6146cbc@github.com> Message-ID: <70cbu9KEULf_o0pC1y8nvOGjN4WoNYxu6VAT0uyq1Vs=.20b7000c-710f-4b69-956b-553619b8e326@github.com> On Mon, 15 Jan 2024 16:58:33 GMT, Jorn Vernee wrote: > This test can not work with `-XX:+DeoptimizeALot`. Parts of it depend on a particular sequence of compilation and deoptimization, so if DeoptimizeALot deoptimizes things prematurely, the test can fail. > > This PR adds `@requires vm.opt.DeoptimizeALot != true` to the test so that it is skipped when `-XX:+DeoptimizeALot` is used. This pull request has now been integrated. Changeset: 2fd775f6 Author: Jorn Vernee URL: https://git.openjdk.org/jdk/commit/2fd775f69c8eb4d0bd1163e8b5d2615db105352b Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod 8323651: compiler/c2/irTests/TestPrunedExHandler.java fails with -XX:+DeoptimizeALot Reviewed-by: alanb, kvn ------------- PR: https://git.openjdk.org/jdk/pull/17432 From jbhateja at openjdk.org Tue Jan 16 13:29:24 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 16 Jan 2024 13:29:24 GMT Subject: RFR: JDK-8320448 Accelerate IndexOf using AVX2 [v7] In-Reply-To: References: Message-ID: <0XxCusssrDiiKzXBfdsY1XHkv9T6mJwJe7dwFz5Uy-I=.3325e496-5bf1-4a79-8969-e28e018b77db@github.com> On Thu, 11 Jan 2024 23:06:32 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: > > - Merge branch 'openjdk:master' into indexof > - Merge branch 'openjdk:master' into indexof > - Addressing review comments. > - Fix for JDK-8321599 > - Support UU IndexOf > - Only use optimization when EnableX86ECoreOpts is true > - Fix whitespace > - Merge branch 'openjdk:master' into indexof > - Comments; added exhaustive-ish test > - Subtracting 0x10 twice. > - ... and 12 more: https://git.openjdk.org/jdk/compare/8e12053e...3e58d0c2 src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 417: > 415: __ cmpl(Address(rbx, r15, Address::times_1, -0x14), rax); > 416: __ jne(L_top_loop_1); > 417: __ jmp(L_0x406019); For cases which are multiple of 4 bytes we can use VMASKMOVPS (conditional moves) and VPTEST. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1453425855 From jbhateja at openjdk.org Tue Jan 16 13:32:25 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 16 Jan 2024 13:32:25 GMT Subject: RFR: JDK-8320448 Accelerate IndexOf using AVX2 [v7] In-Reply-To: References: Message-ID: On Thu, 11 Jan 2024 23:06:32 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: > > - Merge branch 'openjdk:master' into indexof > - Merge branch 'openjdk:master' into indexof > - Addressing review comments. > - Fix for JDK-8321599 > - Support UU IndexOf > - Only use optimization when EnableX86ECoreOpts is true > - Fix whitespace > - Merge branch 'openjdk:master' into indexof > - Comments; added exhaustive-ish test > - Subtracting 0x10 twice. > - ... and 12 more: https://git.openjdk.org/jdk/compare/8e12053e...3e58d0c2 src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 470: > 468: __ jne(L_top_loop_1); > 469: __ jmp(L_0x406019); > 470: For 16 bytes we can directly use [V]PTEST instruction to save multiple loads and compares. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1453429803 From tholenstein at openjdk.org Tue Jan 16 14:02:23 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Tue, 16 Jan 2024 14:02:23 GMT Subject: RFR: JDK-8316756: C2 EA fails with "missing memory path" when encountering unsafe_arraycopy stub call [v3] In-Reply-To: References: <2_AtOPRPpm2wINLTviyCnA3E7DVj3bT5-ErEuQmc660=.ed1b019a-28b1-44a1-ae26-ec31ed5ce13c@github.com> Message-ID: <9FJSR28ZY492lEVQZZ4kfAWby4J8lOyeTAUCHxJwRKM=.2221d572-200e-4391-ad4b-d32c06ffc5d6@github.com> On Tue, 16 Jan 2024 12:29:57 GMT, Christian Hagedorn wrote: >> src/hotspot/share/opto/escape.cpp line 4010: >> >>> 4008: } >>> 4009: } else if (n->is_CallLeaf()) { >>> 4010: // Runtime calls with narrow memory input (no MergeMem node) >> >> Could we somehow assert here that we have a call with an intended narrow memory input? Directly asserting that this is an unsafe arraycopy might be too specific. But maybe we can add the following sanity check? >> >> n->as_CallLeaf()->adr_type()->is_rawptr() > > Okay, it does not always need to be raw memory. But maybe we still want to assert that we have an unsafe arraycopy in this case? If we ever have more valid cases, the assert could easily be adjusted to allow them. > > But given how close we are to RDP 2, I suggest to go with this general fix and follow up with an RFE to add that assert if you all agree with that. > Could we somehow assert here that we have a call with an intended narrow memory input? Directly asserting that this is an unsafe arraycopy might be too specific. But maybe we can add the following sanity check? > > ``` > n->as_CallLeaf()->adr_type()->is_rawptr() > ``` @vnkozlov what do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17347#discussion_r1453467079 From jvernee at openjdk.org Tue Jan 16 14:26:43 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 16 Jan 2024 14:26:43 GMT Subject: [jdk22] RFR: 8323651: compiler/c2/irTests/TestPrunedExHandler.java fails with -XX:+DeoptimizeALot Message-ID: Hi all, This pull request contains a backport of commit [2fd775f6](https://github.com/openjdk/jdk/commit/2fd775f69c8eb4d0bd1163e8b5d2615db105352b) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. The commit being backported was authored by Jorn Vernee on 16 Jan 2024 and was reviewed by Alan Bateman and Vladimir Kozlov. This is a P4 test-only change, and we are currently in Ramp Down Phase 1. The release process allows P1-P5 test-only changes during RDP1: https://openjdk.org/jeps/3#Quick-reference Thanks! ------------- Commit messages: - Backport 2fd775f69c8eb4d0bd1163e8b5d2615db105352b Changes: https://git.openjdk.org/jdk22/pull/82/files Webrev: https://webrevs.openjdk.org/?repo=jdk22&pr=82&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8323651 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk22/pull/82.diff Fetch: git fetch https://git.openjdk.org/jdk22.git pull/82/head:pull/82 PR: https://git.openjdk.org/jdk22/pull/82 From thartmann at openjdk.org Tue Jan 16 15:04:20 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 16 Jan 2024 15:04:20 GMT Subject: [jdk22] RFR: 8323651: compiler/c2/irTests/TestPrunedExHandler.java fails with -XX:+DeoptimizeALot In-Reply-To: References: Message-ID: On Tue, 16 Jan 2024 14:20:18 GMT, Jorn Vernee wrote: > Hi all, > > This pull request contains a backport of commit [2fd775f6](https://github.com/openjdk/jdk/commit/2fd775f69c8eb4d0bd1163e8b5d2615db105352b) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Jorn Vernee on 16 Jan 2024 and was reviewed by Alan Bateman and Vladimir Kozlov. > > This is a P4 test-only change, and we are currently in Ramp Down Phase 1. The release process allows P1-P5 test-only changes during RDP1: https://openjdk.org/jeps/3#Quick-reference > > Thanks! Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk22/pull/82#pullrequestreview-1823692213 From epeter at openjdk.org Tue Jan 16 15:13:32 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 16 Jan 2024 15:13:32 GMT Subject: RFR: 8318446: C2: implement StoreNode::Ideal_merge_stores Message-ID: This is a feature requiested by @RogerRiggs and @cl4es . **Idea** Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get speedups by using Unsafe (e.g. `Unsafe.putLongUnaligned`), or ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code. This patch here supports a few simple use-cases, like these: Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant: https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395 Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the splitting (i.e. shifting and truncation), and directly store the variable: https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456 The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian): https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338 https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472 **Details** This draft currently implements the optimization in an additional special IGVN phase: https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485 We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. We essentially try to establish a chain of mergable stores: https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806 Mergable stores must have the same Opcode (implies they have the same element type and hence size). Further, mergable stores must have the same control (or be separated by only a RangeCheck). Further, they must either both store constants, or adjacent segments of a larger value (i.e. a larger value right-shifted by a constant offset, see`is_con_RShift`). Further, we must be able to prove that the stores reference adjacent memory (i.e. the address is shifted by the element size). For two mergable stores (one `use`, one `def`), the def-store should not have any other use than the use-store, so that we only merge stores that are in the same basic block. With the only exceptions of merging through RangeChecks (which can have MergeMem nodes on the memory path, and hence such MergeMem are allowed as secondary uses of the def-node). I made this optimization a new phase, and placed it after loop-opts for these reasons: - I do not want it to interfere with loop-opts, in particular with the autovectorizer (SuperWord). - I don't want it to interfere with any other memory optimizations, this should just improve things if nothing else worked. - Checking if two memory addresses are adjacent is much simpler after loop-opts, when some of the `CastII` nodes have disappeared, and the address expression becomes much simpler (in particular, the constants from the integer index can only sink through the CastI2L after loop-opts). We could do adjacency checking with a more complicated algorithm, such as `VPointer` in the current autovectorizer. **Performance** TODO **Testing** Tier 1-6 + stress-testing. Performance testing: no significant difference. ------------- Commit messages: - fix flag initialization issue - merge manually - Merge branch 'master' into JDK-8318446 - make sure the unsafe accesses are always recognized - swap number and type - add short and int examples to bench - rename array - add LE-API to bench - improved the benchmark - fix bad if problem - ... and 20 more: https://git.openjdk.org/jdk/compare/7e0a4ed6...adca9e22 Changes: https://git.openjdk.org/jdk/pull/16245/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16245&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8318446 Stats: 2007 lines in 11 files changed: 2006 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16245.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16245/head:pull/16245 PR: https://git.openjdk.org/jdk/pull/16245 From qamai at openjdk.org Tue Jan 16 15:13:33 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 16 Jan 2024 15:13:33 GMT Subject: RFR: 8318446: C2: implement StoreNode::Ideal_merge_stores In-Reply-To: References: Message-ID: On Wed, 18 Oct 2023 13:04:35 GMT, Emanuel Peter wrote: > This is a feature requiested by @RogerRiggs and @cl4es . > > **Idea** > > Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. > Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get > speedups by using > Unsafe (e.g. `Unsafe.putLongUnaligned`), or > ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). > They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code. > > This patch here supports a few simple use-cases, like these: > > Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395 > > Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the > splitting (i.e. shifting and truncation), and directly store the variable: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456 > > The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian): > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338 > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472 > > **Details** > > This draft currently implements the optimization in an additional special IGVN phase: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485 > > We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). > During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. > We essentially try to establish a chain of mergable stores: > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806 > > Mergable stores must have the same Opcode (implies they have the same element type and hence size). > Further, mergable stores must have the same control (or be separated by only a RangeCheck). > Further, they must either both store constants, or adjacent segments of a larger value... I imagine it would be beneficial if we could merge stores to fields and stores from loads, which are common in object constructions. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16245#issuecomment-1778435561 From epeter at openjdk.org Tue Jan 16 15:13:33 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 16 Jan 2024 15:13:33 GMT Subject: RFR: 8318446: C2: implement StoreNode::Ideal_merge_stores In-Reply-To: References: Message-ID: On Wed, 25 Oct 2023 03:11:12 GMT, Quan Anh Mai wrote: >> This is a feature requiested by @RogerRiggs and @cl4es . >> >> **Idea** >> >> Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. >> Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get >> speedups by using >> Unsafe (e.g. `Unsafe.putLongUnaligned`), or >> ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). >> They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code. >> >> This patch here supports a few simple use-cases, like these: >> >> Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant: >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395 >> >> Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the >> splitting (i.e. shifting and truncation), and directly store the variable: >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456 >> >> The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian): >> >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338 >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472 >> >> **Details** >> >> This draft currently implements the optimization in an additional special IGVN phase: >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485 >> >> We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). >> During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. >> We essentially try to establish a chain of mergable stores: >> >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806 >> >> Mergable stores must have the same Opcode (implies they have the same element type and hence size). >> Further, mergable stores must have the same control (or be separated by only a RangeCheck). >> Further,... > > I imagine it would be beneficial if we could merge stores to fields and stores from loads, which are common in object constructions. > > Thanks. @merykitty do you have examples for both? Maybe stores to fields already works. Merging loads and stores may be out of scope. That sounds a little much like SLP. We can still try to do that in a future RFE. We could even try to use (masked) vector instructions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16245#issuecomment-1778600064 From qamai at openjdk.org Tue Jan 16 15:13:34 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 16 Jan 2024 15:13:34 GMT Subject: RFR: 8318446: C2: implement StoreNode::Ideal_merge_stores In-Reply-To: References: Message-ID: On Wed, 25 Oct 2023 06:31:05 GMT, Emanuel Peter wrote: >> I imagine it would be beneficial if we could merge stores to fields and stores from loads, which are common in object constructions. >> >> Thanks. > > @merykitty do you have examples for both? Maybe stores to fields already works. Merging loads and stores may be out of scope. That sounds a little much like SLP. We can still try to do that in a future RFE. We could even try to use (masked) vector instructions. @eme64 I have tried your patch, it seems that there are some limitations: - The stores are not merged if the order is not right (e.g `a[2] = 2; a[1] = 1;`) - The stores are not merged if they are floating point constants. - The stores are not merged if they are consecutive fields in an object. E.g: class Point { int x; int y; } p.x = 1; p.y = 2; // Cannot merge into mov [p.x], 0x200000001 Regarding the final point, fields may be of different types with different sizes and there may be padding between them. This means that for load-store sequence merges, I think SLP cannot handle these cases. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16245#issuecomment-1779472249 From epeter at openjdk.org Tue Jan 16 15:13:35 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 16 Jan 2024 15:13:35 GMT Subject: RFR: 8318446: C2: implement StoreNode::Ideal_merge_stores In-Reply-To: References: Message-ID: <_EOrNbIYbl3WazGH0hgAGGfArkWG_gVCyfJR6jD1gdA=.8fa6982c-f3b6-4372-94bd-77c3f2738a4a@github.com> On Wed, 25 Oct 2023 14:59:07 GMT, Quan Anh Mai wrote: >> @merykitty do you have examples for both? Maybe stores to fields already works. Merging loads and stores may be out of scope. That sounds a little much like SLP. We can still try to do that in a future RFE. We could even try to use (masked) vector instructions. > > @eme64 I have tried your patch, it seems that there are some limitations: > > - The stores are not merged if the order is not right (e.g `a[2] = 2; a[1] = 1;`) > - The stores are not merged if they are floating point constants. > - The stores are not merged if they are consecutive fields in an object. E.g: > > > class Point { > int x; int y; > } > > p.x = 1; > p.y = 2; // Cannot merge into mov [p.x], 0x200000001 > > > Regarding the final point, fields may be of different types with different sizes and there may be padding between them. This means that for load-store sequence merges, I think SLP cannot handle these cases. > > Thanks. @merykitty I just looked at this project again today. About the limitations: Yes, this is deliberately limited for now. We could make it much more smart, and create a sort of straight-line code SLP algorithm that could even allow for different element sizes and padding in between (using masked loads / stores). Maybe that would be worth attempting. For now, this is just to satisfy the limited requirements of library folks who do not want to see everybody using Unsafe to merge stores. About fields stores: I see that different fields apparently are not in a chain, but rather independent: static void test3(Point p) { p.x = 1; p.y = 2; } 40 StoreI === 28 7 39 21 [[ 16 ]] @Test$Point+12 *, name=x, idx=4; Memory: @Test$Point+12 *, name=x, idx=4; !jvms: Test::test3 @ bci:2 (line 36) 44 StoreI === 28 7 43 41 [[ 16 ]] @Test$Point+16 *, name=y, idx=5; Memory: @Test$Point+16 *, name=y, idx=5; !jvms: Test::test3 @ bci:7 (line 37) I should be able to allow for that quite easily, they can either be in a chain, or have the same memory state as input. @merykitty @cl4es @RogerRiggs @vnkozlov I wonder if you think that the approach of this PR is good, and if you have any suggestions about it? - Is a separate phase ok? - Is this PR in a sweet-spot that reaches the goals of the library-folks, but is not too complex? - Would you prefer a more general solution, like a straight-line SLP algorithm, that can merge (even vectorize) any load / store sequences, even merge accesses with different element sizes and with gaps/padding? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16245#issuecomment-1893927494 PR Comment: https://git.openjdk.org/jdk/pull/16245#issuecomment-1893940205 From tholenstein at openjdk.org Tue Jan 16 15:37:20 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Tue, 16 Jan 2024 15:37:20 GMT Subject: RFR: JDK-8316756: C2 EA fails with "missing memory path" when encountering unsafe_arraycopy stub call [v3] In-Reply-To: <6_AwJfpSEdDqzEPpq2ns9JeC9cPmIjURIcbxAijlE4Y=.8ea0ba42-e1da-42d1-9b34-d9a89d149a02@github.com> References: <6_AwJfpSEdDqzEPpq2ns9JeC9cPmIjURIcbxAijlE4Y=.8ea0ba42-e1da-42d1-9b34-d9a89d149a02@github.com> Message-ID: On Mon, 15 Jan 2024 19:24:23 GMT, Vladimir Kozlov wrote: > Did you file RFE for StoreB move? Yes, I filed https://bugs.openjdk.org/browse/JDK-8323813 ------------- PR Comment: https://git.openjdk.org/jdk/pull/17347#issuecomment-1893990445 From jvernee at openjdk.org Tue Jan 16 15:54:20 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 16 Jan 2024 15:54:20 GMT Subject: [jdk22] Integrated: 8323651: compiler/c2/irTests/TestPrunedExHandler.java fails with -XX:+DeoptimizeALot In-Reply-To: References: Message-ID: On Tue, 16 Jan 2024 14:20:18 GMT, Jorn Vernee wrote: > Hi all, > > This pull request contains a backport of commit [2fd775f6](https://github.com/openjdk/jdk/commit/2fd775f69c8eb4d0bd1163e8b5d2615db105352b) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Jorn Vernee on 16 Jan 2024 and was reviewed by Alan Bateman and Vladimir Kozlov. > > This is a P4 test-only change, and we are currently in Ramp Down Phase 1. The release process allows P1-P5 test-only changes during RDP1: https://openjdk.org/jeps/3#Quick-reference > > Thanks! This pull request has now been integrated. Changeset: 247a4360 Author: Jorn Vernee URL: https://git.openjdk.org/jdk22/commit/247a4360f6a4960821c1d10d92d44286e2681cdc Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod 8323651: compiler/c2/irTests/TestPrunedExHandler.java fails with -XX:+DeoptimizeALot Reviewed-by: thartmann Backport-of: 2fd775f69c8eb4d0bd1163e8b5d2615db105352b ------------- PR: https://git.openjdk.org/jdk22/pull/82 From never at openjdk.org Tue Jan 16 15:57:21 2024 From: never at openjdk.org (Tom Rodriguez) Date: Tue, 16 Jan 2024 15:57:21 GMT Subject: RFR: 8323616: [JVMCI] TestInvalidJVMCIOption.java fails intermittently with NPE [v4] In-Reply-To: References: <8KAT1E7MkSLCl1JrnEBjDNvgmaCn_ryBbGVMtY0uiTQ=.e6bc008c-5338-4361-9042-20f1ba5e65cd@github.com> Message-ID: On Tue, 16 Jan 2024 12:28:31 GMT, Doug Simon wrote: >> This PR changes callSystemExit to call `vm_exit_during_initialization()` instead of `System.exit` if the module system has not been initialized. This avoids an NPE in the `System.exit` code path where it is assumed that the `Class.module` field is non-null for `java.lang.Shutdown`. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > use substring instead of equality test for expected error message Looks good. ------------- Marked as reviewed by never (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17397#pullrequestreview-1823889605 From thartmann at openjdk.org Tue Jan 16 16:24:22 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 16 Jan 2024 16:24:22 GMT Subject: RFR: 8320237 C2: late inlining of method handle invoke causes duplicate lines in PrintInlining output [v2] In-Reply-To: References: Message-ID: On Wed, 10 Jan 2024 16:37:44 GMT, Kangcheng Xu wrote: >> This PR resolves [JDK-8320237](https://bugs.openjdk.org/browse/JDK-8320237) >> >> The original behavior produces both a failure and success message upon late inlining which is confusing. The patch removes the failure message if inlining was successful. Huge thanks to @rwestrel for reporting and working out a solution. >> >> Unit test `test/hotspot/jtreg/compiler/inlining/TestDuplicatedLateInliningOutput.java` is added and passing. > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > fix VM crashes Thanks for the explanation. `compiler/inlining/TestDuplicatedLateInliningOutput.java` still failed once with `-XX:+UseZGC -XX:+ZGenerational`: java.lang.Exception: No inlining found at compiler.inlining.TestDuplicatedLateInliningOutput.lambda$test$1(TestDuplicatedLateInliningOutput.java:77) at java.base/java.util.OptionalInt.orElseThrow(OptionalInt.java:273) at compiler.inlining.TestDuplicatedLateInliningOutput.test(TestDuplicatedLateInliningOutput.java:77) at compiler.inlining.TestDuplicatedLateInliningOutput.main(TestDuplicatedLateInliningOutput.java:46) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) at java.base/java.lang.reflect.Method.invoke(Method.java:580) at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:333) at java.base/java.lang.Thread.run(Thread.java:1575) ------------- PR Comment: https://git.openjdk.org/jdk/pull/17147#issuecomment-1894076148 From kvn at openjdk.org Tue Jan 16 17:05:21 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 16 Jan 2024 17:05:21 GMT Subject: RFR: JDK-8316756: C2 EA fails with "missing memory path" when encountering unsafe_arraycopy stub call [v3] In-Reply-To: <9FJSR28ZY492lEVQZZ4kfAWby4J8lOyeTAUCHxJwRKM=.2221d572-200e-4391-ad4b-d32c06ffc5d6@github.com> References: <2_AtOPRPpm2wINLTviyCnA3E7DVj3bT5-ErEuQmc660=.ed1b019a-28b1-44a1-ae26-ec31ed5ce13c@github.com> <9FJSR28ZY492lEVQZZ4kfAWby4J8lOyeTAUCHxJwRKM=.2221d572-200e-4391-ad4b-d32c06ffc5d6@github.com> Message-ID: On Tue, 16 Jan 2024 13:59:29 GMT, Tobias Holenstein wrote: >> Okay, it does not always need to be raw memory. But maybe we still want to assert that we have an unsafe arraycopy in this case? If we ever have more valid cases, the assert could easily be adjusted to allow them. >> >> But given how close we are to RDP 2, I suggest to go with this general fix and follow up with an RFE to add that assert if you all agree with that. > >> Could we somehow assert here that we have a call with an intended narrow memory input? Directly asserting that this is an unsafe arraycopy might be too specific. But maybe we can add the following sanity check? >> >> ``` >> n->as_CallLeaf()->adr_type()->is_rawptr() >> ``` > > @vnkozlov what do you think? If it is not Narrow memory you will get `MergeMem` node as Call's memory input which we put on`mergemem_worklist` and not processing it or its users in this part of code. We have `assert(mergemem_worklist.contains(m->as_MergeMem())` instead here. You can add `assert(!n->in(TypeFunc::Memory)->as_MergeMem(), "only narrow memory expected here");` if you want. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17347#discussion_r1453718235 From epeter at openjdk.org Tue Jan 16 17:21:19 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 16 Jan 2024 17:21:19 GMT Subject: RFR: 8323220: Reassociate loop invariants involved in Cmps and Add/Subs In-Reply-To: References: Message-ID: On Thu, 11 Jan 2024 19:59:39 GMT, Joshua Cao wrote: >> // inv1 == (x + inv2) => ( inv1 - inv2 ) == x >> // inv1 == (x - inv2) => ( inv1 + inv2 ) == x >> // inv1 == (inv2 - x) => (-inv1 + inv2 ) == x >> >> >> For example, >> >> >> fn(inv1, inv2) >> while(...) >> x = foobar() >> if inv1 == x + inv2 >> blackhole() >> >> >> We can transform this into >> >> >> fn(inv1, inv2) >> t = inv1 - inv2 >> while(...) >> x = foobar() >> if t == x >> blackhole() >> >> >> Here is an example: https://github.com/openjdk/jdk/blob/b78896b9aafcb15f453eaed6e154a5461581407b/src/java.base/share/classes/java/lang/invoke/LambdaFormEditor.java#L910. LHS `1` and RHS `pos` are both loop invariant >> >> Passes tier1 locally on Linux machine. Passes GHA on my fork. > > test/hotspot/jtreg/compiler/loopopts/InvariantCodeMotionReassociateCmp.java line 191: > >> 189: @Arguments({Argument.NUMBER_42, Argument.NUMBER_42}) >> 190: @IR(failOn = {IRNode.SUB_I}) >> 191: public void leDontReassociate(int inv1, int inv2) { > > I added DontReassociate tests for `le`, `gt`, and `ge`. For `lt`, C2 generates a second `SUB_I` as part of other transformations. > > IR matching for ADD/SUB is pretty hard in general. They commonly are created as part of other transformations. Any suggestions on how I can test this better is appreciated. You could always use a simple regex with the linenumber I guess. But that is a bit nasty too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17375#discussion_r1453737924 From epeter at openjdk.org Tue Jan 16 17:44:27 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 16 Jan 2024 17:44:27 GMT Subject: RFR: 8323220: Reassociate loop invariants involved in Cmps and Add/Subs In-Reply-To: References: Message-ID: On Thu, 11 Jan 2024 17:41:53 GMT, Joshua Cao wrote: > // inv1 == (x + inv2) => ( inv1 - inv2 ) == x > // inv1 == (x - inv2) => ( inv1 + inv2 ) == x > // inv1 == (inv2 - x) => (-inv1 + inv2 ) == x > > > For example, > > > fn(inv1, inv2) > while(...) > x = foobar() > if inv1 == x + inv2 > blackhole() > > > We can transform this into > > > fn(inv1, inv2) > t = inv1 - inv2 > while(...) > x = foobar() > if t == x > blackhole() > > > Here is an example: https://github.com/openjdk/jdk/blob/b78896b9aafcb15f453eaed6e154a5461581407b/src/java.base/share/classes/java/lang/invoke/LambdaFormEditor.java#L910. LHS `1` and RHS `pos` are both loop invariant > > Passes tier1 locally on Linux machine. Passes GHA on my fork. Nice idea, thanks for the work! Can you also create a regression test that has edge-case values and random values, to check that the correctness ok? This would also experimentally rule out overflow issues. src/hotspot/share/opto/loopTransform.cpp line 276: > 274: } > 275: for (DUIterator i = n->outs(); n->has_out(i); i++) { > 276: BoolNode *boolOut = n->out(i)->isa_Bool(); Suggestion: BoolNode* boolOut = n->out(i)->isa_Bool(); src/hotspot/share/opto/loopTransform.cpp line 277: > 275: for (DUIterator i = n->outs(); n->has_out(i); i++) { > 276: BoolNode *boolOut = n->out(i)->isa_Bool(); > 277: if (!boolOut || !(boolOut->_test._test == BoolTest::eq || Suggestion: if (boolOut != nullptr || !(boolOut->_test._test == BoolTest::eq || src/hotspot/share/opto/loopTransform.cpp line 333: > 331: // inv1 == (inv2 - x) => (-inv1 + inv2 ) == x > 332: // > 333: Node* IdealLoopTree::reassociate_add_sub_cmp(Node* n1, int inv1_idx, int inv2_idx, PhaseIdealLoop *phase) { Suggestion: Node* IdealLoopTree::reassociate_add_sub_cmp(Node* n1, int inv1_idx, int inv2_idx, PhaseIdealLoop* phase) { src/hotspot/share/opto/loopTransform.cpp line 345: > 343: bool neg_inv1 = (n1->is_Sub() && !n1->is_Cmp() && inv1_idx == 2) || > 344: (n1->is_Cmp() && inv2_idx == 1 && n2->is_Sub()); > 345: if (n1->is_Sub() && !n1->is_Cmp() && inv1_idx == 1) { Would you mind adding some comments for this logic? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17375#pullrequestreview-1824310859 PR Review Comment: https://git.openjdk.org/jdk/pull/17375#discussion_r1453743203 PR Review Comment: https://git.openjdk.org/jdk/pull/17375#discussion_r1453744396 PR Review Comment: https://git.openjdk.org/jdk/pull/17375#discussion_r1453751200 PR Review Comment: https://git.openjdk.org/jdk/pull/17375#discussion_r1453761329 From duke at openjdk.org Tue Jan 16 17:47:21 2024 From: duke at openjdk.org (Joshua Cao) Date: Tue, 16 Jan 2024 17:47:21 GMT Subject: RFR: 8323220: Reassociate loop invariants involved in Cmps and Add/Subs In-Reply-To: References: Message-ID: On Tue, 16 Jan 2024 17:18:39 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/loopopts/InvariantCodeMotionReassociateCmp.java line 191: >> >>> 189: @Arguments({Argument.NUMBER_42, Argument.NUMBER_42}) >>> 190: @IR(failOn = {IRNode.SUB_I}) >>> 191: public void leDontReassociate(int inv1, int inv2) { >> >> I added DontReassociate tests for `le`, `gt`, and `ge`. For `lt`, C2 generates a second `SUB_I` as part of other transformations. >> >> IR matching for ADD/SUB is pretty hard in general. They commonly are created as part of other transformations. Any suggestions on how I can test this better is appreciated. > > You could always use a simple regex with the linenumber I guess. But that is a bit nasty too. Yeah, it would work for this patch. But people working on future unrelated changes may have to change the line number. Seems more pain than its worth. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17375#discussion_r1453766140 From chagedorn at openjdk.org Tue Jan 16 17:53:24 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 16 Jan 2024 17:53:24 GMT Subject: RFR: JDK-8316756: C2 EA fails with "missing memory path" when encountering unsafe_arraycopy stub call [v3] In-Reply-To: References: <2_AtOPRPpm2wINLTviyCnA3E7DVj3bT5-ErEuQmc660=.ed1b019a-28b1-44a1-ae26-ec31ed5ce13c@github.com> <9FJSR28ZY492lEVQZZ4kfAWby4J8lOyeTAUCHxJwRKM=.2221d572-200e-4391-ad4b-d32c06ffc5d6@github.com> Message-ID: On Tue, 16 Jan 2024 17:02:55 GMT, Vladimir Kozlov wrote: >>> Could we somehow assert here that we have a call with an intended narrow memory input? Directly asserting that this is an unsafe arraycopy might be too specific. But maybe we can add the following sanity check? >>> >>> ``` >>> n->as_CallLeaf()->adr_type()->is_rawptr() >>> ``` >> >> @vnkozlov what do you think? > > If it is not Narrow memory you will get `MergeMem` node as Call's memory input which we put on`mergemem_worklist` and not processing it or its users in this part of code. We have `assert(mergemem_worklist.contains(m->as_MergeMem())` instead here. > > You can add `assert(!n->in(TypeFunc::Memory)->as_MergeMem(), "only narrow memory expected here");` if you want. Right, we could add this assert as well for expecting a narrow memory input in general. What are your thoughts about explicitly asserting for an unsafe arraycopy when visiting this call? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17347#discussion_r1453772091 From kvn at openjdk.org Tue Jan 16 17:53:21 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 16 Jan 2024 17:53:21 GMT Subject: RFR: 8323795: jcmd Compiler.codecache counts total sizes of used/free In-Reply-To: References: Message-ID: On Tue, 16 Jan 2024 11:08:47 GMT, Yi Yang wrote: > CodeHeap 'non-profiled nmethods': size=118592Kb used=29Kb max_used=29Kb free=118562Kb > bounds [0x00007fbe84622000, 0x00007fbe84892000, 0x00007fbe8b9f2000] > CodeHeap 'profiled nmethods': size=118588Kb used=80Kb max_used=80Kb free=118507Kb > bounds [0x00007fbe7c9f2000, 0x00007fbe7cc62000, 0x00007fbe83dc1000] > CodeHeap 'non-nmethods': size=8580Kb used=1258Kb max_used=1834Kb free=7321Kb > bounds [0x00007fbe83dc1000, 0x00007fbe84031000, 0x00007fbe84622000] > total_blobs=474 nmethods=87 adapters=293 > compilation: enabled > stopped_count=0, restarted_count=0 > full_count=0 > > > It's better to accumulates total size of used/free/size, for example > > CodeHeap 'non-profiled nmethods': size=118592Kb used=29Kb max_used=29Kb free=118562Kb > bounds [0x00007fbe84622000, 0x00007fbe84892000, 0x00007fbe8b9f2000] > CodeHeap 'profiled nmethods': size=118588Kb used=80Kb max_used=80Kb free=118507Kb > bounds [0x00007fbe7c9f2000, 0x00007fbe7cc62000, 0x00007fbe83dc1000] > CodeHeap 'non-nmethods': size=8580Kb used=1258Kb max_used=1834Kb free=7321Kb > bounds [0x00007fbe83dc1000, 0x00007fbe84031000, 0x00007fbe84622000] > total_blobs=474 nmethods=87 adapters=293 > compilation: enabled > stopped_count=0, restarted_count=0 > full_count=0 > Total CodeHeap: > size=245760Kb, used=1367Kb, max used=1943Kb, free=244390Kb Can you do next layout of output?: CodeHeap 'non-nmethods': size=8580Kb used=1258Kb max_used=1834Kb free=7321Kb bounds [0x00007fbe83dc1000, 0x00007fbe84031000, 0x00007fbe84622000] Total CodeHeap: size=245760Kb, used=1367Kb, max used=1943Kb, free=244390Kb total_blobs=474 nmethods=87 adapters=293, full_count=0 Compilation: enabled, stopped_count=0, restarted_count=0 ------------- PR Comment: https://git.openjdk.org/jdk/pull/17445#issuecomment-1894225955 From epeter at openjdk.org Tue Jan 16 17:54:19 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 16 Jan 2024 17:54:19 GMT Subject: RFR: 8323220: Reassociate loop invariants involved in Cmps and Add/Subs In-Reply-To: References: Message-ID: <-5JZDSqyvX6C2dOKIogkE4BKSD594q1RGX3POS4HnTQ=.4b4d01ed-de2d-4ea8-abc3-32e4ee53d5f2@github.com> On Tue, 16 Jan 2024 17:44:45 GMT, Joshua Cao wrote: >> You could always use a simple regex with the linenumber I guess. But that is a bit nasty too. > > Yeah, it would work for this patch. But people working on future unrelated changes may have to change the line number. Seems more pain than its worth. Another suboptimal idea: you wrap the add / sub in a method, and then ensure that this method is inlined. It might still keep the annotation of being part of that inner method, and you could use regex to check for it. Or maybe we could also have some sort of relative line offset mechanism in the IR framework, that allows you to specify that you want something that is let's say 7 lines down from the IR rule. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17375#discussion_r1453773799 From duke at openjdk.org Tue Jan 16 17:54:28 2024 From: duke at openjdk.org (Joshua Cao) Date: Tue, 16 Jan 2024 17:54:28 GMT Subject: RFR: 8323820: [MacOS] build failure: non-void function does not return a value Message-ID: I can't reproduce the issue since I don't have the right build setup. But hopefully this change is trivial enough to fix the error. ------------- Commit messages: - 8323820: [MacOS] build failure: non-void function does not return a value Changes: https://git.openjdk.org/jdk/pull/17449/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17449&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8323820 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17449.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17449/head:pull/17449 PR: https://git.openjdk.org/jdk/pull/17449 From kvn at openjdk.org Tue Jan 16 17:55:23 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 16 Jan 2024 17:55:23 GMT Subject: RFR: 8322692: ZGC: avoid over-unrolling due to hidden barrier size [v2] In-Reply-To: References: Message-ID: On Mon, 15 Jan 2024 09:01:14 GMT, Roberto Casta?eda Lozano wrote: >> This changeset refines the C2 loop unrolling heuristic by including an estimation of the final size of (Generational) ZGC barriers in the loop size computation. These are not exposed in C2's intermediate representation and thus currently ignored by the heuristic, which can lead to over-unrolling. >> >> #### Testing >> >> - tier1-5, stress test, fuzzing (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64). >> - tier6-9 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64, ZGC-specific tests only). >> >> #### Performance and code size evaluation >> >> - DaCapo, SPECjvm2008, SPECjbb2015 (linux-x64 with `-XX:+UseZGC -XX:+ZGenerational`). The changeset reduces slightly the size of the C2-generated code (around 0.3% fewer bytes per compiled bytecode for the DaCapo `fop` benchmark) and speeds up SPECjvm2008's `Serial` by around 4%. > > Roberto Casta?eda Lozano has updated the pull request incrementally with two additional commits since the last revision: > > - Update copyright years > - Exclude size of slow path from estimation Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17367#pullrequestreview-1824440011 From kvn at openjdk.org Tue Jan 16 17:57:20 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 16 Jan 2024 17:57:20 GMT Subject: RFR: 8322743: assert(held_monitor_count() == jni_monitor_count()) failed In-Reply-To: References: Message-ID: On Wed, 10 Jan 2024 23:42:21 GMT, Dean Long wrote: >> Corner case with a local (not escaped) object used for synchronization. C2 Escape Analysis thinks that it can eliminate locks for it. In most cases it is true but not in this case. >> >> >> for (int i = 0; i < 2; ++i) { >> Object o = new Object(); >> synchronized (o) { // monitorenter >> // Trigger OSR compilation >> for (int j = 0; j < 100_000; ++j) { >> >> The test has nested loop which trigger OSR compilation. The locked object comes from Interpreter into compiled OSR code. During parsing C2 creates an other non escaped object and correctly merge both together (with Phi node) so that non escaped object is not scalar replaceable. Because it does not globally escapes EA still removes locks for it and, as result, also for merged locked object from Interpreter which is the bug. >> >> The fix is to check that synchronized block does not have any associated escaped objects when EA decides if locks can be eliminated. >> >> Added regression test prepared by @TobiHartmann. Tested tier1-5, xcomp and stress. >> Performance testing show no difference. > > Nevermind, object fields from the interpreter could have any value, so my idea doesn't work. @dean-long, @iwanowww do you have other questions? Can I get reviewed status ;^) ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17331#issuecomment-1894232084 From epeter at openjdk.org Tue Jan 16 18:01:23 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 16 Jan 2024 18:01:23 GMT Subject: RFR: 8322694: C1: Handle Constant and IfOp in NullCheckEliminator [v2] In-Reply-To: References: <1vvyuwLRjlWItKwCyighjCSM5SNbO4CSEE59hQtCU24=.b4783e52-328a-4ce3-8c92-7b736cea7546@github.com> Message-ID: On Wed, 3 Jan 2024 13:37:21 GMT, Denghui Dong wrote: >> This patch added the support for Constant and IfOn in NullCheckEliminator to eliminate more null check. >> >> testing: tier1-4 in progress > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > update Could there be a regression test for this enhancement? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17191#issuecomment-1894238306 From kvn at openjdk.org Tue Jan 16 18:09:20 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 16 Jan 2024 18:09:20 GMT Subject: RFR: JDK-8316756: C2 EA fails with "missing memory path" when encountering unsafe_arraycopy stub call [v3] In-Reply-To: References: <2_AtOPRPpm2wINLTviyCnA3E7DVj3bT5-ErEuQmc660=.ed1b019a-28b1-44a1-ae26-ec31ed5ce13c@github.com> <9FJSR28ZY492lEVQZZ4kfAWby4J8lOyeTAUCHxJwRKM=.2221d572-200e-4391-ad4b-d32c06ffc5d6@github.com> Message-ID: On Tue, 16 Jan 2024 17:50:10 GMT, Christian Hagedorn wrote: >> If it is not Narrow memory you will get `MergeMem` node as Call's memory input which we put on`mergemem_worklist` and not processing it or its users in this part of code. We have `assert(mergemem_worklist.contains(m->as_MergeMem())` instead here. >> >> You can add `assert(!n->in(TypeFunc::Memory)->as_MergeMem(), "only narrow memory expected here");` if you want. > > Right, we could add this assert as well for expecting a narrow memory input in general. What are your thoughts about explicitly asserting for an unsafe arraycopy when visiting this call? No. There is another case (DTrace runtime call) with narrow memory: [parseHelper.cpp#L54](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/parseHelper.cpp#L54) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17347#discussion_r1453792760 From shade at openjdk.org Tue Jan 16 18:10:23 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 16 Jan 2024 18:10:23 GMT Subject: RFR: 8323820: [MacOS] build failure: non-void function does not return a value In-Reply-To: References: Message-ID: On Tue, 16 Jan 2024 17:48:49 GMT, Joshua Cao wrote: > I can't reproduce the issue since I don't have the right build setup. But hopefully this change is trivial enough to fix the error. Using older clang, maybe? These methods should have been marked `noreturn`, but there is a special block for older clang: https://github.com/openjdk/jdk/blob/1007618f6f97fad0f66e4074b50521bdd853629e/src/hotspot/share/utilities/attributeNoreturn.hpp#L36-L48 src/hotspot/share/opto/castnode.cpp line 470: > 468: return new CastPPNode(c, in, type, dependency, types); > 469: } > 470: ShouldNotReachHere(); Keep the `fatal` with the message, and just add `return`? ------------- PR Review: https://git.openjdk.org/jdk/pull/17449#pullrequestreview-1824492468 PR Review Comment: https://git.openjdk.org/jdk/pull/17449#discussion_r1453793559 From epeter at openjdk.org Tue Jan 16 18:10:27 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 16 Jan 2024 18:10:27 GMT Subject: RFR: 8319822: Use a linear-time algorithm for assert_different_registers() [v6] In-Reply-To: References: Message-ID: On Mon, 27 Nov 2023 12:57:31 GMT, Andrew Haley wrote: >> At the present time, `assert_different_registers()` uses an O(N**2) algorithm in assert_different_registers(). We can utilize RegSet to do it in O(N) time. This would be a useful optimization for all builds with assertions enabled. >> >> In addition, it would be useful to be able to static_assert different registers. >> >> Also, I've taken the opportunity to expand the maximum size of a RegSet to 64 on 64-bit platforms. >> >> I also fixed a bug: sometimes `noreg` is passed to `assert_different_registers()`, but it may only be passed once or a spurious assertion is triggered. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Review feedback. Just a code-style review. Question: could there be some sort of regression test for this, with different examples and edge cases? src/hotspot/share/asm/register.hpp line 258: > 256: template > 257: inline constexpr bool different_registers(AbstractRegSet allocated_regs, R first_register) { > 258: return ! allocated_regs.contains(first_register); Suggestion: return !allocated_regs.contains(first_register); src/hotspot/share/asm/register.hpp line 264: > 262: inline constexpr bool different_registers(AbstractRegSet allocated_regs, R first_register, Rx... more_registers) { > 263: if (allocated_regs.contains(first_register)) > 264: return false; Use curly scope brackets ;) src/hotspot/share/asm/register.hpp line 286: > 284: inline void assert_different_registers(R first_register, Rx... more_registers) { > 285: #ifdef ASSERT > 286: if (! different_registers(first_register, more_registers...)) { Suggestion: if (!different_registers(first_register, more_registers...)) { src/hotspot/share/asm/register.hpp line 291: > 289: for (size_t i = 0; i < ARRAY_SIZE(regs) - 1; ++i) { > 290: for (size_t j = i + 1; j < ARRAY_SIZE(regs); ++j) { > 291: assert(! regs[i]->is_valid() || regs[i] != regs[j], Suggestion: assert(!regs[i]->is_valid() || regs[i] != regs[j], ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16617#pullrequestreview-1824483630 PR Review Comment: https://git.openjdk.org/jdk/pull/16617#discussion_r1453789805 PR Review Comment: https://git.openjdk.org/jdk/pull/16617#discussion_r1453790310 PR Review Comment: https://git.openjdk.org/jdk/pull/16617#discussion_r1453791120 PR Review Comment: https://git.openjdk.org/jdk/pull/16617#discussion_r1453790965 From epeter at openjdk.org Tue Jan 16 18:38:25 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 16 Jan 2024 18:38:25 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v4] In-Reply-To: References: Message-ID: On Thu, 4 Jan 2024 08:12:22 GMT, Roland Westrelin wrote: >> This change implements C2 optimizations for calls to >> ScopedValue.get(). Indeed, in: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> `v2` can be replaced by `v1` and the second call to `get()` can be >> optimized out. That's true whatever is between the 2 calls unless a >> new mapping for `scopedValue` is created in between (when that happens >> no optimizations is performed for the method being compiled). Hoisting >> a `get()` call out of loop for a loop invariant `scopedValue` should >> also be legal in most cases. >> >> `ScopedValue.get()` is implemented in java code as a 2 step process. A >> cache is attached to the current thread object. If the `ScopedValue` >> object is in the cache then the result from `get()` is read from >> there. Otherwise a slow call is performed that also inserts the >> mapping in the cache. The cache itself is lazily allocated. One >> `ScopedValue` can be hashed to 2 different indexes in the cache. On a >> cache probe, both indexes are checked. As a consequence, the process >> of probing the cache is a multi step process (check if the cache is >> present, check first index, check second index if first index >> failed). If the cache is populated early on, then when the method that >> calls `ScopedValue.get()` is compiled, profile reports the slow path >> as never taken and only the read from the cache is compiled. >> >> To perform the optimizations, I added 3 new node types to C2: >> >> - the pair >> ScopedValueGetHitsInCacheNode/ScopedValueGetLoadFromCacheNode for >> the cache probe >> >> - a cfg node ScopedValueGetResultNode to help locate the result of the >> `get()` call in the IR graph. >> >> In pseudo code, once the nodes are inserted, the code of a `get()` is: >> >> >> hits_in_the_cache = ScopedValueGetHitsInCache(scopedValue) >> if (hits_in_the_cache) { >> res = ScopedValueGetLoadFromCache(hits_in_the_cache); >> } else { >> res = ..; //slow call possibly inlined. Subgraph can be arbitray complex >> } >> res = ScopedValueGetResult(res) >> >> >> In the snippet: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> Replacing `v2` by `v1` is then done by starting from the >> `ScopedValueGetResult` node for the second `get()` and looking for a >> dominating `ScopedValueGetResult` for the same `ScopedValue` >> object. When one is found, it is used as a replacement. Eliminating >> the second `get()` call is achieved by making >> `ScopedValueGetHitsInCache` always successful if there's a dominating >> `Scoped... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > merge fix I just studied the background and hope to look into this in the next days. Personal wishlist: can you add a case where this optimization enables vectorization? Or do your optimizations happen too late for that? test/hotspot/jtreg/compiler/c2/irTests/TestScopedValue.java line 169: > 167: // MyLong long2 = (MyLong)scopedValue.get(); > 168: // return long1.getValue() + long2.getValue(); > 169: // } Are you still working on this? ------------- PR Review: https://git.openjdk.org/jdk/pull/16966#pullrequestreview-1824562688 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1453828788 From lucy at openjdk.org Tue Jan 16 18:41:23 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Tue, 16 Jan 2024 18:41:23 GMT Subject: RFR: 8323820: [MacOS] build failure: non-void function does not return a value In-Reply-To: References: Message-ID: On Tue, 16 Jan 2024 18:07:39 GMT, Aleksey Shipilev wrote: >> I can't reproduce the issue since I don't have the right build setup. But hopefully this change is trivial enough to fix the error. > > src/hotspot/share/opto/castnode.cpp line 470: > >> 468: return new CastPPNode(c, in, type, dependency, types); >> 469: } >> 470: ShouldNotReachHere(); > > Keep the `fatal` with the message, and just add `return`? `return nullptr;` does not create an issue with Xcode 15. Could have been an issue because if Xcode15 recognizes `fatal()` as a `noreturn` function, it could report the `return` as not reachable. `fatal()` vs. `ShouldNotReachHere()` Which one creates more helpful debug information? I'm fine with either one. I will review once decided. Xcode 12 <-> clang 12 Xcode 13 <-> clang 13 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17449#discussion_r1453843957 From duke at openjdk.org Tue Jan 16 18:47:33 2024 From: duke at openjdk.org (Joshua Cao) Date: Tue, 16 Jan 2024 18:47:33 GMT Subject: RFR: 8323820: [MacOS] build failure: non-void function does not return a value [v2] In-Reply-To: References: Message-ID: > I can't reproduce the issue since I don't have the right build setup. But hopefully this change is trivial enough to fix the error. Joshua Cao has updated the pull request incrementally with one additional commit since the last revision: ShouldNotReachHere() -> fatal() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17449/files - new: https://git.openjdk.org/jdk/pull/17449/files/f72aa7b7..95e25b26 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17449&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17449&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17449.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17449/head:pull/17449 PR: https://git.openjdk.org/jdk/pull/17449 From duke at openjdk.org Tue Jan 16 18:47:35 2024 From: duke at openjdk.org (Joshua Cao) Date: Tue, 16 Jan 2024 18:47:35 GMT Subject: RFR: 8323820: [MacOS] build failure: non-void function does not return a value [v2] In-Reply-To: References: Message-ID: On Tue, 16 Jan 2024 18:37:02 GMT, Lutz Schmidt wrote: >> src/hotspot/share/opto/castnode.cpp line 470: >> >>> 468: return new CastPPNode(c, in, type, dependency, types); >>> 469: } >>> 470: ShouldNotReachHere(); >> >> Keep the `fatal` with the message, and just add `return`? > > `return nullptr;` > does not create an issue with Xcode 15. Could have been an issue because if Xcode15 recognizes `fatal()` as a `noreturn` function, it could report the `return` as not reachable. > > `fatal()` vs. `ShouldNotReachHere()` > Which one creates more helpful debug information? > I'm fine with either one. I will review once decided. > > Xcode 12 <-> clang 12 > Xcode 13 <-> clang 13 Sure, changed it back to `fatal()`. Don't have strong preferences, and not sure what is best practices. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17449#discussion_r1453853918 From mli at openjdk.org Tue Jan 16 18:48:33 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 16 Jan 2024 18:48:33 GMT Subject: RFR: JDK-8318228: RISC-V: C2 ConvF2HF Message-ID: Hi, Can you review the patch to add ConvHF2F intrinsic to JDK for riscv? Thanks! ## Test ### Functionality #### hotspot tests (running in progress...) test/hotspot/jtreg/compiler/intrinsics/ test/hotspot/jtreg/compiler/c2/irTests #### jdk tests test/jdk/java/lang/Float/Binary16Conversion*.java ### Performance tested on licheepi. #### with UseZfh enabled Benchmark (size) Mode Cnt Score Error Units Fp16ConversionBenchmark.floatToFloat16 2048 avgt 2 4170.549 ns/op Fp16ConversionBenchmark.floatToFloat16Memory 2048 avgt 2 21.492 ns/op #### with UseZfh disabled (i.e. disable the intrinsic) Benchmark (size) Mode Cnt Score Error Units Fp16ConversionBenchmark.floatToFloat16 2048 avgt 2 25036.647 ns/op Fp16ConversionBenchmark.floatToFloat16Memory 2048 avgt 2 27.326 ns/op ------------- Commit messages: - clean code - Initial commit: float to float16 Changes: https://git.openjdk.org/jdk/pull/17450/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17450&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8318228 Stats: 57 lines in 3 files changed: 57 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17450.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17450/head:pull/17450 PR: https://git.openjdk.org/jdk/pull/17450 From dnsimon at openjdk.org Tue Jan 16 19:30:48 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 16 Jan 2024 19:30:48 GMT Subject: RFR: 8323616: [JVMCI] TestInvalidJVMCIOption.java fails intermittently with NPE [v4] In-Reply-To: References: <8KAT1E7MkSLCl1JrnEBjDNvgmaCn_ryBbGVMtY0uiTQ=.e6bc008c-5338-4361-9042-20f1ba5e65cd@github.com> Message-ID: On Tue, 16 Jan 2024 12:28:31 GMT, Doug Simon wrote: >> This PR changes callSystemExit to call `vm_exit_during_initialization()` instead of `System.exit` if the module system has not been initialized. This avoids an NPE in the `System.exit` code path where it is assumed that the `Class.module` field is non-null for `java.lang.Shutdown`. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > use substring instead of equality test for expected error message Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/17397#issuecomment-1894379007 From dnsimon at openjdk.org Tue Jan 16 19:35:13 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 16 Jan 2024 19:35:13 GMT Subject: Integrated: 8323616: [JVMCI] TestInvalidJVMCIOption.java fails intermittently with NPE In-Reply-To: <8KAT1E7MkSLCl1JrnEBjDNvgmaCn_ryBbGVMtY0uiTQ=.e6bc008c-5338-4361-9042-20f1ba5e65cd@github.com> References: <8KAT1E7MkSLCl1JrnEBjDNvgmaCn_ryBbGVMtY0uiTQ=.e6bc008c-5338-4361-9042-20f1ba5e65cd@github.com> Message-ID: On Fri, 12 Jan 2024 14:25:29 GMT, Doug Simon wrote: > This PR changes callSystemExit to call `vm_exit_during_initialization()` instead of `System.exit` if the module system has not been initialized. This avoids an NPE in the `System.exit` code path where it is assumed that the `Class.module` field is non-null for `java.lang.Shutdown`. This pull request has now been integrated. Changeset: 19c9388c Author: Doug Simon URL: https://git.openjdk.org/jdk/commit/19c9388c2001b7b3d21624e2dd4ab4fdd8821e2f Stats: 13 lines in 2 files changed: 11 ins; 0 del; 2 mod 8323616: [JVMCI] TestInvalidJVMCIOption.java fails intermittently with NPE Reviewed-by: thartmann, never ------------- PR: https://git.openjdk.org/jdk/pull/17397 From shade at openjdk.org Tue Jan 16 19:53:34 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 16 Jan 2024 19:53:34 GMT Subject: RFR: 8323820: [MacOS] build failure: non-void function does not return a value [v2] In-Reply-To: References: Message-ID: <9B1_CD5VFOIHOCWLpQMM8iOvASWxdwsd7tmnPWjBgOM=.0983d429-ecfc-4e65-97b8-f2f62eb4fc66@github.com> On Tue, 16 Jan 2024 18:47:33 GMT, Joshua Cao wrote: >> I can't reproduce the issue since I don't have the right build setup. But hopefully this change is trivial enough to fix the error. > > Joshua Cao has updated the pull request incrementally with one additional commit since the last revision: > > ShouldNotReachHere() -> fatal() Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/17449#pullrequestreview-1824820821 From lucy at openjdk.org Tue Jan 16 21:20:51 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Tue, 16 Jan 2024 21:20:51 GMT Subject: RFR: 8323820: [MacOS] build failure: non-void function does not return a value [v2] In-Reply-To: References: Message-ID: <28NxgieBGpFSIqj4R0cZN7gUOTElrDe6lAb4KNBYiNw=.4ac094a4-3973-4302-b222-de5804fc7b47@github.com> On Tue, 16 Jan 2024 18:47:33 GMT, Joshua Cao wrote: >> I can't reproduce the issue since I don't have the right build setup. But hopefully this change is trivial enough to fix the error. > > Joshua Cao has updated the pull request incrementally with one additional commit since the last revision: > > ShouldNotReachHere() -> fatal() Looks good now. ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17449#pullrequestreview-1825103551 From duke at openjdk.org Tue Jan 16 22:03:59 2024 From: duke at openjdk.org (Joshua Cao) Date: Tue, 16 Jan 2024 22:03:59 GMT Subject: Integrated: 8323820: [MacOS] build failure: non-void function does not return a value In-Reply-To: References: Message-ID: On Tue, 16 Jan 2024 17:48:49 GMT, Joshua Cao wrote: > I can't reproduce the issue since I don't have the right build setup. But hopefully this change is trivial enough to fix the error. This pull request has now been integrated. Changeset: b058063c Author: Joshua Cao Committer: Lutz Schmidt URL: https://git.openjdk.org/jdk/commit/b058063c40154ea008278077e2e6298ed6765426 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8323820: [MacOS] build failure: non-void function does not return a value Reviewed-by: shade, lucy ------------- PR: https://git.openjdk.org/jdk/pull/17449 From kvn at openjdk.org Tue Jan 16 22:31:30 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 16 Jan 2024 22:31:30 GMT Subject: RFR: JDK-8320448 Accelerate IndexOf using AVX2 [v7] In-Reply-To: References: Message-ID: On Thu, 11 Jan 2024 23:06:32 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: > > - Merge branch 'openjdk:master' into indexof > - Merge branch 'openjdk:master' into indexof > - Addressing review comments. > - Fix for JDK-8321599 > - Support UU IndexOf > - Only use optimization when EnableX86ECoreOpts is true > - Fix whitespace > - Merge branch 'openjdk:master' into indexof > - Comments; added exhaustive-ish test > - Subtracting 0x10 twice. > - ... and 12 more: https://git.openjdk.org/jdk/compare/8e12053e...3e58d0c2 src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 4111: > 4109: if ((UseAVX == 2) && EnableX86ECoreOpts && VM_Version::supports_avx2()) { > 4110: StubRoutines::_string_indexof = generate_string_indexof(); > 4111: } What motivation for this extensive new code only for avx2? 30% is nice (for some cases) but it is enabled only for AVX2 and not for avx512 which all modern x86 CPUs have so the code will not be used. Or it is typo? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1454139710 From kvn at openjdk.org Tue Jan 16 23:04:51 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 16 Jan 2024 23:04:51 GMT Subject: RFR: 8318446: C2: implement StoreNode::Ideal_merge_stores In-Reply-To: <_EOrNbIYbl3WazGH0hgAGGfArkWG_gVCyfJR6jD1gdA=.8fa6982c-f3b6-4372-94bd-77c3f2738a4a@github.com> References: <_EOrNbIYbl3WazGH0hgAGGfArkWG_gVCyfJR6jD1gdA=.8fa6982c-f3b6-4372-94bd-77c3f2738a4a@github.com> Message-ID: On Tue, 16 Jan 2024 15:09:27 GMT, Emanuel Peter wrote: >> @eme64 I have tried your patch, it seems that there are some limitations: >> >> - The stores are not merged if the order is not right (e.g `a[2] = 2; a[1] = 1;`) >> - The stores are not merged if they are floating point constants. >> - The stores are not merged if they are consecutive fields in an object. E.g: >> >> >> class Point { >> int x; int y; >> } >> >> p.x = 1; >> p.y = 2; // Cannot merge into mov [p.x], 0x200000001 >> >> >> Regarding the final point, fields may be of different types with different sizes and there may be padding between them. This means that for load-store sequence merges, I think SLP cannot handle these cases. >> >> Thanks. > > @merykitty @cl4es @RogerRiggs @vnkozlov I wonder if you think that the approach of this PR is good, and if you have any suggestions about it? > > - Is a separate phase ok? > - Is this PR in a sweet-spot that reaches the goals of the library-folks, but is not too complex? > - Would you prefer a more general solution, like a straight-line SLP algorithm, that can merge (even vectorize) any load / store sequences, even merge accesses with different element sizes and with gaps/padding? @eme64 I would suggest to change the subject of RFE and this PR to something like: "C2: optimize stores into primitive arrays by combining values into larger store" It will correctly describes the scope of changes. In a future we may have separate RFE for object fields - I don't think we should do it in this RFE. For performance result it would be nice to have only one table with additional column with % difference. It is hard to see now the difference. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16245#issuecomment-1894659376 From kvn at openjdk.org Tue Jan 16 23:10:52 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 16 Jan 2024 23:10:52 GMT Subject: RFR: 8318446: C2: implement StoreNode::Ideal_merge_stores In-Reply-To: References: Message-ID: On Wed, 18 Oct 2023 13:04:35 GMT, Emanuel Peter wrote: > This is a feature requiested by @RogerRiggs and @cl4es . > > **Idea** > > Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get speedups by using Unsafe (e.g. `Unsafe.putLongUnaligned`), or ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code. > > This patch here supports a few simple use-cases, like these: > > Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395 > > Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the splitting (i.e. shifting and truncation), and directly store the variable: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456 > > The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian): > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338 > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472 > > **Details** > > This draft currently implements the optimization in an additional special IGVN phase: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485 > > We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. We essentially try to establish a chain of mergable stores: > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806 > > Mergable stores must have the same Opcode (implies they have the same element type and hence size). Further, mergable stores must have the same control (or be separated by only a RangeCheck). Further, they must either both store constants, or adjacent segments of a larger value ... About changes. May be you can use something similar to ClearArrayNode. Collect all stores into one node and corresponding Mach (machine) nodes will implement it using available instructions instead of C2 decide the size of combined store. One drawback for these changes I see that you may use a lot more registers to keep all values. For constants you need to keep in mind the order of memory (little or big endian). ------------- PR Comment: https://git.openjdk.org/jdk/pull/16245#issuecomment-1894665744 From kvn at openjdk.org Tue Jan 16 23:16:00 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 16 Jan 2024 23:16:00 GMT Subject: RFR: 8323820: [MacOS] build failure: non-void function does not return a value [v2] In-Reply-To: References: Message-ID: <6QxyDPca19Pgb2NJ2R1WfigWnCyI-kecxdJ9YuK5OYw=.a1a92317-1519-46bf-9d81-1f94fe361bd4@github.com> On Tue, 16 Jan 2024 18:47:33 GMT, Joshua Cao wrote: >> I can't reproduce the issue since I don't have the right build setup. But hopefully this change is trivial enough to fix the error. > > Joshua Cao has updated the pull request incrementally with one additional commit since the last revision: > > ShouldNotReachHere() -> fatal() Nice! I missed to review this PR but I fully support it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17449#issuecomment-1894669964 From sgibbons at openjdk.org Tue Jan 16 23:53:53 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Tue, 16 Jan 2024 23:53:53 GMT Subject: RFR: JDK-8320448 Accelerate IndexOf using AVX2 [v7] In-Reply-To: References: Message-ID: On Tue, 16 Jan 2024 22:27:52 GMT, Vladimir Kozlov wrote: >> Scott Gibbons has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: >> >> - Merge branch 'openjdk:master' into indexof >> - Merge branch 'openjdk:master' into indexof >> - Addressing review comments. >> - Fix for JDK-8321599 >> - Support UU IndexOf >> - Only use optimization when EnableX86ECoreOpts is true >> - Fix whitespace >> - Merge branch 'openjdk:master' into indexof >> - Comments; added exhaustive-ish test >> - Subtracting 0x10 twice. >> - ... and 12 more: https://git.openjdk.org/jdk/compare/8e12053e...3e58d0c2 > > src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 4111: > >> 4109: if ((UseAVX == 2) && EnableX86ECoreOpts && VM_Version::supports_avx2()) { >> 4110: StubRoutines::_string_indexof = generate_string_indexof(); >> 4111: } > > What motivation for this extensive new code only for avx2? 30% is nice (for some cases) but it is enabled only for AVX2 and not for avx512 which all modern x86 CPUs have so the code will not be used. > > Or it is typo? This is acceleration for AVX2, replacing the pcmpestri instruction which is microcoded on E-cores and causes significant performance impact. I am working on a pared-down implementation and should update this PR in a couple of days. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1454217437 From kvn at openjdk.org Wed Jan 17 00:12:49 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 17 Jan 2024 00:12:49 GMT Subject: RFR: 8322996: BoxLockNode creation fails with assert(reg < CHUNK_SIZE) failed: sanity [v3] In-Reply-To: References: <66zey4Kt56pbcieRZ0AjBLemzwsAQcHCdDG1nlK-P1c=.14dc02d6-bef8-4239-9f39-02456756b7ea@github.com> Message-ID: On Mon, 15 Jan 2024 09:32:15 GMT, Daniel Lund?n wrote: >> I think your current code is correct. >> >> On x64 `sync_stack_slots` defined as 2 (takes 2 bits in regmask) in `x86_64.ad` and as 1 in `x86_32.ad`. On most 64 bit platforms it is also 2 slots, from what I see. But we can't guarantee that some platforms will not have bigger value. We can't use last odd bit on 64 bit platform in regmask - it is taking anyway already by "infinite stack flag". > > Yes, that is my intuition as well. Therefore, I'm left wondering if the [construction of `_inmask`](https://github.com/dlunde/jdk/blob/9ab6e561780aee0f2cc2f06cd40ec487d60fe39c/src/hotspot/share/opto/locknode.cpp#L51) in the `BoxLockNode` constructor is incorrect, as it always just sets a single bit in the mask (no matter the value of `sync_stack_slots()`). Should we perhaps change it to instead set the range [ reg, reg + sync_stack_slots() ) in `_inmask`? It is fine. Monitors use fixed stack slots which are not available to Register allocator: [compile.hpp#L315](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/compile.hpp#L315). Fixed stack slots reserved based on number monitors and etc. We don't do regmask operations on them. `BoxLockNode::reg(box)` is only accessed in format output in `JVMState::format()` and when we generate debug info in `PhaseOutput::Process_OopMap_Node()`. In general, in 64-bit VM long, double, oop/pointers values take 2 slots but their value is stored only in first slot (since slot's size is 64 bit). Don't ask why. It is convention in VM. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17370#discussion_r1454236214 From kvn at openjdk.org Wed Jan 17 00:15:52 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 17 Jan 2024 00:15:52 GMT Subject: RFR: JDK-8320448 Accelerate IndexOf using AVX2 [v7] In-Reply-To: References: Message-ID: On Tue, 16 Jan 2024 23:51:15 GMT, Scott Gibbons wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 4111: >> >>> 4109: if ((UseAVX == 2) && EnableX86ECoreOpts && VM_Version::supports_avx2()) { >>> 4110: StubRoutines::_string_indexof = generate_string_indexof(); >>> 4111: } >> >> What motivation for this extensive new code only for avx2? 30% is nice (for some cases) but it is enabled only for AVX2 and not for avx512 which all modern x86 CPUs have so the code will not be used. >> >> Or it is typo? > > This is acceleration for AVX2, replacing the pcmpestri instruction which is microcoded on E-cores and causes significant performance impact. I am working on a pared-down implementation and should update this PR in a couple of days. Thank you for explanation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1454238988 From yyang at openjdk.org Wed Jan 17 03:02:27 2024 From: yyang at openjdk.org (Yi Yang) Date: Wed, 17 Jan 2024 03:02:27 GMT Subject: RFR: 8323795: jcmd Compiler.codecache counts total sizes of used/free [v2] In-Reply-To: References: Message-ID: > CodeHeap 'non-profiled nmethods': size=118592Kb used=29Kb max_used=29Kb free=118562Kb > bounds [0x00007fbe84622000, 0x00007fbe84892000, 0x00007fbe8b9f2000] > CodeHeap 'profiled nmethods': size=118588Kb used=80Kb max_used=80Kb free=118507Kb > bounds [0x00007fbe7c9f2000, 0x00007fbe7cc62000, 0x00007fbe83dc1000] > CodeHeap 'non-nmethods': size=8580Kb used=1258Kb max_used=1834Kb free=7321Kb > bounds [0x00007fbe83dc1000, 0x00007fbe84031000, 0x00007fbe84622000] > total_blobs=474 nmethods=87 adapters=293 > compilation: enabled > stopped_count=0, restarted_count=0 > full_count=0 > > > It's better to accumulates total size of used/free/size, for example > > -SegmentedCodeCache > CodeCache: size=245760Kb used=1366Kb max_used=1943Kb free=244393Kb > bounds [0x00007fdcc89f2000, 0x00007fdcc8c62000, 0x00007fdcd79f2000] > total_blobs=474, nmethods=87, adapters=293 > stopped_count=0, restarted_count=0, full_count=0 > compilation=enabled > > > > +SegmentedCodeCache > CodeHeap 'non-profiled nmethods': size=118592Kb used=29Kb max_used=29Kb free=118562Kb > bounds [0x00007f89c8622000, 0x00007f89c8892000, 0x00007f89cf9f2000] > CodeHeap 'profiled nmethods': size=118588Kb used=80Kb max_used=80Kb free=118507Kb > bounds [0x00007f89c09f2000, 0x00007f89c0c62000, 0x00007f89c7dc1000] > CodeHeap 'non-nmethods': size=8580Kb used=1258Kb max_used=1834Kb free=7321Kb > bounds [0x00007f89c7dc1000, 0x00007f89c8031000, 0x00007f89c8622000] > CodeCache: size=245760Kb, used=1367Kb, max_used=1943Kb, free=244390Kb > total_blobs=474, nmethods=87, adapters=293 > stopped_count=0, restarted_count=0, full_count=0 > compilation=enabled Yi Yang has updated the pull request incrementally with one additional commit since the last revision: new output && fix test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17445/files - new: https://git.openjdk.org/jdk/pull/17445/files/772be154..a9939a85 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17445&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17445&range=00-01 Stats: 54 lines in 2 files changed: 27 ins; 8 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/17445.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17445/head:pull/17445 PR: https://git.openjdk.org/jdk/pull/17445 From thartmann at openjdk.org Wed Jan 17 06:47:50 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 17 Jan 2024 06:47:50 GMT Subject: RFR: JDK-8316756: C2 EA fails with "missing memory path" when encountering unsafe_arraycopy stub call [v3] In-Reply-To: References: <2_AtOPRPpm2wINLTviyCnA3E7DVj3bT5-ErEuQmc660=.ed1b019a-28b1-44a1-ae26-ec31ed5ce13c@github.com> <9FJSR28ZY492lEVQZZ4kfAWby4J8lOyeTAUCHxJwRKM=.2221d572-200e-4391-ad4b-d32c06ffc5d6@github.com> Message-ID: On Tue, 16 Jan 2024 18:07:00 GMT, Vladimir Kozlov wrote: >> Right, we could add this assert as well for expecting a narrow memory input in general. What are your thoughts about explicitly asserting for an unsafe arraycopy when visiting this call? > > No. There is another case (DTrace runtime call) with narrow memory: [parseHelper.cpp#L54](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/parseHelper.cpp#L54) Since we need to integrate this P3 until tomorrow (Thursday), I'd suggest to integrate as-is and add the idea of adding an assert to the follow-up RFE. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17347#discussion_r1454730476 From rrich at openjdk.org Wed Jan 17 06:53:00 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 17 Jan 2024 06:53:00 GMT Subject: Integrated: 8290965: PPC64: Implement post-call NOPs In-Reply-To: References: Message-ID: On Wed, 20 Dec 2023 19:56:28 GMT, Richard Reingruber wrote: > #### Implementation of post call nops (PCNs) on ppc64. > > Depends on https://github.com/openjdk/jdk/pull/17150 > > About post call nops: > > - instruction(s) at return addresses of compiled java calls > - emitted iff vm continuations are enabled to support virtual threads > - encode data that can be be used to find the corresponding CodeBlob and oop map faster > - mt-safe patchable to trigger deoptimization > > Background: > > - Frames in continuation StackChunks are not visited if their compiled method is made not entrant (in contrast to frames on stack). > Instead all PCNs of the compiled method are patched to trigger deoptimization when control returns to such frames. > - With vm continuations, stacks are walked and inspected more frequently. This requires lookup of metadata like frame size and oop maps. As an optimization the offset of the CodeBlob to the PCN and the oop map slot are encoded as data in the PCN. > > Post call nops on ppc64 > > - 1 instruction, i.e. 4 bytes (either CMPI or CMPLI[1]) > x86_64: 1 instruction, 8 bytes > aarch64: 3 instruction, 12 bytes > [1] 3.1.10 Fixed Point Compare Instructions in Power ISA 3.1B > https://openpowerfoundation.org/specifications/isa/ > > - 26 bits data payload > x86_64: 32 bits; aarch64: 32 bits > - 9 bits dedicated to oop map slot. With 8 bits there where cases with SPECjvm2008 where the slot could not be encoded (on ppc64 and x86_64). > x86_64: 8 bits; aarch64: 8 bits > - 17 bits dedicated to cb offset. Effectively 19 bits due to instruction alignment. > x86_64: 24 bits; aarch64: 24 bits > - Also used when reconstructing the back chain after thawing continuation frames (see `Thaw::patch_caller_links`) > > - Refactored frame constructors to make use of fast CodeBlob lookup based on PCNs. > The fast lookup may only be used if the pc is known to be in the code cache because `CodeCache::find_blob_fast` can yield wrong results if it finds instructions outside the code cache that look just like PCNs. Callers of the frame class constructors need to pass `frame::kind::native` in that case to avoid errors. Other platforms don't make this explicit which is a problem in my eyes. Picking the wrong constructor can cause errors when porting and in future development. > > - Currently only the PCNs in nmethods are initialized. Therefore we don't even try to make a fast lookup based on PCNs if we know the CodeBlob is, e.g., a RuntimeStub. To achieve this we call the frame constructor passing `frame::kind::code_blob`. > > #### Statistics > > > | SpecJVM2008... This pull request has now been integrated. Changeset: de97c0eb Author: Richard Reingruber URL: https://git.openjdk.org/jdk/commit/de97c0eb4bbeece0dfab3065c260c7f5434060a7 Stats: 132 lines in 13 files changed: 96 ins; 0 del; 36 mod 8290965: PPC64: Implement post-call NOPs Reviewed-by: mdoerr ------------- PR: https://git.openjdk.org/jdk/pull/17171 From thartmann at openjdk.org Wed Jan 17 06:58:51 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 17 Jan 2024 06:58:51 GMT Subject: RFR: 8323795: jcmd Compiler.codecache counts total sizes of used/free [v2] In-Reply-To: References: Message-ID: On Wed, 17 Jan 2024 03:02:27 GMT, Yi Yang wrote: >> CodeHeap 'non-profiled nmethods': size=118592Kb used=29Kb max_used=29Kb free=118562Kb >> bounds [0x00007fbe84622000, 0x00007fbe84892000, 0x00007fbe8b9f2000] >> CodeHeap 'profiled nmethods': size=118588Kb used=80Kb max_used=80Kb free=118507Kb >> bounds [0x00007fbe7c9f2000, 0x00007fbe7cc62000, 0x00007fbe83dc1000] >> CodeHeap 'non-nmethods': size=8580Kb used=1258Kb max_used=1834Kb free=7321Kb >> bounds [0x00007fbe83dc1000, 0x00007fbe84031000, 0x00007fbe84622000] >> total_blobs=474 nmethods=87 adapters=293 >> compilation: enabled >> stopped_count=0, restarted_count=0 >> full_count=0 >> >> >> It's better to accumulates total size of used/free/size, for example >> >> -SegmentedCodeCache >> CodeCache: size=245760Kb used=1366Kb max_used=1943Kb free=244393Kb >> bounds [0x00007fdcc89f2000, 0x00007fdcc8c62000, 0x00007fdcd79f2000] >> total_blobs=474, nmethods=87, adapters=293 >> stopped_count=0, restarted_count=0, full_count=0 >> compilation=enabled >> >> >> >> +SegmentedCodeCache >> CodeHeap 'non-profiled nmethods': size=118592Kb used=29Kb max_used=29Kb free=118562Kb >> bounds [0x00007f89c8622000, 0x00007f89c8892000, 0x00007f89cf9f2000] >> CodeHeap 'profiled nmethods': size=118588Kb used=80Kb max_used=80Kb free=118507Kb >> bounds [0x00007f89c09f2000, 0x00007f89c0c62000, 0x00007f89c7dc1000] >> CodeHeap 'non-nmethods': size=8580Kb used=1258Kb max_used=1834Kb free=7321Kb >> bounds [0x00007f89c7dc1000, 0x00007f89c8031000, 0x00007f89c8622000] >> CodeCache: size=245760Kb, used=1367Kb, max_used=1943Kb, free=244390Kb >> total_blobs=474, nmethods=87, adapters=293 >> stopped_count=0, restarted_count=0, full_count=0 >> compilation=enabled > > Yi Yang has updated the pull request incrementally with one additional commit since the last revision: > > new output && fix test The title is confusing. Should it be something like "jcmd Compiler.codecache should print total size of code cache"? src/hotspot/share/code/codeCache.cpp line 1802: > 1800: "enabled" : Arguments::mode() == Arguments::_int ? > 1801: "disabled (interpreter mode)" : > 1802: "disabled (not enough contiguous free space left)"); Why did you change the order of the `compilation=` and the `stopped_count=` output? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17445#issuecomment-1895153379 PR Review Comment: https://git.openjdk.org/jdk/pull/17445#discussion_r1454744516 From chagedorn at openjdk.org Wed Jan 17 07:14:51 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 17 Jan 2024 07:14:51 GMT Subject: RFR: JDK-8316756: C2 EA fails with "missing memory path" when encountering unsafe_arraycopy stub call [v3] In-Reply-To: References: <2_AtOPRPpm2wINLTviyCnA3E7DVj3bT5-ErEuQmc660=.ed1b019a-28b1-44a1-ae26-ec31ed5ce13c@github.com> <9FJSR28ZY492lEVQZZ4kfAWby4J8lOyeTAUCHxJwRKM=.2221d572-200e-4391-ad4b-d32c06ffc5d6@github.com> Message-ID: <5NE4rGgHDQFus2JJ8wQLo0WynBA2avmXnOB3Hq3Iu4s=.1a699e77-bb99-4117-8ba5-dbfde085e0db@github.com> On Wed, 17 Jan 2024 06:45:21 GMT, Tobias Hartmann wrote: >> No. There is another case (DTrace runtime call) with narrow memory: [parseHelper.cpp#L54](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/parseHelper.cpp#L54) > > Since we need to integrate this P3 until tomorrow (Thursday), I'd suggest to integrate as-is and add the idea of adding an assert to the follow-up RFE. > No. There is another case (DTrace runtime call) with narrow memory: [parseHelper.cpp#L54](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/parseHelper.cpp#L54) That's right. We would need to assert this case as well. But as Tobias suggested, let's move on without any additional assertions for now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17347#discussion_r1454776654 From thartmann at openjdk.org Wed Jan 17 07:22:52 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 17 Jan 2024 07:22:52 GMT Subject: RFR: 8322743: assert(held_monitor_count() == jni_monitor_count()) failed In-Reply-To: References: Message-ID: On Tue, 9 Jan 2024 22:57:27 GMT, Vladimir Kozlov wrote: > Corner case with a local (not escaped) object used for synchronization. C2 Escape Analysis thinks that it can eliminate locks for it. In most cases it is true but not in this case. > > > for (int i = 0; i < 2; ++i) { > Object o = new Object(); > synchronized (o) { // monitorenter > // Trigger OSR compilation > for (int j = 0; j < 100_000; ++j) { > > The test has nested loop which trigger OSR compilation. The locked object comes from Interpreter into compiled OSR code. During parsing C2 creates an other non escaped object and correctly merge both together (with Phi node) so that non escaped object is not scalar replaceable. Because it does not globally escapes EA still removes locks for it and, as result, also for merged locked object from Interpreter which is the bug. > > The fix is to check that synchronized block does not have any associated escaped objects when EA decides if locks can be eliminated. > > Added regression test prepared by @TobiHartmann. Tested tier1-5, xcomp and stress. > Performance testing show no difference. test/hotspot/jtreg/compiler/escapeAnalysis/TestLocksInOSR.java line 1: > 1: /* Copyright (c) 2024, Oracle and/or its affiliates. All rights reserved. Suggestion: /* * Copyright (c) 2024, Oracle and/or its affiliates. All rights reserved. test/hotspot/jtreg/compiler/escapeAnalysis/TestLocksInOSR.java line 26: > 24: * @test > 25: * @bug 8322743 > 26: * @summary EA incorrectly marks locks for elimiation for escaped object which comes from Interpreter in OSR compilation. Suggestion: * @summary EA incorrectly marks locks for elimination for escaped object which comes from Interpreter in OSR compilation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17331#discussion_r1454787771 PR Review Comment: https://git.openjdk.org/jdk/pull/17331#discussion_r1454785515 From thartmann at openjdk.org Wed Jan 17 07:26:49 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 17 Jan 2024 07:26:49 GMT Subject: RFR: 8322694: C1: Handle Constant and IfOp in NullCheckEliminator [v2] In-Reply-To: References: <1vvyuwLRjlWItKwCyighjCSM5SNbO4CSEE59hQtCU24=.b4783e52-328a-4ce3-8c92-7b736cea7546@github.com> Message-ID: On Tue, 16 Jan 2024 17:58:45 GMT, Emanuel Peter wrote: > Could there be a regression test for this enhancement? The IR framework only supports C2, so a regression test would need to manually check the `-XX:+PrintIR` output here. I don't think that's worth it for C1. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17191#issuecomment-1895225821 From rcastanedalo at openjdk.org Wed Jan 17 07:49:52 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 17 Jan 2024 07:49:52 GMT Subject: RFR: 8322692: ZGC: avoid over-unrolling due to hidden barrier size [v2] In-Reply-To: References: Message-ID: On Tue, 16 Jan 2024 10:16:19 GMT, Roberto Casta?eda Lozano wrote: > Looks good. Thanks, Vladimir! ------------- PR Comment: https://git.openjdk.org/jdk/pull/17367#issuecomment-1895266751 From chagedorn at openjdk.org Wed Jan 17 07:50:04 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 17 Jan 2024 07:50:04 GMT Subject: RFR: 8323154: C2: assert(cmp != nullptr && cmp->Opcode() == Op_Cmp(bt)) failed: no exit test Message-ID: <-3LO2nXaqGEyNCjjPjYnE9su-CUFR_FZZ85wp-aU6J0=.28a03271-f85c-4394-bbc2-000fe8d84a2a@github.com> This bug is very similar to [JDK-8314191](https://bugs.openjdk.org/browse/JDK-8314191) but with long counted loops instead of int counted loops and a with a different manifestation. The original problem was that [JDK-8276162](https://bugs.openjdk.org/browse/JDK-8276162) added transformations for `CmpI` nodes to use `CmpU` nodes instead. The transformations were also applied for `CmpI` nodes of counted loop exit checks which messed pattern matching up. [JDK-8314191](https://bugs.openjdk.org/browse/JDK-8314191) and the follow-up fix [JDK-8316719](https://bugs.openjdk.org/browse/JDK-8316719) fixed this but only for `CountedLoopNodeEndNodes`. The newly added `is_cloop_condition()` method only checks for `is_CountedLoopEnd()` (int counted loops) instead of `is_BaseCountedLoopEnd()` (also includes long counted loop). This patch fixes this. I've had a closer look at other uses of `is_CountedLoop*` and found that https://github.com/openjdk/jdk/blob/1007618f6f97fad0f66e4074b50521bdd853629e/src/hotspot/share/opto/subnode.cpp#L126-L139 and https://github.com/openjdk/jdk/blob/1007618f6f97fad0f66e4074b50521bdd853629e/src/hotspot/share/opto/subnode.cpp#L155-L163 should probably also use the `BaseCountedLoop*` versions. These methods are used in `ok_to_convert()` which is used in several places. They try to prevent transformations involving the iv and the increment node of a counted loop to save registers. However, these transformations are still applied before a loop is transformed to a counted loop. This raises the question whether these bailouts should be extended to also work before loop opts. Since this code has been around for such a long time, it would also be interesting to see, if it's still beneficial to block these optimizations in general. If so, it might be good if we could add some IR tests to prove that. There are more places where we try to prevent such transformations but again only if we already have a counted loop. For example: https://github.com/openjdk/jdk/blob/1007618f6f97fad0f66e4074b50521bdd853629e/src/hotspot/share/opto/addnode.cpp#L176-L191 or https://github.com/openjdk/jdk/blob/1007618f6f97fad0f66e4074b50521bdd853629e/src/hotspot/share/opto/addnode.cpp#L193-L209 It might be a good idea to revisit all of these bailouts in general and check if it's still beneficial to have them around and if they should be extended to also work before loop opts. I suggest to do this investigation together with fixing `CountedLoop*` -> `BaseCountedLoop*` in the methods used in `ok_to_convert()` in a separate RFE. Thanks, Christian ------------- Commit messages: - 8323154: C2: assert(cmp != nullptr && cmp->Opcode() == Op_Cmp(bt)) failed: no exit test Changes: https://git.openjdk.org/jdk/pull/17459/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17459&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8323154 Stats: 50 lines in 2 files changed: 49 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17459.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17459/head:pull/17459 PR: https://git.openjdk.org/jdk/pull/17459 From rcastanedalo at openjdk.org Wed Jan 17 07:53:11 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 17 Jan 2024 07:53:11 GMT Subject: Integrated: 8322692: ZGC: avoid over-unrolling due to hidden barrier size In-Reply-To: References: Message-ID: On Thu, 11 Jan 2024 08:47:41 GMT, Roberto Casta?eda Lozano wrote: > This changeset refines the C2 loop unrolling heuristic by including an estimation of the final size of (Generational) ZGC barriers in the loop size computation. These are not exposed in C2's intermediate representation and thus currently ignored by the heuristic, which can lead to over-unrolling. > > #### Testing > > - tier1-5, stress test, fuzzing (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64). > - tier6-9 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64, ZGC-specific tests only). > > #### Performance and code size evaluation > > - DaCapo, SPECjvm2008, SPECjbb2015 (linux-x64 with `-XX:+UseZGC -XX:+ZGenerational`). The changeset reduces slightly the size of the C2-generated code (around 0.3% fewer bytes per compiled bytecode for the DaCapo `fop` benchmark) and speeds up SPECjvm2008's `Serial` by around 4%. This pull request has now been integrated. Changeset: bf666bc0 Author: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk/commit/bf666bc0c7ead0c5520f21f8e8cfac15323f5b50 Stats: 112 lines in 7 files changed: 109 ins; 0 del; 3 mod 8322692: ZGC: avoid over-unrolling due to hidden barrier size Reviewed-by: eosterlund, kvn ------------- PR: https://git.openjdk.org/jdk/pull/17367 From kbarrett at openjdk.org Wed Jan 17 07:54:59 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 17 Jan 2024 07:54:59 GMT Subject: RFR: 8319822: Use a linear-time algorithm for assert_different_registers() [v6] In-Reply-To: References: Message-ID: On Mon, 27 Nov 2023 12:57:31 GMT, Andrew Haley wrote: >> At the present time, `assert_different_registers()` uses an O(N**2) algorithm in assert_different_registers(). We can utilize RegSet to do it in O(N) time. This would be a useful optimization for all builds with assertions enabled. >> >> In addition, it would be useful to be able to static_assert different registers. >> >> Also, I've taken the opportunity to expand the maximum size of a RegSet to 64 on 64-bit platforms. >> >> I also fixed a bug: sometimes `noreg` is passed to `assert_different_registers()`, but it may only be passed once or a spurious assertion is triggered. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Review feedback. Changes requested by kbarrett (Reviewer). src/hotspot/cpu/aarch64/register_aarch64.hpp line 73: > 71: > 72: constexpr bool operator==(const Register r) const { return _encoding == r._encoding; } > 73: constexpr bool operator!=(const Register r) const { return _encoding != r._encoding; } This seems unrelated to the rest of this change. It also seems like something that should be done for all of the register_ variants. src/hotspot/cpu/x86/register_x86.hpp line 395: > 393: inline Register AbstractRegSet::first() { > 394: size_t first = _bitset & -_bitset; > 395: return first ? as_Register(exact_log2(first)) : noreg; pre-existing: violation of the "Avoid implicit conversions to bool" rule from the style guide. Similarly for the XMMRegister case. src/hotspot/share/asm/register.hpp line 163: > 161: } > 162: > 163: constexpr uint size() const { return population_count(_bitset); } population_count is currently not constexpr. I'm surprised this doesn't lead to warnings for guaranteed non-constexpr body of a constexpr function. I'm pretty sure I've seen such warnings from some compiler. Note that I think there's no reason for it to not be constexpr, other than preceding our use of C++11/14 and not yet updated accordingly. https://bugs.openjdk.org/browse/JDK-8323952 src/hotspot/share/asm/register.hpp line 257: > 255: > 256: template > 257: inline constexpr bool different_registers(AbstractRegSet allocated_regs, R first_register) { "inline" is redundant with "constexpr". src/hotspot/share/asm/register.hpp line 257: > 255: > 256: template > 257: inline constexpr bool different_registers(AbstractRegSet allocated_regs, R first_register) { different_registers is only used by debug-only code in assert_different_registers. Shouldn't all the overloads for different_registers be within an `#ifdef ASSERT` block? src/hotspot/share/asm/register.hpp line 273: > 271: } > 272: > 273: template Rx is unused and not needed. Similarly for 3-R overload. src/hotspot/share/asm/register.hpp line 281: > 279: inline constexpr bool different_registers(R reg1, R reg2, R reg3) { > 280: return reg1 != reg2 && reg2 != reg3 && reg1 != reg3; > 281: } The 2-R and 3-R overloads are just an optimization (probably) of the variadic-R overload for small numbers of arguments. Given this is debug-only code, is it really worth the additional source code? ------------- PR Review: https://git.openjdk.org/jdk/pull/16617#pullrequestreview-1826587710 PR Review Comment: https://git.openjdk.org/jdk/pull/16617#discussion_r1454772535 PR Review Comment: https://git.openjdk.org/jdk/pull/16617#discussion_r1454736036 PR Review Comment: https://git.openjdk.org/jdk/pull/16617#discussion_r1454856106 PR Review Comment: https://git.openjdk.org/jdk/pull/16617#discussion_r1454760813 PR Review Comment: https://git.openjdk.org/jdk/pull/16617#discussion_r1454812257 PR Review Comment: https://git.openjdk.org/jdk/pull/16617#discussion_r1454796895 PR Review Comment: https://git.openjdk.org/jdk/pull/16617#discussion_r1454808568 From epeter at openjdk.org Wed Jan 17 07:59:49 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 17 Jan 2024 07:59:49 GMT Subject: [jdk22] RFR: 8320175: [BACKOUT] 8316533: C2 compilation fails with assert(verify(phase)) failed: missing Value() optimization In-Reply-To: References: Message-ID: On Tue, 16 Jan 2024 10:26:13 GMT, Tobias Hartmann wrote: >> Hi all, >> >> This pull request contains a backport of commit [e01f6da1](https://github.com/openjdk/jdk/commit/e01f6da1b8e7de19f90c7cb21b3cd1ff2ab29cb7) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. >> >> The commit being backported was authored by Emanuel Peter on 16 Jan 2024 and was reviewed by Tobias Hartmann and Christian Hagedorn. >> >> Thanks! > > Looks good. Thanks @TobiHartmann for the review! ------------- PR Comment: https://git.openjdk.org/jdk22/pull/78#issuecomment-1895279372 From epeter at openjdk.org Wed Jan 17 08:02:52 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 17 Jan 2024 08:02:52 GMT Subject: [jdk22] Integrated: 8320175: [BACKOUT] 8316533: C2 compilation fails with assert(verify(phase)) failed: missing Value() optimization In-Reply-To: References: Message-ID: On Tue, 16 Jan 2024 10:20:13 GMT, Emanuel Peter wrote: > Hi all, > > This pull request contains a backport of commit [e01f6da1](https://github.com/openjdk/jdk/commit/e01f6da1b8e7de19f90c7cb21b3cd1ff2ab29cb7) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Emanuel Peter on 16 Jan 2024 and was reviewed by Tobias Hartmann and Christian Hagedorn. > > Thanks! This pull request has now been integrated. Changeset: eb2c4b0b Author: Emanuel Peter URL: https://git.openjdk.org/jdk22/commit/eb2c4b0b838930f1a2bf4d040bd13da5adde6ec3 Stats: 76 lines in 2 files changed: 0 ins; 76 del; 0 mod 8320175: [BACKOUT] 8316533: C2 compilation fails with assert(verify(phase)) failed: missing Value() optimization Reviewed-by: thartmann Backport-of: e01f6da1b8e7de19f90c7cb21b3cd1ff2ab29cb7 ------------- PR: https://git.openjdk.org/jdk22/pull/78 From fyang at openjdk.org Wed Jan 17 08:06:56 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 17 Jan 2024 08:06:56 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version In-Reply-To: References: Message-ID: On Sat, 13 Jan 2024 09:21:37 GMT, Yuri Gaevsky wrote: > The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. > > Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. Some initial comments from a brief look. src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1603: > 1601: la(pows31, ExternalAddress(adr_pows31)); > 1602: mv(t1, num_8b_elems_in_vec); > 1603: vsetvli(t0, t1, Assembler::e32, Assembler::m4); I wonder if the scalar code for handling `WIDE_TAIL` could be eliminated with RVV's design for stripmining approach [1]? Looks like the current code doesn't make use of this approach as new vl returned by `vsetvli` is not checked and used. [1] https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#sec-vector-config One of the common approaches to handling a large number of elements is "stripmining" where each iteration of a loop handles some number of elements, and the iterations continue until all elements have been processed. The RISC-V vector specification provides direct, portable support for this approach. The application specifies the total number of elements to be processed (the application vector length or AVL) as a candidate value for vl, and the hardware responds via a general-purpose register with the (frequently smaller) number of elements that the hardware will handle per iteration (stored in vl), based on the microarchitectural implementation and the vtype setting. A straightforward loop structure, shown in [Example of stripmining and changes to SEW] (https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#example-stripmine-sew), depicts the ease with which the code keeps track of the remaining number of elements and the amount per iteration handled by hardware. src/hotspot/cpu/riscv/riscv_v.ad line 2681: > 2679: iRegLNoSp tmp4, iRegLNoSp tmp5, iRegLNoSp tmp6, rFlagsReg cr) > 2680: %{ > 2681: predicate(UseRVV && (MaxVectorSize >= 16)); Similar here: `MaxVectorSize >= 16` condition is already checked and ensured on JVM startup. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5266: > 5264: } > 5265: > 5266: if (UseVectorizedHashCodeIntrinsic && UseRVV && (MaxVectorSize >= 16)) { I think `MaxVectorSize >= 16` condition is already checked and ensured on JVM startup when RVV extension is available. ------------- PR Review: https://git.openjdk.org/jdk/pull/17413#pullrequestreview-1826634240 PR Review Comment: https://git.openjdk.org/jdk/pull/17413#discussion_r1454866513 PR Review Comment: https://git.openjdk.org/jdk/pull/17413#discussion_r1454805091 PR Review Comment: https://git.openjdk.org/jdk/pull/17413#discussion_r1454799065 From tholenstein at openjdk.org Wed Jan 17 08:10:06 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Wed, 17 Jan 2024 08:10:06 GMT Subject: Integrated: JDK-8316756: C2 EA fails with "missing memory path" when encountering unsafe_arraycopy stub call In-Reply-To: References: Message-ID: On Wed, 10 Jan 2024 13:35:17 GMT, Tobias Holenstein wrote: > Before https://github.com/openjdk/jdk/pull/5259 the graph of the following program looked like this during Escape Analysis: > > > static int test() { > MyClass obj = new MyClass(); // Non-escaping to trigger Escape Analysis > UNSAFE.copyMemory(null, SRC_BASE, null, DST_BASE, 4); > obj.x = 42; > return obj.x; > } > > With MemBarCPUOrder: > working > > Setting `RC_NARROW_MEM` flag in `LibraryCallKit::inline_unsafe_copyMemory()` removes the `428 MergeMem` node. Without a `MergeMem` node after `432 StoreB` EA failes in Phase 2 of `ConnectionGraph::split_unique_types(...)` when trying to push the allocation's users on the appropriate worklist - `429 CallLeafNoFP` is not an expected user of `428 StoreB`. Therefore the assert `"EA: missing memory path"` is hit. > Without MemBarCPUOrdera and after setting `RC_NARROW_MEM`: > failing > > > ### Proposed Fix > Dropping the `RC_NARROW_MEM` flag in `LibraryCallKit::inline_unsafe_copyMemory()` causes the introduction of a `MergeMem` between `StoreB` and `CallLeafNoFP`, so the corresponding code in EA doesn't encounter a `CallLeafNoFP` anymore: > fixed > > Testing: Tier1-4 passed This pull request has now been integrated. Changeset: b8917214 Author: Tobias Holenstein URL: https://git.openjdk.org/jdk/commit/b89172149d6a900d11630a95be7278870421b435 Stats: 86 lines in 2 files changed: 85 ins; 0 del; 1 mod 8316756: C2 EA fails with "missing memory path" when encountering unsafe_arraycopy stub call Co-authored-by: Vladimir Kozlov Reviewed-by: kvn, thartmann, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/17347 From tholenstein at openjdk.org Wed Jan 17 08:10:04 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Wed, 17 Jan 2024 08:10:04 GMT Subject: RFR: JDK-8316756: C2 EA fails with "missing memory path" when encountering unsafe_arraycopy stub call [v3] In-Reply-To: <9TBHM9ao0Tyjc972h-X9OhyvO3Aj6CxYYukDmdSg4ps=.ee848b99-533d-42d3-a82c-74ef323b44bb@github.com> References: <9TBHM9ao0Tyjc972h-X9OhyvO3Aj6CxYYukDmdSg4ps=.ee848b99-533d-42d3-a82c-74ef323b44bb@github.com> Message-ID: On Tue, 16 Jan 2024 12:32:46 GMT, Christian Hagedorn wrote: >> Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: >> >> added testcase > > The fix looks good to me, too. Thanks @chhagedorn , @vnkozlov and @TobiHartmann for the reviews! And thanks @iwanowww for the discussion! ------------- PR Comment: https://git.openjdk.org/jdk/pull/17347#issuecomment-1895290081 From dlunden at openjdk.org Wed Jan 17 08:38:49 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 17 Jan 2024 08:38:49 GMT Subject: RFR: 8322996: BoxLockNode creation fails with assert(reg < CHUNK_SIZE) failed: sanity [v3] In-Reply-To: References: <66zey4Kt56pbcieRZ0AjBLemzwsAQcHCdDG1nlK-P1c=.14dc02d6-bef8-4239-9f39-02456756b7ea@github.com> Message-ID: On Wed, 17 Jan 2024 00:10:10 GMT, Vladimir Kozlov wrote: >> Yes, that is my intuition as well. Therefore, I'm left wondering if the [construction of `_inmask`](https://github.com/dlunde/jdk/blob/9ab6e561780aee0f2cc2f06cd40ec487d60fe39c/src/hotspot/share/opto/locknode.cpp#L51) in the `BoxLockNode` constructor is incorrect, as it always just sets a single bit in the mask (no matter the value of `sync_stack_slots()`). Should we perhaps change it to instead set the range [ reg, reg + sync_stack_slots() ) in `_inmask`? > > It is fine. Monitors use fixed stack slots which are not available to Register allocator: [compile.hpp#L315](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/compile.hpp#L315). Fixed stack slots reserved based on number monitors and etc. We don't do regmask operations on them. > > `BoxLockNode::reg(box)` is only accessed in format output in `JVMState::format()` and when we generate debug info in `PhaseOutput::Process_OopMap_Node()`. > > In general, in 64-bit VM long, double, oop/pointers values take 2 slots but their value is stored only in first slot (since slot's size is 64 bit). Don't ask why. It is convention in VM. Thanks for the clarification. I'll go ahead with the proposed solution with `can_represent_sync` then (but I'll move it to `regmask.hpp`). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17370#discussion_r1454948748 From thartmann at openjdk.org Wed Jan 17 08:40:37 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 17 Jan 2024 08:40:37 GMT Subject: [jdk22] RFR: 8316756: C2 EA fails with "missing memory path" when encountering unsafe_arraycopy stub call Message-ID: Hi all, This pull request contains a backport of commit [b8917214](https://github.com/openjdk/jdk/commit/b89172149d6a900d11630a95be7278870421b435) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. The commit being backported was authored by Tobias Holenstein on 17 Jan 2024 and was reviewed by Vladimir Kozlov, Tobias Hartmann and Christian Hagedorn. Thanks! ------------- Commit messages: - Backport b89172149d6a900d11630a95be7278870421b435 Changes: https://git.openjdk.org/jdk22/pull/86/files Webrev: https://webrevs.openjdk.org/?repo=jdk22&pr=86&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8316756 Stats: 86 lines in 2 files changed: 85 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk22/pull/86.diff Fetch: git fetch https://git.openjdk.org/jdk22.git pull/86/head:pull/86 PR: https://git.openjdk.org/jdk22/pull/86 From roland at openjdk.org Wed Jan 17 08:42:49 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 17 Jan 2024 08:42:49 GMT Subject: RFR: 8323154: C2: assert(cmp != nullptr && cmp->Opcode() == Op_Cmp(bt)) failed: no exit test In-Reply-To: <-3LO2nXaqGEyNCjjPjYnE9su-CUFR_FZZ85wp-aU6J0=.28a03271-f85c-4394-bbc2-000fe8d84a2a@github.com> References: <-3LO2nXaqGEyNCjjPjYnE9su-CUFR_FZZ85wp-aU6J0=.28a03271-f85c-4394-bbc2-000fe8d84a2a@github.com> Message-ID: On Wed, 17 Jan 2024 07:44:28 GMT, Christian Hagedorn wrote: > This bug is very similar to [JDK-8314191](https://bugs.openjdk.org/browse/JDK-8314191) but with long counted loops instead of int counted loops and a with a different manifestation. > > The original problem was that [JDK-8276162](https://bugs.openjdk.org/browse/JDK-8276162) added transformations for `CmpI` nodes to use `CmpU` nodes instead. The transformations were also applied for `CmpI` nodes of counted loop exit checks which messed pattern matching up. [JDK-8314191](https://bugs.openjdk.org/browse/JDK-8314191) and the follow-up fix [JDK-8316719](https://bugs.openjdk.org/browse/JDK-8316719) fixed this but only for `CountedLoopNodeEndNodes`. The newly added `is_cloop_condition()` method only checks for `is_CountedLoopEnd()` (int counted loops) instead of `is_BaseCountedLoopEnd()` (also includes long counted loop). This patch fixes this. > > I've had a closer look at other uses of `is_CountedLoop*` and found that > https://github.com/openjdk/jdk/blob/1007618f6f97fad0f66e4074b50521bdd853629e/src/hotspot/share/opto/subnode.cpp#L126-L139 > and > https://github.com/openjdk/jdk/blob/1007618f6f97fad0f66e4074b50521bdd853629e/src/hotspot/share/opto/subnode.cpp#L155-L163 > should probably also use the `BaseCountedLoop*` versions. These methods are used in `ok_to_convert()` which is used in several places. They try to prevent transformations involving the iv and the increment node of a counted loop to save registers. However, these transformations are still applied before a loop is transformed to a counted loop. This raises the question whether these bailouts should be extended to also work before loop opts. Since this code has been around for such a long time, it would also be interesting to see, if it's still beneficial to block these optimizations in general. If so, it might be good if we could add some IR tests to prove that. > > There are more places where we try to prevent such transformations but again only if we already have a counted loop. For example: > https://github.com/openjdk/jdk/blob/1007618f6f97fad0f66e4074b50521bdd853629e/src/hotspot/share/opto/addnode.cpp#L176-L191 > or > https://github.com/openjdk/jdk/blob/1007618f6f97fad0f66e4074b50521bdd853629e/src/hotspot/share/opto/addnode.cpp#L193-L209 > > It might be a good idea to revisit all of these bailouts in general and check if it's still beneficial to have them around and if they should be extended to also work before loop opts. > > I suggest to do this investigation together with fixing `CountedLoop*` -> `BaseCountedLoop*` in... Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17459#pullrequestreview-1826762325 From chagedorn at openjdk.org Wed Jan 17 08:45:50 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 17 Jan 2024 08:45:50 GMT Subject: RFR: 8323154: C2: assert(cmp != nullptr && cmp->Opcode() == Op_Cmp(bt)) failed: no exit test In-Reply-To: <-3LO2nXaqGEyNCjjPjYnE9su-CUFR_FZZ85wp-aU6J0=.28a03271-f85c-4394-bbc2-000fe8d84a2a@github.com> References: <-3LO2nXaqGEyNCjjPjYnE9su-CUFR_FZZ85wp-aU6J0=.28a03271-f85c-4394-bbc2-000fe8d84a2a@github.com> Message-ID: On Wed, 17 Jan 2024 07:44:28 GMT, Christian Hagedorn wrote: > This bug is very similar to [JDK-8314191](https://bugs.openjdk.org/browse/JDK-8314191) but with long counted loops instead of int counted loops and a with a different manifestation. > > The original problem was that [JDK-8276162](https://bugs.openjdk.org/browse/JDK-8276162) added transformations for `CmpI` nodes to use `CmpU` nodes instead. The transformations were also applied for `CmpI` nodes of counted loop exit checks which messed pattern matching up. [JDK-8314191](https://bugs.openjdk.org/browse/JDK-8314191) and the follow-up fix [JDK-8316719](https://bugs.openjdk.org/browse/JDK-8316719) fixed this but only for `CountedLoopNodeEndNodes`. The newly added `is_cloop_condition()` method only checks for `is_CountedLoopEnd()` (int counted loops) instead of `is_BaseCountedLoopEnd()` (also includes long counted loop). This patch fixes this. > > I've had a closer look at other uses of `is_CountedLoop*` and found that > https://github.com/openjdk/jdk/blob/1007618f6f97fad0f66e4074b50521bdd853629e/src/hotspot/share/opto/subnode.cpp#L126-L139 > and > https://github.com/openjdk/jdk/blob/1007618f6f97fad0f66e4074b50521bdd853629e/src/hotspot/share/opto/subnode.cpp#L155-L163 > should probably also use the `BaseCountedLoop*` versions. These methods are used in `ok_to_convert()` which is used in several places. They try to prevent transformations involving the iv and the increment node of a counted loop to save registers. However, these transformations are still applied before a loop is transformed to a counted loop. This raises the question whether these bailouts should be extended to also work before loop opts. Since this code has been around for such a long time, it would also be interesting to see, if it's still beneficial to block these optimizations in general. If so, it might be good if we could add some IR tests to prove that. > > There are more places where we try to prevent such transformations but again only if we already have a counted loop. For example: > https://github.com/openjdk/jdk/blob/1007618f6f97fad0f66e4074b50521bdd853629e/src/hotspot/share/opto/addnode.cpp#L176-L191 > or > https://github.com/openjdk/jdk/blob/1007618f6f97fad0f66e4074b50521bdd853629e/src/hotspot/share/opto/addnode.cpp#L193-L209 > > It might be a good idea to revisit all of these bailouts in general and check if it's still beneficial to have them around and if they should be extended to also work before loop opts. > > I suggest to do this investigation together with fixing `CountedLoop*` -> `BaseCountedLoop*` in... Thanks Roland for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/17459#issuecomment-1895342033 From rehn at openjdk.org Wed Jan 17 08:45:55 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Wed, 17 Jan 2024 08:45:55 GMT Subject: RFR: 8323694: RISC-V: Unnecessary ResourceMark in NativeCall::set_destination_mt_safe In-Reply-To: References: Message-ID: On Tue, 16 Jan 2024 02:21:50 GMT, Gui Cao wrote: > Hi, We noticed that RISC-V bears a similar issue as: https://bugs.openjdk.org/browse/JDK-8323584. > In `NativeCall::set_destination_mt_safe`, there is a `ResourceMark` that does not seem to have any purpose: no code in its scope uses resource allocation. > > ### Testing: > > - [x] Run tier1 tests on qemu 8.1.0 with UseRVV (fastdebug) > - [x] Run tier1-3 tests on qemu 8.1.0 with UseRVV (release) Thanks! ------------- Marked as reviewed by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17436#pullrequestreview-1826766862 From thartmann at openjdk.org Wed Jan 17 08:50:50 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 17 Jan 2024 08:50:50 GMT Subject: RFR: 8323154: C2: assert(cmp != nullptr && cmp->Opcode() == Op_Cmp(bt)) failed: no exit test In-Reply-To: <-3LO2nXaqGEyNCjjPjYnE9su-CUFR_FZZ85wp-aU6J0=.28a03271-f85c-4394-bbc2-000fe8d84a2a@github.com> References: <-3LO2nXaqGEyNCjjPjYnE9su-CUFR_FZZ85wp-aU6J0=.28a03271-f85c-4394-bbc2-000fe8d84a2a@github.com> Message-ID: On Wed, 17 Jan 2024 07:44:28 GMT, Christian Hagedorn wrote: > This bug is very similar to [JDK-8314191](https://bugs.openjdk.org/browse/JDK-8314191) but with long counted loops instead of int counted loops and a with a different manifestation. > > The original problem was that [JDK-8276162](https://bugs.openjdk.org/browse/JDK-8276162) added transformations for `CmpI` nodes to use `CmpU` nodes instead. The transformations were also applied for `CmpI` nodes of counted loop exit checks which messed pattern matching up. [JDK-8314191](https://bugs.openjdk.org/browse/JDK-8314191) and the follow-up fix [JDK-8316719](https://bugs.openjdk.org/browse/JDK-8316719) fixed this but only for `CountedLoopNodeEndNodes`. The newly added `is_cloop_condition()` method only checks for `is_CountedLoopEnd()` (int counted loops) instead of `is_BaseCountedLoopEnd()` (also includes long counted loop). This patch fixes this. > > I've had a closer look at other uses of `is_CountedLoop*` and found that > https://github.com/openjdk/jdk/blob/1007618f6f97fad0f66e4074b50521bdd853629e/src/hotspot/share/opto/subnode.cpp#L126-L139 > and > https://github.com/openjdk/jdk/blob/1007618f6f97fad0f66e4074b50521bdd853629e/src/hotspot/share/opto/subnode.cpp#L155-L163 > should probably also use the `BaseCountedLoop*` versions. These methods are used in `ok_to_convert()` which is used in several places. They try to prevent transformations involving the iv and the increment node of a counted loop to save registers. However, these transformations are still applied before a loop is transformed to a counted loop. This raises the question whether these bailouts should be extended to also work before loop opts. Since this code has been around for such a long time, it would also be interesting to see, if it's still beneficial to block these optimizations in general. If so, it might be good if we could add some IR tests to prove that. > > There are more places where we try to prevent such transformations but again only if we already have a counted loop. For example: > https://github.com/openjdk/jdk/blob/1007618f6f97fad0f66e4074b50521bdd853629e/src/hotspot/share/opto/addnode.cpp#L176-L191 > or > https://github.com/openjdk/jdk/blob/1007618f6f97fad0f66e4074b50521bdd853629e/src/hotspot/share/opto/addnode.cpp#L193-L209 > > It might be a good idea to revisit all of these bailouts in general and check if it's still beneficial to have them around and if they should be extended to also work before loop opts. > > I suggest to do this investigation together with fixing `CountedLoop*` -> `BaseCountedLoop*` in... Looks good to me too. > I suggest to do this investigation together with fixing CountedLoop* -> BaseCountedLoop* in the methods used in ok_to_convert() in a separate RFE. Makes sense. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17459#pullrequestreview-1826774130 From qamai at openjdk.org Wed Jan 17 08:50:51 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 17 Jan 2024 08:50:51 GMT Subject: RFR: 8323154: C2: assert(cmp != nullptr && cmp->Opcode() == Op_Cmp(bt)) failed: no exit test In-Reply-To: <-3LO2nXaqGEyNCjjPjYnE9su-CUFR_FZZ85wp-aU6J0=.28a03271-f85c-4394-bbc2-000fe8d84a2a@github.com> References: <-3LO2nXaqGEyNCjjPjYnE9su-CUFR_FZZ85wp-aU6J0=.28a03271-f85c-4394-bbc2-000fe8d84a2a@github.com> Message-ID: On Wed, 17 Jan 2024 07:44:28 GMT, Christian Hagedorn wrote: > This bug is very similar to [JDK-8314191](https://bugs.openjdk.org/browse/JDK-8314191) but with long counted loops instead of int counted loops and a with a different manifestation. > > The original problem was that [JDK-8276162](https://bugs.openjdk.org/browse/JDK-8276162) added transformations for `CmpI` nodes to use `CmpU` nodes instead. The transformations were also applied for `CmpI` nodes of counted loop exit checks which messed pattern matching up. [JDK-8314191](https://bugs.openjdk.org/browse/JDK-8314191) and the follow-up fix [JDK-8316719](https://bugs.openjdk.org/browse/JDK-8316719) fixed this but only for `CountedLoopNodeEndNodes`. The newly added `is_cloop_condition()` method only checks for `is_CountedLoopEnd()` (int counted loops) instead of `is_BaseCountedLoopEnd()` (also includes long counted loop). This patch fixes this. > > I've had a closer look at other uses of `is_CountedLoop*` and found that > https://github.com/openjdk/jdk/blob/1007618f6f97fad0f66e4074b50521bdd853629e/src/hotspot/share/opto/subnode.cpp#L126-L139 > and > https://github.com/openjdk/jdk/blob/1007618f6f97fad0f66e4074b50521bdd853629e/src/hotspot/share/opto/subnode.cpp#L155-L163 > should probably also use the `BaseCountedLoop*` versions. These methods are used in `ok_to_convert()` which is used in several places. They try to prevent transformations involving the iv and the increment node of a counted loop to save registers. However, these transformations are still applied before a loop is transformed to a counted loop. This raises the question whether these bailouts should be extended to also work before loop opts. Since this code has been around for such a long time, it would also be interesting to see, if it's still beneficial to block these optimizations in general. If so, it might be good if we could add some IR tests to prove that. > > There are more places where we try to prevent such transformations but again only if we already have a counted loop. For example: > https://github.com/openjdk/jdk/blob/1007618f6f97fad0f66e4074b50521bdd853629e/src/hotspot/share/opto/addnode.cpp#L176-L191 > or > https://github.com/openjdk/jdk/blob/1007618f6f97fad0f66e4074b50521bdd853629e/src/hotspot/share/opto/addnode.cpp#L193-L209 > > It might be a good idea to revisit all of these bailouts in general and check if it's still beneficial to have them around and if they should be extended to also work before loop opts. > > I suggest to do this investigation together with fixing `CountedLoop*` -> `BaseCountedLoop*` in... Thanks a lot for fixing this. ------------- Marked as reviewed by qamai (Committer). PR Review: https://git.openjdk.org/jdk/pull/17459#pullrequestreview-1826775195 From chagedorn at openjdk.org Wed Jan 17 08:57:49 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 17 Jan 2024 08:57:49 GMT Subject: RFR: 8323154: C2: assert(cmp != nullptr && cmp->Opcode() == Op_Cmp(bt)) failed: no exit test In-Reply-To: <-3LO2nXaqGEyNCjjPjYnE9su-CUFR_FZZ85wp-aU6J0=.28a03271-f85c-4394-bbc2-000fe8d84a2a@github.com> References: <-3LO2nXaqGEyNCjjPjYnE9su-CUFR_FZZ85wp-aU6J0=.28a03271-f85c-4394-bbc2-000fe8d84a2a@github.com> Message-ID: On Wed, 17 Jan 2024 07:44:28 GMT, Christian Hagedorn wrote: > This bug is very similar to [JDK-8314191](https://bugs.openjdk.org/browse/JDK-8314191) but with long counted loops instead of int counted loops and a with a different manifestation. > > The original problem was that [JDK-8276162](https://bugs.openjdk.org/browse/JDK-8276162) added transformations for `CmpI` nodes to use `CmpU` nodes instead. The transformations were also applied for `CmpI` nodes of counted loop exit checks which messed pattern matching up. [JDK-8314191](https://bugs.openjdk.org/browse/JDK-8314191) and the follow-up fix [JDK-8316719](https://bugs.openjdk.org/browse/JDK-8316719) fixed this but only for `CountedLoopNodeEndNodes`. The newly added `is_cloop_condition()` method only checks for `is_CountedLoopEnd()` (int counted loops) instead of `is_BaseCountedLoopEnd()` (also includes long counted loop). This patch fixes this. > > I've had a closer look at other uses of `is_CountedLoop*` and found that > https://github.com/openjdk/jdk/blob/1007618f6f97fad0f66e4074b50521bdd853629e/src/hotspot/share/opto/subnode.cpp#L126-L139 > and > https://github.com/openjdk/jdk/blob/1007618f6f97fad0f66e4074b50521bdd853629e/src/hotspot/share/opto/subnode.cpp#L155-L163 > should probably also use the `BaseCountedLoop*` versions. These methods are used in `ok_to_convert()` which is used in several places. They try to prevent transformations involving the iv and the increment node of a counted loop to save registers. However, these transformations are still applied before a loop is transformed to a counted loop. This raises the question whether these bailouts should be extended to also work before loop opts. Since this code has been around for such a long time, it would also be interesting to see, if it's still beneficial to block these optimizations in general. If so, it might be good if we could add some IR tests to prove that. > > There are more places where we try to prevent such transformations but again only if we already have a counted loop. For example: > https://github.com/openjdk/jdk/blob/1007618f6f97fad0f66e4074b50521bdd853629e/src/hotspot/share/opto/addnode.cpp#L176-L191 > or > https://github.com/openjdk/jdk/blob/1007618f6f97fad0f66e4074b50521bdd853629e/src/hotspot/share/opto/addnode.cpp#L193-L209 > > It might be a good idea to revisit all of these bailouts in general and check if it's still beneficial to have them around and if they should be extended to also work before loop opts. > > I suggest to do this investigation together with fixing `CountedLoop*` -> `BaseCountedLoop*` in... Thanks Tobias and Quan for your reviews! I've filed [JDK-8323968](https://bugs.openjdk.org/browse/JDK-8323968) to follow up on the mentioned bailouts. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17459#issuecomment-1895361874 From tholenstein at openjdk.org Wed Jan 17 09:03:53 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Wed, 17 Jan 2024 09:03:53 GMT Subject: [jdk22] RFR: 8316756: C2 EA fails with "missing memory path" when encountering unsafe_arraycopy stub call In-Reply-To: References: Message-ID: On Wed, 17 Jan 2024 08:33:53 GMT, Tobias Hartmann wrote: > Hi all, > > This pull request contains a backport of commit [b8917214](https://github.com/openjdk/jdk/commit/b89172149d6a900d11630a95be7278870421b435) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Tobias Holenstein on 17 Jan 2024 and was reviewed by Vladimir Kozlov, Tobias Hartmann and Christian Hagedorn. > > Thanks! looks good ------------- Marked as reviewed by tholenstein (Reviewer). PR Review: https://git.openjdk.org/jdk22/pull/86#pullrequestreview-1826800135 From thartmann at openjdk.org Wed Jan 17 09:10:50 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 17 Jan 2024 09:10:50 GMT Subject: [jdk22] RFR: 8316756: C2 EA fails with "missing memory path" when encountering unsafe_arraycopy stub call In-Reply-To: References: Message-ID: <9MiOQdj224yQ8Dofa9wtkBNOB1ru48T8g-bE5q3jlU0=.bf407c61-7615-4e51-9dbc-bea0a1125773@github.com> On Wed, 17 Jan 2024 08:33:53 GMT, Tobias Hartmann wrote: > Hi all, > > This pull request contains a backport of commit [b8917214](https://github.com/openjdk/jdk/commit/b89172149d6a900d11630a95be7278870421b435) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Tobias Holenstein on 17 Jan 2024 and was reviewed by Vladimir Kozlov, Tobias Hartmann and Christian Hagedorn. > > Thanks! Thanks for the review, Toby. ------------- PR Comment: https://git.openjdk.org/jdk22/pull/86#issuecomment-1895383592 From dlunden at openjdk.org Wed Jan 17 10:03:06 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 17 Jan 2024 10:03:06 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v2] In-Reply-To: References: Message-ID: > This changeset translates the tests in `compiler/c2/cr7200264/` to use the IR verification framework. > > The proposed translation, to the extent possible, attempts to preserve the semantics of the original test. A major difference is that the IR checks are now local (for every `test_*` method) instead of global. The execution time of the new test is comparable to the old test. > > Testing: > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/7553846710) > - Ran the new translated tests within all tier1 through tier10 contexts on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. > - Tested that manually adding `-XX:LoopUnrollLimit=0` to the test framework flags caused the translated tests to fail. Note: it is, however, no longer possible to break the test by passing `-XX:LoopUnrollLimit=0` on the command line. Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: Refactor test to use multiple @Test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17428/files - new: https://git.openjdk.org/jdk/pull/17428/files/61edc32e..d1b4aa5e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17428&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17428&range=00-01 Stats: 641 lines in 3 files changed: 181 ins; 131 del; 329 mod Patch: https://git.openjdk.org/jdk/pull/17428.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17428/head:pull/17428 PR: https://git.openjdk.org/jdk/pull/17428 From dlunden at openjdk.org Wed Jan 17 10:05:51 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 17 Jan 2024 10:05:51 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v2] In-Reply-To: References: Message-ID: On Tue, 16 Jan 2024 08:54:10 GMT, Roberto Casta?eda Lozano wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Refactor test to use multiple @Test > > Thanks for working on this, Daniel! > > These tests would be more idiomatic, precise, and possibly even faster if the IR verification was applied to each vectorization method (`test_sum`, `test_addc`, etc.) separately, instead of doing it as a bulk check over the entire `TestIntVect::testInner()`. This can be achieved by using `applyIfCPUFeature` annotations in the IR checks, similarly to e.g. `test/hotspot/jtreg/compiler/loopopts/superword/RedTest_int.java`. I recognize this limitation is pre-existing, but this issue seems a good place to address it. I have now pushed a revised version (and updated the PR description). Please have a look when you have some spare time @robcasloz. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17428#issuecomment-1895476271 From epeter at openjdk.org Wed Jan 17 11:24:10 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 17 Jan 2024 11:24:10 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v4] In-Reply-To: References: Message-ID: On Thu, 4 Jan 2024 08:12:22 GMT, Roland Westrelin wrote: >> This change implements C2 optimizations for calls to >> ScopedValue.get(). Indeed, in: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> `v2` can be replaced by `v1` and the second call to `get()` can be >> optimized out. That's true whatever is between the 2 calls unless a >> new mapping for `scopedValue` is created in between (when that happens >> no optimizations is performed for the method being compiled). Hoisting >> a `get()` call out of loop for a loop invariant `scopedValue` should >> also be legal in most cases. >> >> `ScopedValue.get()` is implemented in java code as a 2 step process. A >> cache is attached to the current thread object. If the `ScopedValue` >> object is in the cache then the result from `get()` is read from >> there. Otherwise a slow call is performed that also inserts the >> mapping in the cache. The cache itself is lazily allocated. One >> `ScopedValue` can be hashed to 2 different indexes in the cache. On a >> cache probe, both indexes are checked. As a consequence, the process >> of probing the cache is a multi step process (check if the cache is >> present, check first index, check second index if first index >> failed). If the cache is populated early on, then when the method that >> calls `ScopedValue.get()` is compiled, profile reports the slow path >> as never taken and only the read from the cache is compiled. >> >> To perform the optimizations, I added 3 new node types to C2: >> >> - the pair >> ScopedValueGetHitsInCacheNode/ScopedValueGetLoadFromCacheNode for >> the cache probe >> >> - a cfg node ScopedValueGetResultNode to help locate the result of the >> `get()` call in the IR graph. >> >> In pseudo code, once the nodes are inserted, the code of a `get()` is: >> >> >> hits_in_the_cache = ScopedValueGetHitsInCache(scopedValue) >> if (hits_in_the_cache) { >> res = ScopedValueGetLoadFromCache(hits_in_the_cache); >> } else { >> res = ..; //slow call possibly inlined. Subgraph can be arbitray complex >> } >> res = ScopedValueGetResult(res) >> >> >> In the snippet: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> Replacing `v2` by `v1` is then done by starting from the >> `ScopedValueGetResult` node for the second `get()` and looking for a >> dominating `ScopedValueGetResult` for the same `ScopedValue` >> object. When one is found, it is used as a replacement. Eliminating >> the second `get()` call is achieved by making >> `ScopedValueGetHitsInCache` always successful if there's a dominating >> `Scoped... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > merge fix Nice work @rwestrel I'm sending out a first batch or comments, more coming later. src/hotspot/share/gc/shared/c2/barrierSetC2.cpp line 109: > 107: Node* ctl = opt_access.ctl(); > 108: assert(opt_access.mem()->is_MergeMem(), ""); > 109: MergeMemNode* mm = opt_access.mem()->as_MergeMem(); Does `as_MergeMem` not assert `is_MergeMem`? src/hotspot/share/opto/callGenerator.cpp line 817: > 815: > 816: class LateInlineScopedValueCallGenerator : public LateInlineCallGenerator { > 817: Node* _sv; I would prefer a longer name. `_scoped_value`? Is this the pointer to the ScopedValue object? An oop? `_soped_value_oop`? A fuller name would help me understand the pattern-matching code more easily. src/hotspot/share/opto/callGenerator.cpp line 851: > 849: } > 850: > 851: virtual void process_result(GraphKit& kit) { It would be really nice if you refactored this huge method (6+ pages of code) into smaller units. It would for example make separation of pattern-matching and transformation easier to see. src/hotspot/share/opto/callGenerator.cpp line 854: > 852: if (!_process_result) { > 853: return; > 854: } Since it is not set in the constructor, and we seem to need `_sv` here, could we add this? `assert(_sv != nullptr, "must have set scoped value to be pattern matched")` src/hotspot/share/opto/callGenerator.cpp line 877: > 875: wq.push(kit.control()); > 876: for (uint i = 0; i < wq.size(); ++i) { > 877: Node* c = wq.at(i); Could we assert that these are all CFG nodes? And give it a more expressive name? src/hotspot/share/opto/callGenerator.cpp line 888: > 886: } else { > 887: if (c->Opcode() == Op_If) { > 888: Node* bol = c->in(1); Suggestion: BoolNode* bol = c->in(1)->as_Bool(); Then you can drop the assert below src/hotspot/share/opto/callGenerator.cpp line 897: > 895: in1->in(0)->as_CallJava()->method()->intrinsic_id() == vmIntrinsics::_scopedValueCache) { > 896: assert(in2->bottom_type() == TypePtr::NULL_PTR, ""); > 897: assert(get_cache_iff == nullptr, ""); Suggestion: assert(get_cache_iff == nullptr, "should only find one get_cache_if"); src/hotspot/share/opto/callGenerator.cpp line 902: > 900: scoped_value_cache = in1->in(0)->as_Call(); > 901: } else { > 902: assert(scoped_value_cache == in1->in(0), ""); Suggestion: assert(scoped_value_cache == in1->in(0), "should only find one scoped_value_cache"); src/hotspot/share/opto/callGenerator.cpp line 915: > 913: assert(scoped_value_cache == in1->in(0), ""); > 914: } > 915: continue; Code duplication. Could easily be extracted as a helper method. src/hotspot/share/opto/callGenerator.cpp line 923: > 921: assert(in2 = _sv, ""); > 922: in = in1; > 923: } Might be less verbose: assert(in1 == _sv || in2 == _sv, "one of the comparison values must be the scoped value (oop?)"); // pick the other: Node* in = (in1 == _sv) ? in2 : in1; Could we have a more expressive name for `in`? src/hotspot/share/opto/callGenerator.cpp line 931: > 929: assert(in->Opcode() == Op_LoadP || in->Opcode() == Op_LoadN, ""); > 930: assert(C->get_alias_index(in->adr_type()) == C->get_alias_index(TypeAryPtr::OOPS), ""); > 931: Node* addp1 = in->in(MemNode::Address); Suggestion: AddPNode* addp1 = in->in(MemNode::Address)->as_AddP(); And drop the assert below. src/hotspot/share/opto/callGenerator.cpp line 940: > 938: } else { > 939: assert(scoped_value_cache == addp1->in(AddPNode::Base)->uncast()->in(0), ""); > 940: } Looks like a reference to: `ProjNode* sv_cache_proj = addp1->in(AddPNode::Base)->uncast()->as_Proj()` Would get rid of half of your assert, and also make the code easier to read because of reuse. And you could further simplify: assert(scoped_value_cache == nullptr || scoped_value_cache == sv_cache_proj->in(0), "only one cache allowed"); scoped_value_cache == sv_cache_proj->in(0); src/hotspot/share/opto/callGenerator.cpp line 949: > 947: int header = arrayOopDesc::base_offset_in_bytes(bt); > 948: assert(const_offset >= header, ""); > 949: const_offset -= header; More comments about pattern would be nice, you lost me here ? src/hotspot/share/opto/callGenerator.cpp line 951: > 949: const_offset -= header; > 950: > 951: Node* index = kit.gvn().intcon(const_offset >> shift); Index for what? Please pick a more descriptive name. src/hotspot/share/opto/callGenerator.cpp line 955: > 953: assert(!addp2->in(AddPNode::Address)->is_AddP() && > 954: addp2->in(AddPNode::Base) == addp1->in(AddPNode::Base), > 955: ""); An explanation in the string of the assert would be nice too. src/hotspot/share/opto/callGenerator.cpp line 964: > 962: if (offset2->Opcode() == Op_CastII && offset2->in(0)->is_Proj() && > 963: offset2->in(0)->in(0) == get_cache_iff) { > 964: ShouldNotReachHere(); Why? src/hotspot/share/opto/callGenerator.cpp line 981: > 979: } > 980: } else if (c->is_RangeCheck()) { > 981: // Kill the range checks as they are known to always succeed Wow, that looks a bit scary ? How do we know we do not accidentally kill a unrelated RangeCheck? And what is the argument for why they always succeed? src/hotspot/share/opto/callGenerator.cpp line 988: > 986: assert(slow_call == nullptr, ""); > 987: slow_call = c->as_CallStaticJava(); > 988: assert(slow_call->method()->intrinsic_id() == vmIntrinsics::_SVslowGet, ""); I would move the assert to before the assignment, but that is a matter of taste. src/hotspot/share/opto/callGenerator.cpp line 990: > 988: assert(slow_call->method()->intrinsic_id() == vmIntrinsics::_SVslowGet, ""); > 989: } else { > 990: assert(c->is_Proj() || c->is_Catch(), ""); Suggestion: assert(c->is_Proj() || c->is_Catch(), "unexpected node in pattern matching"); src/hotspot/share/opto/callGenerator.cpp line 996: > 994: } > 995: // get_first_iff/get_second_iff contain the first/second check we ran into during the graph traversal but they may > 996: // not be the first/second one in execution order. Perform another traversal to figure out which is first. Why can this not be done in the first traversal, and why does this (down) traversal do the right thing? Can we assert that `c` is always CFG? Please mention in a comment that we are only traversing the same CFG nodes from the first traversal. src/hotspot/share/opto/callGenerator.cpp line 998: > 996: // not be the first/second one in execution order. Perform another traversal to figure out which is first. > 997: if (get_second_iff != nullptr) { > 998: Node_Stack stack(0); No visited set. Can this trigger an exponential explosion with if/region diamonds? src/hotspot/share/opto/callGenerator.cpp line 1031: > 1029: CallStaticJavaNode* get_first_iff_unc = get_first_iff_failure->is_uncommon_trap_proj(Deoptimization::Reason_none); > 1030: if (get_first_iff_unc != nullptr) { > 1031: // first cache check never hits, keep only the second. I'm struggling to understand: We still have an unc-trap for the first. So we never failed so far, right? So we always found it in the cache, or am I wrong? We are not removing this unc-trap though, right? src/hotspot/share/opto/callGenerator.cpp line 1050: > 1048: // Now move right above the scopedValueCache() call > 1049: Node* mem = scoped_value_cache->in(TypeFunc::Memory); > 1050: Node* c = scoped_value_cache->in(TypeFunc::Control); Suggestion: Node* ctrl = scoped_value_cache->in(TypeFunc::Control); src/hotspot/share/opto/callGenerator.cpp line 1193: > 1191: // continue: > 1192: // > 1193: // slow_call: Makes it look like continue is a fall-through to slow_call, that is not what you want, right? src/hotspot/share/opto/callGenerator.cpp line 1220: > 1218: // goto continue; > 1219: // > 1220: // the transformed graph includes 2 copies of the cache probing logic. One represented by the Suggestion: // The transformed graph includes 2 copies of the cache probing logic. One represented by the src/hotspot/share/opto/callGenerator.cpp line 1225: > 1223: // that some paths may end with an uncommon trap and if one traps, we want the trap to be recorded for the right bci. > 1224: // When the ScopedValueGetHitsInCache/ScopedValueGetLoadFromCache pair is expanded, split if finds the duplicate > 1225: // logic and cleans it up. I would prefer the comment section at the beginning of the method. Otherwise I may start reading down linearly, reverse-engineer the code, and only discover this afterwards... ? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16966#pullrequestreview-1826905074 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455120226 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455185080 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455153771 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455190286 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455257893 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455157115 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455169129 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455170848 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455175485 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455200497 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455206181 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455225844 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455230819 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455245344 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455233801 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455241722 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455251712 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455253875 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455255529 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455274616 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455279448 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455292221 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455296497 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455143110 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455147489 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455150085 From epeter at openjdk.org Wed Jan 17 11:24:11 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 17 Jan 2024 11:24:11 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v4] In-Reply-To: References: Message-ID: On Wed, 17 Jan 2024 10:13:22 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> merge fix > > src/hotspot/share/opto/callGenerator.cpp line 888: > >> 886: } else { >> 887: if (c->Opcode() == Op_If) { >> 888: Node* bol = c->in(1); > > Suggestion: > > BoolNode* bol = c->in(1)->as_Bool(); > > Then you can drop the assert below Or even: `Node* cmp = c->in(1)->as_Bool()->in(1);` > src/hotspot/share/opto/callGenerator.cpp line 902: > >> 900: scoped_value_cache = in1->in(0)->as_Call(); >> 901: } else { >> 902: assert(scoped_value_cache == in1->in(0), ""); > > Suggestion: > > assert(scoped_value_cache == in1->in(0), "should only find one scoped_value_cache"); Even nicer might be to also have an assert, and then just assign: assert(scoped_value_cache == nullptr || scoped_value_cache == in1->in(0), "should only find one scoped_value_cache"); scoped_value_cache = in1->in(0)->as_Call(); > src/hotspot/share/opto/callGenerator.cpp line 923: > >> 921: assert(in2 = _sv, ""); >> 922: in = in1; >> 923: } > > Might be less verbose: > > assert(in1 == _sv || in2 == _sv, "one of the comparison values must be the scoped value (oop?)"); > // pick the other: > Node* in = (in1 == _sv) ? in2 : in1; > > Could we have a more expressive name for `in`? Also a general comment about what kind of matching we are doing here would be nice. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455161369 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455173243 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455211335 From thartmann at openjdk.org Wed Jan 17 12:23:58 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 17 Jan 2024 12:23:58 GMT Subject: [jdk22] Integrated: 8316756: C2 EA fails with "missing memory path" when encountering unsafe_arraycopy stub call In-Reply-To: References: Message-ID: <8WE3lWDBm3uv-kad4GUsA4dA_jtxihpAgk6wOLz4uIw=.7da39424-2d78-43da-a87a-ec0ca125d54f@github.com> On Wed, 17 Jan 2024 08:33:53 GMT, Tobias Hartmann wrote: > Hi all, > > This pull request contains a backport of commit [b8917214](https://github.com/openjdk/jdk/commit/b89172149d6a900d11630a95be7278870421b435) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Tobias Holenstein on 17 Jan 2024 and was reviewed by Vladimir Kozlov, Tobias Hartmann and Christian Hagedorn. > > Thanks! This pull request has now been integrated. Changeset: 78150ca9 Author: Tobias Hartmann URL: https://git.openjdk.org/jdk22/commit/78150ca9df8af70f07e08d593097819dfea389fa Stats: 86 lines in 2 files changed: 85 ins; 0 del; 1 mod 8316756: C2 EA fails with "missing memory path" when encountering unsafe_arraycopy stub call Reviewed-by: tholenstein Backport-of: b89172149d6a900d11630a95be7278870421b435 ------------- PR: https://git.openjdk.org/jdk22/pull/86 From aph-open at littlepinkcloud.com Wed Jan 17 13:19:49 2024 From: aph-open at littlepinkcloud.com (Andrew Haley) Date: Wed, 17 Jan 2024 13:19:49 +0000 Subject: discuss about release barrier for final fields initialization In-Reply-To: <822b0f93-ae9e-4196-8d20-1b87286b91d5.kuaiwei.kw@alibaba-inc.com> References: <5e41f867-b31b-4ded-b737-8d0c869c8895.kuaiwei.kw@alibaba-inc.com> <20965b9a-ec64-4d81-866e-b1a0a94ed1e0@littlepinkcloud.com> <822b0f93-ae9e-4196-8d20-1b87286b91d5.kuaiwei.kw@alibaba-inc.com> Message-ID: <502b509b-bf31-4eee-8468-3f2362d69da8@littlepinkcloud.com> On 1/11/24 11:58, Kuai Wei wrote: > Thanks for reply. I checked the previous discussion and not clear about the root cause. > > If you can provide more detail about the optimize, like what load or load dependency will be elided, so we may check chance to detect or prevent. We think you're probably right. However, C2 does a lot of reorganization, so it's hard to say that C2 can never predict what might be stored by static field initialization in one thread. If you're benchmarking this, can you try dmb st; dmb ld without fusing them together, thus avoiding a storeload? This would help us understand the performance benefit. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From chagedorn at openjdk.org Wed Jan 17 14:11:53 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 17 Jan 2024 14:11:53 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v2] In-Reply-To: References: Message-ID: On Wed, 17 Jan 2024 10:03:06 GMT, Daniel Lund?n wrote: >> This changeset translates the tests in `compiler/c2/cr7200264/` to use the IR verification framework. >> >> The proposed translation, to the extent possible, attempts to preserve the semantics of the original test. A major difference is that the IR checks are now local (for every `test_*` method) instead of global. The execution time of the new test is comparable to the old test. >> >> Testing: >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/7553846710) >> - Ran the new translated tests within all tier1 through tier10 contexts on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. >> - Tested that manually adding `-XX:LoopUnrollLimit=0` to the test framework flags caused the translated tests to fail. Note: it is, however, no longer possible to break the test by passing `-XX:LoopUnrollLimit=0` on the command line. > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Refactor test to use multiple @Test test/hotspot/jtreg/compiler/c2/cr7200264/TestIntVect.java line 64: > 62: public void run() { > 63: > 64: System.out.println("Testing Integer vectors"); Not sure if it's worth to keep these printing statements. But does not hurt either to leave them in. test/hotspot/jtreg/compiler/c2/cr7200264/TestIntVect.java line 155: > 153: test_addc(a0, a1); > 154: for (int i=0; i 155: errn += verify("test_addc: ", i, a0[i], (int)((int)(ADD_INIT+i)+VALUE)); I suggest to either directly use: Asserts.assertEQ(a0[i], (int)((int)(ADD_INIT+i)+VALUE), "test_addc failed at a0[" + i + "]"); Or change `verify()` such that it uses `Asserts.assertEQ()` (just an example and could also be adjusted): static int verify(String text, int i, int elem, int val) { Asserts.assertEQ(elem, val, text + " failed at a0[" + i + "]"). } test/hotspot/jtreg/compiler/c2/cr7200264/TestIntVect.java line 510: > 508: } > 509: > 510: void test_divc(int[] a0, int[] a1) { What about these tests without IR verification? Are they expected to be not vectorized? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17428#discussion_r1455657566 PR Review Comment: https://git.openjdk.org/jdk/pull/17428#discussion_r1455622323 PR Review Comment: https://git.openjdk.org/jdk/pull/17428#discussion_r1455653160 From redestad at openjdk.org Wed Jan 17 14:31:53 2024 From: redestad at openjdk.org (Claes Redestad) Date: Wed, 17 Jan 2024 14:31:53 GMT Subject: RFR: 8318446: C2: implement StoreNode::Ideal_merge_stores In-Reply-To: References: Message-ID: <5b2J1w0TrFH-Gw5D7pkW7678s_JeJmdM7-PtDOuBrhA=.ec71341b-1edf-4d5c-80c0-06957243561a@github.com> On Wed, 18 Oct 2023 13:04:35 GMT, Emanuel Peter wrote: > This is a feature requiested by @RogerRiggs and @cl4es . > > **Idea** > > Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get speedups by using Unsafe (e.g. `Unsafe.putLongUnaligned`), or ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code. > > This patch here supports a few simple use-cases, like these: > > Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395 > > Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the splitting (i.e. shifting and truncation), and directly store the variable: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456 > > The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian): > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338 > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472 > > **Details** > > This draft currently implements the optimization in an additional special IGVN phase: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485 > > We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. We essentially try to establish a chain of mergable stores: > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806 > > Mergable stores must have the same Opcode (implies they have the same element type and hence size). Further, mergable stores must have the same control (or be separated by only a RangeCheck). Further, they must either both store constants, or adjacent segments of a larger value ... While not a formal review I think this looks great! > Is a separate phase ok? It might be good to check that compilation times doesn't increase excessively on various benchmarks. @robcasloz has done some compilation time analysis recently and might be able to give you some pointers on how to replicate. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16245#issuecomment-1895927485 From epeter at openjdk.org Wed Jan 17 14:44:00 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 17 Jan 2024 14:44:00 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v4] In-Reply-To: References: Message-ID: On Thu, 4 Jan 2024 08:12:22 GMT, Roland Westrelin wrote: >> This change implements C2 optimizations for calls to >> ScopedValue.get(). Indeed, in: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> `v2` can be replaced by `v1` and the second call to `get()` can be >> optimized out. That's true whatever is between the 2 calls unless a >> new mapping for `scopedValue` is created in between (when that happens >> no optimizations is performed for the method being compiled). Hoisting >> a `get()` call out of loop for a loop invariant `scopedValue` should >> also be legal in most cases. >> >> `ScopedValue.get()` is implemented in java code as a 2 step process. A >> cache is attached to the current thread object. If the `ScopedValue` >> object is in the cache then the result from `get()` is read from >> there. Otherwise a slow call is performed that also inserts the >> mapping in the cache. The cache itself is lazily allocated. One >> `ScopedValue` can be hashed to 2 different indexes in the cache. On a >> cache probe, both indexes are checked. As a consequence, the process >> of probing the cache is a multi step process (check if the cache is >> present, check first index, check second index if first index >> failed). If the cache is populated early on, then when the method that >> calls `ScopedValue.get()` is compiled, profile reports the slow path >> as never taken and only the read from the cache is compiled. >> >> To perform the optimizations, I added 3 new node types to C2: >> >> - the pair >> ScopedValueGetHitsInCacheNode/ScopedValueGetLoadFromCacheNode for >> the cache probe >> >> - a cfg node ScopedValueGetResultNode to help locate the result of the >> `get()` call in the IR graph. >> >> In pseudo code, once the nodes are inserted, the code of a `get()` is: >> >> >> hits_in_the_cache = ScopedValueGetHitsInCache(scopedValue) >> if (hits_in_the_cache) { >> res = ScopedValueGetLoadFromCache(hits_in_the_cache); >> } else { >> res = ..; //slow call possibly inlined. Subgraph can be arbitray complex >> } >> res = ScopedValueGetResult(res) >> >> >> In the snippet: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> Replacing `v2` by `v1` is then done by starting from the >> `ScopedValueGetResult` node for the second `get()` and looking for a >> dominating `ScopedValueGetResult` for the same `ScopedValue` >> object. When one is found, it is used as a replacement. Eliminating >> the second `get()` call is achieved by making >> `ScopedValueGetHitsInCache` always successful if there's a dominating >> `Scoped... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > merge fix Second badge of comments. src/hotspot/share/classfile/vmIntrinsics.hpp line 301: > 299: do_name( slowGet_name, "slowGet") \ > 300: do_intrinsic(_SVCacheInvalidate, java_lang_ScopedValue_Cache, invalidate_name, int_void_signature, F_S) \ > 301: do_name( invalidate_name, "invalidate") \ What prevents us from writing out `_scopedValueGet` etc, just like the other names here? src/hotspot/share/opto/callGenerator.cpp line 1051: > 1049: Node* mem = scoped_value_cache->in(TypeFunc::Memory); > 1050: Node* c = scoped_value_cache->in(TypeFunc::Control); > 1051: Node* io = scoped_value_cache->in(TypeFunc::I_O); I would use `input_mem`, `input_ctrl`, `input_io`. Then the replacements below would read more intuitively. src/hotspot/share/opto/callGenerator.cpp line 1058: > 1056: > 1057: // remove the scopedValueCache() call > 1058: CallProjections scoped_value_cache_projs = CallProjections(); Suggestion: CallProjections scoped_value_cache_projs; Is the assignment really necessary, or style-wise preferrable? I see you use it without elsewhere. src/hotspot/share/opto/callGenerator.cpp line 1089: > 1087: second_index == nullptr ? C->top() : second_index); > 1088: > 1089: // It will later be expanded back to all the checks so record profile data Should we also copy the node info (e.g. line number etc)? src/hotspot/share/opto/callGenerator.cpp line 1115: > 1113: } else { > 1114: sv_hits_in_cache->set_profile_data(2, 0, 0); > 1115: } Another case of code duplication. Why not write a method that extracts `cnt, prob` for an `iff`?`Or maybe that already exists? src/hotspot/share/opto/callGenerator.cpp line 1120: > 1118: > 1119: // And compute the probability of a miss in the cache > 1120: float prob; Suggestion: float cache_miss_prob; src/hotspot/share/opto/callGenerator.cpp line 1122: > 1120: float prob; > 1121: // get_cache_prob: probability that cache array is not null > 1122: // get_first_prob: probability of a miss `get_first_prob` sounds like the probability of "getting it", so not a miss. Which is it? src/hotspot/share/opto/callGenerator.cpp line 1127: > 1125: prob = PROB_UNKNOWN; > 1126: } else { > 1127: prob = (1 - get_cache_prob) + get_cache_prob * (get_first_prob + (1 - get_first_prob) * get_second_prob); Suggestion: cache_miss_prob = (1 - get_cache_prob) + // cache array is null get_cache_prob * ( // cache array not null, and: get_first_prob + // first has cache miss (1 - get_first_prob) * get_second_prob // first hits, but second misses ); src/hotspot/share/opto/callGenerator.cpp line 1139: > 1137: > 1138: // Merge the paths that produce the result (in case there's a slow path) > 1139: Node* r = new RegionNode(3); Suggestion: Node* region_fast_slow = new RegionNode(3); I think we can affort a slightly more expressive name. src/hotspot/share/opto/callGenerator.cpp line 1151: > 1149: phi_cache_value->init_req(1, C->top()); > 1150: phi_mem->init_req(1, C->top()); > 1151: phi_io->init_req(1, C->top()); Why not just put the slow path on input `2`, and make the size of the RegionNode depend on if there is a `slow_call`? Then you can avoid these top inputs, right? src/hotspot/share/opto/callGenerator.cpp line 1155: > 1153: CallProjections slow_projs; > 1154: slow_call->extract_projections(&slow_projs, false); > 1155: Node* fallthrough = slow_projs.fallthrough_catchproj->clone(); Why does that have to be cloned? src/hotspot/share/opto/callGenerator.cpp line 1166: > 1164: > 1165: // ScopedValueGetLoadFromCache is a single that represents the result of a hit in the cache > 1166: Node* cache_value = kit.gvn().transform(new ScopedValueGetLoadFromCacheNode(C, in_cache, sv_hits_in_cache)); Suggestion: Node* sv_load_from_cache = kit.gvn().transform(new ScopedValueGetLoadFromCacheNode(C, in_cache, sv_hits_in_cache)); For consistency with `sv_hits_in_cache` and the node class name. src/hotspot/share/opto/cfgnode.hpp line 737: > 735: ProjNode* result_out() { > 736: return proj_out_or_null(Result); > 737: } Either verify that we have not null, or else rename to `result_out_or_null`. src/hotspot/share/opto/compile.cpp line 465: > 463: remove_useless_late_inlines( &_boxing_late_inlines, useful); > 464: remove_useless_late_inlines(&_vector_reboxing_late_inlines, useful); > 465: remove_useless_late_inlines( &_scoped_value_late_inlines, useful); Suggestion: remove_useless_late_inlines( &_scoped_value_late_inlines, useful); src/hotspot/share/opto/compile.cpp line 2034: > 2032: > 2033: void Compile::inline_scoped_value_calls(PhaseIterGVN& igvn) { > 2034: if (_scoped_value_late_inlines.length() > 0) { Rather than indenting everything, I would just check `_scoped_value_late_inlines.is_empty()` and return. ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16966#pullrequestreview-1827249142 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455708715 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455535612 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455507963 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455600271 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455631389 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455639533 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455649353 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455651376 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455669565 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455678033 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455683828 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455692682 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455721814 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455724546 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455742818 From epeter at openjdk.org Wed Jan 17 14:44:00 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 17 Jan 2024 14:44:00 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v4] In-Reply-To: References: Message-ID: On Wed, 17 Jan 2024 14:05:32 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> merge fix > > src/hotspot/share/opto/callGenerator.cpp line 1127: > >> 1125: prob = PROB_UNKNOWN; >> 1126: } else { >> 1127: prob = (1 - get_cache_prob) + get_cache_prob * (get_first_prob + (1 - get_first_prob) * get_second_prob); > > Suggestion: > > cache_miss_prob = (1 - get_cache_prob) + // cache array is null > get_cache_prob * ( // cache array not null, and: > get_first_prob + // first has cache miss > (1 - get_first_prob) * get_second_prob // first hits, but second misses > ); idk if this helps, maybe the alignment would have to be improved too. But maybe just slightly better naming would also do the trick? > src/hotspot/share/opto/compile.cpp line 465: > >> 463: remove_useless_late_inlines( &_boxing_late_inlines, useful); >> 464: remove_useless_late_inlines(&_vector_reboxing_late_inlines, useful); >> 465: remove_useless_late_inlines( &_scoped_value_late_inlines, useful); > > Suggestion: > > remove_useless_late_inlines( &_scoped_value_late_inlines, useful); Make it align with the code above ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455653387 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455725257 From roland at openjdk.org Wed Jan 17 14:46:52 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 17 Jan 2024 14:46:52 GMT Subject: RFR: 8318446: C2: implement StoreNode::Ideal_merge_stores In-Reply-To: References: Message-ID: On Wed, 18 Oct 2023 13:04:35 GMT, Emanuel Peter wrote: > This is a feature requiested by @RogerRiggs and @cl4es . > > **Idea** > > Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get speedups by using Unsafe (e.g. `Unsafe.putLongUnaligned`), or ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code. > > This patch here supports a few simple use-cases, like these: > > Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395 > > Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the splitting (i.e. shifting and truncation), and directly store the variable: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456 > > The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian): > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338 > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472 > > **Details** > > This draft currently implements the optimization in an additional special IGVN phase: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485 > > We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. We essentially try to establish a chain of mergable stores: > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806 > > Mergable stores must have the same Opcode (implies they have the same element type and hence size). Further, mergable stores must have the same control (or be separated by only a RangeCheck). Further, they must either both store constants, or adjacent segments of a larger value ... So what happens to the range checks in this transformation? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16245#issuecomment-1895957362 From epeter at openjdk.org Wed Jan 17 14:50:52 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 17 Jan 2024 14:50:52 GMT Subject: RFR: 8318446: C2: implement StoreNode::Ideal_merge_stores In-Reply-To: References: Message-ID: On Wed, 17 Jan 2024 14:44:26 GMT, Roland Westrelin wrote: > So what happens to the range checks in this transformation? @rwestrel good question: I start at the "latest" store, and look up the memory graph, also bypassing RangeChecks. When I decide to merge the stores, I place them at the place of the "latest" store, so after the RangeChecks. Now you might wonder: what happens if we were to actually fail a RangeCheck? Answer: I only replace the "latest" store. All earlier ones might still survive igvn if they have other uses, such as in a uncommon-trap. But they will probably sink out into the uncommon-trap path, and away from the main path, in which we should hopefully only have the merged store. Does that answer your question? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16245#issuecomment-1895965350 From roland at openjdk.org Wed Jan 17 14:54:56 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 17 Jan 2024 14:54:56 GMT Subject: RFR: 8318446: C2: implement StoreNode::Ideal_merge_stores In-Reply-To: References: Message-ID: On Wed, 17 Jan 2024 14:44:26 GMT, Roland Westrelin wrote: >> This is a feature requiested by @RogerRiggs and @cl4es . >> >> **Idea** >> >> Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get speedups by using Unsafe (e.g. `Unsafe.putLongUnaligned`), or ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code. >> >> This patch here supports a few simple use-cases, like these: >> >> Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant: >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395 >> >> Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the splitting (i.e. shifting and truncation), and directly store the variable: >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456 >> >> The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian): >> >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338 >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472 >> >> **Details** >> >> This draft currently implements the optimization in an additional special IGVN phase: >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485 >> >> We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. We essentially try to establish a chain of mergable stores: >> >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806 >> >> Mergable stores must have the same Opcode (implies they have the same element type and hence size). Further, mergable stores must have the same control (or be separated by only a RangeCheck). Further, they must either bot... > > So what happens to the range checks in this transformation? > @rwestrel good question: I start at the "latest" store, and look up the memory graph, also bypassing RangeChecks. When I decide to merge the stores, I place them at the place of the "latest" store, so after the RangeChecks. Now you might wonder: what happens if we were to actually fail a RangeCheck? Answer: I only replace the "latest" store. All earlier ones might still survive igvn if they have other uses, such as in a uncommon-trap. But they will probably sink out into the uncommon-trap path, and away from the main path, in which we should hopefully only have the merged store. Wouldn't the first store in a chain of 3 stores have a use at the 2nd and 3rd range checks and so wouldn't sink? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16245#issuecomment-1895973468 From dlunden at openjdk.org Wed Jan 17 15:02:53 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 17 Jan 2024 15:02:53 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v2] In-Reply-To: References: Message-ID: On Wed, 17 Jan 2024 14:06:13 GMT, Christian Hagedorn wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Refactor test to use multiple @Test > > test/hotspot/jtreg/compiler/c2/cr7200264/TestIntVect.java line 510: > >> 508: } >> 509: >> 510: void test_divc(int[] a0, int[] a1) { > > What about these tests without IR verification? Are they expected to be not vectorized? That's a good question. I based the current `@IR` verifications on the vector nodes checked in the original test, and the `test_*` functions that do not currently have `@IR` annotations are those that do not result in any of the vector operations part of the original test. I additionally checked a few of these functions in IGV, and none of them resulted in any vector operations. I guess we can remove them altogether, unless they are actually supposed to generate vector operations. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17428#discussion_r1455790054 From dlunden at openjdk.org Wed Jan 17 15:10:55 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 17 Jan 2024 15:10:55 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v2] In-Reply-To: References: Message-ID: <1ys6VKnk5FRHBGmMwK10_yUNjZuqcvIO7Tk6iU8hboQ=.5d0f54e9-8263-4636-aba6-b528482a4c5a@github.com> On Wed, 17 Jan 2024 14:08:03 GMT, Christian Hagedorn wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Refactor test to use multiple @Test > > test/hotspot/jtreg/compiler/c2/cr7200264/TestIntVect.java line 64: > >> 62: public void run() { >> 63: >> 64: System.out.println("Testing Integer vectors"); > > Not sure if it's worth to keep these printing statements. But does not hurt either to leave them in. Same comment here as above, is updating the testing code itself within the scope of this issue? > test/hotspot/jtreg/compiler/c2/cr7200264/TestIntVect.java line 155: > >> 153: test_addc(a0, a1); >> 154: for (int i=0; i> 155: errn += verify("test_addc: ", i, a0[i], (int)((int)(ADD_INIT+i)+VALUE)); > > I suggest to either directly use: > > Asserts.assertEQ(a0[i], (int)((int)(ADD_INIT+i)+VALUE), "test_addc failed at a0[" + i + "]"); > > Or change `verify()` such that it uses `Asserts.assertEQ()` (just an example and could also be adjusted): > > static int verify(String text, int i, int elem, int val) { > Asserts.assertEQ(elem, val, text + " failed at a0[" + i + "]"). > } When translating the test, I focused only on translating the old ad-hoc IR tests to the IR verification framework. Do we want to extend the scope of this issue to also update the testing code? Mainly, the reason I'm asking is that copies of `TestIntVect.java` appears in many places (e.g., `test/compiler/6340864/TestIntVect.java`), and the testing pattern in particular appears in many places (grep for, e.g., `errn += verify(`). If we update the pattern here, we should probably also update it everywhere else? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17428#discussion_r1455809950 PR Review Comment: https://git.openjdk.org/jdk/pull/17428#discussion_r1455806619 From epeter at openjdk.org Wed Jan 17 15:12:54 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 17 Jan 2024 15:12:54 GMT Subject: RFR: 8318446: C2: implement StoreNode::Ideal_merge_stores In-Reply-To: References: Message-ID: <10wcc-pgfz9ddsLYg1wkqG7EdXiXDd1vZIdZqwhBkns=.d032c755-eb79-4a76-9175-3e847d5bb1f7@github.com> On Wed, 17 Jan 2024 14:52:15 GMT, Roland Westrelin wrote: > Wouldn't the first store in a chain of 3 stores have a use at the 2nd and 3rd range checks and so wouldn't sink? Sure, if there are actually more than 2 RangeChecks. But if the stores are all adjacent, then I would hope that the RangeChecks get smeared, and so we only have 2 left, right? The first before the stores, and the second before the second store. And so only the first store should have other uses. Do you agree? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16245#issuecomment-1896011657 From roland at openjdk.org Wed Jan 17 15:17:53 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 17 Jan 2024 15:17:53 GMT Subject: RFR: 8318446: C2: implement StoreNode::Ideal_merge_stores In-Reply-To: <10wcc-pgfz9ddsLYg1wkqG7EdXiXDd1vZIdZqwhBkns=.d032c755-eb79-4a76-9175-3e847d5bb1f7@github.com> References: <10wcc-pgfz9ddsLYg1wkqG7EdXiXDd1vZIdZqwhBkns=.d032c755-eb79-4a76-9175-3e847d5bb1f7@github.com> Message-ID: On Wed, 17 Jan 2024 15:10:33 GMT, Emanuel Peter wrote: > Sure, if there are actually more than 2 RangeChecks. But if the stores are all adjacent, then I would hope that the RangeChecks get smeared, and so we only have 2 left, right? The first before the stores, and the second before the second store. And so only the first store should have other uses. Do you agree? Ok, but that assumes the sequence of offsets is increasing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16245#issuecomment-1896021311 From epeter at openjdk.org Wed Jan 17 15:46:04 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 17 Jan 2024 15:46:04 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v4] In-Reply-To: References: Message-ID: On Thu, 4 Jan 2024 08:12:22 GMT, Roland Westrelin wrote: >> This change implements C2 optimizations for calls to >> ScopedValue.get(). Indeed, in: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> `v2` can be replaced by `v1` and the second call to `get()` can be >> optimized out. That's true whatever is between the 2 calls unless a >> new mapping for `scopedValue` is created in between (when that happens >> no optimizations is performed for the method being compiled). Hoisting >> a `get()` call out of loop for a loop invariant `scopedValue` should >> also be legal in most cases. >> >> `ScopedValue.get()` is implemented in java code as a 2 step process. A >> cache is attached to the current thread object. If the `ScopedValue` >> object is in the cache then the result from `get()` is read from >> there. Otherwise a slow call is performed that also inserts the >> mapping in the cache. The cache itself is lazily allocated. One >> `ScopedValue` can be hashed to 2 different indexes in the cache. On a >> cache probe, both indexes are checked. As a consequence, the process >> of probing the cache is a multi step process (check if the cache is >> present, check first index, check second index if first index >> failed). If the cache is populated early on, then when the method that >> calls `ScopedValue.get()` is compiled, profile reports the slow path >> as never taken and only the read from the cache is compiled. >> >> To perform the optimizations, I added 3 new node types to C2: >> >> - the pair >> ScopedValueGetHitsInCacheNode/ScopedValueGetLoadFromCacheNode for >> the cache probe >> >> - a cfg node ScopedValueGetResultNode to help locate the result of the >> `get()` call in the IR graph. >> >> In pseudo code, once the nodes are inserted, the code of a `get()` is: >> >> >> hits_in_the_cache = ScopedValueGetHitsInCache(scopedValue) >> if (hits_in_the_cache) { >> res = ScopedValueGetLoadFromCache(hits_in_the_cache); >> } else { >> res = ..; //slow call possibly inlined. Subgraph can be arbitray complex >> } >> res = ScopedValueGetResult(res) >> >> >> In the snippet: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> Replacing `v2` by `v1` is then done by starting from the >> `ScopedValueGetResult` node for the second `get()` and looking for a >> dominating `ScopedValueGetResult` for the same `ScopedValue` >> object. When one is found, it is used as a replacement. Eliminating >> the second `get()` call is achieved by making >> `ScopedValueGetHitsInCache` always successful if there's a dominating >> `Scoped... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > merge fix Third comment batch. src/hotspot/share/opto/compile.cpp line 2053: > 2051: C->set_has_scoped_value_get_nodes(true); > 2052: CallNode* call = cg->call_node(); > 2053: CallProjections projs; Suggestion: CallProjections call_projs; src/hotspot/share/opto/compile.cpp line 2057: > 2055: Node* sv = call->in(TypeFunc::Parms); > 2056: Node* control_out = projs.fallthrough_catchproj; > 2057: Node* res = projs.resproj; can we have longer and more descriptive names for `sv` and `res` please ? ? src/hotspot/share/opto/compile.cpp line 2066: > 2064: res = res->clone(); > 2065: gvn->set_type_bottom(res); > 2066: gvn->record_for_igvn(res); Why do you clone these? (maybe add a comment) src/hotspot/share/opto/compile.cpp line 3936: > 3934: case Op_ScopedValueGetHitsInCache: > 3935: case Op_ScopedValueGetLoadFromCache: { > 3936: ShouldNotReachHere(); Why? Add a comment! src/hotspot/share/opto/intrinsicnode.cpp line 376: > 374: Node* hits_in_cache = in(1); > 375: assert(hits_in_cache->Opcode() == Op_ScopedValueGetHitsInCache, ""); > 376: return ((ScopedValueGetHitsInCacheNode*)hits_in_cache)->scoped_value(); Why not add the neccessary bits to the class so you can use `as_ScopedValueGetLoadFromCache()`? src/hotspot/share/opto/intrinsicnode.cpp line 388: > 386: assert(in(0)->in(0)->in(1)->is_Bool(), ""); > 387: assert(in(0)->in(0)->in(1)->in(1)->Opcode() == Op_ScopedValueGetHitsInCache, ""); > 388: assert(in(0)->in(0)->in(1)->in(1) == in(1), ""); Why not use your beautiful enum for addressing the inputs? src/hotspot/share/opto/loopPredicate.cpp line 1561: > 1559: > 1560: bool PhaseIdealLoop::is_uncommon_trap_if_pattern(IfProjNode* proj) { > 1561: if (proj->is_uncommon_trap_if_pattern()) { Is having two methods with an identical name but subtly different semantics not quite confusing, and maybe going to lead to some subtle bugs later on? src/hotspot/share/opto/multnode.cpp line 240: > 238: } > 239: > 240: bool ProjNode::is_multi_uncommon_trap_proj() { Add a comment, define when this is true/false. Why is is correct to return false if the `path_limit` is reached, etc. src/hotspot/share/opto/multnode.cpp line 265: > 263: } > 264: } > 265: } else if (n->Opcode() != Op_Halt) { So a path without a call and only Halt node is also a uncommon trap? src/hotspot/share/opto/multnode.hpp line 104: > 102: CallStaticJavaNode* is_uncommon_trap_if_pattern(Deoptimization::DeoptReason reason = Deoptimization::Reason_none) const; > 103: // Return true if this projection doesn't end with an uncommon trap but, even though several cfg paths are branching out > 104: // from here, they all end with an uncommon trap This comment is a bit confusing. `Return true if this projection doesn't end with an uncommon trap` Sounds like if you find no uncommon trap you return true. Suggestion: Check if all cfg paths lead to some (possibly multiple different) uncommon trap or Halt node. Traverse Region, If, IfProj nodes. Control question: a Halt node is also an uncommon trap in your definition then? src/hotspot/share/opto/node.cpp line 988: > 986: return res; > 987: } > 988: Code duplication warning ? Not sure what is the best solution though. src/hotspot/share/opto/type.cpp line 617: > 615: TypeInstKlassPtr::OBJECT_OR_NULL = TypeInstKlassPtr::make(TypePtr::BotPTR, current->env()->Object_klass(), 0); > 616: > 617: const Type **fgetfromcache =(const Type**)shared_type_arena->AmallocWords(3*sizeof(Type*)); Suggestion: const Type** fgetfromcache =(const Type**)shared_type_arena->AmallocWords(3*sizeof(Type*)); src/hotspot/share/opto/type.cpp line 622: > 620: fgetfromcache[2] = TypeAryPtr::OOPS; > 621: TypeTuple::make(3, fgetfromcache); > 622: const Type **fsvgetresult =(const Type**)shared_type_arena->AmallocWords(2*sizeof(Type*)); Suggestion: const Type** fsvgetresult =(const Type**)shared_type_arena->AmallocWords(2*sizeof(Type*)); src/hotspot/share/opto/type.hpp line 749: > 747: static const TypeTuple *INT_CC_PAIR; > 748: static const TypeTuple *LONG_CC_PAIR; > 749: static const TypeTuple *SV_GET_RESULT; Suggestion: static const TypeTuple* SV_GET_RESULT; ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16966#pullrequestreview-1827491007 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455776137 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455783062 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455785694 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455794637 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455808790 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455822604 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455841256 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455854635 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455869419 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455867815 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455879865 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455889614 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455890036 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1455891259 From epeter at openjdk.org Wed Jan 17 15:46:56 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 17 Jan 2024 15:46:56 GMT Subject: RFR: 8318446: C2: implement StoreNode::Ideal_merge_stores In-Reply-To: References: <10wcc-pgfz9ddsLYg1wkqG7EdXiXDd1vZIdZqwhBkns=.d032c755-eb79-4a76-9175-3e847d5bb1f7@github.com> Message-ID: On Wed, 17 Jan 2024 15:15:18 GMT, Roland Westrelin wrote: > Ok, but that assumes the sequence of offsets is increasing. And I do only merge them if they are increasing. It is a limitation, but not a terrible one I'd say. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16245#issuecomment-1896081297 From kxu at openjdk.org Wed Jan 17 18:41:52 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 17 Jan 2024 18:41:52 GMT Subject: RFR: 8320237 C2: late inlining of method handle invoke causes duplicate lines in PrintInlining output [v2] In-Reply-To: References: Message-ID: <6oQW4SWxx34egyxx9qd3EU8WUkYbSPh28lPdJ6x0c_A=.c8a051ed-6848-4e9f-b450-5232b5ba4742@github.com> On Tue, 16 Jan 2024 16:21:18 GMT, Tobias Hartmann wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> fix VM crashes > > Thanks for the explanation. `compiler/inlining/TestDuplicatedLateInliningOutput.java` still failed once with `-XX:+UseZGC -XX:+ZGenerational`: > > java.lang.Exception: No inlining found > at compiler.inlining.TestDuplicatedLateInliningOutput.lambda$test$1(TestDuplicatedLateInliningOutput.java:77) > at java.base/java.util.OptionalInt.orElseThrow(OptionalInt.java:273) > at compiler.inlining.TestDuplicatedLateInliningOutput.test(TestDuplicatedLateInliningOutput.java:77) > at compiler.inlining.TestDuplicatedLateInliningOutput.main(TestDuplicatedLateInliningOutput.java:46) > at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) > at java.base/java.lang.reflect.Method.invoke(Method.java:580) > at com.sun.javatest.regtest.agent.MainActionHelper$AgentVMRunnable.run(MainActionHelper.java:333) > at java.base/java.lang.Thread.run(Thread.java:1575) Hi @TobiHartmann, Thank you for testing, but I've been unsuccessful in reproducing this error so far. Could you please elaborate on which config and the platform you were testing against? Also, am I correct to assume you were passing zgc options to the test harness (e.g., something like `make ... JTREG="VM_OPTIONS=-XX:+UseZGC -XX:+ZGenerational"`), not the actual subprocesses outputing those inlining log? (I did test running subprocesses with zgc options with no failures.) Either way, please advice how you were specifying zgc options. Sorry if I'm missing something obvious here. I'm new to JDK development. :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/17147#issuecomment-1896424463 From duke at openjdk.org Wed Jan 17 19:30:51 2024 From: duke at openjdk.org (Joshua Cao) Date: Wed, 17 Jan 2024 19:30:51 GMT Subject: RFR: 8323220: Reassociate loop invariants involved in Cmps and Add/Subs In-Reply-To: <-5JZDSqyvX6C2dOKIogkE4BKSD594q1RGX3POS4HnTQ=.4b4d01ed-de2d-4ea8-abc3-32e4ee53d5f2@github.com> References: <-5JZDSqyvX6C2dOKIogkE4BKSD594q1RGX3POS4HnTQ=.4b4d01ed-de2d-4ea8-abc3-32e4ee53d5f2@github.com> Message-ID: On Tue, 16 Jan 2024 17:51:58 GMT, Emanuel Peter wrote: >> Yeah, it would work for this patch. But people working on future unrelated changes may have to change the line number. Seems more pain than its worth. > > Another suboptimal idea: you wrap the add / sub in a method, and then ensure that this method is inlined. It might still keep the annotation of being part of that inner method, and you could use regex to check for it. > > Or maybe we could also have some sort of relative line offset mechanism in the IR framework, that allows you to specify that you want something that is let's say 7 lines down from the IR rule. Some sort of pattern matcher could work. It would be able nice to match something like `a ADD_I b CMP_LT c`. In java this could look something like @IR(counts = {IRNode.CMP_LT[IRNode.ANY, IRNode.SUB_I, IRNode.ANY], "1"} The arguments in the `[]` are the inputs. `IRNode.ANY` matches any node. (The zero'th node is ANY because its the region node). Anyway, I think a `lt` test is not super-required for the coverage for this PR. The current machinery does not provide a convenient way to test it. I'd prefer to avoid something hacky. I think this work can be done separately. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17375#discussion_r1456368613 From duke at openjdk.org Wed Jan 17 19:42:23 2024 From: duke at openjdk.org (Joshua Cao) Date: Wed, 17 Jan 2024 19:42:23 GMT Subject: RFR: 8323220: Reassociate loop invariants involved in Cmps and Add/Subs [v2] In-Reply-To: References: Message-ID: > // inv1 == (x + inv2) => ( inv1 - inv2 ) == x > // inv1 == (x - inv2) => ( inv1 + inv2 ) == x > // inv1 == (inv2 - x) => (-inv1 + inv2 ) == x > > > For example, > > > fn(inv1, inv2) > while(...) > x = foobar() > if inv1 == x + inv2 > blackhole() > > > We can transform this into > > > fn(inv1, inv2) > t = inv1 - inv2 > while(...) > x = foobar() > if t == x > blackhole() > > > Here is an example: https://github.com/openjdk/jdk/blob/b78896b9aafcb15f453eaed6e154a5461581407b/src/java.base/share/classes/java/lang/invoke/LambdaFormEditor.java#L910. LHS `1` and RHS `pos` are both loop invariant > > Passes tier1 locally on Linux machine. Passes GHA on my fork. Joshua Cao has updated the pull request incrementally with one additional commit since the last revision: Formatting and comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17375/files - new: https://git.openjdk.org/jdk/pull/17375/files/adcb6432..dda874eb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17375&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17375&range=00-01 Stats: 8 lines in 1 file changed: 3 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/17375.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17375/head:pull/17375 PR: https://git.openjdk.org/jdk/pull/17375 From duke at openjdk.org Wed Jan 17 19:42:23 2024 From: duke at openjdk.org (Joshua Cao) Date: Wed, 17 Jan 2024 19:42:23 GMT Subject: RFR: 8323220: Reassociate loop invariants involved in Cmps and Add/Subs [v2] In-Reply-To: References: Message-ID: On Tue, 16 Jan 2024 17:40:08 GMT, Emanuel Peter wrote: >> Joshua Cao has updated the pull request incrementally with one additional commit since the last revision: >> >> Formatting and comments > > src/hotspot/share/opto/loopTransform.cpp line 345: > >> 343: bool neg_inv1 = (n1->is_Sub() && !n1->is_Cmp() && inv1_idx == 2) || >> 344: (n1->is_Cmp() && inv2_idx == 1 && n2->is_Sub()); >> 345: if (n1->is_Sub() && !n1->is_Cmp() && inv1_idx == 1) { > > Would you mind adding some comments for this logic? I added a little comment block. Not sure how useful it is. I agree the code is hard to follow. Before all the changes for `Cmp`, I was able to make sense of it by following the comments at the top of the function. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17375#discussion_r1456381211 From kvn at openjdk.org Wed Jan 17 20:20:05 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 17 Jan 2024 20:20:05 GMT Subject: RFR: 8322743: assert(held_monitor_count() == jni_monitor_count()) failed [v2] In-Reply-To: References: Message-ID: > Corner case with a local (not escaped) object used for synchronization. C2 Escape Analysis thinks that it can eliminate locks for it. In most cases it is true but not in this case. > > > for (int i = 0; i < 2; ++i) { > Object o = new Object(); > synchronized (o) { // monitorenter > // Trigger OSR compilation > for (int j = 0; j < 100_000; ++j) { > > The test has nested loop which trigger OSR compilation. The locked object comes from Interpreter into compiled OSR code. During parsing C2 creates an other non escaped object and correctly merge both together (with Phi node) so that non escaped object is not scalar replaceable. Because it does not globally escapes EA still removes locks for it and, as result, also for merged locked object from Interpreter which is the bug. > > The fix is to check that synchronized block does not have any associated escaped objects when EA decides if locks can be eliminated. > > Added regression test prepared by @TobiHartmann. Tested tier1-5, xcomp and stress. > Performance testing show no difference. Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: Address review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17331/files - new: https://git.openjdk.org/jdk/pull/17331/files/f6031e2f..e5248c28 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17331&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17331&range=00-01 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17331.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17331/head:pull/17331 PR: https://git.openjdk.org/jdk/pull/17331 From kvn at openjdk.org Wed Jan 17 20:20:05 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 17 Jan 2024 20:20:05 GMT Subject: RFR: 8322743: assert(held_monitor_count() == jni_monitor_count()) failed In-Reply-To: References: Message-ID: On Tue, 9 Jan 2024 22:57:27 GMT, Vladimir Kozlov wrote: > Corner case with a local (not escaped) object used for synchronization. C2 Escape Analysis thinks that it can eliminate locks for it. In most cases it is true but not in this case. > > > for (int i = 0; i < 2; ++i) { > Object o = new Object(); > synchronized (o) { // monitorenter > // Trigger OSR compilation > for (int j = 0; j < 100_000; ++j) { > > The test has nested loop which trigger OSR compilation. The locked object comes from Interpreter into compiled OSR code. During parsing C2 creates an other non escaped object and correctly merge both together (with Phi node) so that non escaped object is not scalar replaceable. Because it does not globally escapes EA still removes locks for it and, as result, also for merged locked object from Interpreter which is the bug. > > The fix is to check that synchronized block does not have any associated escaped objects when EA decides if locks can be eliminated. > > Added regression test prepared by @TobiHartmann. Tested tier1-5, xcomp and stress. > Performance testing show no difference. Thank you, @TobiHartmann, for review. I addressed your comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17331#issuecomment-1896623699 From cslucas at openjdk.org Wed Jan 17 22:28:09 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 17 Jan 2024 22:28:09 GMT Subject: RFR: JDK-8322572: AllocationMergesTests.java fails with "IRViolationException: There were one or multiple IR rule failures." Message-ID: Please review this PR to fix a test in `AllocationMergesTest.java`. The test was failing intermittently because of a random value used as parameter was causing some randomization in the shape method's IR graph and consequently the optimization tested wasn't happening. Tested this locally on Mac, Win and Linux x86_64. ------------- Commit messages: - Fix TestLoadAfterLoopAlias in AllocationMergesTests Changes: https://git.openjdk.org/jdk/pull/17469/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17469&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8322572 Stats: 10 lines in 1 file changed: 0 ins; 1 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/17469.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17469/head:pull/17469 PR: https://git.openjdk.org/jdk/pull/17469 From kvn at openjdk.org Wed Jan 17 22:53:50 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 17 Jan 2024 22:53:50 GMT Subject: RFR: JDK-8322572: AllocationMergesTests.java fails with "IRViolationException: There were one or multiple IR rule failures." In-Reply-To: References: Message-ID: On Wed, 17 Jan 2024 21:41:49 GMT, Cesar Soares Lucas wrote: > Please review this PR to fix a test in `AllocationMergesTest.java`. The test was failing intermittently because of a random value used as parameter was causing some randomization in the shape method's IR graph and consequently the optimization tested wasn't happening. > > Tested this locally on Mac, Win and Linux x86_64. Add second (2024) year to Copyright line. test/hotspot/jtreg/compiler/c2/irTests/scalarReplacement/AllocationMergesTests.java line 124: > 122: Asserts.assertEQ(testMergedAccessAfterCallNoWrite_Interp(cond1, x, y), testMergedAccessAfterCallNoWrite_C2(cond1, x, y)); > 123: Asserts.assertEQ(testCmpMergeWithNull_Second_Interp(cond1, x, y), testCmpMergeWithNull_Second_C2(cond1, x, y)); > 124: Asserts.assertEQ(testObjectIdentity_Interp(cond1, 42, y), testObjectIdentity_C2(cond1, 42, y)); Wrong spacing. ------------- PR Review: https://git.openjdk.org/jdk/pull/17469#pullrequestreview-1828346047 PR Review Comment: https://git.openjdk.org/jdk/pull/17469#discussion_r1456582204 From vlivanov at openjdk.org Thu Jan 18 00:34:22 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 18 Jan 2024 00:34:22 GMT Subject: RFR: 8322743: assert(held_monitor_count() == jni_monitor_count()) failed In-Reply-To: References: Message-ID: <6NibmJkI_k04YJFWALBLGNLlSILRVm1sJtfQx16SF78=.ef57cce8-02ce-435e-a9f5-cf0eb3a9bdc5@github.com> On Wed, 17 Jan 2024 20:16:49 GMT, Vladimir Kozlov wrote: >> Corner case with a local (not escaped) object used for synchronization. C2 Escape Analysis thinks that it can eliminate locks for it. In most cases it is true but not in this case. >> >> >> for (int i = 0; i < 2; ++i) { >> Object o = new Object(); >> synchronized (o) { // monitorenter >> // Trigger OSR compilation >> for (int j = 0; j < 100_000; ++j) { >> >> The test has nested loop which trigger OSR compilation. The locked object comes from Interpreter into compiled OSR code. During parsing C2 creates an other non escaped object and correctly merge both together (with Phi node) so that non escaped object is not scalar replaceable. Because it does not globally escapes EA still removes locks for it and, as result, also for merged locked object from Interpreter which is the bug. >> >> The fix is to check that synchronized block does not have any associated escaped objects when EA decides if locks can be eliminated. >> >> Added regression test prepared by @TobiHartmann. Tested tier1-5, xcomp and stress. >> Performance testing show no difference. > > Thank you, @TobiHartmann, for review. I addressed your comments. @vnkozlov sorry, I still have a hard time reasoning about the correctness of the proposed fix. It's not clear to me what "synchronized block does not have any associated escaped objects" means in practice and how it relates to the original problem. When does the situation with a single `BoxLock` shared between multiple `AbstractLock`s bug distinct `obj_node()` inputs occur? Does it only happen for matched `Lock`/`Unlock` node pairs? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17331#issuecomment-1897554128 From john.r.rose at oracle.com Thu Jan 18 00:34:46 2024 From: john.r.rose at oracle.com (John Rose) Date: Wed, 17 Jan 2024 16:34:46 -0800 Subject: RFR: 8293170: Improve encoding of the debuginfo nmethod section [v18] In-Reply-To: References: Message-ID: <8A50E600-BE1D-4F77-ADE4-0EFE9A2A88E2@oracle.com> Thanks for your patience, and for the useful conversations at JVMLS. I?ve been thinking for a while, ?what is the right way to suppress zeroes in Hotspot?s compressed metadata?? Also, ?what are good ways to encourage zeroes, if you know you can get rid of them in the end?? Over the break I implemented a zero-suppression scheme that integrates well with UNSIGNED5, and hunted around for use cases. I ended up with this, FTR. I?m not proposing it seriously yet, but I think it has some benefits. https://github.com/openjdk/jdk/pull/17474 On 22 Dec 2022, at 12:24, John R Rose wrote: > On Thu, 15 Dec 2022 13:51:45 GMT, Boris Ulasevich wrote: > >>> The nmethod "scopes data" section is 10% of the size of nmethod. Now the data is compressed using the Pack200 algorithm, which is good for encoding small integers (LineNumberTable, etc). Using the fact that half of the data in the partition contains zeros, I reduce its size by another 30%. >>> >>> Testing: jtreg hotspot&jdk, Renaissance benchmarks >> >> Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision: >> >> cleanup, rename and some testing > > P.S. One reason I know about the Capn Proto packing is as a candidate for fast streaming (de)compression of heap snapshots. We don't have that feature today, but may in the future for CDS and/or Leyden, and all of my arguments about using off-the-shelf techniques will apply there as well. > > ------------- > > PR: https://git.openjdk.org/jdk/pull/10025 From dcubed at openjdk.org Thu Jan 18 01:15:21 2024 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 18 Jan 2024 01:15:21 GMT Subject: RFR: 8324074: increase timeout for jvmci test TestResolvedJavaMethod.java Message-ID: A trivial fix to increase the default timeout for compiler/jvmci/jdk.vm.ci.runtime.test/src/jdk/vm/ci/runtime/test/TestResolvedJavaMethod.java to 240 seconds. ------------- Commit messages: - remove files.list.TestResolvedJavaMethod_timeout. - TestResolvedJavaMethod_timeout.patch.jdk22 Changes: https://git.openjdk.org/jdk/pull/17477/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17477&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8324074 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17477.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17477/head:pull/17477 PR: https://git.openjdk.org/jdk/pull/17477 From gcao at openjdk.org Thu Jan 18 02:31:20 2024 From: gcao at openjdk.org (Gui Cao) Date: Thu, 18 Jan 2024 02:31:20 GMT Subject: RFR: 8323694: RISC-V: Unnecessary ResourceMark in NativeCall::set_destination_mt_safe In-Reply-To: References: Message-ID: On Tue, 16 Jan 2024 02:29:42 GMT, Fei Yang wrote: >> Hi, We noticed that RISC-V bears a similar issue as: https://bugs.openjdk.org/browse/JDK-8323584. >> In `NativeCall::set_destination_mt_safe`, there is a `ResourceMark` that does not seem to have any purpose: no code in its scope uses resource allocation. >> >> ### Testing: >> >> - [x] Run tier1 tests on qemu 8.1.0 with UseRVV (fastdebug) >> - [x] Run tier1-3 tests on qemu 8.1.0 with UseRVV (release) > > Marked as reviewed by fyang (Reviewer). @RealFYang @robehn : Thanks for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17436#issuecomment-1897667723 From gcao at openjdk.org Thu Jan 18 02:31:22 2024 From: gcao at openjdk.org (Gui Cao) Date: Thu, 18 Jan 2024 02:31:22 GMT Subject: Integrated: 8323694: RISC-V: Unnecessary ResourceMark in NativeCall::set_destination_mt_safe In-Reply-To: References: Message-ID: On Tue, 16 Jan 2024 02:21:50 GMT, Gui Cao wrote: > Hi, We noticed that RISC-V bears a similar issue as: https://bugs.openjdk.org/browse/JDK-8323584. > In `NativeCall::set_destination_mt_safe`, there is a `ResourceMark` that does not seem to have any purpose: no code in its scope uses resource allocation. > > ### Testing: > > - [x] Run tier1 tests on qemu 8.1.0 with UseRVV (fastdebug) > - [x] Run tier1-3 tests on qemu 8.1.0 with UseRVV (release) This pull request has now been integrated. Changeset: ff8cc268 Author: Gui Cao Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/ff8cc268fdaaf85299c94088a226b73e7eaf6bdb Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod 8323694: RISC-V: Unnecessary ResourceMark in NativeCall::set_destination_mt_safe Reviewed-by: fyang, rehn ------------- PR: https://git.openjdk.org/jdk/pull/17436 From stuefe at openjdk.org Thu Jan 18 06:17:14 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 18 Jan 2024 06:17:14 GMT Subject: RFR: 8324074: increase timeout for jvmci test TestResolvedJavaMethod.java In-Reply-To: References: Message-ID: On Thu, 18 Jan 2024 01:09:33 GMT, Daniel D. Daugherty wrote: > A trivial fix to increase the default timeout for > compiler/jvmci/jdk.vm.ci.runtime.test/src/jdk/vm/ci/runtime/test/TestResolvedJavaMethod.java > to 240 seconds. > > See the bug report for the gory timeout details. okay and indeed trivial. ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17477#pullrequestreview-1828918513 From thartmann at openjdk.org Thu Jan 18 06:17:15 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 18 Jan 2024 06:17:15 GMT Subject: RFR: 8324074: increase timeout for jvmci test TestResolvedJavaMethod.java In-Reply-To: References: Message-ID: On Thu, 18 Jan 2024 01:09:33 GMT, Daniel D. Daugherty wrote: > A trivial fix to increase the default timeout for > compiler/jvmci/jdk.vm.ci.runtime.test/src/jdk/vm/ci/runtime/test/TestResolvedJavaMethod.java > to 240 seconds. > > See the bug report for the gory timeout details. Looks good and trivial. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17477#pullrequestreview-1828920062 From yyang at openjdk.org Thu Jan 18 06:27:14 2024 From: yyang at openjdk.org (Yi Yang) Date: Thu, 18 Jan 2024 06:27:14 GMT Subject: RFR: 8323795: jcmd Compiler.codecache should print total size of code cache [v2] In-Reply-To: References: Message-ID: On Wed, 17 Jan 2024 06:56:26 GMT, Tobias Hartmann wrote: >> Yi Yang has updated the pull request incrementally with one additional commit since the last revision: >> >> new output && fix test > > The title is confusing. Should it be something like "jcmd Compiler.codecache should print total size of code cache"? @TobiHartmann @vnkozlov In the first commit, the ouput is confusing(CodeCache and Total CodeHeap) if SegmentedCodeCache is turned off, i.e. CodeCache: size=118592Kb used=29Kb max_used=29Kb free=118562Kb bounds [0x00007fbe84622000, 0x00007fbe84892000, 0x00007fbe8b9f2000] Total CodeHeap: size=118592Kb used=29Kb max_used=29Kb free=118562Kb total_blobs=474 nmethods=87 adapters=293 compilation: enabled stopped_count=0, restarted_count=0 full_count=0 So I made new change to -SegmentedCodeCache CodeCache: size=245760Kb used=1366Kb max_used=1943Kb free=244393Kb bounds [0x00007fdcc89f2000, 0x00007fdcc8c62000, 0x00007fdcd79f2000] total_blobs=474, nmethods=87, adapters=293 stopped_count=0, restarted_count=0, full_count=0 compilation=enabled +SegmentedCodeCache CodeHeap 'non-profiled nmethods': size=118592Kb used=29Kb max_used=29Kb free=118562Kb bounds [0x00007f89c8622000, 0x00007f89c8892000, 0x00007f89cf9f2000] CodeHeap 'profiled nmethods': size=118588Kb used=80Kb max_used=80Kb free=118507Kb bounds [0x00007f89c09f2000, 0x00007f89c0c62000, 0x00007f89c7dc1000] CodeHeap 'non-nmethods': size=8580Kb used=1258Kb max_used=1834Kb free=7321Kb bounds [0x00007f89c7dc1000, 0x00007f89c8031000, 0x00007f89c8622000] CodeCache: size=245760Kb, used=1367Kb, max_used=1943Kb, free=244390Kb total_blobs=474, nmethods=87, adapters=293 stopped_count=0, restarted_count=0, full_count=0 compilation=enabled ------------- PR Comment: https://git.openjdk.org/jdk/pull/17445#issuecomment-1897875983 From chagedorn at openjdk.org Thu Jan 18 07:08:18 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 18 Jan 2024 07:08:18 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v2] In-Reply-To: <1ys6VKnk5FRHBGmMwK10_yUNjZuqcvIO7Tk6iU8hboQ=.5d0f54e9-8263-4636-aba6-b528482a4c5a@github.com> References: <1ys6VKnk5FRHBGmMwK10_yUNjZuqcvIO7Tk6iU8hboQ=.5d0f54e9-8263-4636-aba6-b528482a4c5a@github.com> Message-ID: On Wed, 17 Jan 2024 15:06:55 GMT, Daniel Lund?n wrote: >> test/hotspot/jtreg/compiler/c2/cr7200264/TestIntVect.java line 155: >> >>> 153: test_addc(a0, a1); >>> 154: for (int i=0; i>> 155: errn += verify("test_addc: ", i, a0[i], (int)((int)(ADD_INIT+i)+VALUE)); >> >> I suggest to either directly use: >> >> Asserts.assertEQ(a0[i], (int)((int)(ADD_INIT+i)+VALUE), "test_addc failed at a0[" + i + "]"); >> >> Or change `verify()` such that it uses `Asserts.assertEQ()` (just an example and could also be adjusted): >> >> static int verify(String text, int i, int elem, int val) { >> Asserts.assertEQ(elem, val, text + " failed at a0[" + i + "]"). >> } > > When translating the test, I focused only on translating the old ad-hoc IR tests to the IR verification framework. Do we want to extend the scope of this issue to also update the testing code? Mainly, the reason I'm asking is that copies of `TestIntVect.java` appears in many places (e.g., `test/compiler/6340864/TestIntVect.java`), and the testing pattern in particular appears in many places (grep for, e.g., `errn += verify(`). If we update the pattern here, we should probably also update it everywhere else? Good point, I guess it's okay then to leave this code as as it is. It just looked odd to do correctness testing the way it does. But as you say, if you change that you probably also need to revisit other tests. That's probably not worth it. The main goal should be to introduce IR matching with the IR framework. So, I'm fine with not touching that code or the print statements. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17428#discussion_r1457015405 From chagedorn at openjdk.org Thu Jan 18 07:12:16 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 18 Jan 2024 07:12:16 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v2] In-Reply-To: References: Message-ID: On Wed, 17 Jan 2024 15:00:04 GMT, Daniel Lund?n wrote: >> test/hotspot/jtreg/compiler/c2/cr7200264/TestIntVect.java line 510: >> >>> 508: } >>> 509: >>> 510: void test_divc(int[] a0, int[] a1) { >> >> What about these tests without IR verification? Are they expected to be not vectorized? > > That's a good question. I based the current `@IR` verifications on the vector nodes checked in the original test, and the `test_*` functions that do not currently have `@IR` annotations are those that do not result in any of the vector operations part of the original test. > > I additionally checked a few of these functions in IGV, and none of them resulted in any vector operations. I guess we can remove them altogether, unless they are actually supposed to generate vector operations. Thanks for checking it. Maybe @eme64 can also double check if these methods are expected to fail vectorization or if there are some missing optimization opportunities. If the methods cannot be vectorized and we do not want to follow up on them, then I guess both is fine: leaving them in or removing them. Since it's an existing test it might be better though to keep the code as it previously was. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17428#discussion_r1457017411 From fyang at openjdk.org Thu Jan 18 07:13:15 2024 From: fyang at openjdk.org (Fei Yang) Date: Thu, 18 Jan 2024 07:13:15 GMT Subject: RFR: JDK-8318228: RISC-V: C2 ConvF2HF In-Reply-To: References: Message-ID: On Tue, 16 Jan 2024 18:43:03 GMT, Hamlin Li wrote: > Hi, > Can you review the patch to add ConvF2HF intrinsic to JDK for riscv? > Thanks! > > ## Test > ### Functionality > #### hotspot tests > test/hotspot/jtreg/compiler/intrinsics/ > test/hotspot/jtreg/compiler/c2/irTests > > #### jdk tests > test/jdk/java/lang/Float/Binary16Conversion*.java > > ### Performance > tested on licheepi. > > #### with UseZfh enabled > > Benchmark (size) Mode Cnt Score Error Units > Fp16ConversionBenchmark.floatToFloat16 2048 avgt 2 4170.549 ns/op > Fp16ConversionBenchmark.floatToFloat16Memory 2048 avgt 2 21.492 ns/op > > > #### with UseZfh disabled > (i.e. disable the intrinsic) > > Benchmark (size) Mode Cnt Score Error Units > Fp16ConversionBenchmark.floatToFloat16 2048 avgt 2 25036.647 ns/op > Fp16ConversionBenchmark.floatToFloat16Memory 2048 avgt 2 27.326 ns/op Hi, Thanks for this change. I have a small question. src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1842: > 1840: > 1841: // preserve the payloads of non-canonical NaNs. > 1842: __ srai(dst, dst, 13); I see the lowest 13 bits of the payload for `src` is simply discarded here. But these bits are also used for calculating the new significand bits for float16 [1]. So this doesn't seem OK to me. Did I miss anything? [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/Float.java#L1112-L1113 ------------- PR Review: https://git.openjdk.org/jdk/pull/17450#pullrequestreview-1828999418 PR Review Comment: https://git.openjdk.org/jdk/pull/17450#discussion_r1457016454 From epeter at openjdk.org Thu Jan 18 08:29:13 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 18 Jan 2024 08:29:13 GMT Subject: RFR: 8323220: Reassociate loop invariants involved in Cmps and Add/Subs [v2] In-Reply-To: References: Message-ID: On Wed, 17 Jan 2024 19:42:23 GMT, Joshua Cao wrote: >> // inv1 == (x + inv2) => ( inv1 - inv2 ) == x >> // inv1 == (x - inv2) => ( inv1 + inv2 ) == x >> // inv1 == (inv2 - x) => (-inv1 + inv2 ) == x >> >> >> For example, >> >> >> fn(inv1, inv2) >> while(...) >> x = foobar() >> if inv1 == x + inv2 >> blackhole() >> >> >> We can transform this into >> >> >> fn(inv1, inv2) >> t = inv1 - inv2 >> while(...) >> x = foobar() >> if t == x >> blackhole() >> >> >> Here is an example: https://github.com/openjdk/jdk/blob/b78896b9aafcb15f453eaed6e154a5461581407b/src/java.base/share/classes/java/lang/invoke/LambdaFormEditor.java#L910. LHS `1` and RHS `pos` are both loop invariant >> >> Passes tier1 locally on Linux machine. Passes GHA on my fork. > > Joshua Cao has updated the pull request incrementally with one additional commit since the last revision: > > Formatting and comments Thanks for the update, looks better already! I'm still waiting for the test with random/edge-case values, and then I can submit this for testing :) ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17375#pullrequestreview-1829107633 From epeter at openjdk.org Thu Jan 18 08:29:17 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 18 Jan 2024 08:29:17 GMT Subject: RFR: 8323220: Reassociate loop invariants involved in Cmps and Add/Subs [v2] In-Reply-To: References: Message-ID: On Wed, 17 Jan 2024 19:39:02 GMT, Joshua Cao wrote: >> src/hotspot/share/opto/loopTransform.cpp line 345: >> >>> 343: bool neg_inv1 = (n1->is_Sub() && !n1->is_Cmp() && inv1_idx == 2) || >>> 344: (n1->is_Cmp() && inv2_idx == 1 && n2->is_Sub()); >>> 345: if (n1->is_Sub() && !n1->is_Cmp() && inv1_idx == 1) { >> >> Would you mind adding some comments for this logic? > > I added a little comment block. Not sure how useful it is. I agree the code is hard to follow. Before all the changes for `Cmp`, I was able to make sense of it by following the comments at the top of the function. Hmm, how about using these variables? bool n1_is_add = n1->is_Add(); bool n1_is_sub = n1->is_Sub() && !n1->is_Cmp(); bool n1_is_cmp = n1->is_Cmp(); Then you can just comment this: `// Determine whether x, inv1, or inv2 should be negative in the transformed expression` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17375#discussion_r1457082435 From epeter at openjdk.org Thu Jan 18 08:29:17 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 18 Jan 2024 08:29:17 GMT Subject: RFR: 8323220: Reassociate loop invariants involved in Cmps and Add/Subs [v2] In-Reply-To: References: <-5JZDSqyvX6C2dOKIogkE4BKSD594q1RGX3POS4HnTQ=.4b4d01ed-de2d-4ea8-abc3-32e4ee53d5f2@github.com> Message-ID: On Wed, 17 Jan 2024 19:28:36 GMT, Joshua Cao wrote: >> Another suboptimal idea: you wrap the add / sub in a method, and then ensure that this method is inlined. It might still keep the annotation of being part of that inner method, and you could use regex to check for it. >> >> Or maybe we could also have some sort of relative line offset mechanism in the IR framework, that allows you to specify that you want something that is let's say 7 lines down from the IR rule. > > Some sort of pattern matcher could work. It would be able nice to match something like `a ADD_I b CMP_LT c`. In java this could look something like > > > @IR(counts = {IRNode.CMP_LT[IRNode.ANY, IRNode.SUB_I, IRNode.ANY], "1"} > > > The arguments in the `[]` are the inputs. `IRNode.ANY` matches any node. (The zero'th node is ANY because its the region node). > > Anyway, I think a `lt` test is not super-required for the coverage for this PR. The current machinery does not provide a convenient way to test it. I'd prefer to avoid something hacky. I think this work can be done separately. I agree with you there, don't do anything hacky here. But yes, I've also been wondering what kind of improvements to the IR framework would help us to do these sorts of graph-matching verifications. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17375#discussion_r1457085232 From tholenstein at openjdk.org Thu Jan 18 09:05:20 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 18 Jan 2024 09:05:20 GMT Subject: RFR: JDK-8210858: AArch64: remove Math.log intrinsic Message-ID: <7Zgt8VOxl48Niocueb70XwxmYIAIaEZUlxKGvPP6BnU=.e354f71c-d6c6-40b9-a93e-98ef7ba8009d@github.com> [JDK-8215133](https://bugs.openjdk.org/browse/JDK-8215133) disabled vmIntrinsics::_dlog. Remove it now ### Why remove That Java specification says: "The computed result must be within 1 ulp of the exact result. Results must be semi-monotonic... whenever the mathematical function is non-decreasing, so is the floating-point approximation, likewise, whenever the mathematical function is non-increasing, so is the floating-point approximation" There is no proof of the monotonicity of this intrinsics at the moment. ------------- Commit messages: - remove unused intrinsic logic on C1 - JDK-8210858: AArch64: remove Math.log intrinsic Changes: https://git.openjdk.org/jdk/pull/17480/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17480&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8210858 Stats: 15 lines in 2 files changed: 0 ins; 11 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/17480.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17480/head:pull/17480 PR: https://git.openjdk.org/jdk/pull/17480 From ngasson at openjdk.org Thu Jan 18 09:29:20 2024 From: ngasson at openjdk.org (Nick Gasson) Date: Thu, 18 Jan 2024 09:29:20 GMT Subject: RFR: JDK-8210858: AArch64: remove Math.log intrinsic In-Reply-To: <7Zgt8VOxl48Niocueb70XwxmYIAIaEZUlxKGvPP6BnU=.e354f71c-d6c6-40b9-a93e-98ef7ba8009d@github.com> References: <7Zgt8VOxl48Niocueb70XwxmYIAIaEZUlxKGvPP6BnU=.e354f71c-d6c6-40b9-a93e-98ef7ba8009d@github.com> Message-ID: <2oiPyet4xf0IemXgcS0w_Fkk6AJ91dofrGcQBhxmsZU=.fd0af5b5-8d83-4dc3-96f7-b8608fd9bd87@github.com> On Thu, 18 Jan 2024 08:58:20 GMT, Tobias Holenstein wrote: > [JDK-8215133](https://bugs.openjdk.org/browse/JDK-8215133) disabled vmIntrinsics::_dlog. Remove it now > > ### Why remove > > That Java specification says: > > "The computed result must be within 1 ulp of the exact result. Results must be semi-monotonic... whenever the mathematical function is non-decreasing, so is the floating-point approximation, likewise, whenever the mathematical function is non-increasing, so is the floating-point approximation" > > There is no proof of the monotonicity of this intrinsics at the moment. Should we also remove `MacroAssembler::fast_log()` and `generate_dlog()` as they are unused now? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17480#issuecomment-1898103477 From roland at openjdk.org Thu Jan 18 09:32:17 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 18 Jan 2024 09:32:17 GMT Subject: RFR: 8320237 C2: late inlining of method handle invoke causes duplicate lines in PrintInlining output [v2] In-Reply-To: References: Message-ID: On Wed, 10 Jan 2024 16:37:44 GMT, Kangcheng Xu wrote: >> This PR resolves [JDK-8320237](https://bugs.openjdk.org/browse/JDK-8320237) >> >> The original behavior produces both a failure and success message upon late inlining which is confusing. The patch removes the failure message if inlining was successful. Huge thanks to @rwestrel for reporting and working out a solution. >> >> Unit test `test/hotspot/jtreg/compiler/inlining/TestDuplicatedLateInliningOutput.java` is added and passing. > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > fix VM crashes I think the subprocess needs to be run with `-XX:-BackgroundCompilation` otherwise there's a chance it completes before the compilation finishes and the print inlining output is produced. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17147#issuecomment-1898109885 From aph-open at littlepinkcloud.com Thu Jan 18 09:34:26 2024 From: aph-open at littlepinkcloud.com (Andrew Haley) Date: Thu, 18 Jan 2024 09:34:26 +0000 Subject: RFR: JDK-8210858: AArch64: remove Math.log intrinsic In-Reply-To: <2oiPyet4xf0IemXgcS0w_Fkk6AJ91dofrGcQBhxmsZU=.fd0af5b5-8d83-4dc3-96f7-b8608fd9bd87@github.com> References: <7Zgt8VOxl48Niocueb70XwxmYIAIaEZUlxKGvPP6BnU=.e354f71c-d6c6-40b9-a93e-98ef7ba8009d@github.com> <2oiPyet4xf0IemXgcS0w_Fkk6AJ91dofrGcQBhxmsZU=.fd0af5b5-8d83-4dc3-96f7-b8608fd9bd87@github.com> Message-ID: <86c4f9e6-fadc-4699-92e6-c292c5bbe412@littlepinkcloud.com> On 1/18/24 09:29, Nick Gasson wrote: > Should we also remove `MacroAssembler::fast_log()` and `generate_dlog()` as they are unused now? Yes, we shouldn't have code with no prospect of being used. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From amitkumar at openjdk.org Thu Jan 18 09:54:19 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 18 Jan 2024 09:54:19 GMT Subject: RFR: 8322649: Improve class initialization barrier in TemplateTable::_new for S390 Message-ID: s390 Port implementation for https://github.com/openjdk/jdk/pull/17006, Testing: Build: fastdebug + release Test: Tier1 {fastdebug} ------------- Commit messages: - s390 port Changes: https://git.openjdk.org/jdk/pull/17481/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17481&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8322649 Stats: 12 lines in 1 file changed: 0 ins; 5 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/17481.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17481/head:pull/17481 PR: https://git.openjdk.org/jdk/pull/17481 From dlunden at openjdk.org Thu Jan 18 10:07:15 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 18 Jan 2024 10:07:15 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v2] In-Reply-To: References: <1ys6VKnk5FRHBGmMwK10_yUNjZuqcvIO7Tk6iU8hboQ=.5d0f54e9-8263-4636-aba6-b528482a4c5a@github.com> Message-ID: On Thu, 18 Jan 2024 07:06:00 GMT, Christian Hagedorn wrote: >> When translating the test, I focused only on translating the old ad-hoc IR tests to the IR verification framework. Do we want to extend the scope of this issue to also update the testing code? Mainly, the reason I'm asking is that copies of `TestIntVect.java` appears in many places (e.g., `test/compiler/6340864/TestIntVect.java`), and the testing pattern in particular appears in many places (grep for, e.g., `errn += verify(`). If we update the pattern here, we should probably also update it everywhere else? > > Good point, I guess it's okay then to leave this code as as it is. It just looked odd to do correctness testing the way it does. But as you say, if you change that you probably also need to revisit other tests. That's probably not worth it. The main goal should be to introduce IR matching with the IR framework. So, I'm fine with not touching that code or the print statements. Would it perhaps be useful to create a separate RFE for your cleanup suggestions? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17428#discussion_r1457208944 From wzhuo at openjdk.org Thu Jan 18 10:09:23 2024 From: wzhuo at openjdk.org (Wang Zhuo) Date: Thu, 18 Jan 2024 10:09:23 GMT Subject: RFR: 8324123: aarch64: fix prfm literal encoding in assembler Message-ID: Current prfm literal mode encoding in aarch64 assembler is not correct. The prfm_literal instruction requires 31 and 30 bits to be 0x11, while current assembler encodes the two bits to be 0x11, which is a ldr instruction, not prfm. For example, if adding the following code in stubGenerator __ prfm(Address(__ pc())) we get a ldr instruction like ldr x0, 0x0000ffff83f8539c but it should be a prfm instruction like prfm pldl1keep, 0x0000ffff8ff8539c The bug is caused in ld_st2, literal mode, bit 31 and 30 bits are set to (size & 0b01), while for prfm instructions, 31 and 30 bits must be 0b11. void ld_st2(Register Rt, const Address &adr, int size, int op, int V = 0) { starti; f(V, 26); // general reg? zrf(Rt, 0); // Encoding for literal loads is done here (rather than pushed // down into Address::encode) because the encoding of this // instruction is too different from all of the other forms to // make it worth sharing. if (adr.getMode() == Address::literal) { assert(size == 0b10 || size == 0b11, "bad operand size in ldr"); assert(op == 0b01, "literal form can only be used with loads"); f(**size & 0b01, 31, 30**), f(0b011, 29, 27), f(0b00, 25, 24); int64_t offset = (adr.target() - pc()) >> 2; sf(offset, 23, 5); code_section()->relocate(pc(), adr.rspec()); return; } f(size, 31, 30); f(op, 23, 22); // str adr.encode(¤t_insn); } ------------- Commit messages: - 8324123: aarch64: fix prfm literal encoding in assembler Changes: https://git.openjdk.org/jdk/pull/17482/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17482&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8324123 Stats: 9 lines in 1 file changed: 8 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17482.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17482/head:pull/17482 PR: https://git.openjdk.org/jdk/pull/17482 From tholenstein at openjdk.org Thu Jan 18 10:28:23 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 18 Jan 2024 10:28:23 GMT Subject: RFR: JDK-8321984: IGV: Upgrade to Netbeans Platform 20 [v2] In-Reply-To: References: Message-ID: > Upgraded IGV and dependencies to the newest Netbeans Platform 20 which was released on December 2023. > > Tested that IGV still behaves as expected after the upgrade. Tobias Holenstein has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains one additional commit since the last revision: JDK-8321984: IGV: Upgrade to Netbeans Platform 20 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17106/files - new: https://git.openjdk.org/jdk/pull/17106/files/feb25a24..b64adb13 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17106&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17106&range=00-01 Stats: 41910 lines in 1294 files changed: 25389 ins; 10902 del; 5619 mod Patch: https://git.openjdk.org/jdk/pull/17106.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17106/head:pull/17106 PR: https://git.openjdk.org/jdk/pull/17106 From tholenstein at openjdk.org Thu Jan 18 10:34:26 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 18 Jan 2024 10:34:26 GMT Subject: RFR: JDK-8321984: IGV: Upgrade to Netbeans Platform 20 [v3] In-Reply-To: References: Message-ID: > Upgraded IGV and dependencies to the newest Netbeans Platform 20 which was released on December 2023. > > Tested that IGV still behaves as expected after the upgrade. Tobias Holenstein has updated the pull request incrementally with two additional commits since the last revision: - remove requirement for nashorn - make IGV build work with mainline JDK version ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17106/files - new: https://git.openjdk.org/jdk/pull/17106/files/b64adb13..f986934d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17106&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17106&range=01-02 Stats: 146 lines in 2 files changed: 0 ins; 143 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/17106.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17106/head:pull/17106 PR: https://git.openjdk.org/jdk/pull/17106 From epeter at openjdk.org Thu Jan 18 10:45:32 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 18 Jan 2024 10:45:32 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v4] In-Reply-To: References: Message-ID: On Thu, 4 Jan 2024 08:12:22 GMT, Roland Westrelin wrote: >> This change implements C2 optimizations for calls to >> ScopedValue.get(). Indeed, in: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> `v2` can be replaced by `v1` and the second call to `get()` can be >> optimized out. That's true whatever is between the 2 calls unless a >> new mapping for `scopedValue` is created in between (when that happens >> no optimizations is performed for the method being compiled). Hoisting >> a `get()` call out of loop for a loop invariant `scopedValue` should >> also be legal in most cases. >> >> `ScopedValue.get()` is implemented in java code as a 2 step process. A >> cache is attached to the current thread object. If the `ScopedValue` >> object is in the cache then the result from `get()` is read from >> there. Otherwise a slow call is performed that also inserts the >> mapping in the cache. The cache itself is lazily allocated. One >> `ScopedValue` can be hashed to 2 different indexes in the cache. On a >> cache probe, both indexes are checked. As a consequence, the process >> of probing the cache is a multi step process (check if the cache is >> present, check first index, check second index if first index >> failed). If the cache is populated early on, then when the method that >> calls `ScopedValue.get()` is compiled, profile reports the slow path >> as never taken and only the read from the cache is compiled. >> >> To perform the optimizations, I added 3 new node types to C2: >> >> - the pair >> ScopedValueGetHitsInCacheNode/ScopedValueGetLoadFromCacheNode for >> the cache probe >> >> - a cfg node ScopedValueGetResultNode to help locate the result of the >> `get()` call in the IR graph. >> >> In pseudo code, once the nodes are inserted, the code of a `get()` is: >> >> >> hits_in_the_cache = ScopedValueGetHitsInCache(scopedValue) >> if (hits_in_the_cache) { >> res = ScopedValueGetLoadFromCache(hits_in_the_cache); >> } else { >> res = ..; //slow call possibly inlined. Subgraph can be arbitray complex >> } >> res = ScopedValueGetResult(res) >> >> >> In the snippet: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> Replacing `v2` by `v1` is then done by starting from the >> `ScopedValueGetResult` node for the second `get()` and looking for a >> dominating `ScopedValueGetResult` for the same `ScopedValue` >> object. When one is found, it is used as a replacement. Eliminating >> the second `get()` call is achieved by making >> `ScopedValueGetHitsInCache` always successful if there's a dominating >> `Scoped... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > merge fix Path 1 of today. src/hotspot/share/opto/callGenerator.cpp line 872: > 870: Node* second_index = nullptr; // index in the cache for second hash > 871: CallStaticJavaNode* slow_call = nullptr; // slowGet() call if any > 872: { This setup really looks like it should be a class, maybe called `ScopedValueGetPatternMatcher`? All your variables here could be fields, and the scope below a method, or even split into multiple methods. src/hotspot/share/opto/loopPredicate.cpp line 1572: > 1570: > 1571: > 1572: bool PhaseIdealLoop::loop_predication_for_scoped_value_get(IdealLoopTree* loop, IfProjNode* if_success_proj, Add a short comment above, that we are trying to hoist the `If` for a `ScopedValueGetHitsInCache` out of the loop, if possible. You have one on the first line, but I think generally we place them at the top, right? src/hotspot/share/opto/loopPredicate.cpp line 1575: > 1573: ParsePredicateSuccessProj* parse_predicate_proj, > 1574: Invariance &invar, Deoptimization::DeoptReason reason, > 1575: IfNode* iff, IfProjNode*&new_predicate_proj) { Suggestion: IfNode* iff, IfProjNode* &new_predicate_proj) { src/hotspot/share/opto/loopPredicate.cpp line 1579: > 1577: BoolNode* bol = iff->in(1)->as_Bool(); > 1578: if (bol->in(1)->Opcode() == Op_ScopedValueGetHitsInCache && invar.is_invariant(((ScopedValueGetHitsInCacheNode*)bol->in(1))->scoped_value()) && > 1579: invar.is_invariant(((ScopedValueGetHitsInCacheNode*)bol->in(1))->index1()) && invar.is_invariant(((ScopedValueGetHitsInCacheNode*)bol->in(1))->index2())) { Please refactor this if: - if we don't take it, you return false, so make it a bailout. This allows you to already bailout if `bol->in(1)->Opcode() == Op_ScopedValueGetHitsInCache` fails. - After that check, you can already have this line: `ScopedValueGetHitsInCacheNode* hits_in_the_cache = (ScopedValueGetHitsInCacheNode*) bol->in(1);`, and simplify the other 3 conditions of the if, make it more readable. - But make them bailouts as well, instead of indenting the rest of the function. src/hotspot/share/opto/loopPredicate.cpp line 1648: > 1646: tty->print("Predicate invariant if: %d ", new_predicate_iff->_idx); > 1647: loop->dump_head(); > 1648: } else if (TraceLoopOpts) { Why not have them as separate ifs? What if someone enables both, will they not miss a line? src/hotspot/share/opto/loopnode.cpp line 4717: > 4715: assert(!_igvn.delay_transform(), ""); > 4716: _igvn.set_delay_transform(true); > 4717: for (uint i = _scoped_value_get_nodes.size(); i > 0; i--) { Suggestion: for (uint i = _scoped_value_get_nodes.size()-1; i >= 0; i--) { src/hotspot/share/opto/loopnode.cpp line 4718: > 4716: _igvn.set_delay_transform(true); > 4717: for (uint i = _scoped_value_get_nodes.size(); i > 0; i--) { > 4718: Node* n = _scoped_value_get_nodes.at(i - 1); Suggestion: Node* n = _scoped_value_get_nodes.at(i); src/hotspot/share/opto/loopnode.cpp line 4720: > 4718: Node* n = _scoped_value_get_nodes.at(i - 1); > 4719: if (n->Opcode() == Op_ScopedValueGetResult) { > 4720: // Remove the ScopedValueGetResult entirely Suggestion: // Remove the ScopedValueGetResult and (its projections) entirely src/hotspot/share/opto/loopnode.cpp line 4722: > 4720: // Remove the ScopedValueGetResult entirely > 4721: ScopedValueGetResultNode* get_result = (ScopedValueGetResultNode*) n; > 4722: Node* result_out = get_result->result_out(); Suggestion: ProjNode* result_out_proj = get_result->result_out(); knowing it is the projection helps understand that this is not a use of the result, but just the projection, which in turn has the uses below it. src/hotspot/share/opto/loopnode.cpp line 4725: > 4723: Node* result_in = get_result->in(ScopedValueGetResultNode::GetResult); > 4724: if (result_out != nullptr) { > 4725: _igvn.replace_node(result_out, result_in); Suggestion: _igvn.replace_node(result_out_proj, result_in); Otherwise it has me wondering why you can replace the use here, if I don't know it is a projection. src/hotspot/share/opto/loopnode.cpp line 4731: > 4729: lazy_replace(get_result->control_out(), get_result->in(ScopedValueGetResultNode::Control)); > 4730: progress = true; > 4731: remove_scoped_value_get_at(i-1); Suggestion: remove_scoped_value_get_at(i); src/hotspot/share/opto/loopnode.cpp line 4734: > 4732: } > 4733: } > 4734: while (_scoped_value_get_nodes.size() > 0) { Add a comment about why we need a separate loop here. src/hotspot/share/opto/loopnode.cpp line 4737: > 4735: Node* n = _scoped_value_get_nodes.pop(); > 4736: assert (n->Opcode() == Op_ScopedValueGetHitsInCache, ""); > 4737: ScopedValueGetHitsInCacheNode* get_from_cache = (ScopedValueGetHitsInCacheNode*) n; It would be nice to have some variable consistency. Elsewhere you use `sv_hits_in_cache`. The name here suggest that this node actually "gets" something from the cache, but it only checks if we have a hit, and another node does the "getting", right? src/hotspot/share/opto/loopnode.cpp line 4745: > 4743: } > 4744: > 4745: void PhaseIdealLoop::expand_get_from_sv_cache(ScopedValueGetHitsInCacheNode* get_from_cache) { Suggestion: void PhaseIdealLoop::expand_sv_get_hits_in_cache_and_load_from_cache(ScopedValueGetHitsInCacheNode* get_from_cache) { src/hotspot/share/opto/loopnode.cpp line 4752: > 4750: assert(u->is_Bool() || u->Opcode() == Op_ScopedValueGetLoadFromCache, ""); > 4751: } > 4752: #endif Why not make this an enhanced version of `ScopedValueGetHitsInCacheNode::verify`? src/hotspot/share/opto/loopnode.cpp line 4758: > 4756: ProjNode* success = iff->proj_out(1); > 4757: ProjNode* failure = iff->proj_out(0); > 4758: Suggestion: src/hotspot/share/opto/loopnode.cpp line 4760: > 4758: > 4759: > 4760: ScopedValueGetLoadFromCacheNode* load_from_cache = (ScopedValueGetLoadFromCacheNode*)success->find_unique_out_with(Op_ScopedValueGetLoadFromCache); Would it not be nice if `find_unique_out_with` already casted the type? src/hotspot/share/opto/loopnode.cpp line 4767: > 4765: Node* second_index = get_from_cache->index2(); > 4766: > 4767: if (first_index == C->top() && second_index == C->top()) { could this not be done during igvn? src/hotspot/share/opto/loopnode.cpp line 4772: > 4770: _igvn.replace_input_of(iff, 1, zero); > 4771: _igvn.replace_node(get_from_cache, C->top()); > 4772: Suggestion: src/hotspot/share/opto/loopnode.cpp line 4776: > 4774: } > 4775: > 4776: Node* load_of_cache = get_from_cache->in(1); Suggestion: Node* cache_adr = get_from_cache->in(1); src/hotspot/share/opto/loopnode.cpp line 5027: > 5025: } > 5026: > 5027: void PhaseIdealLoop::remove_scoped_value_get_at(uint i) { Should we not have this functionality at the level of `Node_List`? `GrowableArray` also has an order-preserving and a non-preserving method (delete/remove). src/hotspot/share/opto/node.cpp line 977: > 975: } > 976: > 977: Node* Node::find_unique_out_with(int opcode) const { Random idea: Would it not be nice if this method automatically casted the node to that node-class? Suggestions: - using templates: give the class name and the opcode. A bit annoying to use - using macros: give it the node-type name: i.e. `Add` for `AddNode`. The macro then uses the template, filling in `AddNode` and `Op_Add`. What do you think? src/hotspot/share/opto/subnode.hpp line 300: > 298: }; > 299: > 300: // Does a ScopedValue.get() hits in the cache? I finally reconstructed what this node is, and please add a comment saying something like this: This node returns true iff this gets us a cache hit (cache reference not null, and at least one of the indices leads to a hit). It is essencially a Cmp, comparing the `cache_adr` (you name it scoped_value_cache) with a nullptr. But it also gets 2 indices to that cache (ints), which will either score a hit or miss. src/hotspot/share/opto/subnode.hpp line 340: > 338: > 339: Node* mem() const { > 340: return in(Memory); Why not verify that this is a `MemNode`? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16966#pullrequestreview-1829122181 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457091960 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457097392 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457106474 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457105820 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457116768 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457126784 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457126959 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457133264 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457134974 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457136244 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457127201 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457137992 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457168449 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457219598 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457189722 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457196100 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457204400 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457255333 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457226374 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457244894 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457163283 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457203842 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457213443 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457251863 From epeter at openjdk.org Thu Jan 18 10:45:34 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 18 Jan 2024 10:45:34 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v4] In-Reply-To: References: Message-ID: On Thu, 18 Jan 2024 08:48:37 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/loopPredicate.cpp line 1572: >> >>> 1570: >>> 1571: >>> 1572: bool PhaseIdealLoop::loop_predication_for_scoped_value_get(IdealLoopTree* loop, IfProjNode* if_success_proj, >> >> Add a short comment above, that we are trying to hoist the `If` for a `ScopedValueGetHitsInCache` out of the loop, if possible. >> You have one on the first line, but I think generally we place them at the top, right? > > Well, now looking at the method... maybe a longer comment with a picture or pseudocode would be helpful. > It would greatly help me in reviewing the code - otherwise I basically have to draw the picture on a piece of paper myself before understanding it ;) I like it the most when there is ascii art that uses the variable names in the code below. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457118466 From epeter at openjdk.org Thu Jan 18 10:45:34 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 18 Jan 2024 10:45:34 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v4] In-Reply-To: References: Message-ID: On Thu, 18 Jan 2024 08:36:58 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> merge fix > > src/hotspot/share/opto/loopPredicate.cpp line 1572: > >> 1570: >> 1571: >> 1572: bool PhaseIdealLoop::loop_predication_for_scoped_value_get(IdealLoopTree* loop, IfProjNode* if_success_proj, > > Add a short comment above, that we are trying to hoist the `If` for a `ScopedValueGetHitsInCache` out of the loop, if possible. > You have one on the first line, but I think generally we place them at the top, right? Well, now looking at the method... maybe a longer comment with a picture or pseudocode would be helpful. It would greatly help me in reviewing the code - otherwise I basically have to draw the picture on a piece of paper myself before understanding it ;) > src/hotspot/share/opto/loopPredicate.cpp line 1579: > >> 1577: BoolNode* bol = iff->in(1)->as_Bool(); >> 1578: if (bol->in(1)->Opcode() == Op_ScopedValueGetHitsInCache && invar.is_invariant(((ScopedValueGetHitsInCacheNode*)bol->in(1))->scoped_value()) && >> 1579: invar.is_invariant(((ScopedValueGetHitsInCacheNode*)bol->in(1))->index1()) && invar.is_invariant(((ScopedValueGetHitsInCacheNode*)bol->in(1))->index2())) { > > Please refactor this if: > - if we don't take it, you return false, so make it a bailout. This allows you to already bailout if `bol->in(1)->Opcode() == Op_ScopedValueGetHitsInCache` fails. > - After that check, you can already have this line: `ScopedValueGetHitsInCacheNode* hits_in_the_cache = (ScopedValueGetHitsInCacheNode*) bol->in(1);`, and simplify the other 3 conditions of the if, make it more readable. > - But make them bailouts as well, instead of indenting the rest of the function. I don't really know how `is_invar` works. But why is a use-node not automatically variant, if a def-node is variant. Or stated in other terms: why not just check `invar.is_invariant(hits_in_the_cache)`? > src/hotspot/share/opto/loopnode.cpp line 4737: > >> 4735: Node* n = _scoped_value_get_nodes.pop(); >> 4736: assert (n->Opcode() == Op_ScopedValueGetHitsInCache, ""); >> 4737: ScopedValueGetHitsInCacheNode* get_from_cache = (ScopedValueGetHitsInCacheNode*) n; > > It would be nice to have some variable consistency. Elsewhere you use `sv_hits_in_cache`. > The name here suggest that this node actually "gets" something from the cache, but it only checks if we have a hit, and another node does the "getting", right? These are the different names you currently use for `ScopedValueGetHitsInCacheNode`: sv_hits_in_cache hits_in_cache hits_in_the_cache get_from_cache get_from_sv_cache get_from_sv_cache_dom And for `ScopedValueGetResultNode`: sv_get_result get_result sv_get_result_dom And for `ScopedValueGetLoadFromCacheNode`: get_from_cache load_from_cache load_from_cache_dom The names even overlap, e.g. `get_from_cache`. It would be nice if there was just one name per class, that would enhance the clarity of the code. > src/hotspot/share/opto/loopnode.cpp line 4745: > >> 4743: } >> 4744: >> 4745: void PhaseIdealLoop::expand_get_from_sv_cache(ScopedValueGetHitsInCacheNode* get_from_cache) { > > Suggestion: > > void PhaseIdealLoop::expand_sv_get_hits_in_cache_and_load_from_cache(ScopedValueGetHitsInCacheNode* get_from_cache) { And again, a picture / pseudocode of the transformation would help immensely. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457111178 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457109943 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457179839 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457253594 From epeter at openjdk.org Thu Jan 18 10:45:34 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 18 Jan 2024 10:45:34 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v4] In-Reply-To: References: Message-ID: On Thu, 18 Jan 2024 08:54:42 GMT, Emanuel Peter wrote: >> Well, now looking at the method... maybe a longer comment with a picture or pseudocode would be helpful. >> It would greatly help me in reviewing the code - otherwise I basically have to draw the picture on a piece of paper myself before understanding it ;) > > I like it the most when there is ascii art that uses the variable names in the code below. I'll read the code in a later review. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1457137040 From wei.kuai at gmail.com Thu Jan 18 10:49:19 2024 From: wei.kuai at gmail.com (Wei Kuai) Date: Thu, 18 Jan 2024 18:49:19 +0800 Subject: discuss about release barrier for final fields initialization In-Reply-To: <502b509b-bf31-4eee-8468-3f2362d69da8@littlepinkcloud.com> References: <5e41f867-b31b-4ded-b737-8d0c869c8895.kuaiwei.kw@alibaba-inc.com> <20965b9a-ec64-4d81-866e-b1a0a94ed1e0@littlepinkcloud.com> <822b0f93-ae9e-4196-8d20-1b87286b91d5.kuaiwei.kw@alibaba-inc.com> <502b509b-bf31-4eee-8468-3f2362d69da8@littlepinkcloud.com> Message-ID: Hi Andrew, I tested "dmb.ishst; dmb.ishld" for release barrier. The test case is jmh of allocation with final fields. https://gist.github.com/kuaiwei/f71fba40df29991c93325a8600e34c13 In N1 dmb.ish : 1168.059 ops/s dmb.ishst+dmb.ishld: 1321.783 ops/s dmb.ishst : 1511.267 ops/s In N2 dmb.ish : 3672.087 ops/s dmb.ishst+dmb.ishld: 4840.322 ops/s dmb.ishst : 6005.430 ops/s The "dmb.ishst+dmb.ishld" can gain 13% and 32% on N1 and N2. It looks a better replacement for "dmb.ish" Thanks, Kuai Wei On Wed, Jan 17, 2024 at 10:49?PM Andrew Haley wrote: > On 1/11/24 11:58, Kuai Wei wrote: > > Thanks for reply. I checked the previous discussion and not clear about > the root cause. > > > > If you can provide more detail about the optimize, like what load or > load dependency will be elided, so we may check chance to detect or prevent. > > We think you're probably right. However, C2 does a lot of reorganization, > so it's hard to say that C2 can never predict what might be stored by > static field initialization in one thread. > > If you're benchmarking this, can you try dmb st; dmb ld without fusing > them together, thus avoiding a storeload? This would help us understand > the performance benefit. > > -- > Andrew Haley (he/him) > Java Platform Lead Engineer > Red Hat UK Ltd. > https://keybase.io/andrewhaley > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tholenstein at openjdk.org Thu Jan 18 11:50:29 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 18 Jan 2024 11:50:29 GMT Subject: RFR: JDK-8321984: IGV: Upgrade to Netbeans Platform 20 [v4] In-Reply-To: References: Message-ID: > Upgraded IGV and dependencies to the newest Netbeans Platform 20 which was released on December 2023. > > Tested that IGV still behaves as expected after the upgrade. Tobias Holenstein has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains two new commits since the last revision: - make IGV build work with mainline JDK version - remove requirement for nashorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17106/files - new: https://git.openjdk.org/jdk/pull/17106/files/f986934d..35080de7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17106&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17106&range=02-03 Stats: 20 lines in 2 files changed: 5 ins; 5 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/17106.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17106/head:pull/17106 PR: https://git.openjdk.org/jdk/pull/17106 From tholenstein at openjdk.org Thu Jan 18 11:59:13 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 18 Jan 2024 11:59:13 GMT Subject: RFR: JDK-8321984: IGV: Upgrade to Netbeans Platform 20 [v4] In-Reply-To: References: Message-ID: On Thu, 14 Dec 2023 12:56:22 GMT, Roberto Casta?eda Lozano wrote: > Thanks for doing this, Tobias. If we are raising the minimum JDK version to 17, it would make sense to make the dependency on `nashorn-core` unconditional (in `src/utils/IdealGraphVisualizer/Filter/pom.xml`). done. Thanks for the input! ------------- PR Comment: https://git.openjdk.org/jdk/pull/17106#issuecomment-1898341258 From tholenstein at openjdk.org Thu Jan 18 12:02:12 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 18 Jan 2024 12:02:12 GMT Subject: RFR: JDK-8321984: IGV: Upgrade to Netbeans Platform 20 [v4] In-Reply-To: References: Message-ID: On Thu, 18 Jan 2024 11:50:29 GMT, Tobias Holenstein wrote: >> Upgraded IGV and dependencies to the newest Netbeans Platform 20 which was released on December 2023. >> >> Tested that IGV still behaves as expected after the upgrade. > > Tobias Holenstein has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains two new commits since the last revision: > > - make IGV build work with mainline JDK version > - remove requirement for nashorn You can now also build IGV with `JAVA_HOME=path_to_mainline_jdk mvn clean install` by using a mainline JDK build ------------- PR Comment: https://git.openjdk.org/jdk/pull/17106#issuecomment-1898344458 From epeter at openjdk.org Thu Jan 18 12:15:28 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 18 Jan 2024 12:15:28 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v2] In-Reply-To: References: Message-ID: On Wed, 17 Jan 2024 10:03:06 GMT, Daniel Lund?n wrote: >> This changeset translates the tests in `compiler/c2/cr7200264/` to use the IR verification framework. >> >> The proposed translation, to the extent possible, attempts to preserve the semantics of the original test. A major difference is that the IR checks are now local (for every `test_*` method) instead of global. The execution time of the new test is comparable to the old test. >> >> Testing: >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/7553846710) >> - Ran the new translated tests within all tier1 through tier10 contexts on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. >> - Tested that manually adding `-XX:LoopUnrollLimit=0` to the test framework flags caused the translated tests to fail. Note: it is, however, no longer possible to break the test by passing `-XX:LoopUnrollLimit=0` on the command line. > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Refactor test to use multiple @Test A few first comments, still looking into the shift cases... Ah, a general comment: we also care about `aarch64`, i,e, `asimd`. Not just about `sse2` ;) Would it be reasonable to add `asimd` as well? test/hotspot/jtreg/compiler/c2/cr7200264/TestIntVect.java line 37: > 35: > 36: /* > 37: * Based on test/compiler/6340864/TestIntVect.java without performance tests. Suggestion: * Based on test/hotspot/jtreg/compiler/c2/cr6340864/TestIntVect.java without performance tests. New path, must have been moved at some point. test/hotspot/jtreg/compiler/c2/cr7200264/TestIntVect.java line 50: > 48: public static void main(String[] args) { > 49: TestFramework.runWithFlags("-XX:+IgnoreUnrecognizedVMOptions", > 50: "-XX:StressLongCountedLoop=0"); What is this for? Maybe add a comment. test/hotspot/jtreg/compiler/c2/cr7200264/TestIntVect.java line 63: > 61: mode = RunMode.STANDALONE) > 62: public void run() { > 63: Suggestion: test/hotspot/jtreg/compiler/c2/cr7200264/TestIntVect.java line 406: > 404: if (errn > 0) { > 405: System.err.println("FAILED: " + errn + " errors"); > 406: System.exit(97); Why not just throw an exception, with that error message? test/hotspot/jtreg/compiler/c2/cr7200264/TestIntVect.java line 412: > 410: } > 411: > 412: int test_sum(int[] a1) { Suggestion: // Not vectorized: simple addition not profitalbe, see JDK-8307516. int test_sum(int[] a1) { test/hotspot/jtreg/compiler/c2/cr7200264/TestIntVect.java line 510: > 508: } > 509: > 510: void test_divc(int[] a0, int[] a1) { Suggestion: // Not vectorized: no vector div. Might vectorize after JDK-8282365 (transform div to mul/add/shift). void test_divc(int[] a0, int[] a1) { test/hotspot/jtreg/compiler/c2/cr7200264/TestIntVect.java line 516: > 514: } > 515: > 516: void test_divc_n(int[] a0, int[] a1) { Suggestion: // Not vectorized: no vector div. Might vectorize after JDK-8282365 (transform div to mul/add/shift). void test_divc_n(int[] a0, int[] a1) { test/hotspot/jtreg/compiler/c2/cr7200264/TestIntVect.java line 522: > 520: } > 521: > 522: void test_divv(int[] a0, int[] a1, int b) { Suggestion: // Not vectorized: no vector div. void test_divv(int[] a0, int[] a1, int b) { test/hotspot/jtreg/compiler/c2/cr7200264/TestIntVect.java line 528: > 526: } > 527: > 528: void test_diva(int[] a0, int[] a1, int[] a2) { Suggestion: // Not vectorized: no vector div. void test_diva(int[] a0, int[] a1, int[] a2) { test/hotspot/jtreg/compiler/c2/cr7200264/TestIntVect.java line 643: > 641: a0[i] = (int)(a1[i]<<(-SHIFT)); > 642: } > 643: } Not sure why these don't vectorize. Need to investigate. test/hotspot/jtreg/compiler/c2/cr7200264/TestIntVect.java line 682: > 680: a0[i] = (int)(a1[i]>>>(-SHIFT)); > 681: } > 682: } same here, not sure, but shift by 32bit is not great. test/hotspot/jtreg/compiler/c2/cr7200264/TestIntVect.java line 721: > 719: a0[i] = (int)(a1[i]>>(-SHIFT)); > 720: } > 721: } yet another shift case test/hotspot/jtreg/compiler/c2/cr7200264/TestIntVect.java line 732: > 730: } > 731: > 732: void test_pack2(long[] p2, int[] a1) { Intersting pattern! But that would require optimizations we currently do not have. Maybe we can do that in the future, with heavy upgrades to the autovectorizer, or other optimizations that merge loads / stores. ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17428#pullrequestreview-1829487943 PR Comment: https://git.openjdk.org/jdk/pull/17428#issuecomment-1898364629 PR Review Comment: https://git.openjdk.org/jdk/pull/17428#discussion_r1457321852 PR Review Comment: https://git.openjdk.org/jdk/pull/17428#discussion_r1457325501 PR Review Comment: https://git.openjdk.org/jdk/pull/17428#discussion_r1457326179 PR Review Comment: https://git.openjdk.org/jdk/pull/17428#discussion_r1457329945 PR Review Comment: https://git.openjdk.org/jdk/pull/17428#discussion_r1457331858 PR Review Comment: https://git.openjdk.org/jdk/pull/17428#discussion_r1457335029 PR Review Comment: https://git.openjdk.org/jdk/pull/17428#discussion_r1457336037 PR Review Comment: https://git.openjdk.org/jdk/pull/17428#discussion_r1457336434 PR Review Comment: https://git.openjdk.org/jdk/pull/17428#discussion_r1457336611 PR Review Comment: https://git.openjdk.org/jdk/pull/17428#discussion_r1457338711 PR Review Comment: https://git.openjdk.org/jdk/pull/17428#discussion_r1457340003 PR Review Comment: https://git.openjdk.org/jdk/pull/17428#discussion_r1457340606 PR Review Comment: https://git.openjdk.org/jdk/pull/17428#discussion_r1457344577 From epeter at openjdk.org Thu Jan 18 12:15:29 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 18 Jan 2024 12:15:29 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v2] In-Reply-To: References: Message-ID: On Thu, 18 Jan 2024 11:56:32 GMT, Emanuel Peter wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Refactor test to use multiple @Test > > test/hotspot/jtreg/compiler/c2/cr7200264/TestIntVect.java line 643: > >> 641: a0[i] = (int)(a1[i]<<(-SHIFT)); >> 642: } >> 643: } > > Not sure why these don't vectorize. Need to investigate. They are shifted by 32 bit, so maybe that creates something odd? > test/hotspot/jtreg/compiler/c2/cr7200264/TestIntVect.java line 732: > >> 730: } >> 731: >> 732: void test_pack2(long[] p2, int[] a1) { > > Intersting pattern! But that would require optimizations we currently do not have. > Maybe we can do that in the future, with heavy upgrades to the autovectorizer, > or other optimizations that merge loads / stores. applies to all pack/swap cases here ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17428#discussion_r1457339577 PR Review Comment: https://git.openjdk.org/jdk/pull/17428#discussion_r1457345036 From epeter at openjdk.org Thu Jan 18 12:15:29 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 18 Jan 2024 12:15:29 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v2] In-Reply-To: References: Message-ID: <1mAl4_Dta7eNeyzajQdwYz5SoJnTAFMis6GtC_IxlrQ=.b072536d-9238-4b39-8e30-ff8b6c5afb44@github.com> On Thu, 18 Jan 2024 11:57:18 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/c2/cr7200264/TestIntVect.java line 643: >> >>> 641: a0[i] = (int)(a1[i]<<(-SHIFT)); >>> 642: } >>> 643: } >> >> Not sure why these don't vectorize. Need to investigate. > > They are shifted by 32 bit, so maybe that creates something odd? I see this in the old code: `@summary 7192963 changes disabled shift vectors` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17428#discussion_r1457346541 From epeter at openjdk.org Thu Jan 18 12:27:15 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 18 Jan 2024 12:27:15 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v2] In-Reply-To: References: Message-ID: On Wed, 17 Jan 2024 10:03:06 GMT, Daniel Lund?n wrote: >> This changeset translates the tests in `compiler/c2/cr7200264/` to use the IR verification framework. >> >> The proposed translation, to the extent possible, attempts to preserve the semantics of the original test. A major difference is that the IR checks are now local (for every `test_*` method) instead of global. The execution time of the new test is comparable to the old test. >> >> Testing: >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/7553846710) >> - Ran the new translated tests within all tier1 through tier10 contexts on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. >> - Tested that manually adding `-XX:LoopUnrollLimit=0` to the test framework flags caused the translated tests to fail. Note: it is, however, no longer possible to break the test by passing `-XX:LoopUnrollLimit=0` on the command line. > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Refactor test to use multiple @Test Well, I think at least some of the `shift` examples should also vectorize: `./java -XX:CompileCommand=compileonly,Test::test1 -XX:+TraceSuperWord -XX:+TraceLoopOpts -XX:+TraceNewVectors -XX:UseAVX=2 Test.java` Not sure if for all SSE and AVX levels, but all that I quickly checked with the UseSSE and USEAVX flags. TraceNewVectors [SuperWord]: 832 LoadVector === 347 766 740 [[ 738 734 731 727 619 616 518 136 ]] @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched #vectory[8]:{int} !orig=[739],[620],[519],[135] !jvms: Test::test2 @ bci:12 (line 21) TraceNewVectors [SuperWord]: 836 LShiftVI === _ 832 835 [[ 736 733 730 725 618 615 516 157 ]] #vectory[8]:{int} !orig=[738],[619],[518],[136] !jvms: Test::test2 @ bci:14 (line 21) TraceNewVectors [SuperWord]: 837 StoreVector === 763 766 737 836 [[ 341 766 160 339 ]] @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched Memory: @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; !orig=[736],[618],[516],[157],535 !jvms: Test::test2 @ bci:15 (line 21) Test.java: public class Test { static int RANGE = 10_000; public static void main(String[] args) { int[] a = new int[RANGE]; int[] b = new int[RANGE]; for (int i = 0; i < 10_000; i++) { test1(a, b); test2(a, b, i % 200 - 100); } } static void test1(int[] a, int[] b) { for (int i = 0; i < a.length; i++) { a[i] = (int)(b[i] << 32); } } static void test2(int[] a, int[] b, int s) { for (int i = 0; i < a.length; i++) { a[i] = (int)(b[i] << s); } } } I also found this test in `test/hotspot/jtreg/compiler/vectorization/runner/BasicIntOpTest.java`: @Test @IR(applyIfCPUFeatureOr = {"asimd", "true", "sse2", "true"}, counts = {IRNode.LSHIFT_VI, ">0"}) public int[] vectorShiftLeft() { int[] res = new int[SIZE]; for (int i = 0; i < SIZE; i++) { res[i] = a[i] << 3; } return res; } Plus, I see `test.addExpectedVectorization("LShiftVI", 5);` in `test/hotspot/jtreg/compiler/c2/cr7200264/TestSSE2IntVect.java`, which you now deleted. @dlunde would you mind investigating a bit more if you can add some IR rules for all (or at least a few) of the shift examples? If you think they really do not vectorize, can you paste me a Test.java with commandline how you run it? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17428#issuecomment-1898383407 From epeter at openjdk.org Thu Jan 18 12:31:21 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 18 Jan 2024 12:31:21 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v2] In-Reply-To: References: <1ys6VKnk5FRHBGmMwK10_yUNjZuqcvIO7Tk6iU8hboQ=.5d0f54e9-8263-4636-aba6-b528482a4c5a@github.com> Message-ID: On Thu, 18 Jan 2024 10:04:29 GMT, Daniel Lund?n wrote: >> Good point, I guess it's okay then to leave this code as as it is. It just looked odd to do correctness testing the way it does. But as you say, if you change that you probably also need to revisit other tests. That's probably not worth it. The main goal should be to introduce IR matching with the IR framework. So, I'm fine with not touching that code or the print statements. > > Would it perhaps be useful to create a separate RFE for your cleanup suggestions? What I think might eventually be much more helpful, is to add result verification into the IR framework, and not have to hand-code it everywhere. That would be worth investing some time in, because it would simplify future testing, and make sure that people generally have to do result verification (it gets forgotten too easily). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17428#discussion_r1457372919 From dlunden at openjdk.org Thu Jan 18 12:37:25 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 18 Jan 2024 12:37:25 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v3] In-Reply-To: References: Message-ID: > This changeset translates the tests in `compiler/c2/cr7200264/` to use the IR verification framework. > > The proposed translation, to the extent possible, attempts to preserve the semantics of the original test. A major difference is that the IR checks are now local (for every `test_*` method) instead of global. The execution time of the new test is comparable to the old test. > > Testing: > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/7553846710) > - Ran the new translated tests within all tier1 through tier10 contexts on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. > - Tested that manually adding `-XX:LoopUnrollLimit=0` to the test framework flags caused the translated tests to fail. Note: it is, however, no longer possible to break the test by passing `-XX:LoopUnrollLimit=0` on the command line. Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review Co-authored-by: Emanuel Peter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17428/files - new: https://git.openjdk.org/jdk/pull/17428/files/d1b4aa5e..87715718 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17428&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17428&range=01-02 Stats: 7 lines in 1 file changed: 5 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17428.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17428/head:pull/17428 PR: https://git.openjdk.org/jdk/pull/17428 From dlunden at openjdk.org Thu Jan 18 12:37:27 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 18 Jan 2024 12:37:27 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v2] In-Reply-To: References: Message-ID: On Thu, 18 Jan 2024 11:43:43 GMT, Emanuel Peter wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Refactor test to use multiple @Test > > test/hotspot/jtreg/compiler/c2/cr7200264/TestIntVect.java line 50: > >> 48: public static void main(String[] args) { >> 49: TestFramework.runWithFlags("-XX:+IgnoreUnrecognizedVMOptions", >> 50: "-XX:StressLongCountedLoop=0"); > > What is this for? Maybe add a comment. Yes, thanks. The previous comment that fell out is `// make sure int loops do not get converted to long`. I'll readd it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17428#discussion_r1457379678 From epeter at openjdk.org Thu Jan 18 12:42:15 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 18 Jan 2024 12:42:15 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v2] In-Reply-To: References: Message-ID: On Thu, 18 Jan 2024 12:34:56 GMT, Daniel Lund?n wrote: >> test/hotspot/jtreg/compiler/c2/cr7200264/TestIntVect.java line 50: >> >>> 48: public static void main(String[] args) { >>> 49: TestFramework.runWithFlags("-XX:+IgnoreUnrecognizedVMOptions", >>> 50: "-XX:StressLongCountedLoop=0"); >> >> What is this for? Maybe add a comment. > > Yes, thanks. The previous comment that fell out is `// make sure int loops do not get converted to long`. I'll readd it. And is that really necessary? I don't see the flag in the IR Framework whitelist: `test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java` That would mean that if you do set the flag, then we would already disable the IR rules, but we could still run the tests, checking for correctness. TLDR: I believe we should just be able to remove this flag completely. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17428#discussion_r1457384364 From dlunden at openjdk.org Thu Jan 18 12:42:17 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 18 Jan 2024 12:42:17 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v2] In-Reply-To: References: Message-ID: On Thu, 18 Jan 2024 11:48:02 GMT, Emanuel Peter wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Refactor test to use multiple @Test > > test/hotspot/jtreg/compiler/c2/cr7200264/TestIntVect.java line 406: > >> 404: if (errn > 0) { >> 405: System.err.println("FAILED: " + errn + " errors"); >> 406: System.exit(97); > > Why not just throw an exception, with that error message? Same comment here as for Christian's earlier comment: I do not touch the testing/verification code as part of this changeset (I only translate the previous ad-hoc IR checks to the IR framework). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17428#discussion_r1457382161 From dlunden at openjdk.org Thu Jan 18 12:52:15 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 18 Jan 2024 12:52:15 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v2] In-Reply-To: References: Message-ID: On Thu, 18 Jan 2024 12:24:42 GMT, Emanuel Peter wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Refactor test to use multiple @Test > > Well, I think at least some of the `shift` examples should also vectorize: > `./java -XX:CompileCommand=compileonly,Test::test1 -XX:+TraceSuperWord -XX:+TraceLoopOpts -XX:+TraceNewVectors -XX:UseAVX=2 Test.java` > > Not sure if for all SSE and AVX levels, but all that I quickly checked with the UseSSE and USEAVX flags. > > > TraceNewVectors [SuperWord]: 832 LoadVector === 347 766 740 [[ 738 734 731 727 619 616 518 136 ]] @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched #vectory[8]:{int} !orig=[739],[620],[519],[135] !jvms: Test::test2 @ bci:12 (line 21) > TraceNewVectors [SuperWord]: 836 LShiftVI === _ 832 835 [[ 736 733 730 725 618 615 516 157 ]] #vectory[8]:{int} !orig=[738],[619],[518],[136] !jvms: Test::test2 @ bci:14 (line 21) > TraceNewVectors [SuperWord]: 837 StoreVector === 763 766 737 836 [[ 341 766 160 339 ]] @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched Memory: @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; !orig=[736],[618],[516],[157],535 !jvms: Test::test2 @ bci:15 (line 21) > > > Test.java: > > public class Test { > static int RANGE = 10_000; > > public static void main(String[] args) { > int[] a = new int[RANGE]; > int[] b = new int[RANGE]; > for (int i = 0; i < 10_000; i++) { > test1(a, b); > test2(a, b, i % 200 - 100); > } > } > > static void test1(int[] a, int[] b) { > for (int i = 0; i < a.length; i++) { > a[i] = (int)(b[i] << 32); > } > } > > static void test2(int[] a, int[] b, int s) { > for (int i = 0; i < a.length; i++) { > a[i] = (int)(b[i] << s); > } > } > } > > > I also found this test in `test/hotspot/jtreg/compiler/vectorization/runner/BasicIntOpTest.java`: > > @Test > @IR(applyIfCPUFeatureOr = {"asimd", "true", "sse2", "true"}, > counts = {IRNode.LSHIFT_VI, ">0"}) > public int[] vectorShiftLeft() { > int[] res = new int[SIZE]; > for (int i = 0; i < SIZE; i++) { > res[i] = a[i] << 3; > } > return res; > } > > > Plus, I see `test.addExpectedVectorization("LShiftVI", 5);` in `test/hotspot/jtreg/compiler/c2/cr7200264/TestSSE2IntVect.java`, which you now deleted. > > @dlunde would you mind investigating a bit more if you can add some IR rules for all (or at least a few) of the shift examples? > If you think they really do not vectorize, can you paste me a Test.java with comm... Thanks for your investigation efforts @eme64! > Ah, a general comment: we also care about aarch64, i,e, asimd. Not just about sse2 ;) > Would it be reasonable to add asimd as well? The original tests only specified sse as requirements, but I think it sounds reasonable to add asimd. > Plus, I see test.addExpectedVectorization("LShiftVI", 5); in test/hotspot/jtreg/compiler/c2/cr7200264/TestSSE2IntVect.java, which you now deleted. > > @dlunde would you mind investigating a bit more if you can add some IR rules for all (or at least a few) of the shift examples? > If you think they really do not vectorize, can you paste me a Test.java with commandline how you run it? Thanks for spotting this, I'll investigate a bit more. I simply ran the tests locally (`make test TEST="hotspot/jtreg/compiler/c2/cr7200264/TestIntVect.java"`) with unreasonable values for the `@IR` counts (e.g., `@IR(counts = { IRNode.LSHIFT_VI, ">= 10000" })`. In the test output, it then gave an error indicating that the actual number was 0. I'll check if I messed up somewhere. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17428#issuecomment-1898419886 From dlunden at openjdk.org Thu Jan 18 13:09:14 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 18 Jan 2024 13:09:14 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v2] In-Reply-To: References: Message-ID: <9mdyA3D4IhjJeXo2xR6L1oc6p76d8jDJsgyeg41YhiA=.7a43845a-8faa-402b-a4a9-f335a3809df7@github.com> On Thu, 18 Jan 2024 12:39:26 GMT, Emanuel Peter wrote: >> Yes, thanks. The previous comment that fell out is `// make sure int loops do not get converted to long`. I'll readd it. > > And is that really necessary? > I don't see the flag in the IR Framework whitelist: > `test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java` > > That would mean that if you do set the flag, then we would already disable the IR rules, but we could still run the tests, checking for correctness. > > TLDR: I believe we should just be able to remove this flag completely. Isn't that whitelist only for flags passed down through JTREG? For example, when I add `"-XX:LoopUnrollLimit=0"` (not in the whitelist) to the `runWithFlags` arguments, all the IR tests fail. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17428#discussion_r1457415142 From dcubed at openjdk.org Thu Jan 18 13:10:27 2024 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 18 Jan 2024 13:10:27 GMT Subject: RFR: 8324074: increase timeout for jvmci test TestResolvedJavaMethod.java [v2] In-Reply-To: References: Message-ID: > A trivial fix to increase the default timeout for > compiler/jvmci/jdk.vm.ci.runtime.test/src/jdk/vm/ci/runtime/test/TestResolvedJavaMethod.java > to 240 seconds. > > See the bug report for the gory timeout details. Daniel D. Daugherty has updated the pull request incrementally with one additional commit since the last revision: Update copyright year. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17477/files - new: https://git.openjdk.org/jdk/pull/17477/files/a718e3da..d27f2141 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17477&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17477&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17477.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17477/head:pull/17477 PR: https://git.openjdk.org/jdk/pull/17477 From dcubed at openjdk.org Thu Jan 18 13:19:26 2024 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 18 Jan 2024 13:19:26 GMT Subject: RFR: 8324074: increase timeout for jvmci test TestResolvedJavaMethod.java [v2] In-Reply-To: References: Message-ID: On Thu, 18 Jan 2024 06:12:27 GMT, Thomas Stuefe wrote: >> Daniel D. Daugherty has updated the pull request incrementally with one additional commit since the last revision: >> >> Update copyright year. > > okay and indeed trivial. @tstuefe and @TobiHartmann - Thanks for the review. Copyright year updated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17477#issuecomment-1898458935 From dcubed at openjdk.org Thu Jan 18 13:19:28 2024 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 18 Jan 2024 13:19:28 GMT Subject: Integrated: 8324074: increase timeout for jvmci test TestResolvedJavaMethod.java In-Reply-To: References: Message-ID: <3sHvR57zVyDClRby85PBPme5AAtfMQvwgOeo9-A-EZk=.e730191f-c526-4aa0-8fcb-b3328334301d@github.com> On Thu, 18 Jan 2024 01:09:33 GMT, Daniel D. Daugherty wrote: > A trivial fix to increase the default timeout for > compiler/jvmci/jdk.vm.ci.runtime.test/src/jdk/vm/ci/runtime/test/TestResolvedJavaMethod.java > to 240 seconds. > > See the bug report for the gory timeout details. This pull request has now been integrated. Changeset: aeb304b2 Author: Daniel D. Daugherty URL: https://git.openjdk.org/jdk/commit/aeb304b29eaaba2b7a8fef85ee46cbfca27dbfbe Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod 8324074: increase timeout for jvmci test TestResolvedJavaMethod.java Reviewed-by: stuefe, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/17477 From redestad at openjdk.org Thu Jan 18 13:21:14 2024 From: redestad at openjdk.org (Claes Redestad) Date: Thu, 18 Jan 2024 13:21:14 GMT Subject: RFR: 8318446: C2: implement StoreNode::Ideal_merge_stores In-Reply-To: References: <10wcc-pgfz9ddsLYg1wkqG7EdXiXDd1vZIdZqwhBkns=.d032c755-eb79-4a76-9175-3e847d5bb1f7@github.com> Message-ID: On Wed, 17 Jan 2024 15:44:38 GMT, Emanuel Peter wrote: > And I do only merge them if they are increasing. It is a limitation, but not a terrible one I'd say. Would this mean big-endian variants would not be candidates for this optimization? Since offsets would locally decrease for subsequent store? Adding BE microbenchmarks would be good regardless. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16245#issuecomment-1898465718 From chagedorn at openjdk.org Thu Jan 18 13:26:12 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 18 Jan 2024 13:26:12 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v2] In-Reply-To: References: Message-ID: On Thu, 18 Jan 2024 12:37:24 GMT, Daniel Lund?n wrote: >> test/hotspot/jtreg/compiler/c2/cr7200264/TestIntVect.java line 406: >> >>> 404: if (errn > 0) { >>> 405: System.err.println("FAILED: " + errn + " errors"); >>> 406: System.exit(97); >> >> Why not just throw an exception, with that error message? > > Same comment here as for Christian's earlier comment: I do not touch the testing/verification code as part of this changeset (I only translate the previous ad-hoc IR checks to the IR framework). I generally agree with that but `System.exit(97)` seems quite odd. Maybe you want to change that to throw an exception instead. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17428#discussion_r1457434059 From chagedorn at openjdk.org Thu Jan 18 13:26:15 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 18 Jan 2024 13:26:15 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v2] In-Reply-To: References: Message-ID: On Thu, 18 Jan 2024 11:49:49 GMT, Emanuel Peter wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Refactor test to use multiple @Test > > test/hotspot/jtreg/compiler/c2/cr7200264/TestIntVect.java line 412: > >> 410: } >> 411: >> 412: int test_sum(int[] a1) { > > Suggestion: > > // Not vectorized: simple addition not profitalbe, see JDK-8307516. > int test_sum(int[] a1) { Is it simply not profitable or not possible (since we also do some correctness/feasibility checking in `SuperWord::profitable()`)? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17428#discussion_r1457432885 From epeter at openjdk.org Thu Jan 18 13:35:17 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 18 Jan 2024 13:35:17 GMT Subject: RFR: 8318446: C2: implement StoreNode::Ideal_merge_stores In-Reply-To: References: Message-ID: On Wed, 18 Oct 2023 13:04:35 GMT, Emanuel Peter wrote: > This is a feature requiested by @RogerRiggs and @cl4es . > > **Idea** > > Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get speedups by using Unsafe (e.g. `Unsafe.putLongUnaligned`), or ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code. > > This patch here supports a few simple use-cases, like these: > > Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395 > > Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the splitting (i.e. shifting and truncation), and directly store the variable: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456 > > The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian): > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338 > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472 > > **Details** > > This draft currently implements the optimization in an additional special IGVN phase: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485 > > We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. We essentially try to establish a chain of mergable stores: > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806 > > Mergable stores must have the same Opcode (implies they have the same element type and hence size). Further, mergable stores must have the same control (or be separated by only a RangeCheck). Further, they must either both store constants, or adjacent segments of a larger value ... So far, I just ignored the big-endian world completely. No testing machine I have access to has big-endian (except maybe SPARC if I were to dig that up). So I guess that has to be addressed. Does anybody have a way I can test for big-endian machines? > Would this mean big-endian variants would not be candidates for this optimization? @cl4es would you really write this with decreasing indices in java code? Or would you write it with increasing indices, but then flip around the bytes, probably shifting values differently? I am considering to only enable this optimization on little-endian machines, and that will simplyfy my testing, and still apply basically everywhere we care for. Correct me if I should care about big-endian, please ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16245#issuecomment-1898488016 From epeter at openjdk.org Thu Jan 18 13:36:16 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 18 Jan 2024 13:36:16 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v2] In-Reply-To: <9mdyA3D4IhjJeXo2xR6L1oc6p76d8jDJsgyeg41YhiA=.7a43845a-8faa-402b-a4a9-f335a3809df7@github.com> References: <9mdyA3D4IhjJeXo2xR6L1oc6p76d8jDJsgyeg41YhiA=.7a43845a-8faa-402b-a4a9-f335a3809df7@github.com> Message-ID: On Thu, 18 Jan 2024 13:06:53 GMT, Daniel Lund?n wrote: >> And is that really necessary? >> I don't see the flag in the IR Framework whitelist: >> `test/hotspot/jtreg/compiler/lib/ir_framework/TestFramework.java` >> >> That would mean that if you do set the flag, then we would already disable the IR rules, but we could still run the tests, checking for correctness. >> >> TLDR: I believe we should just be able to remove this flag completely. > > Isn't that whitelist only for flags passed down through JTREG? For example, when I add `"-XX:LoopUnrollLimit=0"` (not in the whitelist) to the `runWithFlags` arguments, all the IR tests fail. Ok then, let me explain IR Framework whitelists, and `runWithFlags`: Generally, we want a test to be run from the outside with as many flag combinations as possible, since some bugs only trigger with strange combinations. On the other hand, we need some way to say under what conditions an IR rule should be checked, because many flags might have side-effects on the IR, and disable all sorts of optimizations. We chose the whitelist approach: those flags are expected generally not to change the IR, or change it in a way that could also happen on other machines, and therefore must be allowed to simulate those machines (e.g. UseAVX). If you have a test that is not ok with any combination of the whitelisted flags, then the test must further restrict the IR rule with `applyIf` statements. Sometimes, we want to make sure that IR rules are ok with flag settings that are not allowed by the whitelist, for example `OptimizeFill` (it has an effect, but maybe an effect we want to check for in a specific test). Or maybe we want to overwrite some flag setting for specific reasons (put up some node limit, etc). In those cases, it can make sense to use `runWithFlags`. In the case of this test, I don't see why you would want to set `-XX:LoopUnrollLimit=0` via `runWithFlags`. Of course this makes all tests fail, because it messes up unrolling, and without unrolling you have no SuperWord, and without SuperWord you have no vectorization. But if anybody were to set this flag from the outside (via JTREG), then it would disable the IR rules implicitly (because the flag is not on the whitelist), and the jtreg test would pass as a whole. Does that make sense, and help? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17428#discussion_r1457440109 From chagedorn at openjdk.org Thu Jan 18 13:36:17 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 18 Jan 2024 13:36:17 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v2] In-Reply-To: References: <1ys6VKnk5FRHBGmMwK10_yUNjZuqcvIO7Tk6iU8hboQ=.5d0f54e9-8263-4636-aba6-b528482a4c5a@github.com> Message-ID: On Thu, 18 Jan 2024 12:28:29 GMT, Emanuel Peter wrote: >> Would it perhaps be useful to create a separate RFE for your cleanup suggestions? > > What I think might eventually be much more helpful, is to add result verification into the IR framework, and not have to hand-code it everywhere. That would be worth investing some time in, because it would simplify future testing, and make sure that people generally have to do result verification (it gets forgotten too easily). > Would it perhaps be useful to create a separate RFE for your cleanup suggestions? Not sure if it's worth it. But generally, cleaning up tests is always a good thing. > What I think might eventually be much more helpful, is to add result verification into the IR framework, and not have to hand-code it everywhere. That would be worth investing some time in, because it would simplify future testing, and make sure that people generally have to do result verification (it gets forgotten too easily). That's a good point. We should provide some support for that at some point or even enforce it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17428#discussion_r1457447107 From chagedorn at openjdk.org Thu Jan 18 13:36:22 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 18 Jan 2024 13:36:22 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v2] In-Reply-To: References: Message-ID: On Thu, 18 Jan 2024 11:52:59 GMT, Emanuel Peter wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Refactor test to use multiple @Test > > test/hotspot/jtreg/compiler/c2/cr7200264/TestIntVect.java line 510: > >> 508: } >> 509: >> 510: void test_divc(int[] a0, int[] a1) { > > Suggestion: > > // Not vectorized: no vector div. Might vectorize after JDK-8282365 (transform div to mul/add/shift). > void test_divc(int[] a0, int[] a1) { I was curious about that and it actually does: TraceNewVectors [SuperWord]: 744 LoadVector === 380 693 677 [[ 675 671 669 664 660 658 558 554 552 145 150 154 ]] @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched #vectorx[4]:{int} !orig=[676],[559],[134] !jvms: Test::test_divc @ bci:12 (line 36) TraceNewVectors [SuperWord]: 746 RShiftVI === _ 744 745 [[ 668 657 551 155 ]] #vectorx[4]:{int} !orig=[669],[552],[154] !jvms: Test::test_divc @ bci:15 (line 36) TraceNewVectors [SuperWord]: 747 VectorCastI2X === _ 744 [[ 674 663 557 146 ]] #vectory[4]:{long} !orig=[675],[558],[145] !jvms: Test::test_divc @ bci:15 (line 36) TraceNewVectors [SuperWord]: 748 Replicate === _ 144 [[ ]] #vectory[4]:{long} TraceNewVectors [SuperWord]: 749 MulVL === _ 747 748 [[ 673 662 556 148 ]] #vectory[4]:{long} !orig=[674],[557],[146] !jvms: Test::test_divc @ bci:15 (line 36) TraceNewVectors [SuperWord]: 751 RShiftVL === _ 749 750 [[ 672 661 555 149 ]] #vectory[4]:{long} !orig=[673],[556],[148] !jvms: Test::test_divc @ bci:15 (line 36) TraceNewVectors [SuperWord]: 752 VectorCastL2X === _ 751 [[ 671 660 554 150 ]] #vectorx[4]:{int} !orig=[672],[555],[149] !jvms: Test::test_divc @ bci:15 (line 36) TraceNewVectors [SuperWord]: 753 AddVI === _ 752 744 [[ 670 659 553 152 ]] #vectorx[4]:{int} !orig=[671],[554],[150] !jvms: Test::test_divc @ bci:15 (line 36) TraceNewVectors [SuperWord]: 755 RShiftVI === _ 753 754 [[ 668 657 551 155 ]] #vectorx[4]:{int} !orig=[670],[553],[152] !jvms: Test::test_divc @ bci:15 (line 36) TraceNewVectors [SuperWord]: 756 SubVI === _ 755 746 [[ 666 656 549 176 ]] #vectorx[4]:{int} !orig=[668],[551],[155] !jvms: Test::test_divc @ bci:15 (line 36) TraceNewVectors [SuperWord]: 757 StoreVector === 687 693 667 756 [[ 374 693 372 179 ]] @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched Memory: @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; !orig=[666],[549],[176],575 !jvms: Test::test_divc @ bci:16 (line 36) I have not checked any other methods but it might indeed be possible to vectorize some them. I think it's a good idea to check all methods and add a comment with a short explanation why it's not possible or if there are plans to support vectorization in the future. All these tests look like a good collection of (seemingly good) vectorization opportunities. Thanks @eme64 for your help with that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17428#discussion_r1457441230 From dlunden at openjdk.org Thu Jan 18 14:08:12 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 18 Jan 2024 14:08:12 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v2] In-Reply-To: References: <9mdyA3D4IhjJeXo2xR6L1oc6p76d8jDJsgyeg41YhiA=.7a43845a-8faa-402b-a4a9-f335a3809df7@github.com> Message-ID: On Thu, 18 Jan 2024 13:27:40 GMT, Emanuel Peter wrote: >> Isn't that whitelist only for flags passed down through JTREG? For example, when I add `"-XX:LoopUnrollLimit=0"` (not in the whitelist) to the `runWithFlags` arguments, all the IR tests fail. > > Ok then, let me explain IR Framework whitelists, and `runWithFlags`: > Generally, we want a test to be run from the outside with as many flag combinations as possible, since some bugs only trigger with strange combinations. > On the other hand, we need some way to say under what conditions an IR rule should be checked, because many flags might have side-effects on the IR, and disable all sorts of optimizations. > We chose the whitelist approach: those flags are expected generally not to change the IR, or change it in a way that could also happen on other machines, and therefore must be allowed to simulate those machines (e.g. UseAVX). > If you have a test that is not ok with any combination of the whitelisted flags, then the test must further restrict the IR rule with `applyIf` statements. > > Sometimes, we want to make sure that IR rules are ok with flag settings that are not allowed by the whitelist, for example `OptimizeFill` (it has an effect, but maybe an effect we want to check for in a specific test). Or maybe we want to overwrite some flag setting for specific reasons (put up some node limit, etc). In those cases, it can make sense to use `runWithFlags`. > > In the case of this test, I don't see why you would want to set `-XX:LoopUnrollLimit=0` via `runWithFlags`. Of course this makes all tests fail, because it messes up unrolling, and without unrolling you have no SuperWord, and without SuperWord you have no vectorization. > But if anybody were to set this flag from the outside (via JTREG), then it would disable the IR rules implicitly (because the flag is not on the whitelist), and the jtreg test would pass as a whole. > > Does that make sense, and help? Yes, thanks, that corresponds to my understanding of the framework. Sorry if I was unclear, we of course do not want to include `-XX:LoopUnrollLimit=0`, that was just an example to illustrate the difference between whitelisting and `runWithFlags` (and what originally spawned this issue: https://bugs.openjdk.org/browse/JDK-8291809). The question is if we want to include `-XX:StressLongCountedLoop=0` in `runWithFlags`, as that flag was part of the original test. I'm fine with removing it if it no longer makes sense in the IR framework. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17428#discussion_r1457489284 From tholenstein at openjdk.org Thu Jan 18 14:44:25 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 18 Jan 2024 14:44:25 GMT Subject: RFR: JDK-8210858: AArch64: remove Math.log intrinsic [v2] In-Reply-To: <7Zgt8VOxl48Niocueb70XwxmYIAIaEZUlxKGvPP6BnU=.e354f71c-d6c6-40b9-a93e-98ef7ba8009d@github.com> References: <7Zgt8VOxl48Niocueb70XwxmYIAIaEZUlxKGvPP6BnU=.e354f71c-d6c6-40b9-a93e-98ef7ba8009d@github.com> Message-ID: > [JDK-8215133](https://bugs.openjdk.org/browse/JDK-8215133) disabled vmIntrinsics::_dlog. Remove it now > > ### Why remove > > That Java specification says: > > "The computed result must be within 1 ulp of the exact result. Results must be semi-monotonic... whenever the mathematical function is non-decreasing, so is the floating-point approximation, likewise, whenever the mathematical function is non-increasing, so is the floating-point approximation" > > There is no proof of the monotonicity of this intrinsics at the moment. Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: remove MacroAssembler::fast_log() and generate_dlog() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17480/files - new: https://git.openjdk.org/jdk/pull/17480/files/a0240c4d..f1eaee30 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17480&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17480&range=00-01 Stats: 384 lines in 3 files changed: 0 ins; 384 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17480.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17480/head:pull/17480 PR: https://git.openjdk.org/jdk/pull/17480 From tholenstein at openjdk.org Thu Jan 18 14:44:27 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 18 Jan 2024 14:44:27 GMT Subject: RFR: JDK-8210858: AArch64: remove Math.log intrinsic In-Reply-To: <2oiPyet4xf0IemXgcS0w_Fkk6AJ91dofrGcQBhxmsZU=.fd0af5b5-8d83-4dc3-96f7-b8608fd9bd87@github.com> References: <7Zgt8VOxl48Niocueb70XwxmYIAIaEZUlxKGvPP6BnU=.e354f71c-d6c6-40b9-a93e-98ef7ba8009d@github.com> <2oiPyet4xf0IemXgcS0w_Fkk6AJ91dofrGcQBhxmsZU=.fd0af5b5-8d83-4dc3-96f7-b8608fd9bd87@github.com> Message-ID: On Thu, 18 Jan 2024 09:26:59 GMT, Nick Gasson wrote: > Should we also remove `MacroAssembler::fast_log()` and `generate_dlog()` as they are unused now? you are right. I removed them as well ------------- PR Comment: https://git.openjdk.org/jdk/pull/17480#issuecomment-1898609648 From epeter at openjdk.org Thu Jan 18 14:55:15 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 18 Jan 2024 14:55:15 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v2] In-Reply-To: References: <9mdyA3D4IhjJeXo2xR6L1oc6p76d8jDJsgyeg41YhiA=.7a43845a-8faa-402b-a4a9-f335a3809df7@github.com> Message-ID: On Thu, 18 Jan 2024 14:05:09 GMT, Daniel Lund?n wrote: >> Ok then, let me explain IR Framework whitelists, and `runWithFlags`: >> Generally, we want a test to be run from the outside with as many flag combinations as possible, since some bugs only trigger with strange combinations. >> On the other hand, we need some way to say under what conditions an IR rule should be checked, because many flags might have side-effects on the IR, and disable all sorts of optimizations. >> We chose the whitelist approach: those flags are expected generally not to change the IR, or change it in a way that could also happen on other machines, and therefore must be allowed to simulate those machines (e.g. UseAVX). >> If you have a test that is not ok with any combination of the whitelisted flags, then the test must further restrict the IR rule with `applyIf` statements. >> >> Sometimes, we want to make sure that IR rules are ok with flag settings that are not allowed by the whitelist, for example `OptimizeFill` (it has an effect, but maybe an effect we want to check for in a specific test). Or maybe we want to overwrite some flag setting for specific reasons (put up some node limit, etc). In those cases, it can make sense to use `runWithFlags`. >> >> In the case of this test, I don't see why you would want to set `-XX:LoopUnrollLimit=0` via `runWithFlags`. Of course this makes all tests fail, because it messes up unrolling, and without unrolling you have no SuperWord, and without SuperWord you have no vectorization. >> But if anybody were to set this flag from the outside (via JTREG), then it would disable the IR rules implicitly (because the flag is not on the whitelist), and the jtreg test would pass as a whole. >> >> Does that make sense, and help? > > Yes, thanks, that corresponds to my understanding of the framework. Sorry if I was unclear, we of course do not want to include `-XX:LoopUnrollLimit=0`, that was just an example to illustrate the difference between whitelisting and `runWithFlags` (and what originally spawned this issue: https://bugs.openjdk.org/browse/JDK-8291809). > > The question is if we want to include `-XX:StressLongCountedLoop=0` in `runWithFlags`, as that flag was part of the original test. I'm fine with removing it if it no longer makes sense in the IR framework. Ok, great :) I assume that `-XX:StressLongCountedLoop=0` was there before, because if it was not, then one could make the test fail with JTREG, by for example passing `-XX:StressLongCountedLoop=1000`, right? But of course you would have to guard against all sorts of flags like this. For example `-Xint`. I just ran it with that flag, and it failed! :rofl: TLDR: this is an old mechanism, and a bad one. So please remove the flag ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17428#discussion_r1457554234 From redestad at openjdk.org Thu Jan 18 15:26:23 2024 From: redestad at openjdk.org (Claes Redestad) Date: Thu, 18 Jan 2024 15:26:23 GMT Subject: RFR: 8318446: C2: implement StoreNode::Ideal_merge_stores In-Reply-To: References: Message-ID: On Wed, 18 Oct 2023 13:04:35 GMT, Emanuel Peter wrote: > This is a feature requiested by @RogerRiggs and @cl4es . > > **Idea** > > Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get speedups by using Unsafe (e.g. `Unsafe.putLongUnaligned`), or ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code. > > This patch here supports a few simple use-cases, like these: > > Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395 > > Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the splitting (i.e. shifting and truncation), and directly store the variable: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456 > > The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian): > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338 > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472 > > **Details** > > This draft currently implements the optimization in an additional special IGVN phase: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485 > > We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. We essentially try to establish a chain of mergable stores: > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806 > > Mergable stores must have the same Opcode (implies they have the same element type and hence size). Further, mergable stores must have the same control (or be separated by only a RangeCheck). Further, they must either both store constants, or adjacent segments of a larger value ... I probably wouldn't write with decreasing indices, no, and if it'd be a lot of complexity I can see that it's not worth it. And while I'm not suggesting you need to care for big-endian _hardware_ it would be good to have microbenchmarks that explicitly use and write in big-endian. I guess these do get a similar speed-up with your patch, but some verification would be great. Big-endian is quite common in networking protocols after all. If it helps testing and verification I'd say only enabling this optimization on little-endian HW is fine if you can't find hardware or delegate to others to verify correctness and that there are equivalent speed-ups. Someone maintaining a big-endian platform should be able to test and verify as a follow-up, which might be better to not block progress here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16245#issuecomment-1898688697 From dcubed at openjdk.org Thu Jan 18 15:29:27 2024 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 18 Jan 2024 15:29:27 GMT Subject: [jdk22] RFR: 8324074: increase timeout for jvmci test TestResolvedJavaMethod.java Message-ID: A trivial fix to increase the default timeout for compiler/jvmci/jdk.vm.ci.runtime.test/src/jdk/vm/ci/runtime/test/TestResolvedJavaMethod.java to 240 seconds. See the bug report for the gory timeout details. ------------- Commit messages: - Backport aeb304b29eaaba2b7a8fef85ee46cbfca27dbfbe Changes: https://git.openjdk.org/jdk22/pull/91/files Webrev: https://webrevs.openjdk.org/?repo=jdk22&pr=91&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8324074 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk22/pull/91.diff Fetch: git fetch https://git.openjdk.org/jdk22.git pull/91/head:pull/91 PR: https://git.openjdk.org/jdk22/pull/91 From rgiulietti at openjdk.org Thu Jan 18 15:35:39 2024 From: rgiulietti at openjdk.org (Raffaello Giulietti) Date: Thu, 18 Jan 2024 15:35:39 GMT Subject: RFR: 8282365: Consolidate and improve division by constant idealizations [v44] In-Reply-To: References: Message-ID: On Tue, 9 Jan 2024 17:27:38 GMT, Quan Anh Mai wrote: >> This patch implements idealisation for unsigned divisions to change a division by a constant into a series of multiplication and shift. I also change the idealisation of `DivI` to get a more efficient series when the magic constant overflows an int32. >> >> In general, the idea behind a signed division transformation is that for a positive constant `d`, we would need to find constants `c` and `m` so that: >> >> floor(x / d) = floor(x * c / 2**m) for 0 < x < 2**(N - 1) (1) >> ceil(x / d) = floor(x * c / 2**m) + 1 for -2**(N - 1) <= x < 0 (2) >> >> The implementation in the original book takes into consideration that the machine may not be able to perform the full multiplication `x * c`, so the constant overflow and we need to add back the dividend as in `DivLNode::Ideal` cases. However, for int32 division, `x * c` cannot overflow an int64. As a result, it is always feasible to just calculate the product and shift the result. >> >> For unsigned multiplication, the situation is somewhat trickier because the condition needs to be twice as strong (the condition (1) and (2) above are mostly the same). This results in the magic constant `c` calculated based on the method presented in Hacker's Delight by Henry S. Warren, Jr. may overflow an uintN. For int division, we can depend on the theorem devised by Arch D. Robison in N-Bit Unsigned Division Via N-Bit Multiply-Add, which states that there exists either: >> >> c1 in uint32 and m1, such that floor(x / d) = floor(x * c1 / 2**m1) for 0 < x < 2**32 (3) >> c2 in uint32 and m2, such that floor(x / d) = floor((x + 1) * c2 / 2**m2) for 0 < x < 2**32 (4) >> >> which means that either `x * c1` never overflows an uint64 or `(x + 1) * c2` never overflows an uint64. And we can perform a full multiplication. >> >> For longs, there is no way to do a full multiplication so we do some basic transformations to achieve a computable formula. The details I have written as comments in the overflow case. >> >> More tests are added to cover the possible patterns. >> >> Please take a look and have some reviews. Thank you very much. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > update include order and license year src/hotspot/share/opto/divconstants.cpp line 122: > 120: // c * d - m is the intersection of (0, m / v_neg] and (0, m / v_pos). Which is (0, m / v_pos) > 121: // if v_pos >= v_neg and (0, m / v_neg] otherwise. > 122: // The analysis seem correct. src/hotspot/share/opto/divconstants.cpp line 176: > 174: qv = qv * 2; > 175: rv = new_rv; > 176: } One could perhaps avoid overflows in computing `rc` and `rv`, and simplify the corresponding tests, like so (not sure if this improves anything in practical terms, though): Suggestion: if (d - rc < rc) { // 2 * rc > d c_ovf = c > min_signed; c += c - 1; rc -= d - rc; // rc = 2 * rc - d } else { // 2 * rc <= d c_ovf = c >= min_signed; c += c; rc += rc; // rc = 2 * rc } if (rv >= v - rv) { // 2 * rv >= v qv_ovf = qv >= min_signed; qv += qv + 1; rv -= v - rv; // rv = 2 * rv - v } else { // 2 * rv < v qv_ovf = qv >= min_signed; qv += qv; rv += rv; // rv = 2 * rv } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1457605608 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1457610041 From ayang at openjdk.org Thu Jan 18 15:43:24 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 18 Jan 2024 15:43:24 GMT Subject: [jdk22] RFR: 8324074: increase timeout for jvmci test TestResolvedJavaMethod.java In-Reply-To: References: Message-ID: On Thu, 18 Jan 2024 15:21:40 GMT, Daniel D. Daugherty wrote: > A trivial fix to increase the default timeout for > compiler/jvmci/jdk.vm.ci.runtime.test/src/jdk/vm/ci/runtime/test/TestResolvedJavaMethod.java > to 240 seconds. > > See the bug report for the gory timeout details. Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk22/pull/91#pullrequestreview-1829973463 From thartmann at openjdk.org Thu Jan 18 15:43:25 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 18 Jan 2024 15:43:25 GMT Subject: [jdk22] RFR: 8324074: increase timeout for jvmci test TestResolvedJavaMethod.java In-Reply-To: References: Message-ID: On Thu, 18 Jan 2024 15:21:40 GMT, Daniel D. Daugherty wrote: > A trivial fix to increase the default timeout for > compiler/jvmci/jdk.vm.ci.runtime.test/src/jdk/vm/ci/runtime/test/TestResolvedJavaMethod.java > to 240 seconds. > > See the bug report for the gory timeout details. Marked as reviewed by thartmann (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk22/pull/91#pullrequestreview-1829975209 From dcubed at openjdk.org Thu Jan 18 15:43:26 2024 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 18 Jan 2024 15:43:26 GMT Subject: [jdk22] RFR: 8324074: increase timeout for jvmci test TestResolvedJavaMethod.java In-Reply-To: References: Message-ID: On Thu, 18 Jan 2024 15:36:18 GMT, Albert Mingkun Yang wrote: >> A trivial fix to increase the default timeout for >> compiler/jvmci/jdk.vm.ci.runtime.test/src/jdk/vm/ci/runtime/test/TestResolvedJavaMethod.java >> to 240 seconds. >> >> See the bug report for the gory timeout details. > > Marked as reviewed by ayang (Reviewer). @albertnetymk and @TobiHartmann - Thanks for the fast reviews! ------------- PR Comment: https://git.openjdk.org/jdk22/pull/91#issuecomment-1898718834 From dcubed at openjdk.org Thu Jan 18 15:43:26 2024 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 18 Jan 2024 15:43:26 GMT Subject: [jdk22] Integrated: 8324074: increase timeout for jvmci test TestResolvedJavaMethod.java In-Reply-To: References: Message-ID: On Thu, 18 Jan 2024 15:21:40 GMT, Daniel D. Daugherty wrote: > A trivial fix to increase the default timeout for > compiler/jvmci/jdk.vm.ci.runtime.test/src/jdk/vm/ci/runtime/test/TestResolvedJavaMethod.java > to 240 seconds. > > See the bug report for the gory timeout details. This pull request has now been integrated. Changeset: 73c77d96 Author: Daniel D. Daugherty URL: https://git.openjdk.org/jdk22/commit/73c77d962482e7a632a562414f6e5beeeb74572c Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod 8324074: increase timeout for jvmci test TestResolvedJavaMethod.java Reviewed-by: ayang, thartmann Backport-of: aeb304b29eaaba2b7a8fef85ee46cbfca27dbfbe ------------- PR: https://git.openjdk.org/jdk22/pull/91 From dlunden at openjdk.org Thu Jan 18 15:52:15 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 18 Jan 2024 15:52:15 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v2] In-Reply-To: References: <9mdyA3D4IhjJeXo2xR6L1oc6p76d8jDJsgyeg41YhiA=.7a43845a-8faa-402b-a4a9-f335a3809df7@github.com> Message-ID: On Thu, 18 Jan 2024 14:51:41 GMT, Emanuel Peter wrote: >> Yes, thanks, that corresponds to my understanding of the framework. Sorry if I was unclear, we of course do not want to include `-XX:LoopUnrollLimit=0`, that was just an example to illustrate the difference between whitelisting and `runWithFlags` (and what originally spawned this issue: https://bugs.openjdk.org/browse/JDK-8291809). >> >> The question is if we want to include `-XX:StressLongCountedLoop=0` in `runWithFlags`, as that flag was part of the original test. I'm fine with removing it if it no longer makes sense in the IR framework. > > Ok, great :) > > I assume that `-XX:StressLongCountedLoop=0` was there before, because if it was not, then one could make the test fail with JTREG, by for example passing `-XX:StressLongCountedLoop=1000`, right? > > But of course you would have to guard against all sorts of flags like this. For example `-Xint`. I just ran it with that flag, and it failed! :rofl: > > TLDR: this is an old mechanism, and a bad one. So please remove the flag ;) OK, I'll remove it then! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17428#discussion_r1457644493 From dlunden at openjdk.org Thu Jan 18 15:52:17 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 18 Jan 2024 15:52:17 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v2] In-Reply-To: References: Message-ID: On Thu, 18 Jan 2024 13:23:19 GMT, Christian Hagedorn wrote: >> Same comment here as for Christian's earlier comment: I do not touch the testing/verification code as part of this changeset (I only translate the previous ad-hoc IR checks to the IR framework). > > I generally agree with that but `System.exit(97)` seems quite odd. Maybe you want to change that to throw an exception instead. Fair enough, I'll change it to throw an exception. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17428#discussion_r1457645049 From epeter at openjdk.org Thu Jan 18 16:49:14 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 18 Jan 2024 16:49:14 GMT Subject: RFR: 8322743: assert(held_monitor_count() == jni_monitor_count()) failed [v2] In-Reply-To: References: Message-ID: On Wed, 17 Jan 2024 20:20:05 GMT, Vladimir Kozlov wrote: >> Corner case with a local (not escaped) object used for synchronization. C2 Escape Analysis thinks that it can eliminate locks for it. In most cases it is true but not in this case. >> >> >> for (int i = 0; i < 2; ++i) { >> Object o = new Object(); >> synchronized (o) { // monitorenter >> // Trigger OSR compilation >> for (int j = 0; j < 100_000; ++j) { >> >> The test has nested loop which trigger OSR compilation. The locked object comes from Interpreter into compiled OSR code. During parsing C2 creates an other non escaped object and correctly merge both together (with Phi node) so that non escaped object is not scalar replaceable. Because it does not globally escapes EA still removes locks for it and, as result, also for merged locked object from Interpreter which is the bug. >> >> The fix is to check that synchronized block does not have any associated escaped objects when EA decides if locks can be eliminated. >> >> Added regression test prepared by @TobiHartmann. Tested tier1-5, xcomp and stress. >> Performance testing show no difference. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments I'll need to study EA from the ground up to really review this. So for now just a couple of questions: > During parsing C2 creates an other non escaped object and correctly merge both together (with Phi node) Why do we create this non-escaped object during parsing? Is that the one that comes in through the OSR slot? What exactly makes this case different to another case where we just have a lock come in through the OSR slot, but no Phi that merges them with an object allocation inside the OSR body? src/hotspot/share/opto/escape.cpp line 2888: > 2886: * > 2887: * Return true if lock/unlock can be eliminated. > 2888: */ Suggestion: // The lock/unlock is unnecessary if we are locking a non-escaped object, // unless synchronized block (defined by BoxLock node) has other escaped objects // (for example, locked object come from Interpreter in OSR compilation). // // Return true if lock/unlock can be eliminated. This would be the first use in this file of multi-line comment :man_shrugging: ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17331#pullrequestreview-1830123844 PR Review Comment: https://git.openjdk.org/jdk/pull/17331#discussion_r1457709717 From jbhateja at openjdk.org Thu Jan 18 17:10:33 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 18 Jan 2024 17:10:33 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v6] In-Reply-To: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Message-ID: > Hi, > > Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only targets. > Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 instruction set. > These are very frequently used APIs in columnar database filter operation. > > Implementation uses a lookup table to record permute indices. Table index is computed using > mask argument of compress/expand operation. > > Following are the performance number of JMH micro included with the patch. > > > System : Intel(R) Xeon(R) Platinum 8480+ (Sapphire Rapids) > > Baseline: > Benchmark (size) Mode Cnt Score Error Units > ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 142.767 ops/ms > ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 71.436 ops/ms > ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 35.992 ops/ms > ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 182.151 ops/ms > ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 91.096 ops/ms > ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 44.757 ops/ms > ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 184.099 ops/ms > ColumnFilterBenchmark.filterIntColumn 2047 thrpt 2 91.981 ops/ms > ColumnFilterBenchmark.filterIntColumn 4096 thrpt 2 45.170 ops/ms > ColumnFilterBenchmark.filterLongColumn 1024 thrpt 2 148.017 ops/ms > ColumnFilterBenchmark.filterLongColumn 2047 thrpt 2 73.516 ops/ms > ColumnFilterBenchmark.filterLongColumn 4096 thrpt 2 36.844 ops/ms > > Withopt: > Benchmark (size) Mode Cnt Score Error Units > ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 2051.707 ops/ms > ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 914.072 ops/ms > ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 489.898 ops/ms > ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 5324.195 ops/ms > ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 2587.229 ops/ms > ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 1278.665 ops/ms > ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 4149.384 ops/ms > ColumnFilterBenchmark.filterIntColumn 2047 thrpt 2 1791.170 ops/ms > ColumnFilterBenchmark.filterIntColumn 4096... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Space fixup ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17261/files - new: https://git.openjdk.org/jdk/pull/17261/files/c3f1c50e..3ed6b8bf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17261&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17261&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17261.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17261/head:pull/17261 PR: https://git.openjdk.org/jdk/pull/17261 From jbhateja at openjdk.org Thu Jan 18 17:10:35 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 18 Jan 2024 17:10:35 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v5] In-Reply-To: References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> <4-XrsvK-2HpBV3neMmQQ5Q1A4FDOAnmyFtCkKKZcf2A=.32df7d9e-e399-4715-a6b5-f3f2e9c77150@github.com> Message-ID: On Tue, 16 Jan 2024 07:08:57 GMT, Emanuel Peter wrote: >> Each long/double permute lane holds 64 bit value. > > @jatin-bhateja so why do you shift by 5? I thought 4 longs are 32 bit? For long/double each permute row is 32 byte in size, so a shift by 5 to compute row address. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1457747672 From cslucas at openjdk.org Thu Jan 18 18:31:26 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Thu, 18 Jan 2024 18:31:26 GMT Subject: RFR: JDK-8322572: AllocationMergesTests.java fails with "IRViolationException: There were one or multiple IR rule failures." [v2] In-Reply-To: References: Message-ID: > Please review this PR to fix a test in `AllocationMergesTest.java`. The test was failing intermittently because of a random value used as parameter was causing some randomization in the shape method's IR graph and consequently the optimization tested wasn't happening. > > Tested this locally on Mac, Win and Linux x86_64. Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: fix spacing. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17469/files - new: https://git.openjdk.org/jdk/pull/17469/files/bfbb757c..868d988d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17469&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17469&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17469.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17469/head:pull/17469 PR: https://git.openjdk.org/jdk/pull/17469 From cslucas at openjdk.org Thu Jan 18 18:57:27 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Thu, 18 Jan 2024 18:57:27 GMT Subject: RFR: JDK-8322572: AllocationMergesTests.java fails with "IRViolationException: There were one or multiple IR rule failures." [v3] In-Reply-To: References: Message-ID: > Please review this PR to fix a test in `AllocationMergesTest.java`. The test was failing intermittently because of a random value used as parameter was causing some randomization in the shape method's IR graph and consequently the optimization tested wasn't happening. > > Tested this locally on Mac, Win and Linux x86_64. Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: Fix copyright header date. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17469/files - new: https://git.openjdk.org/jdk/pull/17469/files/868d988d..a5cd36be Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17469&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17469&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17469.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17469/head:pull/17469 PR: https://git.openjdk.org/jdk/pull/17469 From kvn at openjdk.org Thu Jan 18 19:08:12 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 18 Jan 2024 19:08:12 GMT Subject: RFR: 8323795: jcmd Compiler.codecache should print total size of code cache [v2] In-Reply-To: References: Message-ID: <1bgn01MbhJtpBAsy7DGYqmQkLvHD5mQfgNFH0GJoYBQ=.5f2e5686-a65e-4c8a-b498-d8ead2c6e7c0@github.com> On Wed, 17 Jan 2024 03:02:27 GMT, Yi Yang wrote: >> CodeHeap 'non-profiled nmethods': size=118592Kb used=29Kb max_used=29Kb free=118562Kb >> bounds [0x00007fbe84622000, 0x00007fbe84892000, 0x00007fbe8b9f2000] >> CodeHeap 'profiled nmethods': size=118588Kb used=80Kb max_used=80Kb free=118507Kb >> bounds [0x00007fbe7c9f2000, 0x00007fbe7cc62000, 0x00007fbe83dc1000] >> CodeHeap 'non-nmethods': size=8580Kb used=1258Kb max_used=1834Kb free=7321Kb >> bounds [0x00007fbe83dc1000, 0x00007fbe84031000, 0x00007fbe84622000] >> total_blobs=474 nmethods=87 adapters=293 >> compilation: enabled >> stopped_count=0, restarted_count=0 >> full_count=0 >> >> >> It's better to accumulates total size of used/free/size, for example >> >> -SegmentedCodeCache >> CodeCache: size=245760Kb used=1366Kb max_used=1943Kb free=244393Kb >> bounds [0x00007fdcc89f2000, 0x00007fdcc8c62000, 0x00007fdcd79f2000] >> total_blobs=474, nmethods=87, adapters=293 >> stopped_count=0, restarted_count=0, full_count=0 >> compilation=enabled >> >> >> >> +SegmentedCodeCache >> CodeHeap 'non-profiled nmethods': size=118592Kb used=29Kb max_used=29Kb free=118562Kb >> bounds [0x00007f89c8622000, 0x00007f89c8892000, 0x00007f89cf9f2000] >> CodeHeap 'profiled nmethods': size=118588Kb used=80Kb max_used=80Kb free=118507Kb >> bounds [0x00007f89c09f2000, 0x00007f89c0c62000, 0x00007f89c7dc1000] >> CodeHeap 'non-nmethods': size=8580Kb used=1258Kb max_used=1834Kb free=7321Kb >> bounds [0x00007f89c7dc1000, 0x00007f89c8031000, 0x00007f89c8622000] >> CodeCache: size=245760Kb, used=1367Kb, max_used=1943Kb, free=244390Kb >> total_blobs=474, nmethods=87, adapters=293 >> stopped_count=0, restarted_count=0, full_count=0 >> compilation=enabled > > Yi Yang has updated the pull request incrementally with one additional commit since the last revision: > > new output && fix test `stopped_count` and `restarted_count` are related to compilation. That is why I suggested to print them on the same line: Compilation: enabled, stopped_count=0, restarted_count=0 `full_count` can be put on the line with numbers for methods and others. I am fine with last refactoring you did for different `SegmentedCodeCache` flag values. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17445#issuecomment-1899051168 From kvn at openjdk.org Thu Jan 18 19:09:13 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 18 Jan 2024 19:09:13 GMT Subject: RFR: JDK-8322572: AllocationMergesTests.java fails with "IRViolationException: There were one or multiple IR rule failures." [v3] In-Reply-To: References: Message-ID: On Thu, 18 Jan 2024 18:57:27 GMT, Cesar Soares Lucas wrote: >> Please review this PR to fix a test in `AllocationMergesTest.java`. The test was failing intermittently because of a random value used as parameter was causing some randomization in the shape method's IR graph and consequently the optimization tested wasn't happening. >> >> Tested this locally on Mac, Win and Linux x86_64. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Fix copyright header date. Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17469#pullrequestreview-1830385677 From ddong at openjdk.org Fri Jan 19 06:19:30 2024 From: ddong at openjdk.org (Denghui Dong) Date: Fri, 19 Jan 2024 06:19:30 GMT Subject: RFR: 8322694: C1: Handle Constant and IfOp in NullCheckEliminator [v2] In-Reply-To: References: <1vvyuwLRjlWItKwCyighjCSM5SNbO4CSEE59hQtCU24=.b4783e52-328a-4ce3-8c92-7b736cea7546@github.com> Message-ID: <7HJhVuUEGFQ6jVQ8g4Q-I1Q7BJ_Usnu4d9wU82vXQ_Q=.f807ce13-fc1b-49bc-bfeb-48844af507b2@github.com> On Wed, 3 Jan 2024 13:37:21 GMT, Denghui Dong wrote: >> This patch added the support for Constant and IfOn in NullCheckEliminator to eliminate more null check. >> >> testing: tier 1-4 no extra test failure > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > update Hi, I wrote a simple test to verify the IR. My test: import java.util.Random; class Test { public static void main(String... args) throws Exception { Random r = new Random(); for (;;) { getHash(r.nextInt()); } } public static int getHash(int i) { String text = i % 2 == 1 ? "Hello" : "World"; int result = text.hashCode(); return result; } } Run with `-XX:-Inline -XX:+PrintIR -XX:TieredStopAtLevel=1 -XX:CompileCommand=compileonly,Test::getHash Test` (fastdebug) IR before code generation with this patch: IR before code generation B4 [0, 0] -> B5 empty stack inlining depth 0 __bci__use__tid____instr____________________________________ . 0 0 20 std entry B5 B5 (S) [0, 0] -> B0 dom B4 pred: B4 empty stack inlining depth 0 __bci__use__tid____instr____________________________________ . 0 0 19 goto B0 B0 (SV) [0, 21] dom B5 pred: B5 empty stack inlining depth 0 __bci__use__tid____instr____________________________________ 1 1 i5 2 . 2 1 i6 i4 % i5 3 1 i7 1 4 1 a21 4 1 a22 . 4 3 a23 i6 != i7 ? a21 : a22 . 16 0 a14 null_check(a23) (eliminated) . 16 2 i15 a23.invokespecial() java/lang/String.hashCode()I . 21 0 i16 ireturn i15 We can see that null_check is eliminated. IR before code generation without this patch: IR before code generation B4 [0, 0] -> B5 empty stack inlining depth 0 __bci__use__tid____instr____________________________________ . 0 0 20 std entry B5 B5 (S) [0, 0] -> B0 dom B4 pred: B4 empty stack inlining depth 0 __bci__use__tid____instr____________________________________ . 0 0 19 goto B0 B0 (SV) [0, 21] dom B5 pred: B5 empty stack inlining depth 0 __bci__use__tid____instr____________________________________ 1 1 i5 2 . 2 1 i6 i4 % i5 3 1 i7 1 4 1 a21 4 1 a22 . 4 3 a23 i6 != i7 ? a21 : a22 . 16 0 a14 null_check(a23) . 16 2 i15 a23.invokespecial() java/lang/String.hashCode()I . 21 0 i16 ireturn i15 I have tested this patch with tier 1-4. There are some test failures caused by my environment. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17191#issuecomment-1899826148 PR Comment: https://git.openjdk.org/jdk/pull/17191#issuecomment-1899829359 From rcastanedalo at openjdk.org Fri Jan 19 07:35:28 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 19 Jan 2024 07:35:28 GMT Subject: RFR: JDK-8321984: IGV: Upgrade to Netbeans Platform 20 [v4] In-Reply-To: References: Message-ID: On Thu, 18 Jan 2024 11:50:29 GMT, Tobias Holenstein wrote: >> Upgraded IGV and dependencies to the newest Netbeans Platform 20 which was released on December 2023. >> >> Tested that IGV still behaves as expected after the upgrade. > > Tobias Holenstein has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains two new commits since the last revision: > > - make IGV build work with mainline JDK version > - remove requirement for nashorn Thanks for addressing my earlier comment, Toby! Please update the copyright headers. src/utils/IdealGraphVisualizer/pom.xml line 80: > 78: > 79: > 80: 17 I suggest being a bit more conservative (safe) here and specifying an upper bound for the JDK version, since we cannot guarantee that all IGV dependencies will be compatible with any future JDK release. I suggest sticking to the newest JDK supported by the NetBeans Platform (for NetBeans 20 that would be JDK 21, see https://netbeans.apache.org/front/main/download/nb20/). ------------- Changes requested by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17106#pullrequestreview-1831764474 PR Review Comment: https://git.openjdk.org/jdk/pull/17106#discussion_r1458498114 From epeter at openjdk.org Fri Jan 19 07:46:28 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 19 Jan 2024 07:46:28 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v5] In-Reply-To: References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> <4-XrsvK-2HpBV3neMmQQ5Q1A4FDOAnmyFtCkKKZcf2A=.32df7d9e-e399-4715-a6b5-f3f2e9c77150@github.com> Message-ID: On Thu, 18 Jan 2024 17:06:55 GMT, Jatin Bhateja wrote: >> @jatin-bhateja so why do you shift by 5? I thought 4 longs are 32 bit? > > For long/double each permute row is 32 byte in size, so a shift by 5 to compute row address. Ah right. Maybe we could say `32byte = 4 long = 4 * 64bit`. Because "64bit row" sounds like the whole row is only 64 bit long. It is actually the cells that are 64bits, not the rows! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1458509886 From aph at openjdk.org Fri Jan 19 08:58:27 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 19 Jan 2024 08:58:27 GMT Subject: RFR: 8324123: aarch64: fix prfm literal encoding in assembler In-Reply-To: References: Message-ID: <4ax4z-xLJNI-LAyGEKct-lDDNMi_ZN2QV3ILYTjVeDM=.d61f5d47-10fd-4a94-aed5-e88bfe8553d3@github.com> On Thu, 18 Jan 2024 10:02:59 GMT, Wang Zhuo wrote: > Current prfm literal mode encoding in aarch64 assembler is not correct. > The prfm_literal instruction requires 31 and 30 bits to be 0x11, while current assembler encodes the two bits to be 0x11, which is a ldr instruction, not prfm. > For example, if adding the following code in stubGenerator > __ prfm(Address(__ pc())) > we get a ldr instruction like > ldr x0, 0x0000ffff83f8539c > but it should be a prfm instruction like > prfm pldl1keep, 0x0000ffff8ff8539c > > The bug is caused in ld_st2, literal mode, bit 31 and 30 bits are set to (size & 0b01), while for prfm instructions, 31 and 30 bits must be 0b11. > void ld_st2(Register Rt, const Address &adr, int size, int op, int V = 0) { > starti; > > f(V, 26); // general reg? > zrf(Rt, 0); > > // Encoding for literal loads is done here (rather than pushed > // down into Address::encode) because the encoding of this > // instruction is too different from all of the other forms to > // make it worth sharing. > if (adr.getMode() == Address::literal) { > assert(size == 0b10 || size == 0b11, "bad operand size in ldr"); > assert(op == 0b01, "literal form can only be used with loads"); > f(**size & 0b01, 31, 30**), f(0b011, 29, 27), f(0b00, 25, 24); > int64_t offset = (adr.target() - pc()) >> 2; > sf(offset, 23, 5); > code_section()->relocate(pc(), adr.rspec()); > return; > } > > f(size, 31, 30); > f(op, 23, 22); // str > adr.encode(¤t_insn); > } src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 1585: > 1583: int64_t offset = (adr.target() - pc()) >> 2; \ > 1584: sf(offset, 23, 5); \ > 1585: } else { \ This looks reasonable, but we don't need it to be inline. See the examples of `adr` and `_adrp`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17482#discussion_r1458602894 From aph-open at littlepinkcloud.com Fri Jan 19 09:08:26 2024 From: aph-open at littlepinkcloud.com (Andrew Haley) Date: Fri, 19 Jan 2024 09:08:26 +0000 Subject: discuss about release barrier for final fields initialization In-Reply-To: References: <5e41f867-b31b-4ded-b737-8d0c869c8895.kuaiwei.kw@alibaba-inc.com> <20965b9a-ec64-4d81-866e-b1a0a94ed1e0@littlepinkcloud.com> <822b0f93-ae9e-4196-8d20-1b87286b91d5.kuaiwei.kw@alibaba-inc.com> <502b509b-bf31-4eee-8468-3f2362d69da8@littlepinkcloud.com> Message-ID: On 1/18/24 10:49, Wei Kuai wrote: > ? I tested "dmb.ishst; dmb.ishld" for release barrier. The test case is jmh of allocation with final fields. https://gist.github.com/kuaiwei/f71fba40df29991c93325a8600e34c13 > In N1 > ? dmb.ish? ? ? ? ? ? ? ? ? ? : 1168.059 ops/s > ? dmb.ishst+dmb.ishld:?1321.783 ops/s > ? dmb.ishst? ? ? ? ? ? ? ? ?:?1511.267 ops/s > In N2 > ? dmb.ish? ? ? ? ? ? ? ? ? ? : 3672.087 ops/s > ? dmb.ishst+dmb.ishld: 4840.322 ops/s > ? dmb.ishst? ? ? ? ? ? ? ? ?: 6005.430 ops/s > > The "dmb.ishst+dmb.ishld" can gain 13% and 32% on N1 and N2. It looks a better replacement for "dmb.ish" Let's do that, then. If you'd like to submit a patch that does this, please make sure to include the benchmark. We should also stop merging 'dmb ld' and 'dmb st' into 'dmb sy'. We still want to merge duplicated `dmb`s, but the side effect of strengthening is not good. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From ngasson at openjdk.org Fri Jan 19 09:45:28 2024 From: ngasson at openjdk.org (Nick Gasson) Date: Fri, 19 Jan 2024 09:45:28 GMT Subject: RFR: JDK-8210858: AArch64: remove Math.log intrinsic [v2] In-Reply-To: References: <7Zgt8VOxl48Niocueb70XwxmYIAIaEZUlxKGvPP6BnU=.e354f71c-d6c6-40b9-a93e-98ef7ba8009d@github.com> Message-ID: On Thu, 18 Jan 2024 14:44:25 GMT, Tobias Holenstein wrote: >> [JDK-8215133](https://bugs.openjdk.org/browse/JDK-8215133) disabled vmIntrinsics::_dlog. Remove it now >> >> ### Why remove >> >> That Java specification says: >> >> "The computed result must be within 1 ulp of the exact result. Results must be semi-monotonic... whenever the mathematical function is non-decreasing, so is the floating-point approximation, likewise, whenever the mathematical function is non-increasing, so is the floating-point approximation" >> >> There is no proof of the monotonicity of this intrinsics at the moment. > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > remove MacroAssembler::fast_log() and generate_dlog() Marked as reviewed by ngasson (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/17480#pullrequestreview-1832006444 From wei.kuai at gmail.com Fri Jan 19 10:05:45 2024 From: wei.kuai at gmail.com (Wei Kuai) Date: Fri, 19 Jan 2024 18:05:45 +0800 Subject: discuss about release barrier for final fields initialization In-Reply-To: References: <5e41f867-b31b-4ded-b737-8d0c869c8895.kuaiwei.kw@alibaba-inc.com> <20965b9a-ec64-4d81-866e-b1a0a94ed1e0@littlepinkcloud.com> <822b0f93-ae9e-4196-8d20-1b87286b91d5.kuaiwei.kw@alibaba-inc.com> <502b509b-bf31-4eee-8468-3f2362d69da8@littlepinkcloud.com> Message-ID: I've submitted a jbs for it. https://bugs.openjdk.org/browse/JDK-8324186 Thanks, Kuai Wei On Fri, Jan 19, 2024 at 5:08?PM Andrew Haley wrote: > On 1/18/24 10:49, Wei Kuai wrote: > > I tested "dmb.ishst; dmb.ishld" for release barrier. The test case is > jmh of allocation with final fields. > https://gist.github.com/kuaiwei/f71fba40df29991c93325a8600e34c13 < > https://gist.github.com/kuaiwei/f71fba40df29991c93325a8600e34c13> > > In N1 > > dmb.ish : 1168.059 ops/s > > dmb.ishst+dmb.ishld: 1321.783 ops/s > > dmb.ishst : 1511.267 ops/s > > In N2 > > dmb.ish : 3672.087 ops/s > > dmb.ishst+dmb.ishld: 4840.322 ops/s > > dmb.ishst : 6005.430 ops/s > > > > The "dmb.ishst+dmb.ishld" can gain 13% and 32% on N1 and N2. It looks a > better replacement for "dmb.ish" > > Let's do that, then. > > If you'd like to submit a patch that does this, please make sure to include > the benchmark. > > We should also stop merging 'dmb ld' and 'dmb st' into 'dmb sy'. We still > want > to merge duplicated `dmb`s, but the side effect of strengthening is not > good. > > -- > Andrew Haley (he/him) > Java Platform Lead Engineer > Red Hat UK Ltd. > https://keybase.io/andrewhaley > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From epeter at openjdk.org Fri Jan 19 13:18:30 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 19 Jan 2024 13:18:30 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store In-Reply-To: References: Message-ID: On Wed, 18 Oct 2023 13:04:35 GMT, Emanuel Peter wrote: > This is a feature requiested by @RogerRiggs and @cl4es . > > **Idea** > > Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get speedups by using Unsafe (e.g. `Unsafe.putLongUnaligned`), or ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code. > > This patch here supports a few simple use-cases, like these: > > Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395 > > Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the splitting (i.e. shifting and truncation), and directly store the variable: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456 > > The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian): > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338 > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472 > > **Details** > > This draft currently implements the optimization in an additional special IGVN phase: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485 > > We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. We essentially try to establish a chain of mergable stores: > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806 > > Mergable stores must have the same Opcode (implies they have the same element type and hence size). Further, mergable stores must have the same control (or be separated by only a RangeCheck). Further, they must either both store constants, or adjacent segments of a larger value ... src/hotspot/share/opto/compile.cpp line 2486: > 2484: C->set_merge_stores_phase(false); > 2485: } > 2486: @vnkozlov @rwestrel 1. Should I guard the optimization by a flag, maybe `MergeStores`? 2. Should I make a fresh pass over the whole graph like in `gather_nodes_for_merge_stores`, or rather have a list that collects the store nodes during igvn, and that I can just readily pick up here. Just like these lists: https://github.com/openjdk/jdk/pull/16966/files#diff-f076857d7da81f56709da3de1511b1105727032186cde4d02c678667761f46eaR445-R451 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16245#discussion_r1458989585 From tholenstein at openjdk.org Fri Jan 19 13:23:51 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 19 Jan 2024 13:23:51 GMT Subject: RFR: JDK-8321984: IGV: Upgrade to Netbeans Platform 20 [v5] In-Reply-To: References: Message-ID: > Upgraded IGV and dependencies to the newest Netbeans Platform 20 which was released on December 2023. > > Tested that IGV still behaves as expected after the upgrade. Tobias Holenstein has updated the pull request incrementally with four additional commits since the last revision: - Update pom.xml copyright year 2024 - Update pom.xml copyright year 2024 - Update pom.xml copyright year - Update pom.xml copyright year ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17106/files - new: https://git.openjdk.org/jdk/pull/17106/files/35080de7..218ab753 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17106&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17106&range=03-04 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17106.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17106/head:pull/17106 PR: https://git.openjdk.org/jdk/pull/17106 From roland at openjdk.org Fri Jan 19 14:12:30 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 19 Jan 2024 14:12:30 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store In-Reply-To: References: Message-ID: On Wed, 18 Oct 2023 13:04:35 GMT, Emanuel Peter wrote: > This is a feature requiested by @RogerRiggs and @cl4es . > > **Idea** > > Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get speedups by using Unsafe (e.g. `Unsafe.putLongUnaligned`), or ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code. > > This patch here supports a few simple use-cases, like these: > > Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395 > > Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the splitting (i.e. shifting and truncation), and directly store the variable: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456 > > The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian): > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338 > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472 > > **Details** > > This draft currently implements the optimization in an additional special IGVN phase: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485 > > We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. We essentially try to establish a chain of mergable stores: > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806 > > Mergable stores must have the same Opcode (implies they have the same element type and hence size). Further, mergable stores must have the same control (or be separated by only a RangeCheck). Further, they must either both store constants, or adjacent segments of a larger value ... Why not provide new internal API points and intrinsics? The benefits would be: - less complexity on the c2 side (and less bugs) - much easier for someone writing java code to check that the optimization triggers (check the PrintInlining output that the intrinsic shows up vs check the final assembly code) - clear contract between the java libraries and the VM as to what optimizes under what conditions If I was the user for this I would be worried, that: - it's hard for me to check it's doing what I expect - even if it does initially, changes to the java code (maybe by other people less familiar with this transformation) could break the optimization. If there's a call to some specific API, at least people changing the code know special attention is necessary and that as long as the new API points are used, the optimization is guaranteed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16245#issuecomment-1900489523 From mli at openjdk.org Fri Jan 19 14:35:28 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 19 Jan 2024 14:35:28 GMT Subject: RFR: JDK-8318228: RISC-V: C2 ConvF2HF In-Reply-To: References: Message-ID: On Thu, 18 Jan 2024 07:07:30 GMT, Fei Yang wrote: >> Hi, >> Can you review the patch to add ConvF2HF intrinsic to JDK for riscv? >> Thanks! >> >> ## Test >> ### Functionality >> #### hotspot tests >> test/hotspot/jtreg/compiler/intrinsics/ >> test/hotspot/jtreg/compiler/c2/irTests >> >> #### jdk tests >> test/jdk/java/lang/Float/Binary16Conversion*.java >> >> ### Performance >> tested on licheepi. >> >> #### with UseZfh enabled >> >> Benchmark (size) Mode Cnt Score Error Units >> Fp16ConversionBenchmark.floatToFloat16 2048 avgt 2 4170.549 ns/op >> Fp16ConversionBenchmark.floatToFloat16Memory 2048 avgt 2 21.492 ns/op >> >> >> #### with UseZfh disabled >> (i.e. disable the intrinsic) >> >> Benchmark (size) Mode Cnt Score Error Units >> Fp16ConversionBenchmark.floatToFloat16 2048 avgt 2 25036.647 ns/op >> Fp16ConversionBenchmark.floatToFloat16Memory 2048 avgt 2 27.326 ns/op > > src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1842: > >> 1840: >> 1841: // preserve the payloads of non-canonical NaNs. >> 1842: __ srai(dst, dst, 13); > > I see the lowest 13 bits of the payload for `src` is simply discarded here. But these bits are also used for calculating the new significand bits for float16 [1]. So this doesn't seem OK to me. Did I miss anything? > > [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/Float.java#L1112-L1113 It's discarded intentionally, just like in HF2F it's [padded with zero in lower 13 bits](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp#L1800) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17450#discussion_r1459107338 From epeter at openjdk.org Fri Jan 19 14:45:30 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 19 Jan 2024 14:45:30 GMT Subject: RFR: 8322743: assert(held_monitor_count() == jni_monitor_count()) failed [v2] In-Reply-To: References: Message-ID: <_3yNZAQhVIqNWHV1BmvYVOgJmUBfhA2VPMaqk95Qlcs=.a3927f92-285d-469e-b364-12d90fe52243@github.com> On Wed, 17 Jan 2024 20:20:05 GMT, Vladimir Kozlov wrote: >> Corner case with a local (not escaped) object used for synchronization. C2 Escape Analysis thinks that it can eliminate locks for it. In most cases it is true but not in this case. >> >> >> for (int i = 0; i < 2; ++i) { >> Object o = new Object(); >> synchronized (o) { // monitorenter >> // Trigger OSR compilation >> for (int j = 0; j < 100_000; ++j) { >> >> The test has nested loop which trigger OSR compilation. The locked object comes from Interpreter into compiled OSR code. During parsing C2 creates an other non escaped object and correctly merge both together (with Phi node) so that non escaped object is not scalar replaceable. Because it does not globally escapes EA still removes locks for it and, as result, also for merged locked object from Interpreter which is the bug. >> >> The fix is to check that synchronized block does not have any associated escaped objects when EA decides if locks can be eliminated. >> >> Added regression test prepared by @TobiHartmann. Tested tier1-5, xcomp and stress. >> Performance testing show no difference. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments Ok, so I did some rudimentary study of EA. And now this PR makes much more sense ;) Let me summarize my understanding of the issue: An object gets allocated in interpreter, and we lock on it in the interpreter. OSR is triggered, the object is passed in as OSR parameter, we hold the lock. The OSR control flow now looks like this: StartOSR: LoadP -> load the object created in interpreter we have not_global_escape(LoadP) == false so this is correctly marked as escaping now the osr path injects into the middle of the loop Loop: Phi -> merge interpreter obj and that from this compiled code we have not_global_escape(Phi) == false ... Unlock(Phi) ... check some condition, maybe return ... obj = CheckCastPP( Allocate(i.e. new Object()) ) we have not_global_escape(obj) == true this is correct, the object will never escape Lock(obj) ... goto Loop So if I understand this correctly, the marking in/with the ConnectionGrap is correct: - The object passed in through OSR is marked as escaping. - The object created locally is marked as non-escaping. - The loop-phi that merges the two must therefore also be possibly escaping. The question is then with the condition of Lock removal: Can we remove the lock, just because its object is marked as non-escaping? At first glance: obviously, because nobody else could ever have the object, and so nobody can ever lock/unlock it. In the example, if we look at the Unlock node, we cannot remove it (at least at first): its object is possibly escaping, because the Phi is not marked non-escaping. But we can remove the Lock, since its object is non-escaping. This is where the trouble starts. I think it is exactly for this reason, that @vnkozlov thinks one cannot just look at the object of the individual Lock/Unlock node, but one has to look at all Lock/Unlock nodes of a BoxLock, and see if all objects are non-escaping. @vnkozlov please correct me if I got something wrong ;) src/hotspot/share/opto/callnode.cpp line 2004: > 2002: // > 2003: ConnectionGraph *cgr = phase->C->congraph(); > 2004: if (cgr != nullptr && cgr->can_eliminate_lock(this)) { I guess if you make this change, then you probably would also want to rename `NonEscObj` and `set_non_esc_obj` and `is_non_esc_obj`, right? Now it is not just about being non-escaped, but the more complex semantics of `can_eliminate_lock`. ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17331#pullrequestreview-1832633918 PR Review Comment: https://git.openjdk.org/jdk/pull/17331#discussion_r1459021353 From epeter at openjdk.org Fri Jan 19 14:58:29 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 19 Jan 2024 14:58:29 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store In-Reply-To: References: Message-ID: On Fri, 19 Jan 2024 14:09:28 GMT, Roland Westrelin wrote: >> This is a feature requiested by @RogerRiggs and @cl4es . >> >> **Idea** >> >> Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get speedups by using Unsafe (e.g. `Unsafe.putLongUnaligned`), or ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code. >> >> This patch here supports a few simple use-cases, like these: >> >> Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant: >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395 >> >> Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the splitting (i.e. shifting and truncation), and directly store the variable: >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456 >> >> The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian): >> >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338 >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472 >> >> **Details** >> >> This draft currently implements the optimization in an additional special IGVN phase: >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485 >> >> We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. We essentially try to establish a chain of mergable stores: >> >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806 >> >> Mergable stores must have the same Opcode (implies they have the same element type and hence size). Further, mergable stores must have the same control (or be separated by only a RangeCheck). Further, they must either bot... > > Why not provide new internal API points and intrinsics? The benefits would be: > - less complexity on the c2 side (and less bugs) > - much easier for someone writing java code to check that the optimization triggers (check the PrintInlining output that the intrinsic shows up vs check the final assembly code) > - clear contract between the java libraries and the VM as to what optimizes under what conditions > > If I was the user for this I would be worried, that: > - it's hard for me to check it's doing what I expect > - even if it does initially, changes to the java code (maybe by other people less familiar with this transformation) could break the optimization. If there's a call to some specific API, at least people changing the code know special attention is necessary and that as long as the new API points are used, the optimization is guaranteed. @rwestrel Well, that sounds like a good idea too. Especially for the better guarantees. I guess the benefit of this optimization here is that it optimizes lots of existing code, which use adjacent array stores, for example with constants. @cl4es @RogerRiggs what do you think, would you prefer a new internal API points and intrinsics? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16245#issuecomment-1900569327 From epeter at openjdk.org Fri Jan 19 15:14:29 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 19 Jan 2024 15:14:29 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v5] In-Reply-To: References: <6SOOtHaZrEN8UIKKLHBJniz_nUvZjRvV0TLofW7Xjxk=.debaf12a-b7c8-4b04-b217-3b8cd9b3c6f5@github.com> Message-ID: On Thu, 11 Jan 2024 03:20:29 GMT, Quan Anh Mai wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 27 commits: >> >> - Merge branch 'master' into improvevalue >> - Merge branch 'master' into improvevalue >> - improve add/sub implementation >> - Merge branch 'master' into improvevalue >> - typo >> - whitespace >> - fix tests for x86_32 >> - fix widen of ConvI2L >> - problem lists >> - format >> - ... and 17 more: https://git.openjdk.org/jdk/compare/f0169341...843ad076 > > May someone give their opinion on this PR, please? Thanks a lot. @merykitty I've been seeing this PR for a while. I can't promise high priority here, but this looks interesting. It is also a scarily big changeset ? A first remark: I would love to see more tests. For example: have some expressions that have certain bits random, and others masked on or off. Then you can have all sorts of ifs with bit checks, that lead to some special if-else code patterns. With an IR rule you can then see if this if gets folded, by checking if the true-block and/or false-block are still present. What do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15440#issuecomment-1900599235 From mli at openjdk.org Fri Jan 19 15:17:26 2024 From: mli at openjdk.org (Hamlin Li) Date: Fri, 19 Jan 2024 15:17:26 GMT Subject: RFR: JDK-8318228: RISC-V: C2 ConvF2HF In-Reply-To: References: Message-ID: On Fri, 19 Jan 2024 14:31:37 GMT, Hamlin Li wrote: >> src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1842: >> >>> 1840: >>> 1841: // preserve the payloads of non-canonical NaNs. >>> 1842: __ srai(dst, dst, 13); >> >> I see the lowest 13 bits of the payload for `src` is simply discarded here. But these bits are also used for calculating the new significand bits for float16 [1]. So this doesn't seem OK to me. Did I miss anything? >> >> [1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/Float.java#L1112-L1113 > > It's discarded intentionally, just like in HF2F it's [padded with zero in lower 13 bits](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp#L1800) Right now, I'm not sure. I have below patch: diff --git a/src/java.base/share/classes/java/lang/Float.java b/src/java.base/share/classes/java/lang/Float.java index 7508c22d7f4..f96e23b568e 100644 --- a/src/java.base/share/classes/java/lang/Float.java +++ b/src/java.base/share/classes/java/lang/Float.java @@ -1108,9 +1108,7 @@ public static short floatToFloat16(float f) { // Preserve high order bit of float NaN in the // binary16 result NaN (tenth bit); OR in remaining // bits into lower 9 bits of binary 16 significand. - | (doppel & 0x007f_e000) >> 13 // 10 bits - | (doppel & 0x0000_1ff0) >> 4 // 9 bits - | (doppel & 0x0000_000f)); // 4 bits + | (doppel & 0x007f_e000) >> 13); // 10 bits } float abs_f = Math.abs(f); And, test/jdk/java/lang/Float/Binary16ConversionNaN.java/Binary16Conversion.java both passed. Either the tests(both library and hotspot) + intrinsics (not sure if intrinsics on other platforms need improvement) needs to be improved, or the code in library needs to be simplified. (To be frank, I don't think NaN needs such a complicated spec/design, but it depends on the spec). I just filed a library bug to discuss it, [DK-8324212](https://bugs.openjdk.org/browse/JDK-8324212) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17450#discussion_r1459181882 From redestad at openjdk.org Fri Jan 19 15:23:29 2024 From: redestad at openjdk.org (Claes Redestad) Date: Fri, 19 Jan 2024 15:23:29 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store In-Reply-To: References: Message-ID: On Wed, 18 Oct 2023 13:04:35 GMT, Emanuel Peter wrote: > This is a feature requiested by @RogerRiggs and @cl4es . > > **Idea** > > Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get speedups by using Unsafe (e.g. `Unsafe.putLongUnaligned`), or ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code. > > This patch here supports a few simple use-cases, like these: > > Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395 > > Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the splitting (i.e. shifting and truncation), and directly store the variable: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456 > > The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian): > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338 > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472 > > **Details** > > This draft currently implements the optimization in an additional special IGVN phase: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485 > > We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. We essentially try to establish a chain of mergable stores: > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806 > > Mergable stores must have the same Opcode (implies they have the same element type and hence size). Further, mergable stores must have the same control (or be separated by only a RangeCheck). Further, they must either both store constants, or adjacent segments of a larger value ... There already are internal APIs and `VarHandles` to enable similar optimizations, see e.g. `jdk.internal.util.ByteArray/-LittleEndian::putInt/-Long`. The very point of this RFE was to opportunistically enable similar optimizations more automatically in idiomatic java code without the need to bring out the big guns. Of course such automatic transformations will have some level of fragility and you might accidentally disable the optimization in a variety of ways (since C2 needs to draw the line somewhere) - but that's the case for many other heuristic and opportunistic optimizations. Should we not optimize counted loops or do loop unrolling because it's easy to add something that makes C2 bail out on you? Having this optimization in C2 also allows us to avoid dependencies on `VarHandles` in bootstrap sensitive code and still enable the optimization. It might also have a benefit on startup/warmup characteristics. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16245#issuecomment-1900612869 From ddong at openjdk.org Fri Jan 19 15:30:35 2024 From: ddong at openjdk.org (Denghui Dong) Date: Fri, 19 Jan 2024 15:30:35 GMT Subject: RFR: 8324213: C1: There is no need for Canonicalizer to handle IfOp Message-ID: IfOp will not be created when building the graph, so there is no need for Canonicalizer to handle IfOp. ------------- Commit messages: - 8324213: C1: There is no need for Canonicalizer to handle IfOp Changes: https://git.openjdk.org/jdk/pull/17499/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17499&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8324213 Stats: 3 lines in 1 file changed: 0 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17499.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17499/head:pull/17499 PR: https://git.openjdk.org/jdk/pull/17499 From chagedorn at openjdk.org Fri Jan 19 15:51:37 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 19 Jan 2024 15:51:37 GMT Subject: Integrated: 8323154: C2: assert(cmp != nullptr && cmp->Opcode() == Op_Cmp(bt)) failed: no exit test In-Reply-To: <-3LO2nXaqGEyNCjjPjYnE9su-CUFR_FZZ85wp-aU6J0=.28a03271-f85c-4394-bbc2-000fe8d84a2a@github.com> References: <-3LO2nXaqGEyNCjjPjYnE9su-CUFR_FZZ85wp-aU6J0=.28a03271-f85c-4394-bbc2-000fe8d84a2a@github.com> Message-ID: On Wed, 17 Jan 2024 07:44:28 GMT, Christian Hagedorn wrote: > This bug is very similar to [JDK-8314191](https://bugs.openjdk.org/browse/JDK-8314191) but with long counted loops instead of int counted loops and a with a different manifestation. > > The original problem was that [JDK-8276162](https://bugs.openjdk.org/browse/JDK-8276162) added transformations for `CmpI` nodes to use `CmpU` nodes instead. The transformations were also applied for `CmpI` nodes of counted loop exit checks which messed pattern matching up. [JDK-8314191](https://bugs.openjdk.org/browse/JDK-8314191) and the follow-up fix [JDK-8316719](https://bugs.openjdk.org/browse/JDK-8316719) fixed this but only for `CountedLoopNodeEndNodes`. The newly added `is_cloop_condition()` method only checks for `is_CountedLoopEnd()` (int counted loops) instead of `is_BaseCountedLoopEnd()` (also includes long counted loop). This patch fixes this. > > I've had a closer look at other uses of `is_CountedLoop*` and found that > https://github.com/openjdk/jdk/blob/1007618f6f97fad0f66e4074b50521bdd853629e/src/hotspot/share/opto/subnode.cpp#L126-L139 > and > https://github.com/openjdk/jdk/blob/1007618f6f97fad0f66e4074b50521bdd853629e/src/hotspot/share/opto/subnode.cpp#L155-L163 > should probably also use the `BaseCountedLoop*` versions. These methods are used in `ok_to_convert()` which is used in several places. They try to prevent transformations involving the iv and the increment node of a counted loop to save registers. However, these transformations are still applied before a loop is transformed to a counted loop. This raises the question whether these bailouts should be extended to also work before loop opts. Since this code has been around for such a long time, it would also be interesting to see, if it's still beneficial to block these optimizations in general. If so, it might be good if we could add some IR tests to prove that. > > There are more places where we try to prevent such transformations but again only if we already have a counted loop. For example: > https://github.com/openjdk/jdk/blob/1007618f6f97fad0f66e4074b50521bdd853629e/src/hotspot/share/opto/addnode.cpp#L176-L191 > or > https://github.com/openjdk/jdk/blob/1007618f6f97fad0f66e4074b50521bdd853629e/src/hotspot/share/opto/addnode.cpp#L193-L209 > > It might be a good idea to revisit all of these bailouts in general and check if it's still beneficial to have them around and if they should be extended to also work before loop opts. > > I suggest to do this investigation together with fixing `CountedLoop*` -> `BaseCountedLoop*` in... This pull request has now been integrated. Changeset: 6997bfc6 Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/6997bfc68def7f80fbf6a7486a4b9f61225fc471 Stats: 50 lines in 2 files changed: 49 ins; 0 del; 1 mod 8323154: C2: assert(cmp != nullptr && cmp->Opcode() == Op_Cmp(bt)) failed: no exit test Reviewed-by: roland, thartmann, qamai ------------- PR: https://git.openjdk.org/jdk/pull/17459 From qamai at openjdk.org Fri Jan 19 16:49:32 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 19 Jan 2024 16:49:32 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v5] In-Reply-To: References: <6SOOtHaZrEN8UIKKLHBJniz_nUvZjRvV0TLofW7Xjxk=.debaf12a-b7c8-4b04-b217-3b8cd9b3c6f5@github.com> Message-ID: <8bONeZQLoFJ06YxHsq0H-4MsyQZNfAnjBUM811ri9bU=.9b7583f4-6111-4e09-a067-1f193790d780@github.com> On Fri, 19 Jan 2024 15:11:24 GMT, Emanuel Peter wrote: >> May someone give their opinion on this PR, please? Thanks a lot. > > @merykitty I've been seeing this PR for a while. I can't promise high priority here, but this looks interesting. It is also a scarily big changeset ? > > A first remark: I would love to see more tests. For example: have some expressions that have certain bits random, and others masked on or off. Then you can have all sorts of ifs with bit checks, that lead to some special if-else code patterns. With an IR rule you can then see if this if gets folded, by checking if the true-block and/or false-block are still present. > > What do you think? @eme64 Thanks a lot for looking into this. Do you have any potential patterns in mind? I can come up with random patterns but they seem arbitrary and are most likely to be covered by other IR tests already. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15440#issuecomment-1900749282 From rriggs at openjdk.org Fri Jan 19 16:52:29 2024 From: rriggs at openjdk.org (Roger Riggs) Date: Fri, 19 Jan 2024 16:52:29 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store In-Reply-To: References: Message-ID: On Thu, 18 Jan 2024 15:23:15 GMT, Claes Redestad wrote: > I probably wouldn't write with decreasing indices, no, and if it'd be a lot of complexity I can see that it's not worth it. > > And while I'm not suggesting you need to care for big-endian _hardware_ it would be good to have microbenchmarks that explicitly use and write in big-endian. I guess these do get a similar speed-up with your patch, but some verification would be great. Big-endian is quite common in networking protocols after all. > > If it helps testing and verification I'd say only enabling this optimization on little-endian HW is fine if you can't find hardware or delegate to others to verify correctness and that there are equivalent speed-ups. Someone maintaining a big-endian platform should be able to test and verify as a follow-up, which might be better to not block progress here. There is an existing use case for descending indices though they are being done via method handles to handle latin1 and UTF16 charset, so not byte array accesses except inside the method handle. The String format optimizations assemble components right to left, it benefits conversions from binary to printable forms. The implementations of java.util.FormatConcatItem are in java.util.FormatItem. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16245#issuecomment-1900753444 From jbhateja at openjdk.org Fri Jan 19 19:03:31 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 19 Jan 2024 19:03:31 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v7] In-Reply-To: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Message-ID: > Hi, > > Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only targets. > Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 instruction set. > These are very frequently used APIs in columnar database filter operation. > > Implementation uses a lookup table to record permute indices. Table index is computed using > mask argument of compress/expand operation. > > Following are the performance number of JMH micro included with the patch. > > > System : Intel(R) Xeon(R) Platinum 8480+ (Sapphire Rapids) > > Baseline: > Benchmark (size) Mode Cnt Score Error Units > ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 142.767 ops/ms > ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 71.436 ops/ms > ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 35.992 ops/ms > ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 182.151 ops/ms > ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 91.096 ops/ms > ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 44.757 ops/ms > ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 184.099 ops/ms > ColumnFilterBenchmark.filterIntColumn 2047 thrpt 2 91.981 ops/ms > ColumnFilterBenchmark.filterIntColumn 4096 thrpt 2 45.170 ops/ms > ColumnFilterBenchmark.filterLongColumn 1024 thrpt 2 148.017 ops/ms > ColumnFilterBenchmark.filterLongColumn 2047 thrpt 2 73.516 ops/ms > ColumnFilterBenchmark.filterLongColumn 4096 thrpt 2 36.844 ops/ms > > Withopt: > Benchmark (size) Mode Cnt Score Error Units > ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 2051.707 ops/ms > ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 914.072 ops/ms > ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 489.898 ops/ms > ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 5324.195 ops/ms > ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 2587.229 ops/ms > ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 1278.665 ops/ms > ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 4149.384 ops/ms > ColumnFilterBenchmark.filterIntColumn 2047 thrpt 2 1791.170 ops/ms > ColumnFilterBenchmark.filterIntColumn 4096... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Modified code comment for clarity. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17261/files - new: https://git.openjdk.org/jdk/pull/17261/files/3ed6b8bf..b2190fc7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17261&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17261&range=05-06 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17261.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17261/head:pull/17261 PR: https://git.openjdk.org/jdk/pull/17261 From jbhateja at openjdk.org Fri Jan 19 19:03:32 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 19 Jan 2024 19:03:32 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v5] In-Reply-To: References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> <4-XrsvK-2HpBV3neMmQQ5Q1A4FDOAnmyFtCkKKZcf2A=.32df7d9e-e399-4715-a6b5-f3f2e9c77150@github.com> Message-ID: On Fri, 19 Jan 2024 07:43:18 GMT, Emanuel Peter wrote: >> For long/double each permute row is 32 byte in size, so a shift by 5 to compute row address. > > Ah right. Maybe we could say `32byte = 4 long = 4 * 64bit`. > Because "64bit row" sounds like the whole row is only 64 bit long. It is actually the cells that are 64bits, not the rows! DONE ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1459568064 From duke at openjdk.org Fri Jan 19 19:10:26 2024 From: duke at openjdk.org (Joshua Cao) Date: Fri, 19 Jan 2024 19:10:26 GMT Subject: RFR: 8323220: Reassociate loop invariants involved in Cmps and Add/Subs [v2] In-Reply-To: References: <-5JZDSqyvX6C2dOKIogkE4BKSD594q1RGX3POS4HnTQ=.4b4d01ed-de2d-4ea8-abc3-32e4ee53d5f2@github.com> Message-ID: On Thu, 18 Jan 2024 08:25:46 GMT, Emanuel Peter wrote: >> Some sort of pattern matcher could work. It would be able nice to match something like `a ADD_I b CMP_LT c`. In java this could look something like >> >> >> @IR(counts = {IRNode.CMP_LT[IRNode.ANY, IRNode.SUB_I, IRNode.ANY], "1"} >> >> >> The arguments in the `[]` are the inputs. `IRNode.ANY` matches any node. (The zero'th node is ANY because its the region node). >> >> Anyway, I think a `lt` test is not super-required for the coverage for this PR. The current machinery does not provide a convenient way to test it. I'd prefer to avoid something hacky. I think this work can be done separately. > > I agree with you there, don't do anything hacky here. > But yes, I've also been wondering what kind of improvements to the IR framework would help us to do these sorts of graph-matching verifications. Created https://bugs.openjdk.org/browse/JDK-8324226 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17375#discussion_r1459579634 From qamai at openjdk.org Fri Jan 19 20:06:05 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 19 Jan 2024 20:06:05 GMT Subject: RFR: 8282365: Consolidate and improve division by constant idealizations [v45] In-Reply-To: References: Message-ID: > This patch implements idealisation for unsigned divisions to change a division by a constant into a series of multiplication and shift. I also change the idealisation of `DivI` to get a more efficient series when the magic constant overflows an int32. > > In general, the idea behind a signed division transformation is that for a positive constant `d`, we would need to find constants `c` and `m` so that: > > floor(x / d) = floor(x * c / 2**m) for 0 < x < 2**(N - 1) (1) > ceil(x / d) = floor(x * c / 2**m) + 1 for -2**(N - 1) <= x < 0 (2) > > The implementation in the original book takes into consideration that the machine may not be able to perform the full multiplication `x * c`, so the constant overflow and we need to add back the dividend as in `DivLNode::Ideal` cases. However, for int32 division, `x * c` cannot overflow an int64. As a result, it is always feasible to just calculate the product and shift the result. > > For unsigned multiplication, the situation is somewhat trickier because the condition needs to be twice as strong (the condition (1) and (2) above are mostly the same). This results in the magic constant `c` calculated based on the method presented in Hacker's Delight by Henry S. Warren, Jr. may overflow an uintN. For int division, we can depend on the theorem devised by Arch D. Robison in N-Bit Unsigned Division Via N-Bit Multiply-Add, which states that there exists either: > > c1 in uint32 and m1, such that floor(x / d) = floor(x * c1 / 2**m1) for 0 < x < 2**32 (3) > c2 in uint32 and m2, such that floor(x / d) = floor((x + 1) * c2 / 2**m2) for 0 < x < 2**32 (4) > > which means that either `x * c1` never overflows an uint64 or `(x + 1) * c2` never overflows an uint64. And we can perform a full multiplication. > > For longs, there is no way to do a full multiplication so we do some basic transformations to achieve a computable formula. The details I have written as comments in the overflow case. > > More tests are added to cover the possible patterns. > > Please take a look and have some reviews. Thank you very much. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: suggestion ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9947/files - new: https://git.openjdk.org/jdk/pull/9947/files/db80bd4a..6634cd46 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9947&range=44 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9947&range=43-44 Stats: 17 lines in 1 file changed: 2 ins; 3 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/9947.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/9947/head:pull/9947 PR: https://git.openjdk.org/jdk/pull/9947 From qamai at openjdk.org Fri Jan 19 20:06:05 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 19 Jan 2024 20:06:05 GMT Subject: RFR: 8282365: Consolidate and improve division by constant idealizations [v44] In-Reply-To: References: Message-ID: On Thu, 18 Jan 2024 15:29:41 GMT, Raffaello Giulietti wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> update include order and license year > > src/hotspot/share/opto/divconstants.cpp line 176: > >> 174: qv = qv * 2; >> 175: rv = new_rv; >> 176: } > > One could perhaps avoid overflows in computing `rc` and `rv`, and simplify the corresponding tests, like so (not sure if this improves anything in practical terms, though): > Suggestion: > > if (d - rc < rc) { // 2 * rc > d > c_ovf = c > min_signed; > c += c - 1; > rc -= d - rc; // rc = 2 * rc - d > } else { // 2 * rc <= d > c_ovf = c >= min_signed; > c += c; > rc += rc; // rc = 2 * rc > } > > if (rv >= v - rv) { // 2 * rv >= v > qv_ovf = qv >= min_signed; > qv += qv + 1; > rv -= v - rv; // rv = 2 * rv - v > } else { // 2 * rv < v > qv_ovf = qv >= min_signed; > qv += qv; > rv += rv; // rv = 2 * rv > } That's a good idea, I have changed the formulae as your suggestion. Thanks a lot for your effort, this is really dense in mathematical transformations, I really appreciate it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1459674018 From qamai at openjdk.org Fri Jan 19 20:10:27 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 19 Jan 2024 20:10:27 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v5] In-Reply-To: References: <6SOOtHaZrEN8UIKKLHBJniz_nUvZjRvV0TLofW7Xjxk=.debaf12a-b7c8-4b04-b217-3b8cd9b3c6f5@github.com> Message-ID: On Fri, 19 Jan 2024 15:11:24 GMT, Emanuel Peter wrote: >> May someone give their opinion on this PR, please? Thanks a lot. > > @merykitty I've been seeing this PR for a while. I can't promise high priority here, but this looks interesting. It is also a scarily big changeset ? > > A first remark: I would love to see more tests. For example: have some expressions that have certain bits random, and others masked on or off. Then you can have all sorts of ifs with bit checks, that lead to some special if-else code patterns. With an IR rule you can then see if this if gets folded, by checking if the true-block and/or false-block are still present. > > What do you think? @eme64 > It is also a scarily big changeset ? Yes, it definitely is. If this is deemed a good idea then I would probably split it into patches which deal with different nodes and submit them separately. What do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15440#issuecomment-1901037716 From vkempik at openjdk.org Fri Jan 19 21:34:27 2024 From: vkempik at openjdk.org (Vladimir Kempik) Date: Fri, 19 Jan 2024 21:34:27 GMT Subject: RFR: JDK-8318228: RISC-V: C2 ConvF2HF In-Reply-To: References: Message-ID: On Tue, 16 Jan 2024 18:43:03 GMT, Hamlin Li wrote: > Hi, > Can you review the patch to add ConvF2HF intrinsic to JDK for riscv? > Thanks! > > ## Test > ### Functionality > #### hotspot tests > test/hotspot/jtreg/compiler/intrinsics/ > test/hotspot/jtreg/compiler/c2/irTests > > #### jdk tests > test/jdk/java/lang/Float/Binary16Conversion*.java > > ### Performance > tested on licheepi. > > #### with UseZfh enabled > > Benchmark (size) Mode Cnt Score Error Units > Fp16ConversionBenchmark.floatToFloat16 2048 avgt 2 4170.549 ns/op > Fp16ConversionBenchmark.floatToFloat16Memory 2048 avgt 2 21.492 ns/op > > > #### with UseZfh disabled > (i.e. disable the intrinsic) > > Benchmark (size) Mode Cnt Score Error Units > Fp16ConversionBenchmark.floatToFloat16 2048 avgt 2 25036.647 ns/op > Fp16ConversionBenchmark.floatToFloat16Memory 2048 avgt 2 27.326 ns/op src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1863: > 1861: > 1862: // check whether it's a NaN. > 1863: fclass_s(t0, src); As showed roundD intrinsic PR, ( https://github.com/openjdk/jdk/pull/16382/files#diff-7a5c3ed05b6f3f06ed1c59f5fc2a14ec566a6a5bd1d09606115767daa99115bdR4252 ) the feq_s(t0, src, src) + beqz(t0, label) seems to be a faster check for NaN, could you check the jmh numbers with feq_s ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17450#discussion_r1459807779 From dlong at openjdk.org Fri Jan 19 21:41:32 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 19 Jan 2024 21:41:32 GMT Subject: RFR: 8324213: C1: There is no need for Canonicalizer to handle IfOp In-Reply-To: References: Message-ID: On Fri, 19 Jan 2024 15:24:12 GMT, Denghui Dong wrote: > IfOp will not be created when building the graph, so there is no need for Canonicalizer to handle IfOp. src/hotspot/share/c1/c1_Canonicalizer.cpp line 472: > 470: > 471: void Canonicalizer::do_IfOp(IfOp* x) { > 472: } If you are saying this method can never be called, then shouldn't we make it report an error instead of being empty? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17499#discussion_r1459820958 From dlong at openjdk.org Fri Jan 19 21:56:33 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 19 Jan 2024 21:56:33 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store In-Reply-To: References: Message-ID: On Wed, 18 Oct 2023 13:04:35 GMT, Emanuel Peter wrote: > This is a feature requiested by @RogerRiggs and @cl4es . > > **Idea** > > Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get speedups by using Unsafe (e.g. `Unsafe.putLongUnaligned`), or ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code. > > This patch here supports a few simple use-cases, like these: > > Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395 > > Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the splitting (i.e. shifting and truncation), and directly store the variable: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456 > > The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian): > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338 > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472 > > **Details** > > This draft currently implements the optimization in an additional special IGVN phase: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485 > > We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. We essentially try to establish a chain of mergable stores: > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806 > > Mergable stores must have the same Opcode (implies they have the same element type and hence size). Further, mergable stores must have the same control (or be separated by only a RangeCheck). Further, they must either both store constants, or adjacent segments of a larger value ... I took a quick look and I don't see where the code checks that all the candidate stores are using MemNode::unordered and that there aren't memory barriers in between. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16245#issuecomment-1901168627 From kvn at openjdk.org Fri Jan 19 23:03:33 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 19 Jan 2024 23:03:33 GMT Subject: RFR: 8322743: assert(held_monitor_count() == jni_monitor_count()) failed [v2] In-Reply-To: <_3yNZAQhVIqNWHV1BmvYVOgJmUBfhA2VPMaqk95Qlcs=.a3927f92-285d-469e-b364-12d90fe52243@github.com> References: <_3yNZAQhVIqNWHV1BmvYVOgJmUBfhA2VPMaqk95Qlcs=.a3927f92-285d-469e-b364-12d90fe52243@github.com> Message-ID: <85E_84NDTN7zx22pwVS-OTR0otbaclsW1nkBZnos8rw=.ee3d1779-4ac0-42fc-9556-ac2582d196e3@github.com> On Fri, 19 Jan 2024 14:42:22 GMT, Emanuel Peter wrote: >> Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> Address review comments > > Ok, so I did some rudimentary study of EA. And now this PR makes much more sense ;) > > Let me summarize my understanding of the issue: > An object gets allocated in interpreter, and we lock on it in the interpreter. > OSR is triggered, the object is passed in as OSR parameter, we hold the lock. > The OSR control flow now looks like this: > > > StartOSR: > LoadP -> load the object created in interpreter > we have not_global_escape(LoadP) == false > so this is correctly marked as escaping > now the osr path injects into the middle of the loop > > Loop: > Phi -> merge interpreter obj and that from this compiled code > we have not_global_escape(Phi) == false > ... > Unlock(Phi) > ... > check some condition, maybe return > ... > obj = CheckCastPP( Allocate(i.e. new Object()) ) > we have not_global_escape(obj) == true > this is correct, the object will never escape > Lock(obj) > ... > goto Loop > > > So if I understand this correctly, the marking in/with the ConnectionGrap is correct: > - The object passed in through OSR is marked as escaping. > - The object created locally is marked as non-escaping. > - The loop-phi that merges the two must therefore also be possibly escaping. > > The question is then with the condition of Lock removal: > Can we remove the lock, just because its object is marked as non-escaping? > At first glance: obviously, because nobody else could ever have the object, and so nobody can ever lock/unlock it. > > In the example, if we look at the Unlock node, we cannot remove it (at least at first): > its object is possibly escaping, because the Phi is not marked non-escaping. > But we can remove the Lock, since its object is non-escaping. > This is where the trouble starts. > > I think it is exactly for this reason, that @vnkozlov thinks one cannot just look at the object of the individual Lock/Unlock node, but one has to look at all Lock/Unlock nodes of a BoxLock, and see if all objects are non-escaping. > > @vnkozlov please correct me if I got something wrong ;) > > I was trying to see what the meaning of the BoxLockNode is, but I did not find any useful documentation. Can you help me out here? Your patch assumes that all "relevant" Lock/Unlock nodes share the same BoxLockNode. Why is that the case? Thank you, @eme64 for review and for "diving" into the issue to understand it. Your conclusion is correct. First, when not-escaped object merged by Phi node with escaped one we only mark such object "Not Scalar Replaceable" `NSR`: JavaObject(5) NoEscape(NoEscape) NSR [ [ 155 160 215 213 101 99 ]] 143 Allocate We can **not** eliminate it but we can still do some optimizations for it, like CMP nodes optimization and Locks elimination. Unfortunately in this case it share `Unlock` node with escaped object so we can't eliminate `Unlock` and related `Lock`. It is bug that we eliminated `Unlock` based only on knowledge that `Lock` can be eliminated. There is "balanced monitors" rule: on any code path number of executed Locks and Unlocks for locked object should match. Even when an object is "local" and no other threads can see it as in this case. You either eliminate all, keep all or prove that you can eliminate some but keep them balance (as we do for `Lock Coarsening`). This bug breaks this rule. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17331#issuecomment-1901265695 From ddong at openjdk.org Fri Jan 19 23:37:36 2024 From: ddong at openjdk.org (Denghui Dong) Date: Fri, 19 Jan 2024 23:37:36 GMT Subject: RFR: 8324213: C1: There is no need for Canonicalizer to handle IfOp [v2] In-Reply-To: References: Message-ID: <5PJm9P2Eyap_zvmmVIGpbuyMAY9M4BMJUOouWKndftc=.f1924da6-a1e2-453c-81c2-0dbb81935b9f@github.com> > IfOp will not be created when building the graph, so there is no need for Canonicalizer to handle IfOp. Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: update ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17499/files - new: https://git.openjdk.org/jdk/pull/17499/files/8a4b6415..e156cd0c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17499&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17499&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17499.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17499/head:pull/17499 PR: https://git.openjdk.org/jdk/pull/17499 From ddong at openjdk.org Fri Jan 19 23:37:37 2024 From: ddong at openjdk.org (Denghui Dong) Date: Fri, 19 Jan 2024 23:37:37 GMT Subject: RFR: 8324213: C1: There is no need for Canonicalizer to handle IfOp [v2] In-Reply-To: References: Message-ID: On Fri, 19 Jan 2024 21:39:08 GMT, Dean Long wrote: >> Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: >> >> update > > src/hotspot/share/c1/c1_Canonicalizer.cpp line 472: > >> 470: >> 471: void Canonicalizer::do_IfOp(IfOp* x) { >> 472: } > > If you are saying this method can never be called, then shouldn't we make it report an error instead of being empty? Make sense. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17499#discussion_r1460002161 From kvn at openjdk.org Sat Jan 20 00:00:27 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sat, 20 Jan 2024 00:00:27 GMT Subject: RFR: 8322743: assert(held_monitor_count() == jni_monitor_count()) failed [v2] In-Reply-To: References: Message-ID: <0d8QR7MPET_IWyIKtjhwVh6CpUZpGZBcMXMJ3nfZW4Y=.81929a7e-3880-4e09-9e03-8072bc73973d@github.com> On Wed, 17 Jan 2024 20:20:05 GMT, Vladimir Kozlov wrote: >> Corner case with a local (not escaped) object used for synchronization. C2 Escape Analysis thinks that it can eliminate locks for it. In most cases it is true but not in this case. >> >> >> for (int i = 0; i < 2; ++i) { >> Object o = new Object(); >> synchronized (o) { // monitorenter >> // Trigger OSR compilation >> for (int j = 0; j < 100_000; ++j) { >> >> The test has nested loop which trigger OSR compilation. The locked object comes from Interpreter into compiled OSR code. During parsing C2 creates an other non escaped object and correctly merge both together (with Phi node) so that non escaped object is not scalar replaceable. Because it does not globally escapes EA still removes locks for it and, as result, also for merged locked object from Interpreter which is the bug. >> >> The fix is to check that synchronized block does not have any associated escaped objects when EA decides if locks can be eliminated. >> >> Added regression test prepared by @TobiHartmann. Tested tier1-5, xcomp and stress. >> Performance testing show no difference. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments As I commented in bug reported, technically we can split Unlock node through Phi to separate them and try to eliminate ones related not-escaped new object. But it will not help in this case. There are 2 modes in C2 how we handle locks/unlocks. Before [JDK-7125896](https://bugs.openjdk.org/browse/JDK-7125896) we used `BoxLockNode` only to indicate stack slot where we store object's header (MarkWord) for heavy monitors [HotSpot/Synchronization](https://wiki.openjdk.org/display/HotSpot/Synchronization). In that mode the same `BoxLockNode` can be used by not interfering synchronization regions even for different objects: synchronize(obj1) {} synchronize(obj2) {} The only matter stack slot it points [locknode.cpp#L51](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/locknode.cpp#L51) EA supports this mode and C2 looks on each locks/unlocks which reference only one object and creates new separate `BoxLockNode` (synchronization region) for them when it eliminates locks [macro.cpp#L1974](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/macro.cpp#L1974) [JDK-7125896](https://bugs.openjdk.org/browse/JDK-7125896) and sequential fixes introduced new mode to simplify handling locks and to allow "easy" implement elimination of some nested locks which lock the same object. This is default mode (`EliminateNestedLocks` == true) since JDK 8 (and 7u4). In this mode we don't merge `BoxLockNode` nodes - each synchronization region will have separate `BoxLockNode` - one per locked object. This assumes that we will see only on object if we trace all `Lock/Unlock` nodes which reference one `BoxLockNode`. An other assumption is that if we have merge point during parsing (for example, diamond shape code inside synchronized region) we can use `BoxLockNode` for the same stack slot from already processed path: [parse1.cpp#L1800](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/parse1.cpp#L1800). It was additional fix [JDK-7128355](https://bugs.openjdk.org/browse/JDK-7128355) after nested locks elimination implementation. Based on that (all `Lock/Unlock` nodes which reference one `BoxLockNode` locks only one and the same object) in this mode it was assumed that we can eliminate all locks and unlocks if we find at least one which we can eliminate in one synchronized region (one `BoxLockNode`) [macro.cpp#L1946](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/macro.cpp#L1946) OSR compilation in this bug case breaks these assumptions. During parsing we merged synchronized region (one `BoxLockNode`) with different locked object (from Interpreter). As result the assumption that we can eliminate all locks/unlocks for one region based only on one lock is incorrect. It may be possible do something when we parse merge point but I think it is hard. What if this merge point is not at the start but somewhere later? For me it was much easier to catch such case early during escape analysis where information about all objects is available. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17331#issuecomment-1901348194 From kvn at openjdk.org Sat Jan 20 00:04:28 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sat, 20 Jan 2024 00:04:28 GMT Subject: RFR: 8322743: assert(held_monitor_count() == jni_monitor_count()) failed [v2] In-Reply-To: <_3yNZAQhVIqNWHV1BmvYVOgJmUBfhA2VPMaqk95Qlcs=.a3927f92-285d-469e-b364-12d90fe52243@github.com> References: <_3yNZAQhVIqNWHV1BmvYVOgJmUBfhA2VPMaqk95Qlcs=.a3927f92-285d-469e-b364-12d90fe52243@github.com> Message-ID: On Fri, 19 Jan 2024 13:38:30 GMT, Emanuel Peter wrote: >> Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: >> >> Address review comments > > src/hotspot/share/opto/callnode.cpp line 2004: > >> 2002: // >> 2003: ConnectionGraph *cgr = phase->C->congraph(); >> 2004: if (cgr != nullptr && cgr->can_eliminate_lock(this)) { > > I guess if you make this change, then you probably would also want to rename `NonEscObj` and `set_non_esc_obj` and `is_non_esc_obj`, right? Now it is not just about being non-escaped, but the more complex semantics of `can_eliminate_lock`. Right. > src/hotspot/share/opto/escape.cpp line 2888: > >> 2886: * >> 2887: * Return true if lock/unlock can be eliminated. >> 2888: */ > > Suggestion: > > // The lock/unlock is unnecessary if we are locking a non-escaped object, > // unless synchronized block (defined by BoxLock node) has other escaped objects > // (for example, locked object come from Interpreter in OSR compilation). > // > // Return true if lock/unlock can be eliminated. > > This would be the first use in this file of multi-line comment :man_shrugging: Okay. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17331#discussion_r1460029304 PR Review Comment: https://git.openjdk.org/jdk/pull/17331#discussion_r1460029509 From sviswanathan at openjdk.org Sat Jan 20 00:31:28 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Sat, 20 Jan 2024 00:31:28 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v7] In-Reply-To: References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Message-ID: On Fri, 19 Jan 2024 19:03:31 GMT, Jatin Bhateja wrote: >> Hi, >> >> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only targets. >> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 instruction set. >> These are very frequently used APIs in columnar database filter operation. >> >> Implementation uses a lookup table to record permute indices. Table index is computed using >> mask argument of compress/expand operation. >> >> Following are the performance number of JMH micro included with the patch. >> >> >> System : Intel(R) Xeon(R) Platinum 8480+ (Sapphire Rapids) >> >> Baseline: >> Benchmark (size) Mode Cnt Score Error Units >> ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 142.767 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 71.436 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 35.992 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 182.151 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 91.096 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 44.757 ops/ms >> ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 184.099 ops/ms >> ColumnFilterBenchmark.filterIntColumn 2047 thrpt 2 91.981 ops/ms >> ColumnFilterBenchmark.filterIntColumn 4096 thrpt 2 45.170 ops/ms >> ColumnFilterBenchmark.filterLongColumn 1024 thrpt 2 148.017 ops/ms >> ColumnFilterBenchmark.filterLongColumn 2047 thrpt 2 73.516 ops/ms >> ColumnFilterBenchmark.filterLongColumn 4096 thrpt 2 36.844 ops/ms >> >> Withopt: >> Benchmark (size) Mode Cnt Score Error Units >> ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 2051.707 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 914.072 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 489.898 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 5324.195 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 2587.229 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 1278.665 ops/ms >> ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 4149.384 ops/ms >> ColumnFilterBenchmark.filterIntColumn 2047 thrpt ... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Modified code comment for clarity. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 5305: > 5303: // value, this can potentially be used as a blending mask after > 5304: // compressing/expanding the source vector lanes. > 5305: vblendvps(dst, dst, xtmp, permv, vec_enc, false, xtmp1); If I am not wrong, the last argument in vblendps can be same as permv. That way we won't need xtmp1. i.e. the vblendps call can be modified as follows: vblendvps(dst, dst, xtmp, permv, vec_enc, false, permv); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1460080650 From duke at openjdk.org Sat Jan 20 00:36:41 2024 From: duke at openjdk.org (Joshua Cao) Date: Sat, 20 Jan 2024 00:36:41 GMT Subject: RFR: 8323220: Reassociate loop invariants involved in Cmps and Add/Subs [v3] In-Reply-To: References: Message-ID: > // inv1 == (x + inv2) => ( inv1 - inv2 ) == x > // inv1 == (x - inv2) => ( inv1 + inv2 ) == x > // inv1 == (inv2 - x) => (-inv1 + inv2 ) == x > > > For example, > > > fn(inv1, inv2) > while(...) > x = foobar() > if inv1 == x + inv2 > blackhole() > > > We can transform this into > > > fn(inv1, inv2) > t = inv1 - inv2 > while(...) > x = foobar() > if t == x > blackhole() > > > Here is an example: https://github.com/openjdk/jdk/blob/b78896b9aafcb15f453eaed6e154a5461581407b/src/java.base/share/classes/java/lang/invoke/LambdaFormEditor.java#L910. LHS `1` and RHS `pos` are both loop invariant > > Passes tier1 locally on Linux machine. Passes GHA on my fork. Joshua Cao has updated the pull request incrementally with one additional commit since the last revision: Assert for n2. Variables for n1/n2 opcode. More concise comments. Overflow/random tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17375/files - new: https://git.openjdk.org/jdk/pull/17375/files/dda874eb..cb6d24b4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17375&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17375&range=01-02 Stats: 36 lines in 2 files changed: 13 ins; 2 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/17375.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17375/head:pull/17375 PR: https://git.openjdk.org/jdk/pull/17375 From duke at openjdk.org Sat Jan 20 00:36:42 2024 From: duke at openjdk.org (Joshua Cao) Date: Sat, 20 Jan 2024 00:36:42 GMT Subject: RFR: 8323220: Reassociate loop invariants involved in Cmps and Add/Subs [v2] In-Reply-To: References: Message-ID: On Thu, 18 Jan 2024 08:26:44 GMT, Emanuel Peter wrote: > Thanks for the update, looks better already! I'm still waiting for the test with random/edge-case values, and then I can submit this for testing :) I converted two of the "dont-associate" tests to edge case values, and a couple of the positive tests to have random arguments. Changing other tests end up with the issue where there are more add/sub nodes that make it hard to match IR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17375#issuecomment-1901434717 From sviswanathan at openjdk.org Sat Jan 20 00:38:28 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Sat, 20 Jan 2024 00:38:28 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v7] In-Reply-To: References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Message-ID: On Fri, 19 Jan 2024 19:03:31 GMT, Jatin Bhateja wrote: >> Hi, >> >> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only targets. >> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 instruction set. >> These are very frequently used APIs in columnar database filter operation. >> >> Implementation uses a lookup table to record permute indices. Table index is computed using >> mask argument of compress/expand operation. >> >> Following are the performance number of JMH micro included with the patch. >> >> >> System : Intel(R) Xeon(R) Platinum 8480+ (Sapphire Rapids) >> >> Baseline: >> Benchmark (size) Mode Cnt Score Error Units >> ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 142.767 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 71.436 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 35.992 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 182.151 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 91.096 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 44.757 ops/ms >> ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 184.099 ops/ms >> ColumnFilterBenchmark.filterIntColumn 2047 thrpt 2 91.981 ops/ms >> ColumnFilterBenchmark.filterIntColumn 4096 thrpt 2 45.170 ops/ms >> ColumnFilterBenchmark.filterLongColumn 1024 thrpt 2 148.017 ops/ms >> ColumnFilterBenchmark.filterLongColumn 2047 thrpt 2 73.516 ops/ms >> ColumnFilterBenchmark.filterLongColumn 4096 thrpt 2 36.844 ops/ms >> >> Withopt: >> Benchmark (size) Mode Cnt Score Error Units >> ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 2051.707 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 914.072 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 489.898 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 5324.195 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 2587.229 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 1278.665 ops/ms >> ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 4149.384 ops/ms >> ColumnFilterBenchmark.filterIntColumn 2047 thrpt ... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Modified code comment for clarity. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 5331: > 5329: // value, this can potentially be used as a blending mask after > 5330: // compressing/expanding the source vector lanes. > 5331: vblendvps(dst, dst, permv, xtmp, vec_enc, false, xtmp1); Here the last argument in vblendps can be same as xtmp. That way we won't need xtmp1. i.e. the vblendps call can be modified as follows: vblendvps(dst, dst, permv, xtmp, vec_enc, false, xtmp); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1460083068 From sviswanathan at openjdk.org Sat Jan 20 01:18:29 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Sat, 20 Jan 2024 01:18:29 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v7] In-Reply-To: References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Message-ID: On Fri, 19 Jan 2024 19:03:31 GMT, Jatin Bhateja wrote: >> Hi, >> >> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only targets. >> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 instruction set. >> These are very frequently used APIs in columnar database filter operation. >> >> Implementation uses a lookup table to record permute indices. Table index is computed using >> mask argument of compress/expand operation. >> >> Following are the performance number of JMH micro included with the patch. >> >> >> System : Intel(R) Xeon(R) Platinum 8480+ (Sapphire Rapids) >> >> Baseline: >> Benchmark (size) Mode Cnt Score Error Units >> ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 142.767 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 71.436 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 35.992 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 182.151 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 91.096 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 44.757 ops/ms >> ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 184.099 ops/ms >> ColumnFilterBenchmark.filterIntColumn 2047 thrpt 2 91.981 ops/ms >> ColumnFilterBenchmark.filterIntColumn 4096 thrpt 2 45.170 ops/ms >> ColumnFilterBenchmark.filterLongColumn 1024 thrpt 2 148.017 ops/ms >> ColumnFilterBenchmark.filterLongColumn 2047 thrpt 2 73.516 ops/ms >> ColumnFilterBenchmark.filterLongColumn 4096 thrpt 2 36.844 ops/ms >> >> Withopt: >> Benchmark (size) Mode Cnt Score Error Units >> ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 2051.707 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 914.072 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 489.898 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 5324.195 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 2587.229 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 1278.665 ops/ms >> ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 4149.384 ops/ms >> ColumnFilterBenchmark.filterIntColumn 2047 thrpt ... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Modified code comment for clarity. src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 985: > 983: for (int j = 0; j < 4; j++) { > 984: if (mask & (1 << j)) { > 985: __ emit_data64(j, relocInfo::none); This could be something like __ emit_data(2*j, relocInfo::none); __ emit_data(2*j+1, relocInfo::none) to have the double word masks in the table to begin with. Then we don't need the extra instructions in vector_compress_expand_avx2() to generate double word permute masks from long masks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1460113427 From dlong at openjdk.org Sat Jan 20 01:24:27 2024 From: dlong at openjdk.org (Dean Long) Date: Sat, 20 Jan 2024 01:24:27 GMT Subject: RFR: 8322743: assert(held_monitor_count() == jni_monitor_count()) failed [v2] In-Reply-To: References: Message-ID: On Wed, 17 Jan 2024 20:20:05 GMT, Vladimir Kozlov wrote: >> Corner case with a local (not escaped) object used for synchronization. C2 Escape Analysis thinks that it can eliminate locks for it. In most cases it is true but not in this case. >> >> >> for (int i = 0; i < 2; ++i) { >> Object o = new Object(); >> synchronized (o) { // monitorenter >> // Trigger OSR compilation >> for (int j = 0; j < 100_000; ++j) { >> >> The test has nested loop which trigger OSR compilation. The locked object comes from Interpreter into compiled OSR code. During parsing C2 creates an other non escaped object and correctly merge both together (with Phi node) so that non escaped object is not scalar replaceable. Because it does not globally escapes EA still removes locks for it and, as result, also for merged locked object from Interpreter which is the bug. >> >> The fix is to check that synchronized block does not have any associated escaped objects when EA decides if locks can be eliminated. >> >> Added regression test prepared by @TobiHartmann. Tested tier1-5, xcomp and stress. >> Performance testing show no difference. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments > I was thinking about splitting unlock(obj) through Phi node to keep separate unlock for object coming from Interpreter > It may be possible do something when we parse merge point but I think it is hard. What if this merge point is not at the start but somewhere later? I think it is true that OSR nmethods only have the OSR entry point. There is no normal entry point. So if we did a special kind of loop unrolling, so that the OSR entry came first, we would end up with something like this, assuming OSR entry happens on the first iteration with i == 0. The merge point/phi goes away completely, I believe. i = 0; // Trigger OSR compilation [ OSR entry ] [...] [montorexit on iterpreter object, with no preceding monitorenter!] i = 1; Object o = new Object(); // Never escapes synchronized (o) { // This monitorenter can be eliminated for (int j = 0; j < 100_000; ++j) { In general we don't know which iteration will trigger OSR. So unrolled code would look like: int i = OSR_start; [...] for (i = OSR_start + 1; i < 2; ++i) { I wouldn't be surprised if generating the Unlock without the Lock breaks some assumptions elsewhere. I'm not suggesting something like this for this PR -- just thinking it seems possible conceptually. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17331#issuecomment-1901555475 From kvn at openjdk.org Sat Jan 20 01:57:28 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sat, 20 Jan 2024 01:57:28 GMT Subject: RFR: 8322743: assert(held_monitor_count() == jni_monitor_count()) failed [v2] In-Reply-To: References: Message-ID: On Wed, 17 Jan 2024 20:20:05 GMT, Vladimir Kozlov wrote: >> Corner case with a local (not escaped) object used for synchronization. C2 Escape Analysis thinks that it can eliminate locks for it. In most cases it is true but not in this case. >> >> >> for (int i = 0; i < 2; ++i) { >> Object o = new Object(); >> synchronized (o) { // monitorenter >> // Trigger OSR compilation >> for (int j = 0; j < 100_000; ++j) { >> >> The test has nested loop which trigger OSR compilation. The locked object comes from Interpreter into compiled OSR code. During parsing C2 creates an other non escaped object and correctly merge both together (with Phi node) so that non escaped object is not scalar replaceable. Because it does not globally escapes EA still removes locks for it and, as result, also for merged locked object from Interpreter which is the bug. >> >> The fix is to check that synchronized block does not have any associated escaped objects when EA decides if locks can be eliminated. >> >> Added regression test prepared by @TobiHartmann. Tested tier1-5, xcomp and stress. >> Performance testing show no difference. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments I agree, this is very interesting suggestion (for separate RFE) which may allow us to avoid inverted (and irreducible) loops and not just current locking issue. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17331#issuecomment-1901583508 From jrose at openjdk.org Sat Jan 20 02:23:28 2024 From: jrose at openjdk.org (John R Rose) Date: Sat, 20 Jan 2024 02:23:28 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v5] In-Reply-To: <6SOOtHaZrEN8UIKKLHBJniz_nUvZjRvV0TLofW7Xjxk=.debaf12a-b7c8-4b04-b217-3b8cd9b3c6f5@github.com> References: <6SOOtHaZrEN8UIKKLHBJniz_nUvZjRvV0TLofW7Xjxk=.debaf12a-b7c8-4b04-b217-3b8cd9b3c6f5@github.com> Message-ID: On Wed, 10 Jan 2024 18:08:39 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to `TypeInt` and `TypeLong`. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. The new constraints are applied to identity and value calls of the common nodes (Add, Sub, L/R/URShift, And, Or, Xor, bit counting, Cmp, Bool, ConvI2L/L2I), the detailed ideas for each node will be presented below. >> >> In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (~x & ones) == 0`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must normalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. >> >> Please kindly review, thanks very much. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 27 commits: > > - Merge branch 'master' into improvevalue > - Merge branch 'master' into improvevalue > - improve add/sub implementation > - Merge branch 'master' into improvevalue > - typo > - whitespace > - fix tests for x86_32 > - fix widen of ConvI2L > - problem lists > - format > - ... and 17 more: https://git.openjdk.org/jdk/compare/f0169341...843ad076 As you might expect (from an examination of [JDK-8001436](https://bugs.openjdk.org/browse/JDK-8001436)) I?m very excited to see this work going forward. A very long time ago I worked out tight, cheap estimates for bitwise effects of arithmetic operations, and arithmetic effects for bitwise operators. (BTW, did you know that ?x^y = x+y-2*(x&y)?? That?s the sort of identity we are working with here.) The overall point is that, if you know both arithmetic and bitwise ranges for operands, you can infer tight arithmetic and bitwise ranges for all common operators. I see you?ve gone farther than I did, adding unsigned ranges as well, and more ops (popc). Excellent, excellent. I have a request, though, and it is the same request as with your work on division strength reduction. The inference rules need to be factored out and separately g-tested. I see equations like this in the middle of the C2 code in this PR: U known = (min ^~ max) & (i1->_zeros | i1->_ones) & (i2->_zeros | i2->_ones); It is hard to demonstrate that this is correct. It cannot be unit-tested separately when it appears like this in the midst of IR transforms. We have had many subtle bugs before from ?math inclusions? like this in the middle of the IR optimizer. I believe that formula is PROBABLY correct, because I deeply respect your mathematical ability, but that is not good enough. If someone accidentally changes it (or needs to change it for some maintenance reason), we might not have a math savant handy to re-verify it. We need an executable test to give us confidence. In addition, I personally dislike doing pencil-and-paper proofs of code snippets I have to extract manually from wherever it is to do their work. I would much prefer to see the relevant algorithms factored out in their own source files (if they are subtle, as these are). I like to reason from clean formulas, not from formulas that I had to tear away from their duty stations in the optimizer. I think we need a separate file (header and source) for this set of algorithms. Something like rangeInference.[ch]pp. The API points would take one or two inputs, each represented as a full bundle of range data (hi/lo/uh/ul/zs/os). The would also take an operator name (not one API point per operator, please, but maybe one for unaries and one for binaries). And pass 64-bit ints plus length indicators, rather than doing it more than once for different primitive sizes. For naming the operators I suggest either opto node names (Mul not MulL), or (a non-opto-centric choice) the operator names from the Panama Vector API (JEP 460 at last sighting). The API point would symbolically execute the named op over the given symbolic (ranged) arguments, returning a new set of ranged (by C++ references I assume, or maybe a ?little struct? for range tuples). The switch-over-op at the heart of it would be carefully written for clarity, to prove (once and for all) that we know what we are talking about. The gtest would work like my BitTypes demo program, exercising all of the operations through a systematic set of concrete input values and (then) ranges containing those values. We would make sure that (a) the concrete result never ?escapes? the predicted range (for any of a series of representative containing ranges). And also, when possible, (b) that the predicted range is ?tight? in some good way, probably that the concrete result meets each of its inclusive bounds (for each range data point), for at least one set of concrete inputs. I think this kind of testing work is necessary to ensure that our system can be understood and maintained by more than a few people on the planet. And (although I came up with some of the formulas) I?d personally prefer it that way as well. It would be nice to know that if someone bumps an operator or drops a parenthesis, a unit test will catch the bug, rather than either a PhD student re-analyzing the code, or (more likely) a bug that shows up long after system regression tests. The benefits of these optimizations are, basically, that we can push around many more inferences about bitwise operations and unsigned comparisons, beyond what the original C2 design envisioned for inferences about (signed) arithmetic operations. This in turn will give us CC_NE (a nice thing, finally). Moreover, it will help us infer the low-order bits of loop indexes, something useful for proving alignment (in Panama at least). If applied to the lanes of vector types, it will give us a much more robust set of information about what?s going on in vectorized code. So I?m all for it! But, with unit tests, please? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15440#issuecomment-1901609719 From xliu at openjdk.org Sat Jan 20 08:43:28 2024 From: xliu at openjdk.org (Xin Liu) Date: Sat, 20 Jan 2024 08:43:28 GMT Subject: RFR: 8323220: Reassociate loop invariants involved in Cmps and Add/Subs [v3] In-Reply-To: References: Message-ID: On Sat, 20 Jan 2024 00:36:41 GMT, Joshua Cao wrote: >> // inv1 == (x + inv2) => ( inv1 - inv2 ) == x >> // inv1 == (x - inv2) => ( inv1 + inv2 ) == x >> // inv1 == (inv2 - x) => (-inv1 + inv2 ) == x >> >> >> For example, >> >> >> fn(inv1, inv2) >> while(...) >> x = foobar() >> if inv1 == x + inv2 >> blackhole() >> >> >> We can transform this into >> >> >> fn(inv1, inv2) >> t = inv1 - inv2 >> while(...) >> x = foobar() >> if t == x >> blackhole() >> >> >> Here is an example: https://github.com/openjdk/jdk/blob/b78896b9aafcb15f453eaed6e154a5461581407b/src/java.base/share/classes/java/lang/invoke/LambdaFormEditor.java#L910. LHS `1` and RHS `pos` are both loop invariant >> >> Passes tier1 locally on Linux machine. Passes GHA on my fork. > > Joshua Cao has updated the pull request incrementally with one additional commit since the last revision: > > Assert for n2. Variables for n1/n2 opcode. More concise comments. > Overflow/random tests src/hotspot/share/opto/loopTransform.cpp line 356: > 354: } > 355: > 356: bool is_int = n2->bottom_type()->isa_int() != nullptr; I guess you change to n2 because n1 may be CmpI/L. I think it still works because TypeInt::CC is still isa_int(). or we add a comment to make it more clear? src/hotspot/share/opto/loopTransform.cpp line 382: > 380: } > 381: phase->register_new_node(inv, phase->get_early_ctrl(inv)); > 382: if (n1_is_cmp) { CmpNode is subclass of SubNode. if n is CmpI/L, n->is_Sub() is also true. can we use the old logic for your new comparison expressions? yes, we still need to check if n1 is CmpNode or SubNode here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17375#discussion_r1460298766 PR Review Comment: https://git.openjdk.org/jdk/pull/17375#discussion_r1460297868 From xliu at openjdk.org Sat Jan 20 08:50:28 2024 From: xliu at openjdk.org (Xin Liu) Date: Sat, 20 Jan 2024 08:50:28 GMT Subject: RFR: 8323220: Reassociate loop invariants involved in Cmps and Add/Subs [v3] In-Reply-To: References: Message-ID: On Sat, 20 Jan 2024 08:32:10 GMT, Xin Liu wrote: >> Joshua Cao has updated the pull request incrementally with one additional commit since the last revision: >> >> Assert for n2. Variables for n1/n2 opcode. More concise comments. >> Overflow/random tests > > src/hotspot/share/opto/loopTransform.cpp line 356: > >> 354: } >> 355: >> 356: bool is_int = n2->bottom_type()->isa_int() != nullptr; > > I guess you change to n2 because n1 may be CmpI/L. > I think it still works because TypeInt::CC is still isa_int(). or we add a comment to make it more clear? Sorry, I think you're right. we have to use 'n2' here. we need to distinct CmpI and CmpL. TypeInt::CC can't. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17375#discussion_r1460304622 From jbhateja at openjdk.org Sat Jan 20 09:55:45 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sat, 20 Jan 2024 09:55:45 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v8] In-Reply-To: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Message-ID: > Hi, > > Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only targets. > Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 instruction set. > These are very frequently used APIs in columnar database filter operation. > > Implementation uses a lookup table to record permute indices. Table index is computed using > mask argument of compress/expand operation. > > Following are the performance number of JMH micro included with the patch. > > > System : Intel(R) Xeon(R) Platinum 8480+ (Sapphire Rapids) > > Baseline: > Benchmark (size) Mode Cnt Score Error Units > ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 142.767 ops/ms > ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 71.436 ops/ms > ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 35.992 ops/ms > ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 182.151 ops/ms > ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 91.096 ops/ms > ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 44.757 ops/ms > ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 184.099 ops/ms > ColumnFilterBenchmark.filterIntColumn 2047 thrpt 2 91.981 ops/ms > ColumnFilterBenchmark.filterIntColumn 4096 thrpt 2 45.170 ops/ms > ColumnFilterBenchmark.filterLongColumn 1024 thrpt 2 148.017 ops/ms > ColumnFilterBenchmark.filterLongColumn 2047 thrpt 2 73.516 ops/ms > ColumnFilterBenchmark.filterLongColumn 4096 thrpt 2 36.844 ops/ms > > Withopt: > Benchmark (size) Mode Cnt Score Error Units > ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 2051.707 ops/ms > ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 914.072 ops/ms > ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 489.898 ops/ms > ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 5324.195 ops/ms > ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 2587.229 ops/ms > ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 1278.665 ops/ms > ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 4149.384 ops/ms > ColumnFilterBenchmark.filterIntColumn 2047 thrpt 2 1791.170 ops/ms > ColumnFilterBenchmark.filterIntColumn 4096... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolution ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17261/files - new: https://git.openjdk.org/jdk/pull/17261/files/b2190fc7..cd912308 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17261&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17261&range=06-07 Stats: 89 lines in 4 files changed: 20 ins; 50 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/17261.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17261/head:pull/17261 PR: https://git.openjdk.org/jdk/pull/17261 From rgiulietti at openjdk.org Sat Jan 20 10:54:49 2024 From: rgiulietti at openjdk.org (Raffaello Giulietti) Date: Sat, 20 Jan 2024 10:54:49 GMT Subject: RFR: 8282365: Consolidate and improve division by constant idealizations [v45] In-Reply-To: References: Message-ID: On Fri, 19 Jan 2024 20:06:05 GMT, Quan Anh Mai wrote: >> This patch implements idealisation for unsigned divisions to change a division by a constant into a series of multiplication and shift. I also change the idealisation of `DivI` to get a more efficient series when the magic constant overflows an int32. >> >> In general, the idea behind a signed division transformation is that for a positive constant `d`, we would need to find constants `c` and `m` so that: >> >> floor(x / d) = floor(x * c / 2**m) for 0 < x < 2**(N - 1) (1) >> ceil(x / d) = floor(x * c / 2**m) + 1 for -2**(N - 1) <= x < 0 (2) >> >> The implementation in the original book takes into consideration that the machine may not be able to perform the full multiplication `x * c`, so the constant overflow and we need to add back the dividend as in `DivLNode::Ideal` cases. However, for int32 division, `x * c` cannot overflow an int64. As a result, it is always feasible to just calculate the product and shift the result. >> >> For unsigned multiplication, the situation is somewhat trickier because the condition needs to be twice as strong (the condition (1) and (2) above are mostly the same). This results in the magic constant `c` calculated based on the method presented in Hacker's Delight by Henry S. Warren, Jr. may overflow an uintN. For int division, we can depend on the theorem devised by Arch D. Robison in N-Bit Unsigned Division Via N-Bit Multiply-Add, which states that there exists either: >> >> c1 in uint32 and m1, such that floor(x / d) = floor(x * c1 / 2**m1) for 0 < x < 2**32 (3) >> c2 in uint32 and m2, such that floor(x / d) = floor((x + 1) * c2 / 2**m2) for 0 < x < 2**32 (4) >> >> which means that either `x * c1` never overflows an uint64 or `(x + 1) * c2` never overflows an uint64. And we can perform a full multiplication. >> >> For longs, there is no way to do a full multiplication so we do some basic transformations to achieve a computable formula. The details I have written as comments in the overflow case. >> >> More tests are added to cover the possible patterns. >> >> Please take a look and have some reviews. Thank you very much. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > suggestion src/hotspot/share/opto/divconstants.cpp line 161: > 159: c_ovf = c > min_signed; > 160: c += c - 1; > 161: rc += rc - d; // rc = 2 * rc - d Well, `rc - d` usually underflows. This is benign on the supported CPUs, but I think the code proposed originally is clearer. It also makes clear that the subtraction in `-=` is safe, since `rc > d - rc`. With an addition `+=` one has to reason in reverse, so to say. Further, Common Subexpression Elimination can easily detect that the expression `d - rc` is identical to the one in the `if` condition. Not sure if CSE can detect that `rc - d` is `-(d - rc)` and emit a subtraction. Suggestion: rc -= d - rc; // rc = 2 * rc - d src/hotspot/share/opto/divconstants.cpp line 170: > 168: qv_ovf = qv >= min_signed; > 169: qv += qv + 1; > 170: rv += rv - v; // rv = 2 * rv - v Same as above, with the difference that `rv - v` _always_ underflows. Moreover, the quotient `qv` increases, so it's more natural to see a decrease in the remainder `rv`, rather than an increase, IMO. Suggestion: rv -= v - rv; // rv = 2 * rv - v ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1460367093 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1460367207 From qamai at openjdk.org Sat Jan 20 11:53:47 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 20 Jan 2024 11:53:47 GMT Subject: RFR: 8282365: Consolidate and improve division by constant idealizations [v45] In-Reply-To: References: Message-ID: <6O50c4HNx2TL6s1JKRiMQ9YXyJkiNcbkRc-E_da85yg=.6b5b5149-0263-43de-8d6a-5aff61f7f45a@github.com> On Sat, 20 Jan 2024 10:50:23 GMT, Raffaello Giulietti wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> suggestion > > src/hotspot/share/opto/divconstants.cpp line 161: > >> 159: c_ovf = c > min_signed; >> 160: c += c - 1; >> 161: rc += rc - d; // rc = 2 * rc - d > > Well, `rc - d` usually underflows. This is benign on the supported CPUs, but I think the code proposed originally is clearer. > It also makes clear that the subtraction in `-=` is safe, since `rc > d - rc`. With an addition `+=` one has to reason in reverse, so to say. > Further, Common Subexpression Elimination can easily detect that the expression `d - rc` is identical to the one in the `if` condition. Not sure if CSE can detect that `rc - d` is `-(d - rc)` and emit a subtraction. > Suggestion: > > rc -= d - rc; // rc = 2 * rc - d I think it makes it more uniform with the other case and since we are writing it as `rc = 2 * rc - d`, expanding it to `rc + rc - d` seems more natural. Note that the type is unsigned so overflow behaviour is not undefined. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1460395700 From qamai at openjdk.org Sat Jan 20 12:17:04 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 20 Jan 2024 12:17:04 GMT Subject: RFR: 8282365: Consolidate and improve division by constant idealizations [v46] In-Reply-To: References: Message-ID: > This patch implements idealisation for unsigned divisions to change a division by a constant into a series of multiplication and shift. I also change the idealisation of `DivI` to get a more efficient series when the magic constant overflows an int32. > > In general, the idea behind a signed division transformation is that for a positive constant `d`, we would need to find constants `c` and `m` so that: > > floor(x / d) = floor(x * c / 2**m) for 0 < x < 2**(N - 1) (1) > ceil(x / d) = floor(x * c / 2**m) + 1 for -2**(N - 1) <= x < 0 (2) > > The implementation in the original book takes into consideration that the machine may not be able to perform the full multiplication `x * c`, so the constant overflow and we need to add back the dividend as in `DivLNode::Ideal` cases. However, for int32 division, `x * c` cannot overflow an int64. As a result, it is always feasible to just calculate the product and shift the result. > > For unsigned multiplication, the situation is somewhat trickier because the condition needs to be twice as strong (the condition (1) and (2) above are mostly the same). This results in the magic constant `c` calculated based on the method presented in Hacker's Delight by Henry S. Warren, Jr. may overflow an uintN. For int division, we can depend on the theorem devised by Arch D. Robison in N-Bit Unsigned Division Via N-Bit Multiply-Add, which states that there exists either: > > c1 in uint32 and m1, such that floor(x / d) = floor(x * c1 / 2**m1) for 0 < x < 2**32 (3) > c2 in uint32 and m2, such that floor(x / d) = floor((x + 1) * c2 / 2**m2) for 0 < x < 2**32 (4) > > which means that either `x * c1` never overflows an uint64 or `(x + 1) * c2` never overflows an uint64. And we can perform a full multiplication. > > For longs, there is no way to do a full multiplication so we do some basic transformations to achieve a computable formula. The details I have written as comments in the overflow case. > > More tests are added to cover the possible patterns. > > Please take a look and have some reviews. Thank you very much. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: just be simple ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9947/files - new: https://git.openjdk.org/jdk/pull/9947/files/6634cd46..1400de7b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9947&range=45 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9947&range=44-45 Stats: 8 lines in 1 file changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/9947.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/9947/head:pull/9947 PR: https://git.openjdk.org/jdk/pull/9947 From rgiulietti at openjdk.org Sat Jan 20 12:17:04 2024 From: rgiulietti at openjdk.org (Raffaello Giulietti) Date: Sat, 20 Jan 2024 12:17:04 GMT Subject: RFR: 8282365: Consolidate and improve division by constant idealizations [v45] In-Reply-To: <6O50c4HNx2TL6s1JKRiMQ9YXyJkiNcbkRc-E_da85yg=.6b5b5149-0263-43de-8d6a-5aff61f7f45a@github.com> References: <6O50c4HNx2TL6s1JKRiMQ9YXyJkiNcbkRc-E_da85yg=.6b5b5149-0263-43de-8d6a-5aff61f7f45a@github.com> Message-ID: On Sat, 20 Jan 2024 11:50:37 GMT, Quan Anh Mai wrote: >> src/hotspot/share/opto/divconstants.cpp line 161: >> >>> 159: c_ovf = c > min_signed; >>> 160: c += c - 1; >>> 161: rc += rc - d; // rc = 2 * rc - d >> >> Well, `rc - d` usually underflows. This is benign on the supported CPUs, but I think the code proposed originally is clearer. >> It also makes clear that the subtraction in `-=` is safe, since `rc > d - rc`. With an addition `+=` one has to reason in reverse, so to say. >> Further, Common Subexpression Elimination can easily detect that the expression `d - rc` is identical to the one in the `if` condition. Not sure if CSE can detect that `rc - d` is `-(d - rc)` and emit a subtraction. >> Suggestion: >> >> rc -= d - rc; // rc = 2 * rc - d > > I think it makes it more uniform with the other case and since we are writing it as `rc = 2 * rc - d`, expanding it to `rc + rc - d` seems more natural. Note that the type is unsigned so overflow behaviour is not undefined. The purpose of the `//` comment is to make it even more clear that `rc -= d - rc` computes `rc = 2 * rc - d`. But in fact, the expansion `rc = rc - (d - rc)` is already clear enough, IMO. Sure, the behavior with the underflow is specified: that's what I mean by "benign" in my previous comment. Yet, one must reason longer to be convinced that the underflow is harmless. Of course, this is a subjective matter about clarity of the code. Everybody has their own opinion about ;-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1460405625 From qamai at openjdk.org Sat Jan 20 12:17:04 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 20 Jan 2024 12:17:04 GMT Subject: RFR: 8282365: Consolidate and improve division by constant idealizations [v45] In-Reply-To: References: <6O50c4HNx2TL6s1JKRiMQ9YXyJkiNcbkRc-E_da85yg=.6b5b5149-0263-43de-8d6a-5aff61f7f45a@github.com> Message-ID: On Sat, 20 Jan 2024 12:12:47 GMT, Raffaello Giulietti wrote: >> I think it makes it more uniform with the other case and since we are writing it as `rc = 2 * rc - d`, expanding it to `rc + rc - d` seems more natural. Note that the type is unsigned so overflow behaviour is not undefined. > > The purpose of the `//` comment is to make it even more clear that `rc -= d - rc` computes `rc = 2 * rc - d`. But in fact, the expansion `rc = rc - (d - rc)` is already clear enough, IMO. > > Sure, the behavior with the underflow is specified: that's what I mean by "benign" in my previous comment. Yet, one must reason longer to be convinced that the underflow is harmless. > > Of course, this is a subjective matter about clarity of the code. Everybody has their own opinion about ;-) Yes I agree, so I just made it simple and straightforward. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1460406215 From qamai at openjdk.org Sat Jan 20 19:29:41 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 20 Jan 2024 19:29:41 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long Message-ID: Hi, This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. In general, a TypeInt/Long represents a set of values x that satisfies: x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (~x & ones) == 0. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must normalize the constraints (tighten the constraints so that they are optimal) before constructing a TypeInt/Long instance. This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. Please kindly review, thanks a lot. Testing - [ ] GHA - [ ] Linux x64, tier 1-4 ------------- Commit messages: - add unit tests - fix template parameter - refactor - implement unsigned bounds and known bits Changes: https://git.openjdk.org/jdk/pull/17508/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8315066 Stats: 1442 lines in 16 files changed: 887 ins; 282 del; 273 mod Patch: https://git.openjdk.org/jdk/pull/17508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17508/head:pull/17508 PR: https://git.openjdk.org/jdk/pull/17508 From qamai at openjdk.org Sat Jan 20 19:40:45 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 20 Jan 2024 19:40:45 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v2] In-Reply-To: References: Message-ID: > Hi, > > This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. > > In general, a TypeInt/Long represents a set of values x that satisfies: x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (~x & ones) == 0. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must normalize the constraints (tighten the constraints so that they are optimal) before constructing a TypeInt/Long instance. > > This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. > > Please kindly review, thanks a lot. > > Testing > > - [ ] GHA > - [ ] Linux x64, tier 1-4 Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: fix tests, add verify ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17508/files - new: https://git.openjdk.org/jdk/pull/17508/files/12f268a1..6b417f94 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=00-01 Stats: 14 lines in 2 files changed: 5 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/17508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17508/head:pull/17508 PR: https://git.openjdk.org/jdk/pull/17508 From qamai at openjdk.org Sat Jan 20 19:47:25 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 20 Jan 2024 19:47:25 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v2] In-Reply-To: References: Message-ID: On Sat, 20 Jan 2024 19:40:45 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. >> >> In general, a TypeInt/Long represents a set of values x that satisfies: x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (~x & ones) == 0. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must normalize the constraints (tighten the constraints so that they are optimal) before constructing a TypeInt/Long instance. >> >> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. >> >> Please kindly review, thanks a lot. >> >> Testing >> >> - [ ] GHA >> - [ ] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > fix tests, add verify @TobiHartmann @eme64 I have extracted a part of #15440, could you take a look when you have time, please? Thanks a lot for your help. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-1902249626 From jbhateja at openjdk.org Sun Jan 21 06:55:43 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sun, 21 Jan 2024 06:55:43 GMT Subject: RFR: 8318650: Optimized subword gather for x86 targets. [v11] In-Reply-To: References: Message-ID: > Hi All, > > This patch optimizes sub-word gather operation for x86 targets with AVX2 and AVX512 features. > > Following is the summary of changes:- > > 1) Intrinsify sub-word gather using hybrid algorithm which initially partially unrolls scalar loop to accumulates values from gather indices into a quadword(64bit) slice followed by vector permutation to place the slice into appropriate vector lanes, it prevents code bloating and generates compact JIT sequence. This coupled with savings from expansive array allocation in existing java implementation translates into significant performance of 1.5-10x gains with included micro on Intel Atom family CPUs and with JVM option UseAVX=2. > > ![image](https://github.com/openjdk/jdk/assets/59989778/e25ba4ad-6a61-42fa-9566-452f741a9c6d) > > > 2) For AVX512 targets algorithm uses integral gather instructions to load values from normalized indices which are multiple of integer size, followed by shuffling and packing exact sub-word values from integral lanes. > > 3) Patch was also compared against modified java fallback implementation by replacing temporary array allocation with zero initialized vector and a scalar loops which inserts gathered values into vector. But, vector insert operation in higher vector lanes is a three step process which first extracts the upper vector 128 bit lane, updates it with gather subword value and then inserts the lane back to its original position. This makes inserts into higher order lanes costly w.r.t to proposed solution. In addition generated JIT code for modified fallback implementation was very bulky. This may impact in-lining decisions into caller contexts. > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolutions. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16354/files - new: https://git.openjdk.org/jdk/pull/16354/files/de47076e..9ed6b502 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16354&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16354&range=09-10 Stats: 58 lines in 1 file changed: 18 ins; 15 del; 25 mod Patch: https://git.openjdk.org/jdk/pull/16354.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16354/head:pull/16354 PR: https://git.openjdk.org/jdk/pull/16354 From yyang at openjdk.org Mon Jan 22 02:50:28 2024 From: yyang at openjdk.org (Yi Yang) Date: Mon, 22 Jan 2024 02:50:28 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store In-Reply-To: References: Message-ID: On Wed, 18 Oct 2023 13:04:35 GMT, Emanuel Peter wrote: > This is a feature requiested by @RogerRiggs and @cl4es . > > **Idea** > > Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get speedups by using Unsafe (e.g. `Unsafe.putLongUnaligned`), or ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code. > > This patch here supports a few simple use-cases, like these: > > Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395 > > Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the splitting (i.e. shifting and truncation), and directly store the variable: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456 > > The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian): > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338 > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472 > > **Details** > > This draft currently implements the optimization in an additional special IGVN phase: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485 > > We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. We essentially try to establish a chain of mergable stores: > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806 > > Mergable stores must have the same Opcode (implies they have the same element type and hence size). Further, mergable stores must have the same control (or be separated by only a RangeCheck). Further, they must either both store constants, or adjacent segments of a larger value ... Good idea, how many times can we touch such idealization during building jdk itself? This could be an indicator to some extent of whether the pattern is common. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16245#issuecomment-1902992549 From jbhateja at openjdk.org Mon Jan 22 07:11:30 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 22 Jan 2024 07:11:30 GMT Subject: RFR: JDK-8320448 Accelerate IndexOf using AVX2 [v7] In-Reply-To: References: Message-ID: On Thu, 11 Jan 2024 23:06:32 GMT, Scott Gibbons wrote: >> Re-write the IndexOf code without the use of the pcmpestri instruction, only using AVX2 instructions. This change accelerates String.IndexOf on average 1.3x for AVX2. The benchmark numbers: >> >> >> Benchmark Score Latest >> StringIndexOf.advancedWithMediumSub 343.573 317.934 0.925375393x >> StringIndexOf.advancedWithShortSub1 1039.081 1053.96 1.014319384x >> StringIndexOf.advancedWithShortSub2 55.828 110.541 1.980027943x >> StringIndexOf.constantPattern 9.361 11.906 1.271872663x >> StringIndexOf.searchCharLongSuccess 4.216 4.218 1.000474383x >> StringIndexOf.searchCharMediumSuccess 3.133 3.216 1.02649218x >> StringIndexOf.searchCharShortSuccess 3.76 3.761 1.000265957x >> StringIndexOf.success 9.186 9.713 1.057369911x >> StringIndexOf.successBig 14.341 46.343 3.231504079x >> StringIndexOfChar.latin1_AVX2_String 6220.918 12154.52 1.953814533x >> StringIndexOfChar.latin1_AVX2_char 5503.556 5540.044 1.006629895x >> StringIndexOfChar.latin1_SSE4_String 6978.854 6818.689 0.977049957x >> StringIndexOfChar.latin1_SSE4_char 5657.499 5474.624 0.967675646x >> StringIndexOfChar.latin1_Short_String 7132.541 6863.359 0.962260014x >> StringIndexOfChar.latin1_Short_char 16013.389 16162.437 1.009307711x >> StringIndexOfChar.latin1_mixed_String 7386.123 14771.622 1.999915517x >> StringIndexOfChar.latin1_mixed_char 9901.671 9782.245 0.987938803 > > Scott Gibbons has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: > > - Merge branch 'openjdk:master' into indexof > - Merge branch 'openjdk:master' into indexof > - Addressing review comments. > - Fix for JDK-8321599 > - Support UU IndexOf > - Only use optimization when EnableX86ECoreOpts is true > - Fix whitespace > - Merge branch 'openjdk:master' into indexof > - Comments; added exhaustive-ish test > - Subtracting 0x10 twice. > - ... and 12 more: https://git.openjdk.org/jdk/compare/8e12053e...3e58d0c2 src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 505: > 503: __ cmpb(Address(rbx, r15, Address::times_1, -0xa), rax); > 504: __ jne(L_top_loop_1); > 505: __ jmp(L_0x406019); Instead of having special handling for each tail size (3 - 31 bytes), can we directly use 32 bytes VMASKMOVPS with appropriate mask for different tail sizes and only residual part (0 - 3 bytes) can fall over to scalar tail. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1461424231 From jbhateja at openjdk.org Mon Jan 22 07:11:31 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 22 Jan 2024 07:11:31 GMT Subject: RFR: JDK-8320448 Accelerate IndexOf using AVX2 [v7] In-Reply-To: References: Message-ID: On Mon, 22 Jan 2024 07:05:56 GMT, Jatin Bhateja wrote: >> Scott Gibbons has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: >> >> - Merge branch 'openjdk:master' into indexof >> - Merge branch 'openjdk:master' into indexof >> - Addressing review comments. >> - Fix for JDK-8321599 >> - Support UU IndexOf >> - Only use optimization when EnableX86ECoreOpts is true >> - Fix whitespace >> - Merge branch 'openjdk:master' into indexof >> - Comments; added exhaustive-ish test >> - Subtracting 0x10 twice. >> - ... and 12 more: https://git.openjdk.org/jdk/compare/8e12053e...3e58d0c2 > > src/hotspot/cpu/x86/stubGenerator_x86_64_string.cpp line 505: > >> 503: __ cmpb(Address(rbx, r15, Address::times_1, -0xa), rax); >> 504: __ jne(L_top_loop_1); >> 505: __ jmp(L_0x406019); > > Instead of having special handling for each tail size (3 - 31 bytes), can we directly use 32 bytes VMASKMOVPS with appropriate mask for different tail sizes and only residual part (0 - 3 bytes) can fall over to scalar tail. Basically tail size can be rounded to nearest multiple of doubleword. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16753#discussion_r1461425962 From wzhuo at openjdk.org Mon Jan 22 07:59:38 2024 From: wzhuo at openjdk.org (Wang Zhuo) Date: Mon, 22 Jan 2024 07:59:38 GMT Subject: RFR: 8324123: aarch64: fix prfm literal encoding in assembler [v2] In-Reply-To: References: Message-ID: > Current prfm literal mode encoding in aarch64 assembler is not correct. > The prfm_literal instruction requires 31 and 30 bits to be 0x11, while current assembler encodes the two bits to be 0x11, which is a ldr instruction, not prfm. > For example, if adding the following code in stubGenerator > __ prfm(Address(__ pc())) > we get a ldr instruction like > ldr x0, 0x0000ffff83f8539c > but it should be a prfm instruction like > prfm pldl1keep, 0x0000ffff8ff8539c > > The bug is caused in ld_st2, literal mode, bit 31 and 30 bits are set to (size & 0b01), while for prfm instructions, 31 and 30 bits must be 0b11. > void ld_st2(Register Rt, const Address &adr, int size, int op, int V = 0) { > starti; > > f(V, 26); // general reg? > zrf(Rt, 0); > > // Encoding for literal loads is done here (rather than pushed > // down into Address::encode) because the encoding of this > // instruction is too different from all of the other forms to > // make it worth sharing. > if (adr.getMode() == Address::literal) { > assert(size == 0b10 || size == 0b11, "bad operand size in ldr"); > assert(op == 0b01, "literal form can only be used with loads"); > f(**size & 0b01, 31, 30**), f(0b011, 29, 27), f(0b00, 25, 24); > int64_t offset = (adr.target() - pc()) >> 2; > sf(offset, 23, 5); > code_section()->relocate(pc(), adr.rspec()); > return; > } > > f(size, 31, 30); > f(op, 23, 22); // str > adr.encode(¤t_insn); > } Wang Zhuo has updated the pull request incrementally with one additional commit since the last revision: do not use inline for prfm encoding function ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17482/files - new: https://git.openjdk.org/jdk/pull/17482/files/bd83a8d3..eda04747 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17482&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17482&range=00-01 Stats: 33 lines in 2 files changed: 14 ins; 19 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17482.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17482/head:pull/17482 PR: https://git.openjdk.org/jdk/pull/17482 From wzhuo at openjdk.org Mon Jan 22 08:11:28 2024 From: wzhuo at openjdk.org (Wang Zhuo) Date: Mon, 22 Jan 2024 08:11:28 GMT Subject: RFR: 8324123: aarch64: fix prfm literal encoding in assembler [v2] In-Reply-To: <4ax4z-xLJNI-LAyGEKct-lDDNMi_ZN2QV3ILYTjVeDM=.d61f5d47-10fd-4a94-aed5-e88bfe8553d3@github.com> References: <4ax4z-xLJNI-LAyGEKct-lDDNMi_ZN2QV3ILYTjVeDM=.d61f5d47-10fd-4a94-aed5-e88bfe8553d3@github.com> Message-ID: On Fri, 19 Jan 2024 08:55:42 GMT, Andrew Haley wrote: >> Wang Zhuo has updated the pull request incrementally with one additional commit since the last revision: >> >> do not use inline for prfm encoding function > > src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 1585: > >> 1583: int64_t offset = (adr.target() - pc()) >> 2; \ >> 1584: sf(offset, 23, 5); \ >> 1585: } else { \ > > This looks reasonable, but we don't need it to be inline. See the examples of `adr` and `_adrp`. Thank you APH. I have updated the patch to make prfm no inline. Please check the commit do not use inline for prfm encoding function ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17482#discussion_r1461473867 From thartmann at openjdk.org Mon Jan 22 08:25:37 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 22 Jan 2024 08:25:37 GMT Subject: RFR: JDK-8322572: AllocationMergesTests.java fails with "IRViolationException: There were one or multiple IR rule failures." [v3] In-Reply-To: References: Message-ID: On Thu, 18 Jan 2024 18:57:27 GMT, Cesar Soares Lucas wrote: >> Please review this PR to fix a test in `AllocationMergesTest.java`. The test was failing intermittently because of a random value used as parameter was causing some randomization in the shape method's IR graph and consequently the optimization tested wasn't happening. >> >> Tested this locally on Mac, Win and Linux x86_64. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Fix copyright header date. Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17469#pullrequestreview-1835864434 From cslucas at openjdk.org Mon Jan 22 08:25:39 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 22 Jan 2024 08:25:39 GMT Subject: Integrated: JDK-8322572: AllocationMergesTests.java fails with "IRViolationException: There were one or multiple IR rule failures." In-Reply-To: References: Message-ID: <2jT2hi2pTMdyeUAl_QDvhefRryDfY8hx63qYzjUir_0=.096cc9b8-7006-4f0b-b82a-7850def1b824@github.com> On Wed, 17 Jan 2024 21:41:49 GMT, Cesar Soares Lucas wrote: > Please review this PR to fix a test in `AllocationMergesTest.java`. The test was failing intermittently because of a random value used as parameter was causing some randomization in the shape method's IR graph and consequently the optimization tested wasn't happening. > > Tested this locally on Mac, Win and Linux x86_64. This pull request has now been integrated. Changeset: 76afa02d Author: Cesar Soares Lucas Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/76afa02dabb45a0648cc13de40657d15ded73b4a Stats: 10 lines in 1 file changed: 0 ins; 1 del; 9 mod 8322572: AllocationMergesTests.java fails with "IRViolationException: There were one or multiple IR rule failures." Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/17469 From thartmann at openjdk.org Mon Jan 22 08:28:32 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 22 Jan 2024 08:28:32 GMT Subject: RFR: 8320237 C2: late inlining of method handle invoke causes duplicate lines in PrintInlining output [v2] In-Reply-To: References: Message-ID: <2aK5H00lqFJ_ug-O31eL_VeKh25-y6rxaQNMx0N2ztw=.484fa930-09c0-4b57-ada0-a17a0796cb8d@github.com> On Wed, 10 Jan 2024 16:37:44 GMT, Kangcheng Xu wrote: >> This PR resolves [JDK-8320237](https://bugs.openjdk.org/browse/JDK-8320237) >> >> The original behavior produces both a failure and success message upon late inlining which is confusing. The patch removes the failure message if inlining was successful. Huge thanks to @rwestrel for reporting and working out a solution. >> >> Unit test `test/hotspot/jtreg/compiler/inlining/TestDuplicatedLateInliningOutput.java` is added and passing. > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > fix VM crashes Yes, `-XX:+UseZGC -XX:+ZGenerational` were passed via jtreg. The failure seems to be intermittent as it happened only once with a debug build on Windows x64. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17147#issuecomment-1903478559 From thartmann at openjdk.org Mon Jan 22 08:35:27 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 22 Jan 2024 08:35:27 GMT Subject: RFR: 8323795: jcmd Compiler.codecache should print total size of code cache [v2] In-Reply-To: References: Message-ID: On Wed, 17 Jan 2024 03:02:27 GMT, Yi Yang wrote: >> CodeHeap 'non-profiled nmethods': size=118592Kb used=29Kb max_used=29Kb free=118562Kb >> bounds [0x00007fbe84622000, 0x00007fbe84892000, 0x00007fbe8b9f2000] >> CodeHeap 'profiled nmethods': size=118588Kb used=80Kb max_used=80Kb free=118507Kb >> bounds [0x00007fbe7c9f2000, 0x00007fbe7cc62000, 0x00007fbe83dc1000] >> CodeHeap 'non-nmethods': size=8580Kb used=1258Kb max_used=1834Kb free=7321Kb >> bounds [0x00007fbe83dc1000, 0x00007fbe84031000, 0x00007fbe84622000] >> total_blobs=474 nmethods=87 adapters=293 >> compilation: enabled >> stopped_count=0, restarted_count=0 >> full_count=0 >> >> >> It's better to accumulates total size of used/free/size, for example >> >> -SegmentedCodeCache >> CodeCache: size=245760Kb used=1366Kb max_used=1943Kb free=244393Kb >> bounds [0x00007fdcc89f2000, 0x00007fdcc8c62000, 0x00007fdcd79f2000] >> total_blobs=474, nmethods=87, adapters=293 >> stopped_count=0, restarted_count=0, full_count=0 >> compilation=enabled >> >> >> >> +SegmentedCodeCache >> CodeHeap 'non-profiled nmethods': size=118592Kb used=29Kb max_used=29Kb free=118562Kb >> bounds [0x00007f89c8622000, 0x00007f89c8892000, 0x00007f89cf9f2000] >> CodeHeap 'profiled nmethods': size=118588Kb used=80Kb max_used=80Kb free=118507Kb >> bounds [0x00007f89c09f2000, 0x00007f89c0c62000, 0x00007f89c7dc1000] >> CodeHeap 'non-nmethods': size=8580Kb used=1258Kb max_used=1834Kb free=7321Kb >> bounds [0x00007f89c7dc1000, 0x00007f89c8031000, 0x00007f89c8622000] >> CodeCache: size=245760Kb, used=1367Kb, max_used=1943Kb, free=244390Kb >> total_blobs=474, nmethods=87, adapters=293 >> stopped_count=0, restarted_count=0, full_count=0 >> compilation=enabled > > Yi Yang has updated the pull request incrementally with one additional commit since the last revision: > > new output && fix test I agree with Vladimir, I'd prefer something like this: CodeHeap 'non-nmethods': size=8580Kb used=1258Kb max_used=1834Kb free=7321Kb bounds [0x00007f89c7dc1000, 0x00007f89c8031000, 0x00007f89c8622000] CodeCache: size=245760Kb, used=1367Kb, max_used=1943Kb, free=244390Kb total_blobs=474, nmethods=87, adapters=293, full_count=0 Compilation: enabled, stopped_count=0, restarted_count=0 ------------- PR Comment: https://git.openjdk.org/jdk/pull/17445#issuecomment-1903488665 From epeter at openjdk.org Mon Jan 22 09:16:28 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 22 Jan 2024 09:16:28 GMT Subject: RFR: 8323220: Reassociate loop invariants involved in Cmps and Add/Subs [v3] In-Reply-To: References: Message-ID: On Sat, 20 Jan 2024 00:36:41 GMT, Joshua Cao wrote: >> // inv1 == (x + inv2) => ( inv1 - inv2 ) == x >> // inv1 == (x - inv2) => ( inv1 + inv2 ) == x >> // inv1 == (inv2 - x) => (-inv1 + inv2 ) == x >> >> >> For example, >> >> >> fn(inv1, inv2) >> while(...) >> x = foobar() >> if inv1 == x + inv2 >> blackhole() >> >> >> We can transform this into >> >> >> fn(inv1, inv2) >> t = inv1 - inv2 >> while(...) >> x = foobar() >> if t == x >> blackhole() >> >> >> Here is an example: https://github.com/openjdk/jdk/blob/b78896b9aafcb15f453eaed6e154a5461581407b/src/java.base/share/classes/java/lang/invoke/LambdaFormEditor.java#L910. LHS `1` and RHS `pos` are both loop invariant >> >> Passes tier1 locally on Linux machine. Passes GHA on my fork. > > Joshua Cao has updated the pull request incrementally with one additional commit since the last revision: > > Assert for n2. Variables for n1/n2 opcode. More concise comments. > Overflow/random tests Thanks for adding some random arguments! One more nice thing might be result verification, i.e. verify that the generated code actually returns the correct results, for example the expected `i`. test/hotspot/jtreg/compiler/loopopts/InvariantCodeMotionReassociateCmp.java line 163: > 161: } > 162: } > 163: } Why could you not have random arguments here? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17375#pullrequestreview-1835929842 PR Review Comment: https://git.openjdk.org/jdk/pull/17375#discussion_r1461544295 From epeter at openjdk.org Mon Jan 22 09:16:30 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 22 Jan 2024 09:16:30 GMT Subject: RFR: 8323220: Reassociate loop invariants involved in Cmps and Add/Subs [v3] In-Reply-To: References: Message-ID: <_xdd-exrcqA3Pbx_fT8k1K5JtVZVGQfr3JTwi7sWaHA=.d2737e88-6ce1-4863-9f36-2ccb9d3e9696@github.com> On Tue, 16 Jan 2024 17:30:34 GMT, Emanuel Peter wrote: >> Joshua Cao has updated the pull request incrementally with one additional commit since the last revision: >> >> Assert for n2. Variables for n1/n2 opcode. More concise comments. >> Overflow/random tests > > src/hotspot/share/opto/loopTransform.cpp line 333: > >> 331: // inv1 == (inv2 - x) => (-inv1 + inv2 ) == x >> 332: // >> 333: Node* IdealLoopTree::reassociate_add_sub_cmp(Node* n1, int inv1_idx, int inv2_idx, PhaseIdealLoop *phase) { > > Suggestion: > > Node* IdealLoopTree::reassociate_add_sub_cmp(Node* n1, int inv1_idx, int inv2_idx, PhaseIdealLoop* phase) { You should still do this change ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17375#discussion_r1461528935 From yyang at openjdk.org Mon Jan 22 09:19:39 2024 From: yyang at openjdk.org (Yi Yang) Date: Mon, 22 Jan 2024 09:19:39 GMT Subject: RFR: 8323795: jcmd Compiler.codecache should print total size of code cache [v3] In-Reply-To: References: Message-ID: > CodeHeap 'non-profiled nmethods': size=118592Kb used=29Kb max_used=29Kb free=118562Kb > bounds [0x00007fbe84622000, 0x00007fbe84892000, 0x00007fbe8b9f2000] > CodeHeap 'profiled nmethods': size=118588Kb used=80Kb max_used=80Kb free=118507Kb > bounds [0x00007fbe7c9f2000, 0x00007fbe7cc62000, 0x00007fbe83dc1000] > CodeHeap 'non-nmethods': size=8580Kb used=1258Kb max_used=1834Kb free=7321Kb > bounds [0x00007fbe83dc1000, 0x00007fbe84031000, 0x00007fbe84622000] > total_blobs=474 nmethods=87 adapters=293 > compilation: enabled > stopped_count=0, restarted_count=0 > full_count=0 > > > It's better to accumulates total size of used/free/size, for example > > -SegmentedCodeCache > CodeCache: size=245760Kb used=1366Kb max_used=1943Kb free=244393Kb > bounds [0x00007fdcc89f2000, 0x00007fdcc8c62000, 0x00007fdcd79f2000] > total_blobs=474, nmethods=87, adapters=293 > stopped_count=0, restarted_count=0, full_count=0 > compilation=enabled > > > > +SegmentedCodeCache > CodeHeap 'non-profiled nmethods': size=118592Kb used=29Kb max_used=29Kb free=118562Kb > bounds [0x00007f89c8622000, 0x00007f89c8892000, 0x00007f89cf9f2000] > CodeHeap 'profiled nmethods': size=118588Kb used=80Kb max_used=80Kb free=118507Kb > bounds [0x00007f89c09f2000, 0x00007f89c0c62000, 0x00007f89c7dc1000] > CodeHeap 'non-nmethods': size=8580Kb used=1258Kb max_used=1834Kb free=7321Kb > bounds [0x00007f89c7dc1000, 0x00007f89c8031000, 0x00007f89c8622000] > CodeCache: size=245760Kb, used=1367Kb, max_used=1943Kb, free=244390Kb > total_blobs=474, nmethods=87, adapters=293 > stopped_count=0, restarted_count=0, full_count=0 > compilation=enabled Yi Yang has updated the pull request incrementally with one additional commit since the last revision: new output ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17445/files - new: https://git.openjdk.org/jdk/pull/17445/files/a9939a85..841addcd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17445&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17445&range=01-02 Stats: 44 lines in 2 files changed: 10 ins; 11 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/17445.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17445/head:pull/17445 PR: https://git.openjdk.org/jdk/pull/17445 From yyang at openjdk.org Mon Jan 22 09:19:40 2024 From: yyang at openjdk.org (Yi Yang) Date: Mon, 22 Jan 2024 09:19:40 GMT Subject: RFR: 8323795: jcmd Compiler.codecache should print total size of code cache [v2] In-Reply-To: References: Message-ID: On Wed, 17 Jan 2024 03:02:27 GMT, Yi Yang wrote: >> CodeHeap 'non-profiled nmethods': size=118592Kb used=29Kb max_used=29Kb free=118562Kb >> bounds [0x00007fbe84622000, 0x00007fbe84892000, 0x00007fbe8b9f2000] >> CodeHeap 'profiled nmethods': size=118588Kb used=80Kb max_used=80Kb free=118507Kb >> bounds [0x00007fbe7c9f2000, 0x00007fbe7cc62000, 0x00007fbe83dc1000] >> CodeHeap 'non-nmethods': size=8580Kb used=1258Kb max_used=1834Kb free=7321Kb >> bounds [0x00007fbe83dc1000, 0x00007fbe84031000, 0x00007fbe84622000] >> total_blobs=474 nmethods=87 adapters=293 >> compilation: enabled >> stopped_count=0, restarted_count=0 >> full_count=0 >> >> >> It's better to accumulates total size of used/free/size, for example >> >> -SegmentedCodeCache >> CodeCache: size=245760Kb used=1366Kb max_used=1943Kb free=244393Kb >> bounds [0x00007fdcc89f2000, 0x00007fdcc8c62000, 0x00007fdcd79f2000] >> total_blobs=474, nmethods=87, adapters=293 >> stopped_count=0, restarted_count=0, full_count=0 >> compilation=enabled >> >> >> >> +SegmentedCodeCache >> CodeHeap 'non-profiled nmethods': size=118592Kb used=29Kb max_used=29Kb free=118562Kb >> bounds [0x00007f89c8622000, 0x00007f89c8892000, 0x00007f89cf9f2000] >> CodeHeap 'profiled nmethods': size=118588Kb used=80Kb max_used=80Kb free=118507Kb >> bounds [0x00007f89c09f2000, 0x00007f89c0c62000, 0x00007f89c7dc1000] >> CodeHeap 'non-nmethods': size=8580Kb used=1258Kb max_used=1834Kb free=7321Kb >> bounds [0x00007f89c7dc1000, 0x00007f89c8031000, 0x00007f89c8622000] >> CodeCache: size=245760Kb, used=1367Kb, max_used=1943Kb, free=244390Kb >> total_blobs=474, nmethods=87, adapters=293 >> stopped_count=0, restarted_count=0, full_count=0 >> compilation=enabled > > Yi Yang has updated the pull request incrementally with one additional commit since the last revision: > > new output && fix test Now it looks like 104640: CodeHeap 'non-profiled nmethods': size=118592Kb used=29Kb max_used=29Kb free=118562Kb bounds [0x00007f09f8622000, 0x00007f09f8892000, 0x00007f09ff9f2000] CodeHeap 'profiled nmethods': size=118588Kb used=80Kb max_used=80Kb free=118507Kb bounds [0x00007f09f09f2000, 0x00007f09f0c62000, 0x00007f09f7dc1000] CodeHeap 'non-nmethods': size=8580Kb used=1257Kb max_used=1833Kb free=7323Kb bounds [0x00007f09f7dc1000, 0x00007f09f8031000, 0x00007f09f8622000] CodeCache: size=245760Kb, used=1366Kb, max_used=1942Kb, free=244392Kb total_blobs=474, nmethods=87, adapters=293, full_count=0 Compilation: enabled, stopped_count=0, restarted_count=0 110115: CodeCache: size=245760Kb used=1366Kb max_used=1935Kb free=244393Kb bounds [0x00007ff4d89f2000, 0x00007ff4d8c62000, 0x00007ff4e79f2000] total_blobs=474, nmethods=87, adapters=293, full_count=0 Compilation: enabled, stopped_count=0, restarted_count=0 ------------- PR Comment: https://git.openjdk.org/jdk/pull/17445#issuecomment-1903562727 From yyang at openjdk.org Mon Jan 22 09:43:52 2024 From: yyang at openjdk.org (Yi Yang) Date: Mon, 22 Jan 2024 09:43:52 GMT Subject: RFR: 8323795: jcmd Compiler.codecache should print total size of code cache [v4] In-Reply-To: References: Message-ID: > CodeHeap 'non-profiled nmethods': size=118592Kb used=29Kb max_used=29Kb free=118562Kb > bounds [0x00007fbe84622000, 0x00007fbe84892000, 0x00007fbe8b9f2000] > CodeHeap 'profiled nmethods': size=118588Kb used=80Kb max_used=80Kb free=118507Kb > bounds [0x00007fbe7c9f2000, 0x00007fbe7cc62000, 0x00007fbe83dc1000] > CodeHeap 'non-nmethods': size=8580Kb used=1258Kb max_used=1834Kb free=7321Kb > bounds [0x00007fbe83dc1000, 0x00007fbe84031000, 0x00007fbe84622000] > total_blobs=474 nmethods=87 adapters=293 > compilation: enabled > stopped_count=0, restarted_count=0 > full_count=0 > > > It's better to accumulates total size of used/free/size, for example > > -SegmentedCodeCache > CodeCache: size=245760Kb used=1366Kb max_used=1943Kb free=244393Kb > bounds [0x00007fdcc89f2000, 0x00007fdcc8c62000, 0x00007fdcd79f2000] > total_blobs=474, nmethods=87, adapters=293 > stopped_count=0, restarted_count=0, full_count=0 > compilation=enabled > > > > +SegmentedCodeCache > CodeHeap 'non-profiled nmethods': size=118592Kb used=29Kb max_used=29Kb free=118562Kb > bounds [0x00007f89c8622000, 0x00007f89c8892000, 0x00007f89cf9f2000] > CodeHeap 'profiled nmethods': size=118588Kb used=80Kb max_used=80Kb free=118507Kb > bounds [0x00007f89c09f2000, 0x00007f89c0c62000, 0x00007f89c7dc1000] > CodeHeap 'non-nmethods': size=8580Kb used=1258Kb max_used=1834Kb free=7321Kb > bounds [0x00007f89c7dc1000, 0x00007f89c8031000, 0x00007f89c8622000] > CodeCache: size=245760Kb, used=1367Kb, max_used=1943Kb, free=244390Kb > total_blobs=474, nmethods=87, adapters=293 > stopped_count=0, restarted_count=0, full_count=0 > compilation=enabled Yi Yang has updated the pull request incrementally with one additional commit since the last revision: whitespace ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17445/files - new: https://git.openjdk.org/jdk/pull/17445/files/841addcd..e9ccc76d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17445&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17445&range=02-03 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17445.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17445/head:pull/17445 PR: https://git.openjdk.org/jdk/pull/17445 From epeter at openjdk.org Mon Jan 22 09:57:30 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 22 Jan 2024 09:57:30 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v2] In-Reply-To: References: Message-ID: On Sat, 20 Jan 2024 19:40:45 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. >> >> In general, a TypeInt/Long represents a set of values x that satisfies: x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (~x & ones) == 0. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must normalize the constraints (tighten the constraints so that they are optimal) before constructing a TypeInt/Long instance. >> >> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. >> >> Please kindly review, thanks a lot. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > fix tests, add verify @merykitty I just had a quick look. Thanks for spitting out parts and making it more reviewable that way! Since John Rose is generally excited (https://github.com/openjdk/jdk/pull/15440#issuecomment-1901609719), I'll now put in a bit more effort into reviewing this. Thanks for adding some gtests. I would really like to see some IR tests, where we can see that this folds cases, and folds them correctly. And just some general java-code correctness tests, which test your optimizations in an end-to-end way. I have a general concern with the math functions. They have quite a few arguments, often 5-10. And on average half of them are passed as a reference. Sometimes it is hard to immediately see which are the arguments that will not be mutated, and which are return values, and which are both arguments and return values, which are simply further constrained/narrowed etc. I wonder if it might be better to have types like: SRange {lo, hi} URange {lo, hi} KnownBits {ones, zeros} Make them immutable, i.e. the fields are constant. Then as function parameters, you always pass in these as const, and return the new values (possibly in some combined type, or a pair or tuple or whatever). I think it would make the code cleaner, have fewer arguments, and a bit easier to reason about when things are immutable. Plus, then you can put the range-inference methods inside those classes, you can directly ask such an object if it is empty etc. You could for example have somelthing like: `SRange::constrained_with(KnownBits) -> returns SRange`. Basically I'm asking for the code to be a little more object-oriented, and less C-style ;) src/hotspot/share/opto/rangeinference.cpp line 31: > 29: > 30: template > 31: static bool adjust_bounds_from_bits(bool& empty, T& lo, T& hi, T zeros, T ones) { Some nice comments at the beginning of functions would help me know what to expect here. src/hotspot/share/opto/rangeinference.cpp line 120: > 118: > 119: template > 120: void normalize_constraints(bool& empty, T& lo, T& hi, U& ulo, U& uhi, U& zeros, U& ones) { With function signatures like this it is hard to know what are the arguments and what are the return values. src/hotspot/share/opto/rangeinference.cpp line 164: > 162: U zeros2 = zeros; > 163: U ones2 = ones; > 164: normalize_constraints_simple(empty2, lo2, hi2, zeros2, ones2); At a quick glance of the code, it is not immediately clear why we need 2 ranges here. Can you add some comments, or maybe improve the naming from 1 and 2 to something more expressive? src/hotspot/share/opto/type.hpp line 558: > 556: > 557: // Use to compute join of 2 sets > 558: const bool _dual; I think you need to add some comments, explaining why this is here src/hotspot/share/opto/type.hpp line 596: > 594: const jint _lo, _hi; // Lower bound, upper bound > 595: const juint _ulo, _uhi; > 596: const juint _zeros, _ones; Add comments, say how all of these fields constrain the type. ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17508#pullrequestreview-1835981534 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1461575254 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1461582547 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1461584297 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1461559253 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1461559662 From shade at openjdk.org Mon Jan 22 10:02:29 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 22 Jan 2024 10:02:29 GMT Subject: RFR: JDK-8210858: AArch64: remove Math.log intrinsic [v2] In-Reply-To: References: <7Zgt8VOxl48Niocueb70XwxmYIAIaEZUlxKGvPP6BnU=.e354f71c-d6c6-40b9-a93e-98ef7ba8009d@github.com> Message-ID: On Thu, 18 Jan 2024 14:44:25 GMT, Tobias Holenstein wrote: >> [JDK-8215133](https://bugs.openjdk.org/browse/JDK-8215133) disabled vmIntrinsics::_dlog. Remove it now >> >> ### Why remove >> >> That Java specification says: >> >> "The computed result must be within 1 ulp of the exact result. Results must be semi-monotonic... whenever the mathematical function is non-decreasing, so is the floating-point approximation, likewise, whenever the mathematical function is non-increasing, so is the floating-point approximation" >> >> There is no proof of the monotonicity of this intrinsics at the moment. > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > remove MacroAssembler::fast_log() and generate_dlog() Looks good, with nits. Separately, I do wonder if [JDK-8301202](https://bugs.openjdk.org/browse/JDK-8301202) gives us a reason to avoid even calling to runtime, and instead just stay in Java completely. src/hotspot/cpu/aarch64/c1_LIRGenerator_aarch64.cpp line 834: > 832: break; > 833: case vmIntrinsics::_dlog: > 834: assert(StubRoutines::dlog() == nullptr, "no Math.log intrinsic on AArch64"); Is there a reason to assert this? I think the comment should be enough. And it could be richer as well, something like: // Math.log intrinsic is not implemented on AArch64 (see JDK-8210858), // but we can still call the shared runtime. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17480#pullrequestreview-1836070235 PR Review Comment: https://git.openjdk.org/jdk/pull/17480#discussion_r1461597211 From epeter at openjdk.org Mon Jan 22 10:05:31 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 22 Jan 2024 10:05:31 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store In-Reply-To: References: Message-ID: On Fri, 19 Jan 2024 21:53:28 GMT, Dean Long wrote: > I took a quick look and I don't see where the code checks that all the candidate stores are using MemNode::unordered and that there aren't memory barriers in between. @dean-long How would the graph look like if there were memory barriers? Would there not be something on the memory graph or control graph which is neither a Store nor a RangeCheck? Any what exactly is your concern about `MemNode::unordered`? Would you mind explaining or giving some examples? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16245#issuecomment-1903644388 From aph at openjdk.org Mon Jan 22 10:23:26 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 22 Jan 2024 10:23:26 GMT Subject: RFR: JDK-8210858: AArch64: remove Math.log intrinsic [v2] In-Reply-To: References: <7Zgt8VOxl48Niocueb70XwxmYIAIaEZUlxKGvPP6BnU=.e354f71c-d6c6-40b9-a93e-98ef7ba8009d@github.com> Message-ID: On Mon, 22 Jan 2024 09:59:29 GMT, Aleksey Shipilev wrote: > > Separately, I do wonder if [JDK-8301202](https://bugs.openjdk.org/browse/JDK-8301202) gives us a reason to avoid even calling to runtime, and instead just stay in Java completely. That would be nice. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17480#issuecomment-1903675490 From aph at openjdk.org Mon Jan 22 10:28:27 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 22 Jan 2024 10:28:27 GMT Subject: RFR: 8324123: aarch64: fix prfm literal encoding in assembler [v2] In-Reply-To: References: Message-ID: On Mon, 22 Jan 2024 07:59:38 GMT, Wang Zhuo wrote: >> Current prfm literal mode encoding in aarch64 assembler is not correct. >> The prfm_literal instruction requires 31 and 30 bits to be 0x11, while current assembler encodes the two bits to be 0x11, which is a ldr instruction, not prfm. >> For example, if adding the following code in stubGenerator >> __ prfm(Address(__ pc())) >> we get a ldr instruction like >> ldr x0, 0x0000ffff83f8539c >> but it should be a prfm instruction like >> prfm pldl1keep, 0x0000ffff8ff8539c >> >> The bug is caused in ld_st2, literal mode, bit 31 and 30 bits are set to (size & 0b01), while for prfm instructions, 31 and 30 bits must be 0b11. >> void ld_st2(Register Rt, const Address &adr, int size, int op, int V = 0) { >> starti; >> >> f(V, 26); // general reg? >> zrf(Rt, 0); >> >> // Encoding for literal loads is done here (rather than pushed >> // down into Address::encode) because the encoding of this >> // instruction is too different from all of the other forms to >> // make it worth sharing. >> if (adr.getMode() == Address::literal) { >> assert(size == 0b10 || size == 0b11, "bad operand size in ldr"); >> assert(op == 0b01, "literal form can only be used with loads"); >> f(**size & 0b01, 31, 30**), f(0b011, 29, 27), f(0b00, 25, 24); >> int64_t offset = (adr.target() - pc()) >> 2; >> sf(offset, 23, 5); >> code_section()->relocate(pc(), adr.rspec()); >> return; >> } >> >> f(size, 31, 30); >> f(op, 23, 22); // str >> adr.encode(¤t_insn); >> } > > Wang Zhuo has updated the pull request incrementally with one additional commit since the last revision: > > do not use inline for prfm encoding function Thanks. src/hotspot/cpu/aarch64/assembler_aarch64.cpp line 190: > 188: } > 189: > 190: void Assembler::prfm(const Address &adr, prfop pfop) { Suggestion: // This encoding is similar (but not quite identical) to the encoding used // by literal ld/st. see JDK-8324123. void Assembler::prfm(const Address &adr, prfop pfop) { ------------- PR Review: https://git.openjdk.org/jdk/pull/17482#pullrequestreview-1836137299 PR Review Comment: https://git.openjdk.org/jdk/pull/17482#discussion_r1461648239 From shade at openjdk.org Mon Jan 22 10:34:28 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 22 Jan 2024 10:34:28 GMT Subject: RFR: JDK-8210858: AArch64: remove Math.log intrinsic [v2] In-Reply-To: References: <7Zgt8VOxl48Niocueb70XwxmYIAIaEZUlxKGvPP6BnU=.e354f71c-d6c6-40b9-a93e-98ef7ba8009d@github.com> Message-ID: On Mon, 22 Jan 2024 10:20:39 GMT, Andrew Haley wrote: > That would be nice. https://bugs.openjdk.org/browse/JDK-8324296 ------------- PR Comment: https://git.openjdk.org/jdk/pull/17480#issuecomment-1903694841 From mli at openjdk.org Mon Jan 22 10:49:38 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 22 Jan 2024 10:49:38 GMT Subject: RFR: JDK-8318228: RISC-V: C2 ConvF2HF [v2] In-Reply-To: References: Message-ID: > Hi, > Can you review the patch to add ConvF2HF intrinsic to JDK for riscv? > Thanks! > > ## Test > ### Functionality > #### hotspot tests > test/hotspot/jtreg/compiler/intrinsics/ > test/hotspot/jtreg/compiler/c2/irTests > > #### jdk tests > test/jdk/java/lang/Float/Binary16Conversion*.java > > ### Performance > tested on licheepi. > > #### with UseZfh enabled > > Benchmark (size) Mode Cnt Score Error Units > Fp16ConversionBenchmark.floatToFloat16 2048 avgt 2 4170.549 ns/op > Fp16ConversionBenchmark.floatToFloat16Memory 2048 avgt 2 21.492 ns/op > > > #### with UseZfh disabled > (i.e. disable the intrinsic) > > Benchmark (size) Mode Cnt Score Error Units > Fp16ConversionBenchmark.floatToFloat16 2048 avgt 2 25036.647 ns/op > Fp16ConversionBenchmark.floatToFloat16Memory 2048 avgt 2 27.326 ns/op Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: replace fclass with feq as performance optimization. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17450/files - new: https://git.openjdk.org/jdk/pull/17450/files/c732a3ff..01ec31c2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17450&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17450&range=00-01 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/17450.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17450/head:pull/17450 PR: https://git.openjdk.org/jdk/pull/17450 From mli at openjdk.org Mon Jan 22 10:49:40 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 22 Jan 2024 10:49:40 GMT Subject: RFR: JDK-8318228: RISC-V: C2 ConvF2HF [v2] In-Reply-To: References: Message-ID: On Fri, 19 Jan 2024 21:30:30 GMT, Vladimir Kempik wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> replace fclass with feq as performance optimization. > > src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1863: > >> 1861: >> 1862: // check whether it's a NaN. >> 1863: fclass_s(t0, src); > > As showed roundD intrinsic PR, ( https://github.com/openjdk/jdk/pull/16382/files#diff-7a5c3ed05b6f3f06ed1c59f5fc2a14ec566a6a5bd1d09606115767daa99115bdR4252 ) the feq_s(t0, src, src) + beqz(t0, label) seems to be a faster check for NaN, could you check the jmh numbers with feq_s ? Thanks for the suggestion. Yes, it bring better performance. After: Benchmark (size) Mode Cnt Score Error Units Fp16ConversionBenchmark.floatToFloat16 2048 avgt 5 3753.179 ? 43.557 ns/op Fp16ConversionBenchmark.floatToFloat16Memory 2048 avgt 5 19.772 ? 0.860 ns/op Before: Benchmark (size) Mode Cnt Score Error Units Fp16ConversionBenchmark.floatToFloat16 2048 avgt 5 4099.820 ? 57.671 ns/op Fp16ConversionBenchmark.floatToFloat16Memory 2048 avgt 5 20.181 ? 0.108 ns/op ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17450#discussion_r1461681485 From tholenstein at openjdk.org Mon Jan 22 11:34:38 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 22 Jan 2024 11:34:38 GMT Subject: RFR: JDK-8321984: IGV: Upgrade to Netbeans Platform 20 [v6] In-Reply-To: References: Message-ID: > Upgraded IGV and dependencies to the newest Netbeans Platform 20 which was released on December 2023. > > Tested that IGV still behaves as expected after the upgrade. Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: Revert "make IGV build work with mainline JDK version" This reverts commit 35080de727135741fee96834470072f002c32501. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17106/files - new: https://git.openjdk.org/jdk/pull/17106/files/218ab753..273c9652 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17106&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17106&range=04-05 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17106.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17106/head:pull/17106 PR: https://git.openjdk.org/jdk/pull/17106 From chagedorn at openjdk.org Mon Jan 22 11:39:28 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 22 Jan 2024 11:39:28 GMT Subject: RFR: JDK-8321984: IGV: Upgrade to Netbeans Platform 20 [v6] In-Reply-To: References: Message-ID: On Mon, 22 Jan 2024 11:34:38 GMT, Tobias Holenstein wrote: >> Upgraded IGV and dependencies to the newest Netbeans Platform 20 which was released on December 2023. >> >> Tested that IGV still behaves as expected after the upgrade. > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > Revert "make IGV build work with mainline JDK version" > > This reverts commit 35080de727135741fee96834470072f002c32501. Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17106#pullrequestreview-1836264005 From tholenstein at openjdk.org Mon Jan 22 11:39:31 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 22 Jan 2024 11:39:31 GMT Subject: RFR: JDK-8321984: IGV: Upgrade to Netbeans Platform 20 [v4] In-Reply-To: References: Message-ID: On Fri, 19 Jan 2024 07:30:43 GMT, Roberto Casta?eda Lozano wrote: >> Tobias Holenstein has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains two new commits since the last revision: >> >> - make IGV build work with mainline JDK version >> - remove requirement for nashorn > > src/utils/IdealGraphVisualizer/pom.xml line 80: > >> 78: >> 79: >> 80: 17 > > I suggest being a bit more conservative (safe) here and specifying an upper bound for the JDK version, since we cannot guarantee that all IGV dependencies will be compatible with any future JDK release. I suggest sticking to the newest JDK supported by the NetBeans Platform (for NetBeans 20 that would be JDK 21, see https://netbeans.apache.org/front/main/download/nb20/). ok, I changed it back to be JDK 17-21 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17106#discussion_r1461733837 From epeter at openjdk.org Mon Jan 22 11:47:56 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 22 Jan 2024 11:47:56 GMT Subject: RFR: 8315361: C2 SuperWord: refactor out loop analysis into shared auto-vectorization facility VLoopAnalyzer [v9] In-Reply-To: References: Message-ID: <3VLAR16c-akx8SWUTX-vLa0OkCfmYskJuSL1mYNpsaE=.4472fd88-ad72-479d-9674-fbcae3e1c4a3@github.com> > This is a refactoring of `SuperWord`. > > **Goals** > > 1. Clean up `SuperWord`: disentangle different components, make them more **modular**. > 2. Make the loop analysis parts a **shared facility**, not just for SuperWord but also the post-loop-vectorizer ([JDK-8308994](https://bugs.openjdk.org/browse/JDK-8308994)). > 3. It is also a necessary step on my bigger plans for improvement with the C2 Auto-Vectorizer ([see my blog post](https://eme64.github.io/blog/2023/11/03/C2-AutoVectorizer-Improvement-Ideas.html)). > 4. Improve tracing in the auto-vectorization by making it more systematic. > > **Summary** > > - I wrote a summary of how C2 auto-vectorization with SuperWord works (please read!): > https://github.com/openjdk/jdk/blob/95fd361e60fc66eb91edad321662e508b2d1bdde/src/hotspot/share/opto/superword.hpp#L32-L177 > - I moved many `Superword` components out to `VLoop` and its subclass `VLoopAnalyzer`. The idea is that any vectorizer can use these facilities in the future. They are therefore made more modular, which should hopefully make future changes easier. These components are: > - Checking the pre-conditions for vectorization (e.g. no unwanted ctrl-flow). > - `VLoop::check_preconditions_helper` replaces code from old `SuperWord::transform_loop`. > - Running all submodules of `VLoopAnalyzer`: `VLoopAnalyzer::analyze_helper`. Replaces analysis part of `SuperWord::SLP_extract`. > - Finding and marking reductions -> `VLoopReductions` > - Detecting memory slices -> `VLoopMemorySlices` > - Analyzing the body -> `VLoopBody` (renamed `in_bb` -> `in_body`) > - Determining vector element types, and functions to determine the `vector_width` of a node -> `VLoopTypes` > - Constructing the dependence graph -> `VLoopDependenceGraph`. Replaces old `DepGraph` with all its components. > - New: CompileCommand option `TraceAutovectorization` > - Run with `-XX:CompileCommand=traceAutovectorization,*::*,help` to get a usage description. > - Replaced all printing with flags `TraceSuperWord` (and `Verbose`) and of `VectorizeDebug`. > - The advantage of a CompileCommand is that tracing can be applied selectively for only a limited set of java classes / methods. > - It uses tags, which are more readable than the `VectorizeDebug` bit-flags. These tags can be used for all parts of the vectorizer, but one can also target SuperWord specifically. > - I systematically added tracing at every point where vectorization (partially) fails (use tag `SW_REJECTIONS`). > - `TraceSuperWord` still works, and perfo... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: use Composition for VLoop in VLoopAnalyzer, rather than inheritance ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16620/files - new: https://git.openjdk.org/jdk/pull/16620/files/4302f58b..abd9bd43 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16620&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16620&range=07-08 Stats: 74 lines in 5 files changed: 7 ins; 4 del; 63 mod Patch: https://git.openjdk.org/jdk/pull/16620.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16620/head:pull/16620 PR: https://git.openjdk.org/jdk/pull/16620 From mli at openjdk.org Mon Jan 22 11:52:31 2024 From: mli at openjdk.org (Hamlin Li) Date: Mon, 22 Jan 2024 11:52:31 GMT Subject: RFR: JDK-8318228: RISC-V: C2 ConvF2HF [v2] In-Reply-To: References: Message-ID: On Fri, 19 Jan 2024 15:15:10 GMT, Hamlin Li wrote: >> It's discarded intentionally, just like in HF2F it's [padded with zero in lower 13 bits](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp#L1800) > > Right now, I'm not sure. > I have below patch: > > diff --git a/src/java.base/share/classes/java/lang/Float.java b/src/java.base/share/classes/java/lang/Float.java > index 7508c22d7f4..f96e23b568e 100644 > --- a/src/java.base/share/classes/java/lang/Float.java > +++ b/src/java.base/share/classes/java/lang/Float.java > @@ -1108,9 +1108,7 @@ public static short floatToFloat16(float f) { > // Preserve high order bit of float NaN in the > // binary16 result NaN (tenth bit); OR in remaining > // bits into lower 9 bits of binary 16 significand. > - | (doppel & 0x007f_e000) >> 13 // 10 bits > - | (doppel & 0x0000_1ff0) >> 4 // 9 bits > - | (doppel & 0x0000_000f)); // 4 bits > + | (doppel & 0x007f_e000) >> 13); // 10 bits > } > > float abs_f = Math.abs(f); > > And, test/jdk/java/lang/Float/Binary16ConversionNaN.java/Binary16Conversion.java both passed. > > Either the tests(both library and hotspot) + intrinsics (not sure if intrinsics on other platforms need improvement) needs to be improved, or the code in library needs to be simplified. (To be frank, I don't think NaN needs such a complicated spec/design, but it depends on the spec). > > I just filed a library bug to discuss it, [JDK-8324212](https://bugs.openjdk.org/browse/JDK-8324212) Per discussion at: https://mail.openjdk.org/pipermail/riscv-port-dev/2022-December/000706.html, when NaN is used as input and output is also NaN, then there is no restriction on the exact NaN number, this is confirmed by the java library team. That means we can return any NaN if the input is an NaN. So, I think we're fine to just discard the last 13 bits (as implemented in this patch), and in this way it's convenient for us as this will help to pass all the existing tests in hotspot. But, for long term and performance consideration, I think we need to re-visit all the NaN related intrinsics in riscv to check if we need to treat NaN specially rather than just leveraging the riscv default behaviour. And I filed a bug to track this task: [JDK-8324303](https://bugs.openjdk.org/browse/JDK-8324303) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17450#discussion_r1461748743 From rgiulietti at openjdk.org Mon Jan 22 12:03:49 2024 From: rgiulietti at openjdk.org (Raffaello Giulietti) Date: Mon, 22 Jan 2024 12:03:49 GMT Subject: RFR: 8282365: Consolidate and improve division by constant idealizations [v44] In-Reply-To: References: Message-ID: On Thu, 18 Jan 2024 15:26:35 GMT, Raffaello Giulietti wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> update include order and license year > > src/hotspot/share/opto/divconstants.cpp line 122: > >> 120: // c * d - m is the intersection of (0, m / v_neg] and (0, m / v_pos). Which is (0, m / v_pos) >> 121: // if v_pos >= v_neg and (0, m / v_neg] otherwise. >> 122: // > > The analysis seem correct. The convoluted test to break out of the while loop can be simplified if we are willing to consider two cases. Equality in `c / m <= (1 / d) * ((v + 1) / v)` is equivalent to `c / m = b / v`, where `b = (v + 1) / d` is an integer. Note that the fraction `b / v` is irreducible: a common divisor of `b` an `v` has to divide any linear combination as well, in particular it has to divide `d * b - 1 * v = 1`. Thus, `c >= b` and `m >= v`. Since `m = 2^s`, `v` must be a power of 2. This means that, when `m` is a power of 2, the only way to have equality is `v_neg > v_pos` _and_ `v_neg` is a power of 2 (say `v_neg = 2^e`), which should be a rare case. This can be detected cheaply early, before the loop. In such a case, the smallest `c` is `b` and the smallest `s` is `e`, and a result meeting `s >= min_s` doesn't need any iterative algorithm. Otherwise equality does not hold. Now, `c / m < (1 / d) * ((v + 1) / v)` is equivalent to `m > v * rc`. In turn, `m > v * rc` <=> `m / v > rc` <=> `ceil(m / v) > rc`. Thus, rather than maintaining invariants for `qv = floor(m / v)` and `rv = m - qv * v` as currently defined, we can redefine them as `qv = ceil(m / v)` and `rv = qv * v - m` (`0 <= rv < v`) and maintain _these_ invariants instead. T qv = 1; T rv = v - 1; ... if (rv >= v - rv) { // 2 * rv >= v qv_ovf = qv > min_signed; qv = qv * 2 - 1; rv = rv * 2 - v; } else { // 2 * rv < v qv_ovf = qv >= min_signed; qv = qv * 2; rv = rv * 2; } The test to exit the loop then reduces to `qv > rc || qv_ovf`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1461743490 From tholenstein at openjdk.org Mon Jan 22 12:21:38 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 22 Jan 2024 12:21:38 GMT Subject: Integrated: JDK-8321984: IGV: Upgrade to Netbeans Platform 20 In-Reply-To: References: Message-ID: On Thu, 14 Dec 2023 11:36:28 GMT, Tobias Holenstein wrote: > Upgraded IGV and dependencies to the newest Netbeans Platform 20 which was released on December 2023. > > Tested that IGV still behaves as expected after the upgrade. This pull request has now been integrated. Changeset: be943a9f Author: Tobias Holenstein URL: https://git.openjdk.org/jdk/commit/be943a9fd67f957b2a44dbd6531690b3ef3895dd Stats: 31 lines in 2 files changed: 5 ins; 14 del; 12 mod 8321984: IGV: Upgrade to Netbeans Platform 20 Reviewed-by: rcastanedalo, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/17106 From rcastanedalo at openjdk.org Mon Jan 22 12:21:36 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 22 Jan 2024 12:21:36 GMT Subject: RFR: JDK-8321984: IGV: Upgrade to Netbeans Platform 20 [v6] In-Reply-To: References: Message-ID: On Mon, 22 Jan 2024 11:34:38 GMT, Tobias Holenstein wrote: >> Upgraded IGV and dependencies to the newest Netbeans Platform 20 which was released on December 2023. >> >> Tested that IGV still behaves as expected after the upgrade. > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > Revert "make IGV build work with mainline JDK version" > > This reverts commit 35080de727135741fee96834470072f002c32501. Thanks Toby, looks good! ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17106#pullrequestreview-1836331187 From tholenstein at openjdk.org Mon Jan 22 12:21:37 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 22 Jan 2024 12:21:37 GMT Subject: RFR: JDK-8321984: IGV: Upgrade to Netbeans Platform 20 [v6] In-Reply-To: References: Message-ID: On Mon, 22 Jan 2024 12:15:24 GMT, Roberto Casta?eda Lozano wrote: >> Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert "make IGV build work with mainline JDK version" >> >> This reverts commit 35080de727135741fee96834470072f002c32501. > > Thanks Toby, looks good! Thanks @robcasloz and @chhagedorn for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/17106#issuecomment-1903885497 From tholenstein at openjdk.org Mon Jan 22 12:30:28 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 22 Jan 2024 12:30:28 GMT Subject: RFR: JDK-8210858: AArch64: remove Math.log intrinsic [v2] In-Reply-To: References: <7Zgt8VOxl48Niocueb70XwxmYIAIaEZUlxKGvPP6BnU=.e354f71c-d6c6-40b9-a93e-98ef7ba8009d@github.com> Message-ID: On Mon, 22 Jan 2024 10:20:39 GMT, Andrew Haley wrote: > Separately, I do wonder if [JDK-8301202](https://bugs.openjdk.org/browse/JDK-8301202) gives us a reason to avoid even calling to runtime, and instead just stay in Java completely. Yes, would be nice to stay in Java. We had a similar discussion in https://github.com/openjdk/jdk/pull/13606 where I benchmarked the intrinsics on different platforms. But, they still gave up to 40% performance improvement vs. staying in java. I don't think much has changed performance wise since then.. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17480#issuecomment-1903902536 From varadam at openjdk.org Mon Jan 22 12:31:41 2024 From: varadam at openjdk.org (Varada M) Date: Mon, 22 Jan 2024 12:31:41 GMT Subject: RFR: JDK-8322648: Improve class initialization barrier in TemplateTable::_new for PPC Message-ID: ppc port implementation of https://github.com/openjdk/jdk/pull/17006 Fastdebug and Release : build and tier1 testing successful. JBS Issue : [JDK-8322648](https://bugs.openjdk.org/browse/JDK-8322648) ------------- Commit messages: - JDK-8322648: Improve class initialization barrier in TemplateTable::_new for PPC - JDK-8322648: Improve class initialization barrier in TemplateTable::_new for PPC - JDK-8322648: Improve class initialization barrier in TemplateTable::_new for PPC Changes: https://git.openjdk.org/jdk/pull/17518/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17518&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8322648 Stats: 7 lines in 1 file changed: 3 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/17518.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17518/head:pull/17518 PR: https://git.openjdk.org/jdk/pull/17518 From tholenstein at openjdk.org Mon Jan 22 12:38:28 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 22 Jan 2024 12:38:28 GMT Subject: RFR: JDK-8210858: AArch64: remove Math.log intrinsic [v2] In-Reply-To: References: <7Zgt8VOxl48Niocueb70XwxmYIAIaEZUlxKGvPP6BnU=.e354f71c-d6c6-40b9-a93e-98ef7ba8009d@github.com> Message-ID: On Thu, 18 Jan 2024 14:44:25 GMT, Tobias Holenstein wrote: >> [JDK-8215133](https://bugs.openjdk.org/browse/JDK-8215133) disabled vmIntrinsics::_dlog. Remove it now >> >> ### Why remove >> >> That Java specification says: >> >> "The computed result must be within 1 ulp of the exact result. Results must be semi-monotonic... whenever the mathematical function is non-decreasing, so is the floating-point approximation, likewise, whenever the mathematical function is non-increasing, so is the floating-point approximation" >> >> There is no proof of the monotonicity of this intrinsics at the moment. > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > remove MacroAssembler::fast_log() and generate_dlog() > > Separately, I do wonder if [JDK-8301202](https://bugs.openjdk.org/browse/JDK-8301202) gives us a reason to avoid even calling to runtime, and instead just stay in Java completely. > > Yes, would be nice to stay in Java. We had a similar discussion in #13606 where I benchmarked the intrinsics on different platforms. But, they still gave up to 40% performance improvement vs. staying in java. I don't think much has changed performance wise since then.. @jddarcy any thoughts on that? Perhaps I should re-run the benchmarks from May 2023.. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17480#issuecomment-1903912495 From shade at openjdk.org Mon Jan 22 12:38:29 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 22 Jan 2024 12:38:29 GMT Subject: RFR: JDK-8210858: AArch64: remove Math.log intrinsic [v2] In-Reply-To: References: <7Zgt8VOxl48Niocueb70XwxmYIAIaEZUlxKGvPP6BnU=.e354f71c-d6c6-40b9-a93e-98ef7ba8009d@github.com> Message-ID: <-tZP1uJ_KakJpOgQT68dO5pqFOJ61YU-CV9-r2lzPz8=.894eac3e-ceaa-4908-8c67-9859caee6046@github.com> On Thu, 18 Jan 2024 14:44:25 GMT, Tobias Holenstein wrote: >> [JDK-8215133](https://bugs.openjdk.org/browse/JDK-8215133) disabled vmIntrinsics::_dlog. Remove it now >> >> ### Why remove >> >> That Java specification says: >> >> "The computed result must be within 1 ulp of the exact result. Results must be semi-monotonic... whenever the mathematical function is non-decreasing, so is the floating-point approximation, likewise, whenever the mathematical function is non-increasing, so is the floating-point approximation" >> >> There is no proof of the monotonicity of this intrinsics at the moment. > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > remove MacroAssembler::fast_log() and generate_dlog() To be clear, the question on going to Java instead of runtime does not block this PR. I think the discussion on merits of removing the StubRoutines can continue in the relevant RFE. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17480#issuecomment-1903915959 From tholenstein at openjdk.org Mon Jan 22 12:42:26 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 22 Jan 2024 12:42:26 GMT Subject: RFR: JDK-8210858: AArch64: remove Math.log intrinsic [v2] In-Reply-To: <-tZP1uJ_KakJpOgQT68dO5pqFOJ61YU-CV9-r2lzPz8=.894eac3e-ceaa-4908-8c67-9859caee6046@github.com> References: <7Zgt8VOxl48Niocueb70XwxmYIAIaEZUlxKGvPP6BnU=.e354f71c-d6c6-40b9-a93e-98ef7ba8009d@github.com> <-tZP1uJ_KakJpOgQT68dO5pqFOJ61YU-CV9-r2lzPz8=.894eac3e-ceaa-4908-8c67-9859caee6046@github.com> Message-ID: On Mon, 22 Jan 2024 12:36:08 GMT, Aleksey Shipilev wrote: > I think the discussion on merits of removing the StubRoutines can continue in the relevant RFE. So move that discussion to https://bugs.openjdk.org/browse/JDK-8324296 ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17480#issuecomment-1903921032 From shade at openjdk.org Mon Jan 22 12:46:27 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 22 Jan 2024 12:46:27 GMT Subject: RFR: JDK-8210858: AArch64: remove Math.log intrinsic [v2] In-Reply-To: References: <7Zgt8VOxl48Niocueb70XwxmYIAIaEZUlxKGvPP6BnU=.e354f71c-d6c6-40b9-a93e-98ef7ba8009d@github.com> <-tZP1uJ_KakJpOgQT68dO5pqFOJ61YU-CV9-r2lzPz8=.894eac3e-ceaa-4908-8c67-9859caee6046@github.com> Message-ID: On Mon, 22 Jan 2024 12:39:20 GMT, Tobias Holenstein wrote: > > I think the discussion on merits of removing the StubRoutines can continue in the relevant RFE. > So move that discussion to https://bugs.openjdk.org/browse/JDK-8324296 ? Yes! ------------- PR Comment: https://git.openjdk.org/jdk/pull/17480#issuecomment-1903927939 From tholenstein at openjdk.org Mon Jan 22 13:04:50 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 22 Jan 2024 13:04:50 GMT Subject: RFR: JDK-8210858: AArch64: remove Math.log intrinsic [v2] In-Reply-To: References: <7Zgt8VOxl48Niocueb70XwxmYIAIaEZUlxKGvPP6BnU=.e354f71c-d6c6-40b9-a93e-98ef7ba8009d@github.com> Message-ID: On Mon, 22 Jan 2024 09:53:08 GMT, Aleksey Shipilev wrote: >> Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: >> >> remove MacroAssembler::fast_log() and generate_dlog() > > src/hotspot/cpu/aarch64/c1_LIRGenerator_aarch64.cpp line 834: > >> 832: break; >> 833: case vmIntrinsics::_dlog: >> 834: assert(StubRoutines::dlog() == nullptr, "no Math.log intrinsic on AArch64"); > > Is there a reason to assert this? I think the comment should be enough. And it could be richer as well, something like: > > > // Math.log intrinsic is not implemented on AArch64 (see JDK-8210858), > // but we can still call the shared runtime. Right, a comment makes more sense here than an assert! done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17480#discussion_r1461830337 From tholenstein at openjdk.org Mon Jan 22 13:04:47 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 22 Jan 2024 13:04:47 GMT Subject: RFR: JDK-8210858: AArch64: remove Math.log intrinsic [v3] In-Reply-To: <7Zgt8VOxl48Niocueb70XwxmYIAIaEZUlxKGvPP6BnU=.e354f71c-d6c6-40b9-a93e-98ef7ba8009d@github.com> References: <7Zgt8VOxl48Niocueb70XwxmYIAIaEZUlxKGvPP6BnU=.e354f71c-d6c6-40b9-a93e-98ef7ba8009d@github.com> Message-ID: <0VQCOFKHo_kZ8PV5f2WhDDca-N71QcLu35N9Wh3hJ8U=.78afd8f5-b503-4fe5-9a94-1c5eac73870f@github.com> > [JDK-8215133](https://bugs.openjdk.org/browse/JDK-8215133) disabled vmIntrinsics::_dlog. Remove it now > > ### Why remove > > That Java specification says: > > "The computed result must be within 1 ulp of the exact result. Results must be semi-monotonic... whenever the mathematical function is non-decreasing, so is the floating-point approximation, likewise, whenever the mathematical function is non-increasing, so is the floating-point approximation" > > There is no proof of the monotonicity of this intrinsics at the moment. Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: Update c1_LIRGenerator_aarch64.cpp replaced asserts with comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17480/files - new: https://git.openjdk.org/jdk/pull/17480/files/f1eaee30..e6dc5b93 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17480&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17480&range=01-02 Stats: 4 lines in 1 file changed: 1 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17480.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17480/head:pull/17480 PR: https://git.openjdk.org/jdk/pull/17480 From eastigeevich at openjdk.org Mon Jan 22 13:11:27 2024 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Mon, 22 Jan 2024 13:11:27 GMT Subject: RFR: 8324241: Always record evol_method deps to avoid excessive method flushing In-Reply-To: References: Message-ID: On Sat, 20 Jan 2024 19:48:07 GMT, Volker Simonis wrote: > Currently we don't record dependencies on redefined methods (i.e. `evol_method` dependencies) in JIT compiled methods if none of the `can_redefine_classes`, `can_retransform_classes` or `can_generate_breakpoint_events` JVMTI capabalities is set. This means that if a JVMTI agent which requests one of these capabilities is dynamically attached, all the methods which have been JIT compiled until that point, will be marked for deoptimization and flushed from the code cache. For large, warmed-up applications this mean deoptimization and instant recompilation of thousands if not then-thousands of methods, which can lead to dramatic performance/latency drop-downs for several minutes. > > One could argue that dynamic agent attach is now deprecated anyway (see [JEP 451: Prepare to Disallow the Dynamic Loading of Agents](https://openjdk.org/jeps/451)) and this problem could be solved by making the recording of `evol_method` dependencies dependent on the new `-XX:+EnableDynamicAgentLoading` flag isntead of the concrete JVMTI capabilities (because the presence of the flag indicates that an agent will be loaded eventually). > > But there a single, however important exception to this rule and that's JFR. JFR is advertised as low overhead profiler which can be enabled in production at any time. However, when JFR is started dynamically (e.g. through JCMD or JMX) it will silently load a HotSpot internl JVMTI agent which requests the `can_retransform_classes` and retransforms some classes. This will inevitably trigger the deoptimization of all compiled methods as described above. > > I'd therefor like to propose to *always* and unconditionally record `evol_method` dependencies in JIT compiled code by exporting the relevant properties right at startup in `init_globals()`: > ```c++ > jint init_globals() { > management_init(); > JvmtiExport::initialize_oop_storage(); > +#if INCLUDE_JVMTI > + JvmtiExport::set_can_hotswap_or_post_breakpoint(true); > + JvmtiExport::set_all_dependencies_are_recorded(true); > +#endif > > > My measurements indicate that the overhead of doing so is minimal (around 1% increase of nmethod size) and justifies the benefit. E.g. a Spring Petclinic application started with `-Xbatch -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation` compiles about ~11500 methods (~9000 with C1 and ~2500 with C2) resulting in an aggregated nmethod size of around ~40bm. Additionally recording `evol_method` dependencies only increases this size be about 400kb. The ration remains about the same i... lgtm ------------- Marked as reviewed by eastigeevich (Committer). PR Review: https://git.openjdk.org/jdk/pull/17509#pullrequestreview-1836432196 From epeter at openjdk.org Mon Jan 22 14:47:02 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 22 Jan 2024 14:47:02 GMT Subject: RFR: 8315361: C2 SuperWord: refactor out loop analysis into shared auto-vectorization facility VLoopAnalyzer [v10] In-Reply-To: References: Message-ID: <3dKc1dKlIwJOSq6LTbtnUflJzQK24f9OSwuMLYynZaU=.5acb390e-bb5f-4215-a019-588f91f3d63a@github.com> > This is a refactoring of `SuperWord`. > > **Goals** > > 1. Clean up `SuperWord`: disentangle different components, make them more **modular**. > 2. Make the loop analysis parts a **shared facility**, not just for SuperWord but also the post-loop-vectorizer ([JDK-8308994](https://bugs.openjdk.org/browse/JDK-8308994)). > 3. It is also a necessary step on my bigger plans for improvement with the C2 Auto-Vectorizer ([see my blog post](https://eme64.github.io/blog/2023/11/03/C2-AutoVectorizer-Improvement-Ideas.html)). > 4. Improve tracing in the auto-vectorization by making it more systematic. > > **Summary** > > - I wrote a summary of how C2 auto-vectorization with SuperWord works (please read!): > https://github.com/openjdk/jdk/blob/95fd361e60fc66eb91edad321662e508b2d1bdde/src/hotspot/share/opto/superword.hpp#L32-L177 > - I moved many `Superword` components out to `VLoop` and its subclass `VLoopAnalyzer`. The idea is that any vectorizer can use these facilities in the future. They are therefore made more modular, which should hopefully make future changes easier. These components are: > - Checking the pre-conditions for vectorization (e.g. no unwanted ctrl-flow). > - `VLoop::check_preconditions_helper` replaces code from old `SuperWord::transform_loop`. > - Running all submodules of `VLoopAnalyzer`: `VLoopAnalyzer::analyze_helper`. Replaces analysis part of `SuperWord::SLP_extract`. > - Finding and marking reductions -> `VLoopReductions` > - Detecting memory slices -> `VLoopMemorySlices` > - Analyzing the body -> `VLoopBody` (renamed `in_bb` -> `in_body`) > - Determining vector element types, and functions to determine the `vector_width` of a node -> `VLoopTypes` > - Constructing the dependence graph -> `VLoopDependenceGraph`. Replaces old `DepGraph` with all its components. > - New: CompileCommand option `TraceAutovectorization` > - Run with `-XX:CompileCommand=traceAutovectorization,*::*,help` to get a usage description. > - Replaced all printing with flags `TraceSuperWord` (and `Verbose`) and of `VectorizeDebug`. > - The advantage of a CompileCommand is that tracing can be applied selectively for only a limited set of java classes / methods. > - It uses tags, which are more readable than the `VectorizeDebug` bit-flags. These tags can be used for all parts of the vectorizer, but one can also target SuperWord specifically. > - I systematically added tracing at every point where vectorization (partially) fails (use tag `SW_REJECTIONS`). > - `TraceSuperWord` still works, and perfo... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: avoid resetting, at ResourceArea, every lpt gets its own VLoopAnalyzer ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16620/files - new: https://git.openjdk.org/jdk/pull/16620/files/abd9bd43..9fe3fde9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16620&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16620&range=08-09 Stats: 124 lines in 6 files changed: 21 ins; 46 del; 57 mod Patch: https://git.openjdk.org/jdk/pull/16620.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16620/head:pull/16620 PR: https://git.openjdk.org/jdk/pull/16620 From epeter at openjdk.org Mon Jan 22 15:23:46 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 22 Jan 2024 15:23:46 GMT Subject: RFR: 8315361: C2 SuperWord: refactor out loop analysis into shared auto-vectorization facility VLoopAnalyzer [v11] In-Reply-To: References: Message-ID: > This is a refactoring of `SuperWord`. > > **Goals** > > 1. Clean up `SuperWord`: disentangle different components, make them more **modular**. > 2. Make the loop analysis parts a **shared facility**, not just for SuperWord but also the post-loop-vectorizer ([JDK-8308994](https://bugs.openjdk.org/browse/JDK-8308994)). > 3. It is also a necessary step on my bigger plans for improvement with the C2 Auto-Vectorizer ([see my blog post](https://eme64.github.io/blog/2023/11/03/C2-AutoVectorizer-Improvement-Ideas.html)). > 4. Improve tracing in the auto-vectorization by making it more systematic. > > **Summary** > > - I wrote a summary of how C2 auto-vectorization with SuperWord works (please read!): > https://github.com/openjdk/jdk/blob/95fd361e60fc66eb91edad321662e508b2d1bdde/src/hotspot/share/opto/superword.hpp#L32-L177 > - I moved many `Superword` components out to `VLoop` and its subclass `VLoopAnalyzer`. The idea is that any vectorizer can use these facilities in the future. They are therefore made more modular, which should hopefully make future changes easier. These components are: > - Checking the pre-conditions for vectorization (e.g. no unwanted ctrl-flow). > - `VLoop::check_preconditions_helper` replaces code from old `SuperWord::transform_loop`. > - Running all submodules of `VLoopAnalyzer`: `VLoopAnalyzer::analyze_helper`. Replaces analysis part of `SuperWord::SLP_extract`. > - Finding and marking reductions -> `VLoopReductions` > - Detecting memory slices -> `VLoopMemorySlices` > - Analyzing the body -> `VLoopBody` (renamed `in_bb` -> `in_body`) > - Determining vector element types, and functions to determine the `vector_width` of a node -> `VLoopTypes` > - Constructing the dependence graph -> `VLoopDependenceGraph`. Replaces old `DepGraph` with all its components. > - New: CompileCommand option `TraceAutovectorization` > - Run with `-XX:CompileCommand=traceAutovectorization,*::*,help` to get a usage description. > - Replaced all printing with flags `TraceSuperWord` (and `Verbose`) and of `VectorizeDebug`. > - The advantage of a CompileCommand is that tracing can be applied selectively for only a limited set of java classes / methods. > - It uses tags, which are more readable than the `VectorizeDebug` bit-flags. These tags can be used for all parts of the vectorizer, but one can also target SuperWord specifically. > - I systematically added tracing at every point where vectorization (partially) fails (use tag `SW_REJECTIONS`). > - `TraceSuperWord` still works, and perfo... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: refactor into PhaseIdealLoop::autovectorize, have VLoop before VLoopAnalyzer ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16620/files - new: https://git.openjdk.org/jdk/pull/16620/files/9fe3fde9..b05444dc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16620&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16620&range=09-10 Stats: 67 lines in 6 files changed: 35 ins; 20 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/16620.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16620/head:pull/16620 PR: https://git.openjdk.org/jdk/pull/16620 From epeter at openjdk.org Mon Jan 22 16:11:02 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 22 Jan 2024 16:11:02 GMT Subject: RFR: 8315361: C2 SuperWord: refactor out loop analysis into shared auto-vectorization facility VLoopAnalyzer [v12] In-Reply-To: References: Message-ID: > This is a refactoring of `SuperWord`. > > **Goals** > > 1. Clean up `SuperWord`: disentangle different components, make them more **modular**. > 2. Make the loop analysis parts a **shared facility**, not just for SuperWord but also the post-loop-vectorizer ([JDK-8308994](https://bugs.openjdk.org/browse/JDK-8308994)). > 3. It is also a necessary step on my bigger plans for improvement with the C2 Auto-Vectorizer ([see my blog post](https://eme64.github.io/blog/2023/11/03/C2-AutoVectorizer-Improvement-Ideas.html)). > 4. Improve tracing in the auto-vectorization by making it more systematic. > > **Summary** > > - I wrote a summary of how C2 auto-vectorization with SuperWord works (please read!): > https://github.com/openjdk/jdk/blob/95fd361e60fc66eb91edad321662e508b2d1bdde/src/hotspot/share/opto/superword.hpp#L32-L177 > - I moved many `Superword` components out to `VLoop` and its subclass `VLoopAnalyzer`. The idea is that any vectorizer can use these facilities in the future. They are therefore made more modular, which should hopefully make future changes easier. These components are: > - Checking the pre-conditions for vectorization (e.g. no unwanted ctrl-flow). > - `VLoop::check_preconditions_helper` replaces code from old `SuperWord::transform_loop`. > - Running all submodules of `VLoopAnalyzer`: `VLoopAnalyzer::analyze_helper`. Replaces analysis part of `SuperWord::SLP_extract`. > - Finding and marking reductions -> `VLoopReductions` > - Detecting memory slices -> `VLoopMemorySlices` > - Analyzing the body -> `VLoopBody` (renamed `in_bb` -> `in_body`) > - Determining vector element types, and functions to determine the `vector_width` of a node -> `VLoopTypes` > - Constructing the dependence graph -> `VLoopDependenceGraph`. Replaces old `DepGraph` with all its components. > - New: CompileCommand option `TraceAutovectorization` > - Run with `-XX:CompileCommand=traceAutovectorization,*::*,help` to get a usage description. > - Replaced all printing with flags `TraceSuperWord` (and `Verbose`) and of `VectorizeDebug`. > - The advantage of a CompileCommand is that tracing can be applied selectively for only a limited set of java classes / methods. > - It uses tags, which are more readable than the `VectorizeDebug` bit-flags. These tags can be used for all parts of the vectorizer, but one can also target SuperWord specifically. > - I systematically added tracing at every point where vectorization (partially) fails (use tag `SW_REJECTIONS`). > - `TraceSuperWord` still works, and perfo... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: remove SuperWord::init, and reserve space in data structures ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16620/files - new: https://git.openjdk.org/jdk/pull/16620/files/b05444dc..30ef793b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16620&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16620&range=10-11 Stats: 63 lines in 3 files changed: 7 ins; 18 del; 38 mod Patch: https://git.openjdk.org/jdk/pull/16620.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16620/head:pull/16620 PR: https://git.openjdk.org/jdk/pull/16620 From kxu at openjdk.org Mon Jan 22 16:28:02 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 22 Jan 2024 16:28:02 GMT Subject: RFR: 8320237 C2: late inlining of method handle invoke causes duplicate lines in PrintInlining output [v3] In-Reply-To: References: Message-ID: > This PR resolves [JDK-8320237](https://bugs.openjdk.org/browse/JDK-8320237) > > The original behavior produces both a failure and success message upon late inlining which is confusing. The patch removes the failure message if inlining was successful. Huge thanks to @rwestrel for reporting and working out a solution. > > Unit test `test/hotspot/jtreg/compiler/inlining/TestDuplicatedLateInliningOutput.java` is added and passing. Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - add -XX:-BackgroundCompilation flag - Merge branch 'master' into JDK-8320237 - fix VM crashes - update test summary, requirements, and VM flags - Merge branch 'master' into JDK-8320237 - make regex whitespace consistent and to trigger GHA - 8320237: C2: late inlining of method handle invoke causes duplicate lines in PrintInlining output ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17147/files - new: https://git.openjdk.org/jdk/pull/17147/files/94d78fa1..cf23f46a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17147&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17147&range=01-02 Stats: 39112 lines in 1107 files changed: 23690 ins; 10535 del; 4887 mod Patch: https://git.openjdk.org/jdk/pull/17147.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17147/head:pull/17147 PR: https://git.openjdk.org/jdk/pull/17147 From phh at openjdk.org Mon Jan 22 16:41:26 2024 From: phh at openjdk.org (Paul Hohensee) Date: Mon, 22 Jan 2024 16:41:26 GMT Subject: RFR: 8324241: Always record evol_method deps to avoid excessive method flushing In-Reply-To: References: Message-ID: On Sat, 20 Jan 2024 19:48:07 GMT, Volker Simonis wrote: > Currently we don't record dependencies on redefined methods (i.e. `evol_method` dependencies) in JIT compiled methods if none of the `can_redefine_classes`, `can_retransform_classes` or `can_generate_breakpoint_events` JVMTI capabalities is set. This means that if a JVMTI agent which requests one of these capabilities is dynamically attached, all the methods which have been JIT compiled until that point, will be marked for deoptimization and flushed from the code cache. For large, warmed-up applications this mean deoptimization and instant recompilation of thousands if not then-thousands of methods, which can lead to dramatic performance/latency drop-downs for several minutes. > > One could argue that dynamic agent attach is now deprecated anyway (see [JEP 451: Prepare to Disallow the Dynamic Loading of Agents](https://openjdk.org/jeps/451)) and this problem could be solved by making the recording of `evol_method` dependencies dependent on the new `-XX:+EnableDynamicAgentLoading` flag isntead of the concrete JVMTI capabilities (because the presence of the flag indicates that an agent will be loaded eventually). > > But there a single, however important exception to this rule and that's JFR. JFR is advertised as low overhead profiler which can be enabled in production at any time. However, when JFR is started dynamically (e.g. through JCMD or JMX) it will silently load a HotSpot internl JVMTI agent which requests the `can_retransform_classes` and retransforms some classes. This will inevitably trigger the deoptimization of all compiled methods as described above. > > I'd therefor like to propose to *always* and unconditionally record `evol_method` dependencies in JIT compiled code by exporting the relevant properties right at startup in `init_globals()`: > ```c++ > jint init_globals() { > management_init(); > JvmtiExport::initialize_oop_storage(); > +#if INCLUDE_JVMTI > + JvmtiExport::set_can_hotswap_or_post_breakpoint(true); > + JvmtiExport::set_all_dependencies_are_recorded(true); > +#endif > > > My measurements indicate that the overhead of doing so is minimal (around 1% increase of nmethod size) and justifies the benefit. E.g. a Spring Petclinic application started with `-Xbatch -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation` compiles about ~11500 methods (~9000 with C1 and ~2500 with C2) resulting in an aggregated nmethod size of around ~40bm. Additionally recording `evol_method` dependencies only increases this size be about 400kb. The ration remains about the same i... Marked as reviewed by phh (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/17509#pullrequestreview-1836891644 From shade at openjdk.org Mon Jan 22 16:41:27 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 22 Jan 2024 16:41:27 GMT Subject: RFR: 8324241: Always record evol_method deps to avoid excessive method flushing In-Reply-To: References: Message-ID: On Sat, 20 Jan 2024 19:48:07 GMT, Volker Simonis wrote: > Instead of unconditionally recording evol_method dependencies we could guard the recording by a new flag. But this would only make sense if that flag would be on by default and I don't know if such a flag is justified just for the rare (or non-existent?) cases where somebody wants to disable the recording of the dependencies. I think introducing a diagnostic flag is sensible here. If we figure out much later that this solution comes with some other (worse) problems, the diagnostic flag gives us the options to: a) clearly point at this addition as the culprit; b) have the easily deployable solution to restore the original behavior. For the change itself, we need to amend the comment near `VM_RedefineClasses::flush_dependent_code` definition that talks about this peculiar behavior, which now changes. Actually, maybe even the implementation of `flush_dependent_code` should now trust (and assert) that all dependencies are now recorded? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17509#issuecomment-1904382556 From epeter at openjdk.org Mon Jan 22 16:42:29 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 22 Jan 2024 16:42:29 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v3] In-Reply-To: References: Message-ID: On Thu, 18 Jan 2024 12:37:25 GMT, Daniel Lund?n wrote: >> This changeset translates the tests in `compiler/c2/cr7200264/` to use the IR verification framework. >> >> The proposed translation, to the extent possible, attempts to preserve the semantics of the original test. A major difference is that the IR checks are now local (for every `test_*` method) instead of global. The execution time of the new test is comparable to the old test. >> >> Testing: >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/7553846710) >> - Ran the new translated tests within all tier1 through tier10 contexts on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. >> - Tested that manually adding `-XX:LoopUnrollLimit=0` to the test framework flags caused the translated tests to fail. Note: it is, however, no longer possible to break the test by passing `-XX:LoopUnrollLimit=0` on the command line. > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Apply suggestions from code review > > Co-authored-by: Emanuel Peter @dlunde you are right, those examples don't contain any `LShiftVI`. See the comment below. ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17428#pullrequestreview-1836893808 From epeter at openjdk.org Mon Jan 22 16:42:30 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 22 Jan 2024 16:42:30 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v2] In-Reply-To: <1mAl4_Dta7eNeyzajQdwYz5SoJnTAFMis6GtC_IxlrQ=.b072536d-9238-4b39-8e30-ff8b6c5afb44@github.com> References: <1mAl4_Dta7eNeyzajQdwYz5SoJnTAFMis6GtC_IxlrQ=.b072536d-9238-4b39-8e30-ff8b6c5afb44@github.com> Message-ID: On Thu, 18 Jan 2024 12:04:00 GMT, Emanuel Peter wrote: >> They are shifted by 32 bit, so maybe that creates something odd? > > I see this in the old code: > `@summary 7192963 changes disabled shift vectors` @dlunde Ok, you are right, these do not have any LShiftVI, but they vectorize none the less. The reason is that shifting an int by 32 bits is a no-op. `SHIFT` is a known constant at compile time. For a int-shift we only consider the lowest 5 bits, so shift by 32 is same as shift by 0. Hence, I would now put a negative rule for `LSHIFT_VI`, but a posisitve for `LoadVector` and `StoreVector`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17428#discussion_r1462122237 From eastigeevich at openjdk.org Mon Jan 22 16:50:27 2024 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Mon, 22 Jan 2024 16:50:27 GMT Subject: RFR: 8324241: Always record evol_method deps to avoid excessive method flushing In-Reply-To: References: Message-ID: On Mon, 22 Jan 2024 16:37:14 GMT, Aleksey Shipilev wrote: >> Currently we don't record dependencies on redefined methods (i.e. `evol_method` dependencies) in JIT compiled methods if none of the `can_redefine_classes`, `can_retransform_classes` or `can_generate_breakpoint_events` JVMTI capabalities is set. This means that if a JVMTI agent which requests one of these capabilities is dynamically attached, all the methods which have been JIT compiled until that point, will be marked for deoptimization and flushed from the code cache. For large, warmed-up applications this mean deoptimization and instant recompilation of thousands if not then-thousands of methods, which can lead to dramatic performance/latency drop-downs for several minutes. >> >> One could argue that dynamic agent attach is now deprecated anyway (see [JEP 451: Prepare to Disallow the Dynamic Loading of Agents](https://openjdk.org/jeps/451)) and this problem could be solved by making the recording of `evol_method` dependencies dependent on the new `-XX:+EnableDynamicAgentLoading` flag isntead of the concrete JVMTI capabilities (because the presence of the flag indicates that an agent will be loaded eventually). >> >> But there a single, however important exception to this rule and that's JFR. JFR is advertised as low overhead profiler which can be enabled in production at any time. However, when JFR is started dynamically (e.g. through JCMD or JMX) it will silently load a HotSpot internl JVMTI agent which requests the `can_retransform_classes` and retransforms some classes. This will inevitably trigger the deoptimization of all compiled methods as described above. >> >> I'd therefor like to propose to *always* and unconditionally record `evol_method` dependencies in JIT compiled code by exporting the relevant properties right at startup in `init_globals()`: >> ```c++ >> jint init_globals() { >> management_init(); >> JvmtiExport::initialize_oop_storage(); >> +#if INCLUDE_JVMTI >> + JvmtiExport::set_can_hotswap_or_post_breakpoint(true); >> + JvmtiExport::set_all_dependencies_are_recorded(true); >> +#endif >> >> >> My measurements indicate that the overhead of doing so is minimal (around 1% increase of nmethod size) and justifies the benefit. E.g. a Spring Petclinic application started with `-Xbatch -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation` compiles about ~11500 methods (~9000 with C1 and ~2500 with C2) resulting in an aggregated nmethod size of around ~40bm. Additionally recording `evol_method` dependencies only increases this size be about 400kb.... > >> Instead of unconditionally recording evol_method dependencies we could guard the recording by a new flag. But this would only make sense if that flag would be on by default and I don't know if such a flag is justified just for the rare (or non-existent?) cases where somebody wants to disable the recording of the dependencies. > > I think introducing a diagnostic flag is sensible here. If we figure out much later that this solution comes with some other (worse) problems, the diagnostic flag gives us the options to: a) clearly point at this addition as the culprit; b) have the easily deployable solution to restore the original behavior. > > For the change itself, we need to amend the comment near `VM_RedefineClasses::flush_dependent_code` definition that talks about this peculiar behavior, which now changes. Actually, maybe even the implementation of `flush_dependent_code` should now trust (and assert) that all dependencies are now recorded? @shipilev I filed https://bugs.openjdk.org/browse/JDK-8324318 which is related to this change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17509#issuecomment-1904402169 From duke at openjdk.org Mon Jan 22 16:57:31 2024 From: duke at openjdk.org (Joshua Cao) Date: Mon, 22 Jan 2024 16:57:31 GMT Subject: RFR: 8323220: Reassociate loop invariants involved in Cmps and Add/Subs [v3] In-Reply-To: References: Message-ID: On Sat, 20 Jan 2024 08:29:47 GMT, Xin Liu wrote: >> Joshua Cao has updated the pull request incrementally with one additional commit since the last revision: >> >> Assert for n2. Variables for n1/n2 opcode. More concise comments. >> Overflow/random tests > > src/hotspot/share/opto/loopTransform.cpp line 382: > >> 380: } >> 381: phase->register_new_node(inv, phase->get_early_ctrl(inv)); >> 382: if (n1_is_cmp) { > > CmpNode is subclass of SubNode. if n is CmpI/L, n->is_Sub() is also true. > can we use the old logic for your new comparison expressions? > > yes, we still need to check if n1 is CmpNode or SubNode here. I am not following. We are explicitly checking for Cmp here. Why do we also need to check if it is a Sub? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17375#discussion_r1462145062 From qamai at openjdk.org Mon Jan 22 18:31:50 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 22 Jan 2024 18:31:50 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v3] In-Reply-To: References: Message-ID: <2770hngfjAroYOCvePmwPAQngHDsMQSWbGWCWEVFtw4=.6a955e2a-69b0-4923-8fd7-8fc8f03291f6@github.com> > Hi, > > This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. > > In general, a TypeInt/Long represents a set of values x that satisfies: x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (~x & ones) == 0. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must normalize the constraints (tighten the constraints so that they are optimal) before constructing a TypeInt/Long instance. > > This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. > > Please kindly review, thanks a lot. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-4 Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: add comments, group arguments to reduce C-style reference passing arguments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17508/files - new: https://git.openjdk.org/jdk/pull/17508/files/6b417f94..756d6159 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=01-02 Stats: 413 lines in 5 files changed: 117 ins; 95 del; 201 mod Patch: https://git.openjdk.org/jdk/pull/17508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17508/head:pull/17508 PR: https://git.openjdk.org/jdk/pull/17508 From qamai at openjdk.org Mon Jan 22 18:31:53 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 22 Jan 2024 18:31:53 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v2] In-Reply-To: References: Message-ID: <_87LnDzDPxsFHSJmnHUJcNy_BABKu8uCd7MLW6p28Is=.e58bf7a8-1df9-40c8-952d-28ea9d309f35@github.com> On Mon, 22 Jan 2024 09:42:33 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> fix tests, add verify > > src/hotspot/share/opto/rangeinference.cpp line 164: > >> 162: U zeros2 = zeros; >> 163: U ones2 = ones; >> 164: normalize_constraints_simple(empty2, lo2, hi2, zeros2, ones2); > > At a quick glance of the code, it is not immediately clear why we need 2 ranges here. Can you add some comments, or maybe improve the naming from 1 and 2 to something more expressive? I have added explanations, basically the intersection of the signed and unsigned ranges is the union of 2 ranges, 1 in the negative range and the other in the positive one. We then process these separately and merge the results. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1462249445 From qamai at openjdk.org Mon Jan 22 18:34:27 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 22 Jan 2024 18:34:27 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v2] In-Reply-To: References: Message-ID: <_AtgnWt3Zi-VZB-Ghpkb3OUTl82mgz9aCB_X5Eft0e4=.7d33b5b2-8050-46d5-9443-6f5dbc209ae6@github.com> On Mon, 22 Jan 2024 09:54:59 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> fix tests, add verify > > @merykitty I just had a quick look. Thanks for spitting out parts and making it more reviewable that way! Since John Rose is generally excited (https://github.com/openjdk/jdk/pull/15440#issuecomment-1901609719), I'll now put in a bit more effort into reviewing this. > > Thanks for adding some gtests. > I would really like to see some IR tests, where we can see that this folds cases, and folds them correctly. > And just some general java-code correctness tests, which test your optimizations in an end-to-end way. > > I have a general concern with the math functions. They have quite a few arguments, often 5-10. And on average half of them are passed as a reference. Sometimes it is hard to immediately see which are the arguments that will not be mutated, and which are return values, and which are both arguments and return values, which are simply further constrained/narrowed etc. > > I wonder if it might be better to have types like: > > SRange {lo, hi} > URange {lo, hi} > KnownBits {ones, zeros} > > Make them immutable, i.e. the fields are constant. > Then as function parameters, you always pass in these as const, and return the new values (possibly in some combined type, or a pair or tuple or whatever). > > I think it would make the code cleaner, have fewer arguments, and a bit easier to reason about when things are immutable. > > Plus, then you can put the range-inference methods inside those classes, you can directly ask such an object if it is empty etc. You could for example have somelthing like: > `SRange::constrained_with(KnownBits) -> returns SRange`. Basically I'm asking for the code to be a little more object-oriented, and less C-style ;) @eme64 Thanks a lot for your reviews, I hope that I have addressed your concerns. Regarding IR tests, I don't think I can come up with any as there is no node taking advantage of the additional information yet. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-1904576181 From sviswanathan at openjdk.org Mon Jan 22 19:57:27 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 22 Jan 2024 19:57:27 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v8] In-Reply-To: References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Message-ID: On Sat, 20 Jan 2024 09:55:45 GMT, Jatin Bhateja wrote: >> Hi, >> >> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only targets. >> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 instruction set. >> These are very frequently used APIs in columnar database filter operation. >> >> Implementation uses a lookup table to record permute indices. Table index is computed using >> mask argument of compress/expand operation. >> >> Following are the performance number of JMH micro included with the patch. >> >> >> System : Intel(R) Xeon(R) Platinum 8480+ (Sapphire Rapids) >> >> Baseline: >> Benchmark (size) Mode Cnt Score Error Units >> ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 142.767 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 71.436 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 35.992 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 182.151 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 91.096 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 44.757 ops/ms >> ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 184.099 ops/ms >> ColumnFilterBenchmark.filterIntColumn 2047 thrpt 2 91.981 ops/ms >> ColumnFilterBenchmark.filterIntColumn 4096 thrpt 2 45.170 ops/ms >> ColumnFilterBenchmark.filterLongColumn 1024 thrpt 2 148.017 ops/ms >> ColumnFilterBenchmark.filterLongColumn 2047 thrpt 2 73.516 ops/ms >> ColumnFilterBenchmark.filterLongColumn 4096 thrpt 2 36.844 ops/ms >> >> Withopt: >> Benchmark (size) Mode Cnt Score Error Units >> ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 2051.707 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 914.072 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 489.898 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 5324.195 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 2587.229 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 1278.665 ops/ms >> ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 4149.384 ops/ms >> ColumnFilterBenchmark.filterIntColumn 2047 thrpt ... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolution Nice work! The patch looks good to me now. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17261#pullrequestreview-1837250879 From qamai at openjdk.org Mon Jan 22 19:58:50 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 22 Jan 2024 19:58:50 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v4] In-Reply-To: References: Message-ID: > Hi, > > This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. > > In general, a TypeInt/Long represents a set of values x that satisfies: x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (~x & ones) == 0. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must normalize the constraints (tighten the constraints so that they are optimal) before constructing a TypeInt/Long instance. > > This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. > > Please kindly review, thanks a lot. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-4 Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: fix release build ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17508/files - new: https://git.openjdk.org/jdk/pull/17508/files/756d6159..1faa48b5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=02-03 Stats: 7 lines in 1 file changed: 4 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17508/head:pull/17508 PR: https://git.openjdk.org/jdk/pull/17508 From coleenp at openjdk.org Mon Jan 22 20:53:27 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 22 Jan 2024 20:53:27 GMT Subject: RFR: 8324241: Always record evol_method deps to avoid excessive method flushing In-Reply-To: References: Message-ID: On Sat, 20 Jan 2024 19:48:07 GMT, Volker Simonis wrote: > Currently we don't record dependencies on redefined methods (i.e. `evol_method` dependencies) in JIT compiled methods if none of the `can_redefine_classes`, `can_retransform_classes` or `can_generate_breakpoint_events` JVMTI capabalities is set. This means that if a JVMTI agent which requests one of these capabilities is dynamically attached, all the methods which have been JIT compiled until that point, will be marked for deoptimization and flushed from the code cache. For large, warmed-up applications this mean deoptimization and instant recompilation of thousands if not then-thousands of methods, which can lead to dramatic performance/latency drop-downs for several minutes. > > One could argue that dynamic agent attach is now deprecated anyway (see [JEP 451: Prepare to Disallow the Dynamic Loading of Agents](https://openjdk.org/jeps/451)) and this problem could be solved by making the recording of `evol_method` dependencies dependent on the new `-XX:+EnableDynamicAgentLoading` flag isntead of the concrete JVMTI capabilities (because the presence of the flag indicates that an agent will be loaded eventually). > > But there a single, however important exception to this rule and that's JFR. JFR is advertised as low overhead profiler which can be enabled in production at any time. However, when JFR is started dynamically (e.g. through JCMD or JMX) it will silently load a HotSpot internl JVMTI agent which requests the `can_retransform_classes` and retransforms some classes. This will inevitably trigger the deoptimization of all compiled methods as described above. > > I'd therefor like to propose to *always* and unconditionally record `evol_method` dependencies in JIT compiled code by exporting the relevant properties right at startup in `init_globals()`: > ```c++ > jint init_globals() { > management_init(); > JvmtiExport::initialize_oop_storage(); > +#if INCLUDE_JVMTI > + JvmtiExport::set_can_hotswap_or_post_breakpoint(true); > + JvmtiExport::set_all_dependencies_are_recorded(true); > +#endif > > > My measurements indicate that the overhead of doing so is minimal (around 1% increase of nmethod size) and justifies the benefit. E.g. a Spring Petclinic application started with `-Xbatch -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation` compiles about ~11500 methods (~9000 with C1 and ~2500 with C2) resulting in an aggregated nmethod size of around ~40bm. Additionally recording `evol_method` dependencies only increases this size be about 400kb. The ration remains about the same i... I support adding this as a diagnostic option, therefore you won't have to make the change in VM_RedefineClasses::flush_dependent_code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17509#issuecomment-1904784430 From simonis at openjdk.org Mon Jan 22 21:26:41 2024 From: simonis at openjdk.org (Volker Simonis) Date: Mon, 22 Jan 2024 21:26:41 GMT Subject: RFR: 8324241: Always record evol_method deps to avoid excessive method flushing [v2] In-Reply-To: References: Message-ID: > Currently we don't record dependencies on redefined methods (i.e. `evol_method` dependencies) in JIT compiled methods if none of the `can_redefine_classes`, `can_retransform_classes` or `can_generate_breakpoint_events` JVMTI capabalities is set. This means that if a JVMTI agent which requests one of these capabilities is dynamically attached, all the methods which have been JIT compiled until that point, will be marked for deoptimization and flushed from the code cache. For large, warmed-up applications this mean deoptimization and instant recompilation of thousands if not then-thousands of methods, which can lead to dramatic performance/latency drop-downs for several minutes. > > One could argue that dynamic agent attach is now deprecated anyway (see [JEP 451: Prepare to Disallow the Dynamic Loading of Agents](https://openjdk.org/jeps/451)) and this problem could be solved by making the recording of `evol_method` dependencies dependent on the new `-XX:+EnableDynamicAgentLoading` flag isntead of the concrete JVMTI capabilities (because the presence of the flag indicates that an agent will be loaded eventually). > > But there a single, however important exception to this rule and that's JFR. JFR is advertised as low overhead profiler which can be enabled in production at any time. However, when JFR is started dynamically (e.g. through JCMD or JMX) it will silently load a HotSpot internl JVMTI agent which requests the `can_retransform_classes` and retransforms some classes. This will inevitably trigger the deoptimization of all compiled methods as described above. > > I'd therefor like to propose to *always* and unconditionally record `evol_method` dependencies in JIT compiled code by exporting the relevant properties right at startup in `init_globals()`: > ```c++ > jint init_globals() { > management_init(); > JvmtiExport::initialize_oop_storage(); > +#if INCLUDE_JVMTI > + JvmtiExport::set_can_hotswap_or_post_breakpoint(true); > + JvmtiExport::set_all_dependencies_are_recorded(true); > +#endif > > > My measurements indicate that the overhead of doing so is minimal (around 1% increase of nmethod size) and justifies the benefit. E.g. a Spring Petclinic application started with `-Xbatch -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation` compiles about ~11500 methods (~9000 with C1 and ~2500 with C2) resulting in an aggregated nmethod size of around ~40bm. Additionally recording `evol_method` dependencies only increases this size be about 400kb. The ration remains about the same i... Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: Guard the feature with a diagnostic option and update the comments in the code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17509/files - new: https://git.openjdk.org/jdk/pull/17509/files/95db4a72..6d3e24ab Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17509&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17509&range=00-01 Stats: 20 lines in 3 files changed: 8 ins; 0 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/17509.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17509/head:pull/17509 PR: https://git.openjdk.org/jdk/pull/17509 From simonis at openjdk.org Mon Jan 22 21:26:42 2024 From: simonis at openjdk.org (Volker Simonis) Date: Mon, 22 Jan 2024 21:26:42 GMT Subject: RFR: 8324241: Always record evol_method deps to avoid excessive method flushing In-Reply-To: References: Message-ID: On Sat, 20 Jan 2024 19:48:07 GMT, Volker Simonis wrote: > Currently we don't record dependencies on redefined methods (i.e. `evol_method` dependencies) in JIT compiled methods if none of the `can_redefine_classes`, `can_retransform_classes` or `can_generate_breakpoint_events` JVMTI capabalities is set. This means that if a JVMTI agent which requests one of these capabilities is dynamically attached, all the methods which have been JIT compiled until that point, will be marked for deoptimization and flushed from the code cache. For large, warmed-up applications this mean deoptimization and instant recompilation of thousands if not then-thousands of methods, which can lead to dramatic performance/latency drop-downs for several minutes. > > One could argue that dynamic agent attach is now deprecated anyway (see [JEP 451: Prepare to Disallow the Dynamic Loading of Agents](https://openjdk.org/jeps/451)) and this problem could be solved by making the recording of `evol_method` dependencies dependent on the new `-XX:+EnableDynamicAgentLoading` flag isntead of the concrete JVMTI capabilities (because the presence of the flag indicates that an agent will be loaded eventually). > > But there a single, however important exception to this rule and that's JFR. JFR is advertised as low overhead profiler which can be enabled in production at any time. However, when JFR is started dynamically (e.g. through JCMD or JMX) it will silently load a HotSpot internl JVMTI agent which requests the `can_retransform_classes` and retransforms some classes. This will inevitably trigger the deoptimization of all compiled methods as described above. > > I'd therefor like to propose to *always* and unconditionally record `evol_method` dependencies in JIT compiled code by exporting the relevant properties right at startup in `init_globals()`: > ```c++ > jint init_globals() { > management_init(); > JvmtiExport::initialize_oop_storage(); > +#if INCLUDE_JVMTI > + JvmtiExport::set_can_hotswap_or_post_breakpoint(true); > + JvmtiExport::set_all_dependencies_are_recorded(true); > +#endif > > > My measurements indicate that the overhead of doing so is minimal (around 1% increase of nmethod size) and justifies the benefit. E.g. a Spring Petclinic application started with `-Xbatch -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation` compiles about ~11500 methods (~9000 with C1 and ~2500 with C2) resulting in an aggregated nmethod size of around ~40bm. Additionally recording `evol_method` dependencies only increases this size be about 400kb. The ration remains about the same i... Thanks everybody for looking at this. I've now guarded the feature with a diagnostic command, updated the source code comments around `VM_RedefineClasses::flush_dependent_code` and added an assertion for the new flag. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17509#issuecomment-1904842722 From duke at openjdk.org Mon Jan 22 21:52:28 2024 From: duke at openjdk.org (Joshua Cao) Date: Mon, 22 Jan 2024 21:52:28 GMT Subject: RFR: 8323220: Reassociate loop invariants involved in Cmps and Add/Subs [v3] In-Reply-To: References: Message-ID: On Mon, 22 Jan 2024 09:11:13 GMT, Emanuel Peter wrote: >> Joshua Cao has updated the pull request incrementally with one additional commit since the last revision: >> >> Assert for n2. Variables for n1/n2 opcode. More concise comments. >> Overflow/random tests > > test/hotspot/jtreg/compiler/loopopts/InvariantCodeMotionReassociateCmp.java line 163: > >> 161: } >> 162: } >> 163: } > > Why could you not have random arguments here? [notEqualsInvariantSubVariantLong.txt](https://github.com/openjdk/jdk/files/14015703/notEqualsInvariantSubVariantLong.txt) Attaching the IdealGraphVisualizer file (renamed to `.txt` cause GitHub wants that). When we use `Argument.NUMBER_42`, we can avoid generating traps. When `i == 0` we enter the if statement, otherwise we exit. If we use random numbers, we most likely never enter the if statement so we can generate a trap there. In this case, the trap block as an extra `SubL` for `inv1 - i`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17375#discussion_r1462462105 From duke at openjdk.org Mon Jan 22 21:58:42 2024 From: duke at openjdk.org (Joshua Cao) Date: Mon, 22 Jan 2024 21:58:42 GMT Subject: RFR: 8323220: Reassociate loop invariants involved in Cmps and Add/Subs [v4] In-Reply-To: References: Message-ID: > // inv1 == (x + inv2) => ( inv1 - inv2 ) == x > // inv1 == (x - inv2) => ( inv1 + inv2 ) == x > // inv1 == (inv2 - x) => (-inv1 + inv2 ) == x > > > For example, > > > fn(inv1, inv2) > while(...) > x = foobar() > if inv1 == x + inv2 > blackhole() > > > We can transform this into > > > fn(inv1, inv2) > t = inv1 - inv2 > while(...) > x = foobar() > if t == x > blackhole() > > > Here is an example: https://github.com/openjdk/jdk/blob/b78896b9aafcb15f453eaed6e154a5461581407b/src/java.base/share/classes/java/lang/invoke/LambdaFormEditor.java#L910. LHS `1` and RHS `pos` are both loop invariant > > Passes tier1 locally on Linux machine. Passes GHA on my fork. Joshua Cao has updated the pull request incrementally with one additional commit since the last revision: Formatting and fix typo ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17375/files - new: https://git.openjdk.org/jdk/pull/17375/files/cb6d24b4..5ea7a53a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17375&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17375&range=02-03 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/17375.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17375/head:pull/17375 PR: https://git.openjdk.org/jdk/pull/17375 From eastigeevich at openjdk.org Mon Jan 22 23:28:28 2024 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Mon, 22 Jan 2024 23:28:28 GMT Subject: RFR: 8324241: Always record evol_method deps to avoid excessive method flushing [v2] In-Reply-To: References: Message-ID: <7YLjZUNrhGYSma_Hop1JnZWj75lytQMnXczCP5JSQQc=.7da0a1c2-423f-4144-be20-2aa3707ce332@github.com> On Mon, 22 Jan 2024 21:26:41 GMT, Volker Simonis wrote: >> Currently we don't record dependencies on redefined methods (i.e. `evol_method` dependencies) in JIT compiled methods if none of the `can_redefine_classes`, `can_retransform_classes` or `can_generate_breakpoint_events` JVMTI capabalities is set. This means that if a JVMTI agent which requests one of these capabilities is dynamically attached, all the methods which have been JIT compiled until that point, will be marked for deoptimization and flushed from the code cache. For large, warmed-up applications this mean deoptimization and instant recompilation of thousands if not then-thousands of methods, which can lead to dramatic performance/latency drop-downs for several minutes. >> >> One could argue that dynamic agent attach is now deprecated anyway (see [JEP 451: Prepare to Disallow the Dynamic Loading of Agents](https://openjdk.org/jeps/451)) and this problem could be solved by making the recording of `evol_method` dependencies dependent on the new `-XX:+EnableDynamicAgentLoading` flag isntead of the concrete JVMTI capabilities (because the presence of the flag indicates that an agent will be loaded eventually). >> >> But there a single, however important exception to this rule and that's JFR. JFR is advertised as low overhead profiler which can be enabled in production at any time. However, when JFR is started dynamically (e.g. through JCMD or JMX) it will silently load a HotSpot internl JVMTI agent which requests the `can_retransform_classes` and retransforms some classes. This will inevitably trigger the deoptimization of all compiled methods as described above. >> >> I'd therefor like to propose to *always* and unconditionally record `evol_method` dependencies in JIT compiled code by exporting the relevant properties right at startup in `init_globals()`: >> ```c++ >> jint init_globals() { >> management_init(); >> JvmtiExport::initialize_oop_storage(); >> +#if INCLUDE_JVMTI >> + JvmtiExport::set_can_hotswap_or_post_breakpoint(true); >> + JvmtiExport::set_all_dependencies_are_recorded(true); >> +#endif >> >> >> My measurements indicate that the overhead of doing so is minimal (around 1% increase of nmethod size) and justifies the benefit. E.g. a Spring Petclinic application started with `-Xbatch -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation` compiles about ~11500 methods (~9000 with C1 and ~2500 with C2) resulting in an aggregated nmethod size of around ~40bm. Additionally recording `evol_method` dependencies only increases this size be about 400kb.... > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Guard the feature with a diagnostic option and update the comments in the code src/hotspot/share/runtime/globals.hpp line 2013: > 2011: "Profile exception handlers") \ > 2012: \ > 2013: product(bool, AlwaysRecordEvolDependencies, true, DIAGNOSTIC, \ As we record all dependencies not only evol_method ones, should we name it just: `AlwaysRecordDependencies`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17509#discussion_r1462550445 From eastigeevich at openjdk.org Mon Jan 22 23:35:28 2024 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Mon, 22 Jan 2024 23:35:28 GMT Subject: RFR: 8324241: Always record evol_method deps to avoid excessive method flushing [v2] In-Reply-To: References: Message-ID: On Mon, 22 Jan 2024 21:26:41 GMT, Volker Simonis wrote: >> Currently we don't record dependencies on redefined methods (i.e. `evol_method` dependencies) in JIT compiled methods if none of the `can_redefine_classes`, `can_retransform_classes` or `can_generate_breakpoint_events` JVMTI capabalities is set. This means that if a JVMTI agent which requests one of these capabilities is dynamically attached, all the methods which have been JIT compiled until that point, will be marked for deoptimization and flushed from the code cache. For large, warmed-up applications this mean deoptimization and instant recompilation of thousands if not then-thousands of methods, which can lead to dramatic performance/latency drop-downs for several minutes. >> >> One could argue that dynamic agent attach is now deprecated anyway (see [JEP 451: Prepare to Disallow the Dynamic Loading of Agents](https://openjdk.org/jeps/451)) and this problem could be solved by making the recording of `evol_method` dependencies dependent on the new `-XX:+EnableDynamicAgentLoading` flag isntead of the concrete JVMTI capabilities (because the presence of the flag indicates that an agent will be loaded eventually). >> >> But there a single, however important exception to this rule and that's JFR. JFR is advertised as low overhead profiler which can be enabled in production at any time. However, when JFR is started dynamically (e.g. through JCMD or JMX) it will silently load a HotSpot internl JVMTI agent which requests the `can_retransform_classes` and retransforms some classes. This will inevitably trigger the deoptimization of all compiled methods as described above. >> >> I'd therefor like to propose to *always* and unconditionally record `evol_method` dependencies in JIT compiled code by exporting the relevant properties right at startup in `init_globals()`: >> ```c++ >> jint init_globals() { >> management_init(); >> JvmtiExport::initialize_oop_storage(); >> +#if INCLUDE_JVMTI >> + JvmtiExport::set_can_hotswap_or_post_breakpoint(true); >> + JvmtiExport::set_all_dependencies_are_recorded(true); >> +#endif >> >> >> My measurements indicate that the overhead of doing so is minimal (around 1% increase of nmethod size) and justifies the benefit. E.g. a Spring Petclinic application started with `-Xbatch -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation` compiles about ~11500 methods (~9000 with C1 and ~2500 with C2) resulting in an aggregated nmethod size of around ~40bm. Additionally recording `evol_method` dependencies only increases this size be about 400kb.... > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Guard the feature with a diagnostic option and update the comments in the code src/hotspot/share/runtime/globals.hpp line 2014: > 2012: \ > 2013: product(bool, AlwaysRecordEvolDependencies, true, DIAGNOSTIC, \ > 2014: "Unconditionally record method dependencies on class " \ "... record compiled method dependencies ..."? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17509#discussion_r1462553968 From coleenp at openjdk.org Mon Jan 22 23:41:30 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 22 Jan 2024 23:41:30 GMT Subject: RFR: 8324241: Always record evol_method deps to avoid excessive method flushing [v2] In-Reply-To: References: Message-ID: On Mon, 22 Jan 2024 21:26:41 GMT, Volker Simonis wrote: >> Currently we don't record dependencies on redefined methods (i.e. `evol_method` dependencies) in JIT compiled methods if none of the `can_redefine_classes`, `can_retransform_classes` or `can_generate_breakpoint_events` JVMTI capabalities is set. This means that if a JVMTI agent which requests one of these capabilities is dynamically attached, all the methods which have been JIT compiled until that point, will be marked for deoptimization and flushed from the code cache. For large, warmed-up applications this mean deoptimization and instant recompilation of thousands if not then-thousands of methods, which can lead to dramatic performance/latency drop-downs for several minutes. >> >> One could argue that dynamic agent attach is now deprecated anyway (see [JEP 451: Prepare to Disallow the Dynamic Loading of Agents](https://openjdk.org/jeps/451)) and this problem could be solved by making the recording of `evol_method` dependencies dependent on the new `-XX:+EnableDynamicAgentLoading` flag isntead of the concrete JVMTI capabilities (because the presence of the flag indicates that an agent will be loaded eventually). >> >> But there a single, however important exception to this rule and that's JFR. JFR is advertised as low overhead profiler which can be enabled in production at any time. However, when JFR is started dynamically (e.g. through JCMD or JMX) it will silently load a HotSpot internl JVMTI agent which requests the `can_retransform_classes` and retransforms some classes. This will inevitably trigger the deoptimization of all compiled methods as described above. >> >> I'd therefor like to propose to *always* and unconditionally record `evol_method` dependencies in JIT compiled code by exporting the relevant properties right at startup in `init_globals()`: >> ```c++ >> jint init_globals() { >> management_init(); >> JvmtiExport::initialize_oop_storage(); >> +#if INCLUDE_JVMTI >> + JvmtiExport::set_can_hotswap_or_post_breakpoint(true); >> + JvmtiExport::set_all_dependencies_are_recorded(true); >> +#endif >> >> >> My measurements indicate that the overhead of doing so is minimal (around 1% increase of nmethod size) and justifies the benefit. E.g. a Spring Petclinic application started with `-Xbatch -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation` compiles about ~11500 methods (~9000 with C1 and ~2500 with C2) resulting in an aggregated nmethod size of around ~40bm. Additionally recording `evol_method` dependencies only increases this size be about 400kb.... > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Guard the feature with a diagnostic option and update the comments in the code This looks good. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17509#pullrequestreview-1837600406 From jrose at openjdk.org Mon Jan 22 23:41:33 2024 From: jrose at openjdk.org (John R Rose) Date: Mon, 22 Jan 2024 23:41:33 GMT Subject: RFR: 8324241: Always record evol_method deps to avoid excessive method flushing [v2] In-Reply-To: <7YLjZUNrhGYSma_Hop1JnZWj75lytQMnXczCP5JSQQc=.7da0a1c2-423f-4144-be20-2aa3707ce332@github.com> References: <7YLjZUNrhGYSma_Hop1JnZWj75lytQMnXczCP5JSQQc=.7da0a1c2-423f-4144-be20-2aa3707ce332@github.com> Message-ID: On Mon, 22 Jan 2024 23:26:08 GMT, Evgeny Astigeevich wrote: >> Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: >> >> Guard the feature with a diagnostic option and update the comments in the code > > src/hotspot/share/runtime/globals.hpp line 2013: > >> 2011: "Profile exception handlers") \ >> 2012: \ >> 2013: product(bool, AlwaysRecordEvolDependencies, true, DIAGNOSTIC, \ > > As we record all dependencies not only evol_method ones, should we name it just: `AlwaysRecordDependencies`? That?s not exactly right either. `RecordAllDependencies` would be more like it. Because: - We might record some dependencies that we know we need, and leave others out. - Or, we might record all dependencies, even ones we think we might not need. (But we will need them all if somebody turns on JFR.) I like this change. Having a diagnostic switch means we can do a rough triage if something seems to go wrong with this change, down the road. > src/hotspot/share/runtime/globals.hpp line 2014: > >> 2012: \ >> 2013: product(bool, AlwaysRecordEvolDependencies, true, DIAGNOSTIC, \ >> 2014: "Unconditionally record method dependencies on class " \ > > "... record compiled method dependencies ..."? (yes, ?compiled methods? or even ?nmethods?, or ?methods in code cache?) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17509#discussion_r1462556769 PR Review Comment: https://git.openjdk.org/jdk/pull/17509#discussion_r1462557389 From coleenp at openjdk.org Mon Jan 22 23:41:34 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 22 Jan 2024 23:41:34 GMT Subject: RFR: 8324241: Always record evol_method deps to avoid excessive method flushing [v2] In-Reply-To: References: <7YLjZUNrhGYSma_Hop1JnZWj75lytQMnXczCP5JSQQc=.7da0a1c2-423f-4144-be20-2aa3707ce332@github.com> Message-ID: On Mon, 22 Jan 2024 23:37:35 GMT, John R Rose wrote: >> src/hotspot/share/runtime/globals.hpp line 2013: >> >>> 2011: "Profile exception handlers") \ >>> 2012: \ >>> 2013: product(bool, AlwaysRecordEvolDependencies, true, DIAGNOSTIC, \ >> >> As we record all dependencies not only evol_method ones, should we name it just: `AlwaysRecordDependencies`? > > That?s not exactly right either. `RecordAllDependencies` would be more like it. Because: > > - We might record some dependencies that we know we need, and leave others out. > - Or, we might record all dependencies, even ones we think we might not need. > > (But we will need them all if somebody turns on JFR.) > > I like this change. Having a diagnostic switch means we can do a rough triage if something seems to go wrong with this change, down the road. No, because you want to turn on/off evol_method independently of the other dependencies that the compiler is recording. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17509#discussion_r1462556836 From xliu at openjdk.org Mon Jan 22 23:48:26 2024 From: xliu at openjdk.org (Xin Liu) Date: Mon, 22 Jan 2024 23:48:26 GMT Subject: RFR: 8323220: Reassociate loop invariants involved in Cmps and Add/Subs [v3] In-Reply-To: References: Message-ID: On Mon, 22 Jan 2024 21:50:11 GMT, Joshua Cao wrote: >> test/hotspot/jtreg/compiler/loopopts/InvariantCodeMotionReassociateCmp.java line 163: >> >>> 161: } >>> 162: } >>> 163: } >> >> Why could you not have random arguments here? > > [notEqualsInvariantSubVariantLong.txt](https://github.com/openjdk/jdk/files/14015703/notEqualsInvariantSubVariantLong.txt) > > Attaching the IdealGraphVisualizer file (renamed to `.txt` cause GitHub wants that). > > When we use `Argument.NUMBER_42`, we can avoid generating traps. When `i == 0` we enter the if statement, otherwise we exit. > > If we use random numbers, we most likely never enter the if statement so we can generate a trap there. In this case, the trap block as an extra `SubL` for `inv1 - i`. As long as you have inv1 == inv2, your comparison will have a different result when i == 0. Can you randomize just one value and use them for both inv1 and inv2? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17375#discussion_r1462561383 From xliu at openjdk.org Tue Jan 23 01:10:28 2024 From: xliu at openjdk.org (Xin Liu) Date: Tue, 23 Jan 2024 01:10:28 GMT Subject: RFR: 8323220: Reassociate loop invariants involved in Cmps and Add/Subs [v3] In-Reply-To: References: Message-ID: On Mon, 22 Jan 2024 16:54:57 GMT, Joshua Cao wrote: >> src/hotspot/share/opto/loopTransform.cpp line 382: >> >>> 380: } >>> 381: phase->register_new_node(inv, phase->get_early_ctrl(inv)); >>> 382: if (n1_is_cmp) { >> >> CmpNode is subclass of SubNode. if n is CmpI/L, n->is_Sub() is also true. >> can we use the old logic for your new comparison expressions? >> >> yes, we still need to check if n1 is CmpNode or SubNode here. > > I am not following. We are explicitly checking for Cmp here. Why do we also need to check if it is a Sub? I think CmpNode has the same re-association rule as SubNode. take your own example, > inv1 == (x - inv2) => ( inv1 + inv2 ) == x Cmp(inv1, (x-inv2)) => Eq(0, Sub(inv1, (x-inv2)) => Eq(0, Sub(inv1+inv2, x)) => Cmp(inv1+inv2, x) Originally, n1->is_Sub() covers both CmpNode and SubNode. I don't you need to split them it into 2 cases. I think you only to check if n1 Cmp or Sub when you are going to return Cmp/SubNode ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17375#discussion_r1462611394 From dlong at openjdk.org Tue Jan 23 01:42:27 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 23 Jan 2024 01:42:27 GMT Subject: RFR: 8324213: C1: There is no need for Canonicalizer to handle IfOp [v2] In-Reply-To: <5PJm9P2Eyap_zvmmVIGpbuyMAY9M4BMJUOouWKndftc=.f1924da6-a1e2-453c-81c2-0dbb81935b9f@github.com> References: <5PJm9P2Eyap_zvmmVIGpbuyMAY9M4BMJUOouWKndftc=.f1924da6-a1e2-453c-81c2-0dbb81935b9f@github.com> Message-ID: On Fri, 19 Jan 2024 23:37:36 GMT, Denghui Dong wrote: >> IfOp will not be created when building the graph, so there is no need for Canonicalizer to handle IfOp. > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > update Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/17499#pullrequestreview-1837718828 From dlong at openjdk.org Tue Jan 23 03:21:28 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 23 Jan 2024 03:21:28 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store In-Reply-To: References: Message-ID: On Mon, 22 Jan 2024 10:02:41 GMT, Emanuel Peter wrote: > How would the graph look like if there were memory barriers? Would there not be something on the memory graph or control graph which is neither a Store nor a RangeCheck? Yes, I think so. See for example GraphKit::insert_mem_bar(). > Any what exactly is your concern about `MemNode::unordered`? Would you mind explaining or giving some examples? Any stores with explicit Release semantics, for example, probably expect that ordering to preserved. It looks like such a store can be generated using Unsafe.putByteRelease(). So I would think that the only stores that can take part in this optimization should be using "unordered". How would your optimization apply to the following cases? case 1: Unsafe.putByteRelease(array1, offset+0, 'A'); Unsafe.putByteRelease(array1, offset+1, 'A'); case2: Unsafe.putByteRelease(array1, offset+0, 'A'); Unsafe.putByteRelease(array1, offset+3, 'A'); Unsafe.putByteRelease(array1, offset+1, 'A'); In case 1, I don't think it's safe to turn this into a 16-bit store unless the store is atomic, otherwise another thread might see the writes in the wrong order. Likewise for case 2, another thread shouldn't be able to observe the writes in the wrong order. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16245#issuecomment-1905223447 From wzhuo at openjdk.org Tue Jan 23 03:33:39 2024 From: wzhuo at openjdk.org (Wang Zhuo) Date: Tue, 23 Jan 2024 03:33:39 GMT Subject: RFR: 8324123: aarch64: fix prfm literal encoding in assembler [v3] In-Reply-To: References: Message-ID: > Current prfm literal mode encoding in aarch64 assembler is not correct. > The prfm_literal instruction requires 31 and 30 bits to be 0x11, while current assembler encodes the two bits to be 0x11, which is a ldr instruction, not prfm. > For example, if adding the following code in stubGenerator > __ prfm(Address(__ pc())) > we get a ldr instruction like > ldr x0, 0x0000ffff83f8539c > but it should be a prfm instruction like > prfm pldl1keep, 0x0000ffff8ff8539c > > The bug is caused in ld_st2, literal mode, bit 31 and 30 bits are set to (size & 0b01), while for prfm instructions, 31 and 30 bits must be 0b11. > void ld_st2(Register Rt, const Address &adr, int size, int op, int V = 0) { > starti; > > f(V, 26); // general reg? > zrf(Rt, 0); > > // Encoding for literal loads is done here (rather than pushed > // down into Address::encode) because the encoding of this > // instruction is too different from all of the other forms to > // make it worth sharing. > if (adr.getMode() == Address::literal) { > assert(size == 0b10 || size == 0b11, "bad operand size in ldr"); > assert(op == 0b01, "literal form can only be used with loads"); > f(**size & 0b01, 31, 30**), f(0b011, 29, 27), f(0b00, 25, 24); > int64_t offset = (adr.target() - pc()) >> 2; > sf(offset, 23, 5); > code_section()->relocate(pc(), adr.rspec()); > return; > } > > f(size, 31, 30); > f(op, 23, 22); // str > adr.encode(¤t_insn); > } Wang Zhuo has updated the pull request incrementally with one additional commit since the last revision: adding some comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17482/files - new: https://git.openjdk.org/jdk/pull/17482/files/eda04747..9c0f06ef Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17482&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17482&range=01-02 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17482.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17482/head:pull/17482 PR: https://git.openjdk.org/jdk/pull/17482 From dlong at openjdk.org Tue Jan 23 03:34:29 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 23 Jan 2024 03:34:29 GMT Subject: RFR: 8324241: Always record evol_method deps to avoid excessive method flushing [v2] In-Reply-To: References: Message-ID: On Mon, 22 Jan 2024 21:26:41 GMT, Volker Simonis wrote: >> Currently we don't record dependencies on redefined methods (i.e. `evol_method` dependencies) in JIT compiled methods if none of the `can_redefine_classes`, `can_retransform_classes` or `can_generate_breakpoint_events` JVMTI capabalities is set. This means that if a JVMTI agent which requests one of these capabilities is dynamically attached, all the methods which have been JIT compiled until that point, will be marked for deoptimization and flushed from the code cache. For large, warmed-up applications this mean deoptimization and instant recompilation of thousands if not then-thousands of methods, which can lead to dramatic performance/latency drop-downs for several minutes. >> >> One could argue that dynamic agent attach is now deprecated anyway (see [JEP 451: Prepare to Disallow the Dynamic Loading of Agents](https://openjdk.org/jeps/451)) and this problem could be solved by making the recording of `evol_method` dependencies dependent on the new `-XX:+EnableDynamicAgentLoading` flag isntead of the concrete JVMTI capabilities (because the presence of the flag indicates that an agent will be loaded eventually). >> >> But there a single, however important exception to this rule and that's JFR. JFR is advertised as low overhead profiler which can be enabled in production at any time. However, when JFR is started dynamically (e.g. through JCMD or JMX) it will silently load a HotSpot internl JVMTI agent which requests the `can_retransform_classes` and retransforms some classes. This will inevitably trigger the deoptimization of all compiled methods as described above. >> >> I'd therefor like to propose to *always* and unconditionally record `evol_method` dependencies in JIT compiled code by exporting the relevant properties right at startup in `init_globals()`: >> ```c++ >> jint init_globals() { >> management_init(); >> JvmtiExport::initialize_oop_storage(); >> +#if INCLUDE_JVMTI >> + JvmtiExport::set_can_hotswap_or_post_breakpoint(true); >> + JvmtiExport::set_all_dependencies_are_recorded(true); >> +#endif >> >> >> My measurements indicate that the overhead of doing so is minimal (around 1% increase of nmethod size) and justifies the benefit. E.g. a Spring Petclinic application started with `-Xbatch -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation` compiles about ~11500 methods (~9000 with C1 and ~2500 with C2) resulting in an aggregated nmethod size of around ~40bm. Additionally recording `evol_method` dependencies only increases this size be about 400kb.... > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Guard the feature with a diagnostic option and update the comments in the code src/hotspot/share/runtime/init.cpp line 121: > 119: if (AlwaysRecordEvolDependencies) { > 120: JvmtiExport::set_can_hotswap_or_post_breakpoint(true); > 121: JvmtiExport::set_all_dependencies_are_recorded(true); I think the check for AlwaysRecordEvolDependencies needs to be moved into set_can_hotswap_or_post_breakpoint and set_all_dependencies_are_recorded, otherwise don't we risk the value being accidentally reset to false when set_can_hotswap_or_post_breakpoint() is called again by JvmtiManageCapabilities::update()? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17509#discussion_r1462685351 From wzhuo at openjdk.org Tue Jan 23 03:41:27 2024 From: wzhuo at openjdk.org (Wang Zhuo) Date: Tue, 23 Jan 2024 03:41:27 GMT Subject: RFR: 8324123: aarch64: fix prfm literal encoding in assembler [v2] In-Reply-To: References: Message-ID: <9Mngg9xcYfpFB9HhBOXv-Mr8G76BAfvWVgDQhyrCcdY=.dbb25134-39d1-4a57-b56d-cb8974b45a99@github.com> On Mon, 22 Jan 2024 10:25:38 GMT, Andrew Haley wrote: >> Wang Zhuo has updated the pull request incrementally with one additional commit since the last revision: >> >> do not use inline for prfm encoding function > > src/hotspot/cpu/aarch64/assembler_aarch64.cpp line 190: > >> 188: } >> 189: >> 190: void Assembler::prfm(const Address &adr, prfop pfop) { > > Suggestion: > > // This encoding is similar (but not quite identical) to the encoding used > // by literal ld/st. see JDK-8324123. > void Assembler::prfm(const Address &adr, prfop pfop) { Thanks for your suggestions. I have added the comments, please check ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17482#discussion_r1462687942 From qamai at openjdk.org Tue Jan 23 08:18:41 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 23 Jan 2024 08:18:41 GMT Subject: RFR: 8324433: Introduce a way to determine if an expression is evaluated as a constant by the Jit compiler Message-ID: Hi, This patch introduces `JitCompiler::isConstantExpression` which can be used to statically determine whether an expression has been constant-folded by the Jit compiler, leading to more constant-folding opportunities. For example, it can be used in `MemorySessionImpl::checkValidStateRaw` to eliminate the lifetime check on global sessions without imposing additional branches on other non-global sessions. This is inspired by `std::is_constant_evaluated` in C++. Please kindly give your opinion as well as your reviews, thanks very much. ------------- Commit messages: - bug number - add isConstantExpression Changes: https://git.openjdk.org/jdk/pull/17527/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17527&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8324433 Stats: 162 lines in 6 files changed: 158 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/17527.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17527/head:pull/17527 PR: https://git.openjdk.org/jdk/pull/17527 From shade at openjdk.org Tue Jan 23 08:18:41 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 23 Jan 2024 08:18:41 GMT Subject: RFR: 8324433: Introduce a way to determine if an expression is evaluated as a constant by the Jit compiler In-Reply-To: References: Message-ID: On Tue, 23 Jan 2024 08:10:54 GMT, Quan Anh Mai wrote: > Hi, > > This patch introduces `JitCompiler::isConstantExpression` which can be used to statically determine whether an expression has been constant-folded by the Jit compiler, leading to more constant-folding opportunities. For example, it can be used in `MemorySessionImpl::checkValidStateRaw` to eliminate the lifetime check on global sessions without imposing additional branches on other non-global sessions. This is inspired by `std::is_constant_evaluated` in C++. > > Please kindly give your opinion as well as your reviews, thanks very much. Nice. I had a similar thing stashed in my todo queue. Note that there is already `isCompileConstant` that does similar thing: https://github.com/openjdk/jdk/blob/5a74c2a67ebcb47e51732f03c4be694bdf920469/src/hotspot/share/opto/library_call.cpp#L8189-L8193 -- maybe we should just expose that more widely. I would suggest we just do the private `java.lang.{Integer,...}.isCompileConstant` methods and bind them to that intrinsic. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17527#issuecomment-1905504206 From epeter at openjdk.org Tue Jan 23 08:22:33 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 23 Jan 2024 08:22:33 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v8] In-Reply-To: References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Message-ID: On Sat, 20 Jan 2024 09:55:45 GMT, Jatin Bhateja wrote: >> Hi, >> >> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only targets. >> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 instruction set. >> These are very frequently used APIs in columnar database filter operation. >> >> Implementation uses a lookup table to record permute indices. Table index is computed using >> mask argument of compress/expand operation. >> >> Following are the performance number of JMH micro included with the patch. >> >> >> System : Intel(R) Xeon(R) Platinum 8480+ (Sapphire Rapids) >> >> Baseline: >> Benchmark (size) Mode Cnt Score Error Units >> ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 142.767 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 71.436 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 35.992 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 182.151 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 91.096 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 44.757 ops/ms >> ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 184.099 ops/ms >> ColumnFilterBenchmark.filterIntColumn 2047 thrpt 2 91.981 ops/ms >> ColumnFilterBenchmark.filterIntColumn 4096 thrpt 2 45.170 ops/ms >> ColumnFilterBenchmark.filterLongColumn 1024 thrpt 2 148.017 ops/ms >> ColumnFilterBenchmark.filterLongColumn 2047 thrpt 2 73.516 ops/ms >> ColumnFilterBenchmark.filterLongColumn 4096 thrpt 2 36.844 ops/ms >> >> Withopt: >> Benchmark (size) Mode Cnt Score Error Units >> ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 2051.707 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 914.072 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 489.898 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 5324.195 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 2587.229 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 1278.665 ops/ms >> ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 4149.384 ops/ms >> ColumnFilterBenchmark.filterIntColumn 2047 thrpt ... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolution Looks good, except for one detail ;) src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 5301: > 5299: vmovmskps(rtmp, mask, vec_enc); > 5300: } > 5301: shlq(rtmp, 5); // for 32 byte permute row of 8 x 32 bits. Suggestion: shlq(rtmp, 5); // for 32 byte permute row of 8 x 32 bits / 4 x 64 bits. Since you now merged the code of the two paths ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17261#pullrequestreview-1838095271 PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1462873120 From mli at openjdk.org Tue Jan 23 08:27:29 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 23 Jan 2024 08:27:29 GMT Subject: RFR: JDK-8318228: RISC-V: C2 ConvF2HF [v2] In-Reply-To: References: Message-ID: On Thu, 18 Jan 2024 07:10:58 GMT, Fei Yang wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> replace fclass with feq as performance optimization. > > Hi, Thanks for this change. I have a small question. @RealFYang Based on the [latest confirmation](https://bugs.openjdk.org/browse/JDK-8324212), we don't have to implemented in the same way as [Float.java]([1] https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/Float.java#L1112-L1113). And per performance and test consideration, it's convenient for us to implement in current way. But, for long term and performance consideration, I filed a bug to re-visit all the NaN related intrinsics in riscv: [JDK-8324303](https://bugs.openjdk.org/browse/JDK-8324303) ------------- PR Comment: https://git.openjdk.org/jdk/pull/17450#issuecomment-1905536920 From fyang at openjdk.org Tue Jan 23 08:34:27 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 23 Jan 2024 08:34:27 GMT Subject: RFR: JDK-8318228: RISC-V: C2 ConvF2HF [v2] In-Reply-To: References: Message-ID: On Mon, 22 Jan 2024 10:49:38 GMT, Hamlin Li wrote: >> Hi, >> Can you review the patch to add ConvF2HF intrinsic to JDK for riscv? >> Thanks! >> >> ## Test >> ### Functionality >> #### hotspot tests >> test/hotspot/jtreg/compiler/intrinsics/ >> test/hotspot/jtreg/compiler/c2/irTests >> >> #### jdk tests >> test/jdk/java/lang/Float/Binary16Conversion*.java >> >> ### Performance >> tested on licheepi. >> >> #### with UseZfh enabled >> >> Benchmark (size) Mode Cnt Score Error Units >> Fp16ConversionBenchmark.floatToFloat16 2048 avgt 2 4170.549 ns/op >> Fp16ConversionBenchmark.floatToFloat16Memory 2048 avgt 2 21.492 ns/op >> >> >> #### with UseZfh disabled >> (i.e. disable the intrinsic) >> >> Benchmark (size) Mode Cnt Score Error Units >> Fp16ConversionBenchmark.floatToFloat16 2048 avgt 2 25036.647 ns/op >> Fp16ConversionBenchmark.floatToFloat16Memory 2048 avgt 2 27.326 ns/op > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > replace fclass with feq as performance optimization. All right then! Thanks for finding this out. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17450#pullrequestreview-1838157232 From vkempik at openjdk.org Tue Jan 23 08:37:31 2024 From: vkempik at openjdk.org (Vladimir Kempik) Date: Tue, 23 Jan 2024 08:37:31 GMT Subject: RFR: JDK-8318228: RISC-V: C2 ConvF2HF [v2] In-Reply-To: References: Message-ID: On Mon, 22 Jan 2024 10:49:38 GMT, Hamlin Li wrote: >> Hi, >> Can you review the patch to add ConvF2HF intrinsic to JDK for riscv? >> Thanks! >> >> ## Test >> ### Functionality >> #### hotspot tests >> test/hotspot/jtreg/compiler/intrinsics/ >> test/hotspot/jtreg/compiler/c2/irTests >> >> #### jdk tests >> test/jdk/java/lang/Float/Binary16Conversion*.java >> >> ### Performance >> tested on licheepi. >> >> #### with UseZfh enabled >> >> Benchmark (size) Mode Cnt Score Error Units >> Fp16ConversionBenchmark.floatToFloat16 2048 avgt 2 4170.549 ns/op >> Fp16ConversionBenchmark.floatToFloat16Memory 2048 avgt 2 21.492 ns/op >> >> >> #### with UseZfh disabled >> (i.e. disable the intrinsic) >> >> Benchmark (size) Mode Cnt Score Error Units >> Fp16ConversionBenchmark.floatToFloat16 2048 avgt 2 25036.647 ns/op >> Fp16ConversionBenchmark.floatToFloat16Memory 2048 avgt 2 27.326 ns/op > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > replace fclass with feq as performance optimization. Marked as reviewed by vkempik (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/17450#pullrequestreview-1838161730 From epeter at openjdk.org Tue Jan 23 08:51:30 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 23 Jan 2024 08:51:30 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store In-Reply-To: References: Message-ID: On Tue, 23 Jan 2024 03:18:39 GMT, Dean Long wrote: >>> I took a quick look and I don't see where the code checks that all the candidate stores are using MemNode::unordered and that there aren't memory barriers in between. >> >> @dean-long How would the graph look like if there were memory barriers? Would there not be something on the memory graph or control graph which is neither a Store nor a RangeCheck? Any what exactly is your concern about `MemNode::unordered`? Would you mind explaining or giving some examples? > >> How would the graph look like if there were memory barriers? Would there not be something on the memory graph or control graph which is neither a Store nor a RangeCheck? > > Yes, I think so. See for example GraphKit::insert_mem_bar(). > >> Any what exactly is your concern about `MemNode::unordered`? Would you mind explaining or giving some examples? > > Any stores with explicit Release semantics, for example, probably expect that ordering to preserved. It looks like such a store can be generated using Unsafe.putByteRelease(). So I would think that the only stores that can take part in this optimization should be using "unordered". How would your optimization apply to the following cases? > > case 1: > Unsafe.putByteRelease(array1, offset+0, 'A'); > Unsafe.putByteRelease(array1, offset+1, 'A'); > > case2: > Unsafe.putByteRelease(array1, offset+0, 'A'); > Unsafe.putByteRelease(array1, offset+3, 'A'); > Unsafe.putByteRelease(array1, offset+1, 'A'); > > In case 1, I don't think it's safe to turn this into a 16-bit store unless the store is atomic, otherwise another thread might see the writes in the wrong order. Likewise for case 2, another thread shouldn't be able to observe the writes in the wrong order. @dean-long Ok, but if there is something else on the memory graph than the Stores, or something else on the control-graph than the RangeCheck, then the optimization bails out. I also created an example with `Unsafe.putByteRelease`, and some other variants: import jdk.internal.misc.Unsafe; public class Test { static final Unsafe UNSAFE = Unsafe.getUnsafe(); public static void main(String[] strArr) { byte[] a = new byte[100]; for (int i = 0; i < 10_000; i++) { test1(a); test2(a); test3(a); test4(a); } } static void test1(byte[] a) { UNSAFE.putByte(a, UNSAFE.ARRAY_BYTE_BASE_OFFSET + 0, (byte)0xf1); UNSAFE.putByte(a, UNSAFE.ARRAY_BYTE_BASE_OFFSET + 1, (byte)0xf2); UNSAFE.putByte(a, UNSAFE.ARRAY_BYTE_BASE_OFFSET + 2, (byte)0xf3); UNSAFE.putByte(a, UNSAFE.ARRAY_BYTE_BASE_OFFSET + 3, (byte)0xf4); } static void test2(byte[] a) { UNSAFE.putByteVolatile(a, UNSAFE.ARRAY_BYTE_BASE_OFFSET + 0, (byte)0xf1); UNSAFE.putByteVolatile(a, UNSAFE.ARRAY_BYTE_BASE_OFFSET + 1, (byte)0xf2); UNSAFE.putByteVolatile(a, UNSAFE.ARRAY_BYTE_BASE_OFFSET + 2, (byte)0xf3); UNSAFE.putByteVolatile(a, UNSAFE.ARRAY_BYTE_BASE_OFFSET + 3, (byte)0xf4); } static void test3(byte[] a) { UNSAFE.putByteRelease(a, UNSAFE.ARRAY_BYTE_BASE_OFFSET + 0, (byte)0xf1); UNSAFE.putByteRelease(a, UNSAFE.ARRAY_BYTE_BASE_OFFSET + 1, (byte)0xf2); UNSAFE.putByteRelease(a, UNSAFE.ARRAY_BYTE_BASE_OFFSET + 2, (byte)0xf3); UNSAFE.putByteRelease(a, UNSAFE.ARRAY_BYTE_BASE_OFFSET + 3, (byte)0xf4); } static void test4(byte[] a) { UNSAFE.putByteOpaque(a, UNSAFE.ARRAY_BYTE_BASE_OFFSET + 0, (byte)0xf1); UNSAFE.putByteOpaque(a, UNSAFE.ARRAY_BYTE_BASE_OFFSET + 1, (byte)0xf2); UNSAFE.putByteOpaque(a, UNSAFE.ARRAY_BYTE_BASE_OFFSET + 2, (byte)0xf3); UNSAFE.putByteOpaque(a, UNSAFE.ARRAY_BYTE_BASE_OFFSET + 3, (byte)0xf4); } } `java --add-modules java.base --add-exports java.base/jdk.internal.misc=ALL-UNNAMED --add-exports java.base/jdk.internal.util=ALL-UNNAMED -XX:CompileCommand=compileonly,Test::test* -XX:CompileCommand=printcompilation,Test::test* -XX:+TraceMergeStores -XX:-PrintIdeal Test.java` Only `test1` applies the optimization. If you enable `PrintIdeal`, then you can see that all others have `MemBarCPUOrder` nodes, which are on both the memory and control path. That's the reason my algorithm does not merge them. I'll add this to the regression tests. Your "case 2" would not optimize anyway, since I do not re-order anything, I only merge the stores if they are adjacent with increasing indices. Does this satisfy you, or are you worried about other cases too? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16245#issuecomment-1905577461 From epeter at openjdk.org Tue Jan 23 09:13:04 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 23 Jan 2024 09:13:04 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store [v2] In-Reply-To: References: Message-ID: > This is a feature requiested by @RogerRiggs and @cl4es . > > **Idea** > > Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get speedups by using Unsafe (e.g. `Unsafe.putLongUnaligned`), or ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code. > > This patch here supports a few simple use-cases, like these: > > Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395 > > Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the splitting (i.e. shifting and truncation), and directly store the variable: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456 > > The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian): > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338 > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472 > > **Details** > > This draft currently implements the optimization in an additional special IGVN phase: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485 > > We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. We essentially try to establish a chain of mergable stores: > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806 > > Mergable stores must have the same Opcode (implies they have the same element type and hence size). Further, mergable stores must have the same control (or be separated by only a RangeCheck). Further, they must either both store constants, or adjacent segments of a larger value ... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Add Release/Volatile/Opaque store test, limit optimization to little-endian builds only ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16245/files - new: https://git.openjdk.org/jdk/pull/16245/files/adca9e22..e5a93414 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16245&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16245&range=00-01 Stats: 73 lines in 2 files changed: 72 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16245.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16245/head:pull/16245 PR: https://git.openjdk.org/jdk/pull/16245 From qamai at openjdk.org Tue Jan 23 09:20:46 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 23 Jan 2024 09:20:46 GMT Subject: RFR: 8324433: Introduce a way to determine if an expression is evaluated as a constant by the Jit compiler [v2] In-Reply-To: References: Message-ID: > Hi, > > This patch introduces `JitCompiler::isConstantExpression` which can be used to statically determine whether an expression has been constant-folded by the Jit compiler, leading to more constant-folding opportunities. For example, it can be used in `MemorySessionImpl::checkValidStateRaw` to eliminate the lifetime check on global sessions without imposing additional branches on other non-global sessions. This is inspired by `std::is_constant_evaluated` in C++. > > Please kindly give your opinion as well as your reviews, thanks very much. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: use inline_isCompileConstant ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17527/files - new: https://git.openjdk.org/jdk/pull/17527/files/4d0fc3dd..9dd95393 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17527&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17527&range=00-01 Stats: 13 lines in 3 files changed: 0 ins; 9 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/17527.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17527/head:pull/17527 PR: https://git.openjdk.org/jdk/pull/17527 From qamai at openjdk.org Tue Jan 23 09:30:26 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 23 Jan 2024 09:30:26 GMT Subject: RFR: 8324433: Introduce a way to determine if an expression is evaluated as a constant by the Jit compiler In-Reply-To: References: Message-ID: On Tue, 23 Jan 2024 08:16:07 GMT, Aleksey Shipilev wrote: >> Hi, >> >> This patch introduces `JitCompiler::isConstantExpression` which can be used to statically determine whether an expression has been constant-folded by the Jit compiler, leading to more constant-folding opportunities. For example, it can be used in `MemorySessionImpl::checkValidStateRaw` to eliminate the lifetime check on global sessions without imposing additional branches on other non-global sessions. This is inspired by `std::is_constant_evaluated` in C++. >> >> Please kindly give your opinion as well as your reviews, thanks very much. > > Nice. I had a similar thing stashed in my todo queue. Note that there is already `isCompileConstant` that does similar thing: https://github.com/openjdk/jdk/blob/5a74c2a67ebcb47e51732f03c4be694bdf920469/src/hotspot/share/opto/library_call.cpp#L8189-L8193 -- maybe we should just expose that more widely. I would suggest we just do the private `java.lang.{Integer,...}.isCompileConstant` methods and bind them to that intrinsic. @shipilev Thanks a lot for your suggestions. Yes I can just use `inline_isCompileConstant` instead. Regarding the place of the method, I'm not really sure as putting in `java.lang.Long` seems out-of-place for an internal mechanism that is obviously not only used in `java.lang`, which will force a new entry in `JavaLangAccess`. Finally, I think accepting a `long` would be enough (maybe `double`, too?) since `int`, `boolean` etc can be converted losslessly to `long`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17527#issuecomment-1905641141 From qamai at openjdk.org Tue Jan 23 09:34:27 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 23 Jan 2024 09:34:27 GMT Subject: RFR: 8324433: Introduce a way to determine if an expression is evaluated as a constant by the Jit compiler In-Reply-To: References: Message-ID: On Tue, 23 Jan 2024 08:16:07 GMT, Aleksey Shipilev wrote: > I would suggest we just do the private `java.lang.{Integer,...}.isCompileConstant` methods and bind them to that intrinsic. Maybe I am ignorant but doesn't the definition of an intrinsics contain the signature of the method as well? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17527#issuecomment-1905648723 From shade at openjdk.org Tue Jan 23 09:38:24 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 23 Jan 2024 09:38:24 GMT Subject: RFR: 8324433: Introduce a way to determine if an expression is evaluated as a constant by the Jit compiler In-Reply-To: References: Message-ID: On Tue, 23 Jan 2024 08:16:07 GMT, Aleksey Shipilev wrote: >> Hi, >> >> This patch introduces `JitCompiler::isConstantExpression` which can be used to statically determine whether an expression has been constant-folded by the Jit compiler, leading to more constant-folding opportunities. For example, it can be used in `MemorySessionImpl::checkValidStateRaw` to eliminate the lifetime check on global sessions without imposing additional branches on other non-global sessions. This is inspired by `std::is_constant_evaluated` in C++. >> >> Please kindly give your opinion as well as your reviews, thanks very much. > > Nice. I had a similar thing stashed in my todo queue. Note that there is already `isCompileConstant` that does similar thing: https://github.com/openjdk/jdk/blob/5a74c2a67ebcb47e51732f03c4be694bdf920469/src/hotspot/share/opto/library_call.cpp#L8189-L8193 -- maybe we should just expose that more widely. I would suggest we just do the private `java.lang.{Integer,...}.isCompileConstant` methods and bind them to that intrinsic. > @shipilev Thanks a lot for your suggestions. Yes I can just use `inline_isCompileConstant` instead. > > Regarding the place of the method, I'm not really sure as putting in `java.lang.Long` seems out-of-place for an internal mechanism that is obviously not only used in `java.lang`, which will force a new entry in `JavaLangAccess`. Ah yes, if you need to use it across module boundaries, putting the private/protected method would require `JavaLangAccess`, which is burdensome. I am just icky about introducing a whole new internal class for this. Is there anything in current `jdk.internal.vm.*` that fits it? Maybe `misc.Unsafe` or `misc.VM`? > Finally, I think accepting a `long` would be enough (maybe `double`, too?) since `int`, `boolean` etc can be converted losslessly to `long`. Right, that would work for primitives, since we could probably rely on conversion for constants to be folded. But I also see the value for asking `isCompileConstant(Object)`, which is not easily convertible. So I would just do the overloads for all primitives and `Object`. The C2 intrinsic would not care about the `arg(0)` type, it would reply `isCon` on those constants just the same. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17527#issuecomment-1905655407 From shade at openjdk.org Tue Jan 23 09:45:27 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 23 Jan 2024 09:45:27 GMT Subject: RFR: 8324433: Introduce a way to determine if an expression is evaluated as a constant by the Jit compiler In-Reply-To: References: Message-ID: <9xpFtQX-wVtXnENNNiQZZFJqa7cy-n7_yS6uU3UjsQ8=.55c6a9b6-0e89-45f7-bfb9-11b13f9ef605@github.com> On Tue, 23 Jan 2024 09:31:51 GMT, Quan Anh Mai wrote: > Maybe I am ignorant but doesn't the definition of an intrinsics contain the signature of the method as well? The definitions in `vmIntrinsics`, sure, they require full signature for `@IntrinsicCandidate` methods. It would yield some unfortunate duplication. But after that, we can map on the same `inline_isCompileConstant` intrinsic that just asks `arg(0)->is_Con()`, and it would not care about the type of the constant. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17527#issuecomment-1905667006 From aph at openjdk.org Tue Jan 23 09:46:28 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 23 Jan 2024 09:46:28 GMT Subject: RFR: JDK-8210858: AArch64: remove Math.log intrinsic [v2] In-Reply-To: References: <7Zgt8VOxl48Niocueb70XwxmYIAIaEZUlxKGvPP6BnU=.e354f71c-d6c6-40b9-a93e-98ef7ba8009d@github.com> Message-ID: <6UYDWc1V3v6tHKxHyYJkxHAsvlbNMuseFiry8KI_kYc=.cb95a0ab-d27e-404b-96af-85fe8178d6c0@github.com> On Mon, 22 Jan 2024 12:33:53 GMT, Tobias Holenstein wrote: > > > Separately, I do wonder if [JDK-8301202](https://bugs.openjdk.org/browse/JDK-8301202) gives us a reason to avoid even calling to runtime, and instead just stay in Java completely. > > > > > > Yes, would be nice to stay in Java. We had a similar discussion in #13606 where I benchmarked the intrinsics on different platforms. But, they still gave up to "up to" isn't very interesting here. I think we care more about averages. 40% performance improvement vs. staying in java. I don't think much has changed performance wise since then.. > > @jddarcy any thoughts on that? > > Perhaps I should re-run the benchmarks from May 2023.. That would be good. Significant performance deficits on purely numeric code demand explanation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17480#issuecomment-1905668165 From aph at openjdk.org Tue Jan 23 09:52:29 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 23 Jan 2024 09:52:29 GMT Subject: RFR: 8324123: aarch64: fix prfm literal encoding in assembler [v3] In-Reply-To: References: Message-ID: On Tue, 23 Jan 2024 03:33:39 GMT, Wang Zhuo wrote: >> Current prfm literal mode encoding in aarch64 assembler is not correct. >> The prfm_literal instruction requires 31 and 30 bits to be 0x11, while current assembler encodes the two bits to be 0x11, which is a ldr instruction, not prfm. >> For example, if adding the following code in stubGenerator >> __ prfm(Address(__ pc())) >> we get a ldr instruction like >> ldr x0, 0x0000ffff83f8539c >> but it should be a prfm instruction like >> prfm pldl1keep, 0x0000ffff8ff8539c >> >> The bug is caused in ld_st2, literal mode, bit 31 and 30 bits are set to (size & 0b01), while for prfm instructions, 31 and 30 bits must be 0b11. >> void ld_st2(Register Rt, const Address &adr, int size, int op, int V = 0) { >> starti; >> >> f(V, 26); // general reg? >> zrf(Rt, 0); >> >> // Encoding for literal loads is done here (rather than pushed >> // down into Address::encode) because the encoding of this >> // instruction is too different from all of the other forms to >> // make it worth sharing. >> if (adr.getMode() == Address::literal) { >> assert(size == 0b10 || size == 0b11, "bad operand size in ldr"); >> assert(op == 0b01, "literal form can only be used with loads"); >> f(**size & 0b01, 31, 30**), f(0b011, 29, 27), f(0b00, 25, 24); >> int64_t offset = (adr.target() - pc()) >> 2; >> sf(offset, 23, 5); >> code_section()->relocate(pc(), adr.rspec()); >> return; >> } >> >> f(size, 31, 30); >> f(op, 23, 22); // str >> adr.encode(¤t_insn); >> } > > Wang Zhuo has updated the pull request incrementally with one additional commit since the last revision: > > adding some comments src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 1584: > 1582: INSN(prfm, 0b11, 0b10); // FIXME: PRFM should not be used with > 1583: // writeback modes, but the assembler > 1584: // doesn't enfore that. Don't we still need this comment? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17482#discussion_r1463002983 From wzhuo at openjdk.org Tue Jan 23 10:03:38 2024 From: wzhuo at openjdk.org (Wang Zhuo) Date: Tue, 23 Jan 2024 10:03:38 GMT Subject: RFR: 8324123: aarch64: fix prfm literal encoding in assembler [v4] In-Reply-To: References: Message-ID: > Current prfm literal mode encoding in aarch64 assembler is not correct. > The prfm_literal instruction requires 31 and 30 bits to be 0x11, while current assembler encodes the two bits to be 0x11, which is a ldr instruction, not prfm. > For example, if adding the following code in stubGenerator > __ prfm(Address(__ pc())) > we get a ldr instruction like > ldr x0, 0x0000ffff83f8539c > but it should be a prfm instruction like > prfm pldl1keep, 0x0000ffff8ff8539c > > The bug is caused in ld_st2, literal mode, bit 31 and 30 bits are set to (size & 0b01), while for prfm instructions, 31 and 30 bits must be 0b11. > void ld_st2(Register Rt, const Address &adr, int size, int op, int V = 0) { > starti; > > f(V, 26); // general reg? > zrf(Rt, 0); > > // Encoding for literal loads is done here (rather than pushed > // down into Address::encode) because the encoding of this > // instruction is too different from all of the other forms to > // make it worth sharing. > if (adr.getMode() == Address::literal) { > assert(size == 0b10 || size == 0b11, "bad operand size in ldr"); > assert(op == 0b01, "literal form can only be used with loads"); > f(**size & 0b01, 31, 30**), f(0b011, 29, 27), f(0b00, 25, 24); > int64_t offset = (adr.target() - pc()) >> 2; > sf(offset, 23, 5); > code_section()->relocate(pc(), adr.rspec()); > return; > } > > f(size, 31, 30); > f(op, 23, 22); // str > adr.encode(¤t_insn); > } Wang Zhuo has updated the pull request incrementally with one additional commit since the last revision: get some comments back ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17482/files - new: https://git.openjdk.org/jdk/pull/17482/files/9c0f06ef..7f59b473 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17482&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17482&range=02-03 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17482.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17482/head:pull/17482 PR: https://git.openjdk.org/jdk/pull/17482 From shade at openjdk.org Tue Jan 23 10:11:30 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 23 Jan 2024 10:11:30 GMT Subject: RFR: 8324241: Always record evol_method deps to avoid excessive method flushing [v2] In-Reply-To: References: Message-ID: <9yE5_ZZtvuN0dMMV2HSztNLu4pbSSl6eVWeQHXf5Iqc=.20f2084e-d296-4994-95b0-e4e4ed528449@github.com> On Mon, 22 Jan 2024 21:26:41 GMT, Volker Simonis wrote: >> Currently we don't record dependencies on redefined methods (i.e. `evol_method` dependencies) in JIT compiled methods if none of the `can_redefine_classes`, `can_retransform_classes` or `can_generate_breakpoint_events` JVMTI capabalities is set. This means that if a JVMTI agent which requests one of these capabilities is dynamically attached, all the methods which have been JIT compiled until that point, will be marked for deoptimization and flushed from the code cache. For large, warmed-up applications this mean deoptimization and instant recompilation of thousands if not then-thousands of methods, which can lead to dramatic performance/latency drop-downs for several minutes. >> >> One could argue that dynamic agent attach is now deprecated anyway (see [JEP 451: Prepare to Disallow the Dynamic Loading of Agents](https://openjdk.org/jeps/451)) and this problem could be solved by making the recording of `evol_method` dependencies dependent on the new `-XX:+EnableDynamicAgentLoading` flag isntead of the concrete JVMTI capabilities (because the presence of the flag indicates that an agent will be loaded eventually). >> >> But there a single, however important exception to this rule and that's JFR. JFR is advertised as low overhead profiler which can be enabled in production at any time. However, when JFR is started dynamically (e.g. through JCMD or JMX) it will silently load a HotSpot internl JVMTI agent which requests the `can_retransform_classes` and retransforms some classes. This will inevitably trigger the deoptimization of all compiled methods as described above. >> >> I'd therefor like to propose to *always* and unconditionally record `evol_method` dependencies in JIT compiled code by exporting the relevant properties right at startup in `init_globals()`: >> ```c++ >> jint init_globals() { >> management_init(); >> JvmtiExport::initialize_oop_storage(); >> +#if INCLUDE_JVMTI >> + JvmtiExport::set_can_hotswap_or_post_breakpoint(true); >> + JvmtiExport::set_all_dependencies_are_recorded(true); >> +#endif >> >> >> My measurements indicate that the overhead of doing so is minimal (around 1% increase of nmethod size) and justifies the benefit. E.g. a Spring Petclinic application started with `-Xbatch -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation` compiles about ~11500 methods (~9000 with C1 and ~2500 with C2) resulting in an aggregated nmethod size of around ~40bm. Additionally recording `evol_method` dependencies only increases this size be about 400kb.... > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Guard the feature with a diagnostic option and update the comments in the code src/hotspot/share/prims/jvmtiRedefineClasses.cpp line 4078: > 4076: void VM_RedefineClasses::flush_dependent_code() { > 4077: assert(SafepointSynchronize::is_at_safepoint(), "sanity check"); > 4078: assert(AlwaysRecordEvolDependencies ? JvmtiExport::all_dependencies_are_recorded() : true, "sanity check"); This is just "assert all dependencies are recorded, unless we specifically requested not to do so", right? assert(JvmtiExport::all_dependencies_are_recorded() || !AlwaysRecordEvolDependencies, "sanity check"); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17509#discussion_r1463026061 From dlunden at openjdk.org Tue Jan 23 10:14:00 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Tue, 23 Jan 2024 10:14:00 GMT Subject: RFR: 8322996: BoxLockNode creation fails with assert(reg < CHUNK_SIZE) failed: sanity [v4] In-Reply-To: References: Message-ID: > This changeset fixes an issue where deeply nested synchronized statements triggered an assert in C2. > > Changes: > - Bail out on compilation when we create a `BoxLockNode` with a slot index that cannot fit in a `RegMask`. This is similar to how we handle the case when we do not have space to represent arguments in [`opto/matcher.cpp`](https://github.com/openjdk/jdk/blob/58b01dce054c50bcb5a28aad4c1b574acaa90f6d/src/hotspot/share/opto/matcher.cpp#L314-L318) > - Generalize `RegMask::can_represent` to take an additional and optional size argument to facilitate reuse. The default size value, 1, corresponds to the previous functionality. Rewrite `can_represent_arg` to directly call `can_represent(reg, SlotsPerVecZ)`. > - Add a regression test. > > Testing: > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/7612890820) > - tier1 to tier5 on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. > - The new regression test in all tier1 to tier10 contexts on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. Daniel Lund?n has updated the pull request incrementally with two additional commits since the last revision: - Update missed copyright - Refactor ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17370/files - new: https://git.openjdk.org/jdk/pull/17370/files/9ab6e561..bf87138f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17370&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17370&range=02-03 Stats: 17 lines in 3 files changed: 1 ins; 8 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/17370.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17370/head:pull/17370 PR: https://git.openjdk.org/jdk/pull/17370 From dlunden at openjdk.org Tue Jan 23 10:17:30 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Tue, 23 Jan 2024 10:17:30 GMT Subject: RFR: 8322996: BoxLockNode creation fails with assert(reg < CHUNK_SIZE) failed: sanity [v4] In-Reply-To: References: Message-ID: On Thu, 11 Jan 2024 12:41:18 GMT, Roberto Casta?eda Lozano wrote: >> Daniel Lund?n has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update missed copyright >> - Refactor > > Changes requested by rcastanedalo (Reviewer). @robcasloz @vnkozlov: I have now made the changes we've discussed. Please have a look when you have some time to spare. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17370#issuecomment-1905723363 From wzhuo at openjdk.org Tue Jan 23 10:23:30 2024 From: wzhuo at openjdk.org (Wang Zhuo) Date: Tue, 23 Jan 2024 10:23:30 GMT Subject: RFR: 8324123: aarch64: fix prfm literal encoding in assembler [v4] In-Reply-To: References: Message-ID: On Tue, 23 Jan 2024 09:49:47 GMT, Andrew Haley wrote: >> Wang Zhuo has updated the pull request incrementally with one additional commit since the last revision: >> >> get some comments back > > src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 1584: > >> 1582: INSN(prfm, 0b11, 0b10); // FIXME: PRFM should not be used with >> 1583: // writeback modes, but the assembler >> 1584: // doesn't enfore that. > > Don't we still need this comment? Got the comments back. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17482#discussion_r1463043546 From qamai at openjdk.org Tue Jan 23 11:18:43 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 23 Jan 2024 11:18:43 GMT Subject: RFR: 8324433: Introduce a way to determine if an expression is evaluated as a constant by the Jit compiler [v3] In-Reply-To: References: Message-ID: > Hi, > > This patch introduces `JitCompiler::isConstantExpression` which can be used to statically determine whether an expression has been constant-folded by the Jit compiler, leading to more constant-folding opportunities. For example, it can be used in `MemorySessionImpl::checkValidStateRaw` to eliminate the lifetime check on global sessions without imposing additional branches on other non-global sessions. This is inspired by `std::is_constant_evaluated` in C++. > > Please kindly give your opinion as well as your reviews, thanks very much. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: add more overloads ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17527/files - new: https://git.openjdk.org/jdk/pull/17527/files/9dd95393..31403d6f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17527&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17527&range=01-02 Stats: 346 lines in 6 files changed: 333 ins; 1 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/17527.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17527/head:pull/17527 PR: https://git.openjdk.org/jdk/pull/17527 From qamai at openjdk.org Tue Jan 23 11:26:26 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 23 Jan 2024 11:26:26 GMT Subject: RFR: 8324433: Introduce a way to determine if an expression is evaluated as a constant by the Jit compiler [v3] In-Reply-To: References: Message-ID: <3pUibPS3iTDkTve0coFDSCFt61RLryW5Hc7Jve6Cfk8=.9bb8d2f8-59fd-4958-8d04-b8b13f17b7b3@github.com> On Tue, 23 Jan 2024 11:18:43 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch introduces `JitCompiler::isConstantExpression` which can be used to statically determine whether an expression has been constant-folded by the Jit compiler, leading to more constant-folding opportunities. For example, it can be used in `MemorySessionImpl::checkValidStateRaw` to eliminate the lifetime check on global sessions without imposing additional branches on other non-global sessions. This is inspired by `std::is_constant_evaluated` in C++. >> >> Please kindly give your opinion as well as your reviews, thanks very much. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > add more overloads I get your idea. I have added overloads for all types. They will all invoke `inlint_isCompileConstant`. Given that there are now 7 methods I think a separate class is more justified. Another issue is the duplication of `isConstantExpression(Object)`, but I think a separate issue to deduplicate it would be easier. Thanks a lot. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17527#issuecomment-1905836929 From jbhateja at openjdk.org Tue Jan 23 11:56:58 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 23 Jan 2024 11:56:58 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v9] In-Reply-To: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Message-ID: <8oB-M1TUk9aqQIYOGijNmykLCyM1AUTXLTsgy4r8Wk4=.49c90c06-5f2e-47f0-9ac1-ffd6eb438fa4@github.com> > Hi, > > Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only targets. > Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 instruction set. > These are very frequently used APIs in columnar database filter operation. > > Implementation uses a lookup table to record permute indices. Table index is computed using > mask argument of compress/expand operation. > > Following are the performance number of JMH micro included with the patch. > > > System : Intel(R) Xeon(R) Platinum 8480+ (Sapphire Rapids) > > Baseline: > Benchmark (size) Mode Cnt Score Error Units > ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 142.767 ops/ms > ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 71.436 ops/ms > ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 35.992 ops/ms > ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 182.151 ops/ms > ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 91.096 ops/ms > ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 44.757 ops/ms > ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 184.099 ops/ms > ColumnFilterBenchmark.filterIntColumn 2047 thrpt 2 91.981 ops/ms > ColumnFilterBenchmark.filterIntColumn 4096 thrpt 2 45.170 ops/ms > ColumnFilterBenchmark.filterLongColumn 1024 thrpt 2 148.017 ops/ms > ColumnFilterBenchmark.filterLongColumn 2047 thrpt 2 73.516 ops/ms > ColumnFilterBenchmark.filterLongColumn 4096 thrpt 2 36.844 ops/ms > > Withopt: > Benchmark (size) Mode Cnt Score Error Units > ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 2051.707 ops/ms > ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 914.072 ops/ms > ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 489.898 ops/ms > ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 5324.195 ops/ms > ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 2587.229 ops/ms > ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 1278.665 ops/ms > ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 4149.384 ops/ms > ColumnFilterBenchmark.filterIntColumn 2047 thrpt 2 1791.170 ops/ms > ColumnFilterBenchmark.filterIntColumn 4096... Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8322768 - Modifying comments. - Review comments resolution - Modified code comment for clarity. - Space fixup - Using emulated variable blend E-Core optimized instruction. - Review suggestions incorporated. - Review comments resolutions. - Updating copyright year of modified files. - 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17261/files - new: https://git.openjdk.org/jdk/pull/17261/files/cd912308..83e4065e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17261&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17261&range=07-08 Stats: 41105 lines in 1072 files changed: 24738 ins; 11390 del; 4977 mod Patch: https://git.openjdk.org/jdk/pull/17261.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17261/head:pull/17261 PR: https://git.openjdk.org/jdk/pull/17261 From jbhateja at openjdk.org Tue Jan 23 11:56:58 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 23 Jan 2024 11:56:58 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v8] In-Reply-To: References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Message-ID: On Tue, 23 Jan 2024 08:17:13 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolution > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 5301: > >> 5299: vmovmskps(rtmp, mask, vec_enc); >> 5300: } >> 5301: shlq(rtmp, 5); // for 32 byte permute row of 8 x 32 bits. > > Suggestion: > > shlq(rtmp, 5); // for 32 byte permute row of 8 x 32 bits / 4 x 64 bits. > > Since you now merged the code of the two paths As per the latest patch, we are doing a double word permute, hence semantically its ok and in accordance with instruction sequence :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1463160336 From aph at openjdk.org Tue Jan 23 11:57:32 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 23 Jan 2024 11:57:32 GMT Subject: RFR: 8324123: aarch64: fix prfm literal encoding in assembler [v4] In-Reply-To: References: Message-ID: On Tue, 23 Jan 2024 10:03:38 GMT, Wang Zhuo wrote: >> Current prfm literal mode encoding in aarch64 assembler is not correct. >> The prfm_literal instruction requires 31 and 30 bits to be 0x11, while current assembler encodes the two bits to be 0x11, which is a ldr instruction, not prfm. >> For example, if adding the following code in stubGenerator >> __ prfm(Address(__ pc())) >> we get a ldr instruction like >> ldr x0, 0x0000ffff83f8539c >> but it should be a prfm instruction like >> prfm pldl1keep, 0x0000ffff8ff8539c >> >> The bug is caused in ld_st2, literal mode, bit 31 and 30 bits are set to (size & 0b01), while for prfm instructions, 31 and 30 bits must be 0b11. >> void ld_st2(Register Rt, const Address &adr, int size, int op, int V = 0) { >> starti; >> >> f(V, 26); // general reg? >> zrf(Rt, 0); >> >> // Encoding for literal loads is done here (rather than pushed >> // down into Address::encode) because the encoding of this >> // instruction is too different from all of the other forms to >> // make it worth sharing. >> if (adr.getMode() == Address::literal) { >> assert(size == 0b10 || size == 0b11, "bad operand size in ldr"); >> assert(op == 0b01, "literal form can only be used with loads"); >> f(**size & 0b01, 31, 30**), f(0b011, 29, 27), f(0b00, 25, 24); >> int64_t offset = (adr.target() - pc()) >> 2; >> sf(offset, 23, 5); >> code_section()->relocate(pc(), adr.rspec()); >> return; >> } >> >> f(size, 31, 30); >> f(op, 23, 22); // str >> adr.encode(¤t_insn); >> } > > Wang Zhuo has updated the pull request incrementally with one additional commit since the last revision: > > get some comments back Good. src/hotspot/cpu/aarch64/assembler_aarch64.cpp line 193: > 191: // by literal ld/st. see JDK-8324123. > 192: // FIXME: PRFM should not be used with writeback modes, but the assembler > 193: // doesn't enfore that. Suggestion: // doesn't enforce that. ------------- Marked as reviewed by aph (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17482#pullrequestreview-1838577155 PR Review Comment: https://git.openjdk.org/jdk/pull/17482#discussion_r1463160481 From mli at openjdk.org Tue Jan 23 12:05:35 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 23 Jan 2024 12:05:35 GMT Subject: RFR: JDK-8318228: RISC-V: C2 ConvF2HF [v2] In-Reply-To: References: Message-ID: On Tue, 23 Jan 2024 08:34:39 GMT, Vladimir Kempik wrote: >> Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: >> >> replace fclass with feq as performance optimization. > > Marked as reviewed by vkempik (Committer). @VladimirKempik @RealFYang Thanks for your review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17450#issuecomment-1905890879 From mli at openjdk.org Tue Jan 23 12:05:37 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 23 Jan 2024 12:05:37 GMT Subject: Integrated: JDK-8318228: RISC-V: C2 ConvF2HF In-Reply-To: References: Message-ID: On Tue, 16 Jan 2024 18:43:03 GMT, Hamlin Li wrote: > Hi, > Can you review the patch to add ConvF2HF intrinsic to JDK for riscv? > Thanks! > > ## Test > ### Functionality > #### hotspot tests > test/hotspot/jtreg/compiler/intrinsics/ > test/hotspot/jtreg/compiler/c2/irTests > > #### jdk tests > test/jdk/java/lang/Float/Binary16Conversion*.java > > ### Performance > tested on licheepi. > > #### with UseZfh enabled > > Benchmark (size) Mode Cnt Score Error Units > Fp16ConversionBenchmark.floatToFloat16 2048 avgt 2 4170.549 ns/op > Fp16ConversionBenchmark.floatToFloat16Memory 2048 avgt 2 21.492 ns/op > > > #### with UseZfh disabled > (i.e. disable the intrinsic) > > Benchmark (size) Mode Cnt Score Error Units > Fp16ConversionBenchmark.floatToFloat16 2048 avgt 2 25036.647 ns/op > Fp16ConversionBenchmark.floatToFloat16Memory 2048 avgt 2 27.326 ns/op This pull request has now been integrated. Changeset: bcaad515 Author: Hamlin Li URL: https://git.openjdk.org/jdk/commit/bcaad515fdedd0c41a719d2a88b2da3036c766a3 Stats: 57 lines in 3 files changed: 57 ins; 0 del; 0 mod 8318228: RISC-V: C2 ConvF2HF Reviewed-by: fyang, vkempik ------------- PR: https://git.openjdk.org/jdk/pull/17450 From rgiulietti at openjdk.org Tue Jan 23 13:33:51 2024 From: rgiulietti at openjdk.org (Raffaello Giulietti) Date: Tue, 23 Jan 2024 13:33:51 GMT Subject: RFR: 8282365: Consolidate and improve division by constant idealizations [v44] In-Reply-To: References: Message-ID: On Mon, 22 Jan 2024 11:44:50 GMT, Raffaello Giulietti wrote: > such a case, the smallest c is b and the smallest s is e, and a result meeting s >= min_s doesn't need any iterative algorithm. I must correct this claim, my bad. A result meeting `s >= min_s` might still need an iterative algorithm if `e < min_s` and if `c` must be minimal. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1463286075 From qamai at openjdk.org Tue Jan 23 14:56:52 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 23 Jan 2024 14:56:52 GMT Subject: RFR: 8282365: Consolidate and improve division by constant idealizations [v44] In-Reply-To: References: Message-ID: On Tue, 23 Jan 2024 13:30:46 GMT, Raffaello Giulietti wrote: >> The convoluted test to break out of the while loop can be simplified if we are willing to consider two cases. >> >> Equality in `c / m <= (1 / d) * ((v + 1) / v)` is equivalent to `c / m = b / v`, where `b = (v + 1) / d` is an integer. Note that the fraction `b / v` is irreducible: a common divisor of `b` an `v` has to divide any linear combination as well, in particular it has to divide `d * b - 1 * v = 1`. Thus, `c >= b` and `m >= v`. Since `m = 2^s`, `v` must be a power of 2. >> >> This means that, when `m` is a power of 2, the only way to have equality is `v_neg > v_pos` _and_ `v_neg` is a power of 2 (say `v_neg = 2^e`), which should be a rare case. This can be detected cheaply early, before the loop. In such a case, the smallest `c` is `b` and the smallest `s` is `e`, and a result meeting `s >= min_s` doesn't need any iterative algorithm. >> >> Otherwise equality does not hold. Now, `c / m < (1 / d) * ((v + 1) / v)` is equivalent to `m > v * rc`. In turn, `m > v * rc` <=> `m / v > rc` <=> `ceil(m / v) > rc`. >> >> Thus, rather than maintaining invariants for `qv = floor(m / v)` and `rv = m - qv * v` as currently defined, we can redefine them as `qv = ceil(m / v)` and `rv = qv * v - m` (`0 <= rv < v`) and maintain _these_ invariants instead. >> >> T qv = 1; >> T rv = v - 1; >> ... >> if (rv >= v - rv) { // 2 * rv >= v >> qv_ovf = qv > min_signed; >> qv = qv * 2 - 1; >> rv = rv * 2 - v; >> } else { // 2 * rv < v >> qv_ovf = qv >= min_signed; >> qv = qv * 2; >> rv = rv * 2; >> } >> >> The test to exit the loop then reduces to `qv > rc || qv_ovf`. > >> such a case, the smallest c is b and the smallest s is e, and a result meeting s >= min_s doesn't need any iterative algorithm. > > I must correct this claim, my bad. > A result meeting `s >= min_s` might still need an iterative algorithm if `e < min_s` and if `c` must be minimal. That is an excellent analysis. To add to the analysis, we do not really need the minimal value of `c`, since 2 values of `c` that both satisfy the inequations must give the same upper bits for all input values. As a result, for the purpose of the algorithm, they are equivalent. My concern is that it will complicate the analysis, which is complicated enough, for a minor improvement in the exit conditions. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1463403251 From rgiulietti at openjdk.org Tue Jan 23 15:05:51 2024 From: rgiulietti at openjdk.org (Raffaello Giulietti) Date: Tue, 23 Jan 2024 15:05:51 GMT Subject: RFR: 8282365: Consolidate and improve division by constant idealizations [v44] In-Reply-To: References: Message-ID: On Tue, 23 Jan 2024 14:54:08 GMT, Quan Anh Mai wrote: >>> such a case, the smallest c is b and the smallest s is e, and a result meeting s >= min_s doesn't need any iterative algorithm. >> >> I must correct this claim, my bad. >> A result meeting `s >= min_s` might still need an iterative algorithm if `e < min_s` and if `c` must be minimal. > > That is an excellent analysis. To add to the analysis, we do not really need the minimal value of `c`, since 2 values of `c` that both satisfy the inequations must give the same upper bits for all input values. As a result, for the purpose of the algorithm, they are equivalent. > > My concern is that it will complicate the analysis, which is complicated enough, for a minor improvement in the exit conditions. I have no idea about the timing difference with the current exit condition or with the simplified one. It might indeed be negligible. Anyway, there's a choice now ;-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1463415760 From epeter at openjdk.org Tue Jan 23 15:18:29 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 23 Jan 2024 15:18:29 GMT Subject: RFR: 8323220: Reassociate loop invariants involved in Cmps and Add/Subs [v3] In-Reply-To: References: Message-ID: On Mon, 22 Jan 2024 21:50:11 GMT, Joshua Cao wrote: >> test/hotspot/jtreg/compiler/loopopts/InvariantCodeMotionReassociateCmp.java line 163: >> >>> 161: } >>> 162: } >>> 163: } >> >> Why could you not have random arguments here? > > [notEqualsInvariantSubVariantLong.txt](https://github.com/openjdk/jdk/files/14015703/notEqualsInvariantSubVariantLong.txt) > > Attaching the IdealGraphVisualizer file (renamed to `.txt` cause GitHub wants that). > > When we use `Argument.NUMBER_42`, we can avoid generating traps. When `i == 0` we enter the if statement, otherwise we exit. > > If we use random numbers, we most likely never enter the if statement so we can generate a trap there. In this case, the trap block as an extra `SubL` for `inv1 - i`. @caojoshua I see. That is a sad limitation. You could now add a `@Run` statement, but that is overkill as well. I hope to improve the IR framework to make it a bit easier to pass better arguments soon. For now I think we could just leave it with constants, for simplicity. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17375#discussion_r1463435275 From epeter at openjdk.org Tue Jan 23 15:23:32 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 23 Jan 2024 15:23:32 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v9] In-Reply-To: <8oB-M1TUk9aqQIYOGijNmykLCyM1AUTXLTsgy4r8Wk4=.49c90c06-5f2e-47f0-9ac1-ffd6eb438fa4@github.com> References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> <8oB-M1TUk9aqQIYOGijNmykLCyM1AUTXLTsgy4r8Wk4=.49c90c06-5f2e-47f0-9ac1-ffd6eb438fa4@github.com> Message-ID: On Tue, 23 Jan 2024 11:56:58 GMT, Jatin Bhateja wrote: >> Hi, >> >> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only targets. >> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 instruction set. >> These are very frequently used APIs in columnar database filter operation. >> >> Implementation uses a lookup table to record permute indices. Table index is computed using >> mask argument of compress/expand operation. >> >> Following are the performance number of JMH micro included with the patch. >> >> >> System : Intel(R) Xeon(R) Platinum 8480+ (Sapphire Rapids) >> >> Baseline: >> Benchmark (size) Mode Cnt Score Error Units >> ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 142.767 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 71.436 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 35.992 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 182.151 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 91.096 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 44.757 ops/ms >> ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 184.099 ops/ms >> ColumnFilterBenchmark.filterIntColumn 2047 thrpt 2 91.981 ops/ms >> ColumnFilterBenchmark.filterIntColumn 4096 thrpt 2 45.170 ops/ms >> ColumnFilterBenchmark.filterLongColumn 1024 thrpt 2 148.017 ops/ms >> ColumnFilterBenchmark.filterLongColumn 2047 thrpt 2 73.516 ops/ms >> ColumnFilterBenchmark.filterLongColumn 4096 thrpt 2 36.844 ops/ms >> >> Withopt: >> Benchmark (size) Mode Cnt Score Error Units >> ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 2051.707 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 914.072 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 489.898 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 5324.195 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 2587.229 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 1278.665 ops/ms >> ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 4149.384 ops/ms >> ColumnFilterBenchmark.filterIntColumn 2047 thrpt ... > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: > > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8322768 > - Modifying comments. > - Review comments resolution > - Modified code comment for clarity. > - Space fixup > - Using emulated variable blend E-Core optimized instruction. > - Review suggestions incorporated. > - Review comments resolutions. > - Updating copyright year of modified files. > - 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. Ok, I'll just run the testing again, and then I will approve this :) ------------- PR Review: https://git.openjdk.org/jdk/pull/17261#pullrequestreview-1839057311 From kbarrett at openjdk.org Tue Jan 23 15:42:27 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 23 Jan 2024 15:42:27 GMT Subject: RFR: 8324433: Introduce a way to determine if an expression is evaluated as a constant by the Jit compiler [v3] In-Reply-To: References: Message-ID: On Tue, 23 Jan 2024 11:18:43 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch introduces `JitCompiler::isConstantExpression` which can be used to statically determine whether an expression has been constant-folded by the Jit compiler, leading to more constant-folding opportunities. For example, it can be used in `MemorySessionImpl::checkValidStateRaw` to eliminate the lifetime check on global sessions without imposing additional branches on other non-global sessions. This is inspired by `std::is_constant_evaluated` in C++. >> >> Please kindly give your opinion as well as your reviews, thanks very much. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > add more overloads Not a review, just a drive-by comment. >From the description for this PR: "This is inspired by std::is_constant_evaluated in C++." I think what is being proposed here is more like gcc's `__builtin_constant_p`. std::is_constant_evaluated is a different thing, used to detect evaluation in a manifestly constexpr context. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17527#issuecomment-1906337427 From shade at openjdk.org Tue Jan 23 15:51:32 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 23 Jan 2024 15:51:32 GMT Subject: RFR: 8324433: Introduce a way to determine if an expression is evaluated as a constant by the Jit compiler [v3] In-Reply-To: References: Message-ID: On Tue, 23 Jan 2024 11:18:43 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch introduces `JitCompiler::isConstantExpression` which can be used to statically determine whether an expression has been constant-folded by the Jit compiler, leading to more constant-folding opportunities. For example, it can be used in `MemorySessionImpl::checkValidStateRaw` to eliminate the lifetime check on global sessions without imposing additional branches on other non-global sessions. This is inspired by `std::is_constant_evaluated` in C++. >> >> Please kindly give your opinion as well as your reviews, thanks very much. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > add more overloads All right, this is very close :) I now have stylistic comments: src/hotspot/share/classfile/vmIntrinsics.hpp line 912: > 910: do_intrinsic(_getAndSetInt, jdk_internal_misc_Unsafe, getAndSetInt_name, getAndSetInt_signature, F_R) \ > 911: do_name( getAndSetInt_name, "getAndSetInt") \ > 912: do_alias( getAndSetInt_signature, /*"(Ljava/lang/Object;JI)I"*/ getAndAddInt_signature) \ I don't think we need to do these formatting changes in this PR. src/hotspot/share/classfile/vmIntrinsics.hpp line 927: > 925: \ > 926: do_class(jdk_internal_misc_JitCompiler, "jdk/internal/misc/JitCompiler") \ > 927: do_intrinsic(_isConstantExpressionZ, jdk_internal_misc_JitCompiler,isConstantExpression_name, bool_bool_signature, F_S) \ It would be cleaner to follow the current naming for existing intrinsic: do_intrinsic(_isCompileConstant, java_lang_invoke_MethodHandleImpl, isCompileConstant_name, isCompileConstant_signature, F_S) \ do_name( isCompileConstant_name, "isCompileConstant") \ do_alias( isCompileConstant_signature, object_boolean_signature) \ I.e. rename `isConstantExpression` -> `isCompileConstant`. It clearly communicates that we are not dealing with expressions as arguments, and that we underline this is the (JIT) _compile_ constant, not just a constant expression from JLS 15.28 "Constant Expressions". Maybe even replace that `MHImpl` method with the new intrinsic. src/hotspot/share/opto/c2compiler.cpp line 727: > 725: case vmIntrinsics::_storeStoreFence: > 726: case vmIntrinsics::_fullFence: > 727: case vmIntrinsics::_isConstantExpressionZ: Move this closer to `vmIntrinsics::_isCompileConstant:`, if not outright replace it? src/hotspot/share/opto/library_call.hpp line 2: > 1: /* > 2: * Copyright (c) 2020, 2024, Oracle and/or its affiliates. All rights reserved. Unnecessary update? ------------- PR Review: https://git.openjdk.org/jdk/pull/17527#pullrequestreview-1839148507 PR Review Comment: https://git.openjdk.org/jdk/pull/17527#discussion_r1463490470 PR Review Comment: https://git.openjdk.org/jdk/pull/17527#discussion_r1463493124 PR Review Comment: https://git.openjdk.org/jdk/pull/17527#discussion_r1463497227 PR Review Comment: https://git.openjdk.org/jdk/pull/17527#discussion_r1463497518 From alanb at openjdk.org Tue Jan 23 15:55:29 2024 From: alanb at openjdk.org (Alan Bateman) Date: Tue, 23 Jan 2024 15:55:29 GMT Subject: RFR: 8324433: Introduce a way to determine if an expression is evaluated as a constant by the Jit compiler [v3] In-Reply-To: References: Message-ID: <02_Q7SYNI7MYYOeNsq1xGPsOY502JbeXfJyvUGZTtZg=.8a6dcf5c-4dfd-4688-97c1-95497b637cd3@github.com> On Tue, 23 Jan 2024 11:18:43 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch introduces `JitCompiler::isConstantExpression` which can be used to statically determine whether an expression has been constant-folded by the Jit compiler, leading to more constant-folding opportunities. For example, it can be used in `MemorySessionImpl::checkValidStateRaw` to eliminate the lifetime check on global sessions without imposing additional branches on other non-global sessions. This is inspired by `std::is_constant_evaluated` in C++. >> >> Please kindly give your opinion as well as your reviews, thanks very much. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > add more overloads Would it be possible to list further examples where this might be used? Asking because I'm wondering about the usability and maintainability of if-then-else code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17527#issuecomment-1906362127 From shade at openjdk.org Tue Jan 23 16:03:29 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 23 Jan 2024 16:03:29 GMT Subject: RFR: 8324433: Introduce a way to determine if an expression is evaluated as a constant by the Jit compiler [v3] In-Reply-To: <02_Q7SYNI7MYYOeNsq1xGPsOY502JbeXfJyvUGZTtZg=.8a6dcf5c-4dfd-4688-97c1-95497b637cd3@github.com> References: <02_Q7SYNI7MYYOeNsq1xGPsOY502JbeXfJyvUGZTtZg=.8a6dcf5c-4dfd-4688-97c1-95497b637cd3@github.com> Message-ID: On Tue, 23 Jan 2024 15:52:29 GMT, Alan Bateman wrote: > Would it be possible to list further examples where this might be used? Asking because I'm wondering about the usability and maintainability of if-then-else code. A similar thing is already used in JDK: https://github.com/openjdk/jdk/blob/2a01c798d346656a0ee3553c0964feab75b5dfb6/src/java.base/share/classes/java/lang/invoke/Invokers.java#L622-L624 Extending this for more common use allows doing things like optimizing `Integer.toString(int)`: @Stable static final String[] CONST_STRINGS = {"-1", "0", "1"}; @IntrinsicCandidate public static String toString(int i) { if (isCompileConstant(i) && (i >= -1) && (i <= 1)) { return CONST_STRINGS[i + 1]; } ... ------------- PR Comment: https://git.openjdk.org/jdk/pull/17527#issuecomment-1906379544 From simonis at openjdk.org Tue Jan 23 16:40:29 2024 From: simonis at openjdk.org (Volker Simonis) Date: Tue, 23 Jan 2024 16:40:29 GMT Subject: RFR: 8324241: Always record evol_method deps to avoid excessive method flushing [v2] In-Reply-To: References: Message-ID: On Tue, 23 Jan 2024 03:31:59 GMT, Dean Long wrote: >> Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: >> >> Guard the feature with a diagnostic option and update the comments in the code > > src/hotspot/share/runtime/init.cpp line 121: > >> 119: if (AlwaysRecordEvolDependencies) { >> 120: JvmtiExport::set_can_hotswap_or_post_breakpoint(true); >> 121: JvmtiExport::set_all_dependencies_are_recorded(true); > > I think the check for AlwaysRecordEvolDependencies needs to be moved into set_can_hotswap_or_post_breakpoint and set_all_dependencies_are_recorded, otherwise don't we risk the value being accidentally reset to false when set_can_hotswap_or_post_breakpoint() is called again by JvmtiManageCapabilities::update()? A good question, but after deep digging (it took me quite some time to figure this out myself :) I don't think that `can_hotswap_or_post_breakpoint`/`all_dependencies_are_recorded` can ever be reset. Here's how it works: - there's a global set of `always_capabilities` which is initialized when the first JVMTI environment is created: // capabilities which are always potentially available jvmtiCapabilities JvmtiManageCapabilities::always_capabilities; void JvmtiManageCapabilities::initialize() { _capabilities_lock = new Mutex(Mutex::nosafepoint, "Capabilities_lock"); always_capabilities = init_always_capabilities(); - as the name implies, this set of capabilities contains all generally available capabilities (except for the `onload` and `solo` capabilites, which are maintained in separate sets). - `JvmtiManageCapabilities::update()` only operates on this global sets of capabilites (and *not* on the concrete capabilities of a specific JVMTI environment): void JvmtiManageCapabilities::update() { jvmtiCapabilities avail; // all capabilities either(&always_capabilities, &always_solo_capabilities, &avail); ... // If can_redefine_classes is enabled in the onload phase then we know that the // dependency information recorded by the compiler is complete. if ((avail.can_redefine_classes || avail.can_retransform_classes) && JvmtiEnv::get_phase() == JVMTI_PHASE_ONLOAD) { JvmtiExport::set_all_dependencies_are_recorded(true); } ... JvmtiExport::set_can_hotswap_or_post_breakpoint( avail.can_generate_breakpoint_events || avail.can_redefine_classes || avail.can_retransform_classes); - This means that `JvmtiManageCapabilities::update()` is always conservative regarding its exports to `JvmtiExport` as it always exports *all* potentially available capabilites once a JVMTI environment is created and not only the specific capabilities requested by concrete JVMTI environment. This means that even if we create a JVMTI agent with an empty set of capabilities, `can_hotswap_or_post_breakpoint`/`all_dependencies_are_recorded` will still be set in `JvmtiExport`. - There's no code path (at least I couldn't find one) which takes capabilities away from the `always_capabilities` set. Only if an environment requests `on_load` capabilities, they will be transferred from the global `onload_capabilities` to the `always_capabilities` set: jvmtiError JvmtiManageCapabilities::add_capabilities(const jvmtiCapabilities *current, const jvmtiCapabilities *prohibited, const jvmtiCapabilities *desired, jvmtiCapabilities *result) { ... // onload capabilities that got added are now permanent - so, also remove from onload both(&onload_capabilities, desired, &temp); either(&always_capabilities, &temp, &always_capabilities); exclude(&onload_capabilities, &temp, &onload_capabilities); If you like I could put an assertion into `JvmtiExport::set_can_hotswap_or_post_breakpoint()` which verifies that `can_hotswap_or_post_breakpoint` never gets reset once it was set to `true`. What do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17509#discussion_r1463584725 From qamai at openjdk.org Tue Jan 23 16:51:50 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 23 Jan 2024 16:51:50 GMT Subject: RFR: 8324433: Introduce a way to determine if an expression is evaluated as a constant by the Jit compiler [v4] In-Reply-To: References: Message-ID: > Hi, > > This patch introduces `JitCompiler::isConstantExpression` which can be used to statically determine whether an expression has been constant-folded by the Jit compiler, leading to more constant-folding opportunities. For example, it can be used in `MemorySessionImpl::checkValidStateRaw` to eliminate the lifetime check on global sessions without imposing additional branches on other non-global sessions. This is inspired by `std::is_constant_evaluated` in C++. > > Please kindly give your opinion as well as your reviews, thanks very much. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: address reviews: rename to isCompileConstant, remove duplication, revert unnecessary changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17527/files - new: https://git.openjdk.org/jdk/pull/17527/files/31403d6f..18f7d482 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17527&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17527&range=02-03 Stats: 92 lines in 8 files changed: 10 ins; 13 del; 69 mod Patch: https://git.openjdk.org/jdk/pull/17527.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17527/head:pull/17527 PR: https://git.openjdk.org/jdk/pull/17527 From simonis at openjdk.org Tue Jan 23 16:52:31 2024 From: simonis at openjdk.org (Volker Simonis) Date: Tue, 23 Jan 2024 16:52:31 GMT Subject: RFR: 8324241: Always record evol_method deps to avoid excessive method flushing [v2] In-Reply-To: <9yE5_ZZtvuN0dMMV2HSztNLu4pbSSl6eVWeQHXf5Iqc=.20f2084e-d296-4994-95b0-e4e4ed528449@github.com> References: <9yE5_ZZtvuN0dMMV2HSztNLu4pbSSl6eVWeQHXf5Iqc=.20f2084e-d296-4994-95b0-e4e4ed528449@github.com> Message-ID: On Tue, 23 Jan 2024 10:06:34 GMT, Aleksey Shipilev wrote: >> Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: >> >> Guard the feature with a diagnostic option and update the comments in the code > > src/hotspot/share/prims/jvmtiRedefineClasses.cpp line 4078: > >> 4076: void VM_RedefineClasses::flush_dependent_code() { >> 4077: assert(SafepointSynchronize::is_at_safepoint(), "sanity check"); >> 4078: assert(AlwaysRecordEvolDependencies ? JvmtiExport::all_dependencies_are_recorded() : true, "sanity check"); > > This is just "assert all dependencies are recorded, unless we specifically requested not to do so", right? > > > assert(JvmtiExport::all_dependencies_are_recorded() || !AlwaysRecordEvolDependencies, "sanity check"); Yes that's true (I must confess I had to use a truth table to verify it :) I'll take your version tough, because I thinks it's simpler to understand. Do you have other assertions in mind (also see my answer to Dean above)? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17509#discussion_r1463608499 From qamai at openjdk.org Tue Jan 23 16:56:29 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 23 Jan 2024 16:56:29 GMT Subject: RFR: 8324433: Introduce a way to determine if an expression is evaluated as a constant by the Jit compiler [v3] In-Reply-To: References: Message-ID: On Tue, 23 Jan 2024 15:44:40 GMT, Aleksey Shipilev wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> add more overloads > > src/hotspot/share/classfile/vmIntrinsics.hpp line 927: > >> 925: \ >> 926: do_class(jdk_internal_misc_JitCompiler, "jdk/internal/misc/JitCompiler") \ >> 927: do_intrinsic(_isConstantExpressionZ, jdk_internal_misc_JitCompiler,isConstantExpression_name, bool_bool_signature, F_S) \ > > It would be cleaner to follow the current naming for existing intrinsic: > > > do_intrinsic(_isCompileConstant, java_lang_invoke_MethodHandleImpl, isCompileConstant_name, isCompileConstant_signature, F_S) \ > do_name( isCompileConstant_name, "isCompileConstant") \ > do_alias( isCompileConstant_signature, object_boolean_signature) \ > > > I.e. rename `isConstantExpression` -> `isCompileConstant`. It clearly communicates that we are not dealing with expressions as arguments, and that we underline this is the (JIT) _compile_ constant, not just a constant expression from JLS 15.28 "Constant Expressions". > > Maybe even replace that `MHImpl` method with the new intrinsic. Yes you are right, I have renamed it to `isCompileConstant`. > src/hotspot/share/opto/c2compiler.cpp line 727: > >> 725: case vmIntrinsics::_storeStoreFence: >> 726: case vmIntrinsics::_fullFence: >> 727: case vmIntrinsics::_isConstantExpressionZ: > > Move this closer to `vmIntrinsics::_isCompileConstant:`, if not outright replace it? I have replaced `MHImpl::isCompileConstant` with the new one. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17527#discussion_r1463617016 PR Review Comment: https://git.openjdk.org/jdk/pull/17527#discussion_r1463616039 From epeter at openjdk.org Tue Jan 23 17:14:29 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 23 Jan 2024 17:14:29 GMT Subject: RFR: 8324123: aarch64: fix prfm literal encoding in assembler [v4] In-Reply-To: References: Message-ID: On Tue, 23 Jan 2024 10:03:38 GMT, Wang Zhuo wrote: >> Current prfm literal mode encoding in aarch64 assembler is not correct. >> The prfm_literal instruction requires 31 and 30 bits to be 0x11, while current assembler encodes the two bits to be 0x11, which is a ldr instruction, not prfm. >> For example, if adding the following code in stubGenerator >> __ prfm(Address(__ pc())) >> we get a ldr instruction like >> ldr x0, 0x0000ffff83f8539c >> but it should be a prfm instruction like >> prfm pldl1keep, 0x0000ffff8ff8539c >> >> The bug is caused in ld_st2, literal mode, bit 31 and 30 bits are set to (size & 0b01), while for prfm instructions, 31 and 30 bits must be 0b11. >> void ld_st2(Register Rt, const Address &adr, int size, int op, int V = 0) { >> starti; >> >> f(V, 26); // general reg? >> zrf(Rt, 0); >> >> // Encoding for literal loads is done here (rather than pushed >> // down into Address::encode) because the encoding of this >> // instruction is too different from all of the other forms to >> // make it worth sharing. >> if (adr.getMode() == Address::literal) { >> assert(size == 0b10 || size == 0b11, "bad operand size in ldr"); >> assert(op == 0b01, "literal form can only be used with loads"); >> f(**size & 0b01, 31, 30**), f(0b011, 29, 27), f(0b00, 25, 24); >> int64_t offset = (adr.target() - pc()) >> 2; >> sf(offset, 23, 5); >> code_section()->relocate(pc(), adr.rspec()); >> return; >> } >> >> f(size, 31, 30); >> f(op, 23, 22); // str >> adr.encode(¤t_insn); >> } > > Wang Zhuo has updated the pull request incrementally with one additional commit since the last revision: > > get some comments back @sandlerwang could there be a regression test for this bug? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17482#issuecomment-1906538410 From qamai at openjdk.org Tue Jan 23 17:21:47 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 23 Jan 2024 17:21:47 GMT Subject: RFR: 8324433: Introduce a way to determine if an expression is evaluated as a constant by the Jit compiler [v5] In-Reply-To: References: Message-ID: > Hi, > > This patch introduces `JitCompiler::isConstantExpression` which can be used to statically determine whether an expression has been constant-folded by the Jit compiler, leading to more constant-folding opportunities. For example, it can be used in `MemorySessionImpl::checkValidStateRaw` to eliminate the lifetime check on global sessions without imposing additional branches on other non-global sessions. This is similar to `__builtin_constant_p` in GCC and clang. > > Please kindly give your opinion as well as your reviews, thanks very much. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: ident ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17527/files - new: https://git.openjdk.org/jdk/pull/17527/files/18f7d482..3ecb2c66 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17527&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17527&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17527.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17527/head:pull/17527 PR: https://git.openjdk.org/jdk/pull/17527 From qamai at openjdk.org Tue Jan 23 17:21:49 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 23 Jan 2024 17:21:49 GMT Subject: RFR: 8324433: Introduce a way to determine if an expression is evaluated as a constant by the Jit compiler [v3] In-Reply-To: References: <02_Q7SYNI7MYYOeNsq1xGPsOY502JbeXfJyvUGZTtZg=.8a6dcf5c-4dfd-4688-97c1-95497b637cd3@github.com> Message-ID: On Tue, 23 Jan 2024 16:01:05 GMT, Aleksey Shipilev wrote: >> Would it be possible to list further examples where this might be used? Asking because I'm wondering about the usability and maintainability of if-then-else code. > >> Would it be possible to list further examples where this might be used? Asking because I'm wondering about the usability and maintainability of if-then-else code. > > A similar thing is already used in JDK: https://github.com/openjdk/jdk/blob/2a01c798d346656a0ee3553c0964feab75b5dfb6/src/java.base/share/classes/java/lang/invoke/Invokers.java#L622-L624 > > Extending this for more common use allows doing things like optimizing `Integer.toString(int)`: > > > @Stable > static final String[] CONST_STRINGS = {"-1", "0", "1"}; > > @IntrinsicCandidate > public static String toString(int i) { > if (isCompileConstant(i) && (i >= -1) && (i <= 1)) { > return CONST_STRINGS[i + 1]; > } > ... > > > Note how this code would fold away to one of the paths, depending on whether the compiler knows it is a constant or not. Generated-code-wise it is a zero-cost thing :) @shipilev Thanks a lot for the detailed reviews and suggestions, I hope I have addressed all of them. @kimbarrett TIL about that builtin, updated the PR description to mention that instead. Thanks very much. @AlanBateman Another potential usage I mentioned in the JBS issue is that `GlobalSession` is noncloseable, but there is no way for the accessor to know that without doing a checkcast. Using this we can eliminate the check if the session is statically known to be a global session without imposing additional checks on other kinds of memory sessions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17527#issuecomment-1906549618 From shade at openjdk.org Tue Jan 23 17:50:31 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 23 Jan 2024 17:50:31 GMT Subject: RFR: 8324433: Introduce a way to determine if an expression is evaluated as a constant by the Jit compiler [v5] In-Reply-To: References: Message-ID: On Tue, 23 Jan 2024 17:21:47 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch introduces `JitCompiler::isConstantExpression` which can be used to statically determine whether an expression has been constant-folded by the Jit compiler, leading to more constant-folding opportunities. For example, it can be used in `MemorySessionImpl::checkValidStateRaw` to eliminate the lifetime check on global sessions without imposing additional branches on other non-global sessions. This is similar to `__builtin_constant_p` in GCC and clang. >> >> Please kindly give your opinion as well as your reviews, thanks very much. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > ident A few more stylistic comments :) Still thinking the better home for these might be just `jdk.internal.misc.VM`... But I would not insist, if others are happy. src/java.base/share/classes/jdk/internal/misc/JitCompiler.java line 56: > 54: */ > 55: @IntrinsicCandidate > 56: public static boolean isCompileConstant(boolean expr) { Here and in other places: probably not `expr`, but just `val` or something? src/java.base/share/classes/jdk/internal/misc/JitCompiler.java line 119: > 117: * @see #isCompileConstant(boolean) > 118: */ > 119: @IntrinsicCandidate Note how the Java entry for MH intrinsic we have replaced had `@Hidden`. These methods should have `@Hidden` too then? Probably applies to other entries too. ------------- PR Review: https://git.openjdk.org/jdk/pull/17527#pullrequestreview-1839475907 PR Review Comment: https://git.openjdk.org/jdk/pull/17527#discussion_r1463705907 PR Review Comment: https://git.openjdk.org/jdk/pull/17527#discussion_r1463703771 From shade at openjdk.org Tue Jan 23 17:50:33 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 23 Jan 2024 17:50:33 GMT Subject: RFR: 8324433: Introduce a way to determine if an expression is evaluated as a constant by the Jit compiler [v3] In-Reply-To: References: <02_Q7SYNI7MYYOeNsq1xGPsOY502JbeXfJyvUGZTtZg=.8a6dcf5c-4dfd-4688-97c1-95497b637cd3@github.com> Message-ID: On Tue, 23 Jan 2024 16:01:05 GMT, Aleksey Shipilev wrote: >> Would it be possible to list further examples where this might be used? Asking because I'm wondering about the usability and maintainability of if-then-else code. > >> Would it be possible to list further examples where this might be used? Asking because I'm wondering about the usability and maintainability of if-then-else code. > > A similar thing is already used in JDK: https://github.com/openjdk/jdk/blob/2a01c798d346656a0ee3553c0964feab75b5dfb6/src/java.base/share/classes/java/lang/invoke/Invokers.java#L622-L624 > > Extending this for more common use allows doing things like optimizing `Integer.toString(int)`: > > > @Stable > static final String[] CONST_STRINGS = {"-1", "0", "1"}; > > @IntrinsicCandidate > public static String toString(int i) { > if (isCompileConstant(i) && (i >= -1) && (i <= 1)) { > return CONST_STRINGS[i + 1]; > } > ... > > > Note how this code would fold away to one of the paths, depending on whether the compiler knows it is a constant or not. Generated-code-wise it is a zero-cost thing :) > @shipilev Thanks a lot for the detailed reviews and suggestions, I hope I have addressed all of them. Sure thing, I just effectively merged my draft implementation into yours :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/17527#issuecomment-1906602556 From simonis at openjdk.org Tue Jan 23 18:27:30 2024 From: simonis at openjdk.org (Volker Simonis) Date: Tue, 23 Jan 2024 18:27:30 GMT Subject: RFR: 8324241: Always record evol_method deps to avoid excessive method flushing [v2] In-Reply-To: References: <7YLjZUNrhGYSma_Hop1JnZWj75lytQMnXczCP5JSQQc=.7da0a1c2-423f-4144-be20-2aa3707ce332@github.com> Message-ID: On Mon, 22 Jan 2024 23:38:40 GMT, John R Rose wrote: >> src/hotspot/share/runtime/globals.hpp line 2014: >> >>> 2012: \ >>> 2013: product(bool, AlwaysRecordEvolDependencies, true, DIAGNOSTIC, \ >>> 2014: "Unconditionally record method dependencies on class " \ >> >> "... record compiled method dependencies ..."? > > (yes, ?compiled methods? or even ?nmethods?, or ?methods in code cache?) Thanks. I went with "nmethods" because it's the shortest alternative :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17509#discussion_r1463767652 From duke at openjdk.org Tue Jan 23 18:36:27 2024 From: duke at openjdk.org (Joshua Cao) Date: Tue, 23 Jan 2024 18:36:27 GMT Subject: RFR: 8323220: Reassociate loop invariants involved in Cmps and Add/Subs [v3] In-Reply-To: References: Message-ID: On Tue, 23 Jan 2024 01:08:06 GMT, Xin Liu wrote: >> I am not following. We are explicitly checking for Cmp here. Why do we also need to check if it is a Sub? > > I think CmpNode has the same re-association rule as SubNode. > take your own example, >> inv1 == (x - inv2) => ( inv1 + inv2 ) == x > > Cmp(inv1, (x-inv2)) => Eq(0, Sub(inv1, (x-inv2)) => Eq(0, Sub(inv1+inv2, x)) => Cmp(inv1+inv2, x) > > Originally, n1->is_Sub() covers both CmpNode and SubNode. I don't you need to split them it into 2 cases. > > I think you only to check if n1 Cmp or Sub when you are going to return Cmp/SubNode The rules are not always the same. For example > inv1 - (inv2 - x) => (inv1 - inv2) + x > inv1 == (inv2 - x) => (-inv1 + inv2) == x The signs of inv1 and inv2 are flipped. I don't think we can fold these into a single conditional ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17375#discussion_r1463805762 From simonis at openjdk.org Tue Jan 23 19:00:51 2024 From: simonis at openjdk.org (Volker Simonis) Date: Tue, 23 Jan 2024 19:00:51 GMT Subject: RFR: 8324241: Always record evol_method deps to avoid excessive method flushing [v3] In-Reply-To: References: Message-ID: > Currently we don't record dependencies on redefined methods (i.e. `evol_method` dependencies) in JIT compiled methods if none of the `can_redefine_classes`, `can_retransform_classes` or `can_generate_breakpoint_events` JVMTI capabalities is set. This means that if a JVMTI agent which requests one of these capabilities is dynamically attached, all the methods which have been JIT compiled until that point, will be marked for deoptimization and flushed from the code cache. For large, warmed-up applications this mean deoptimization and instant recompilation of thousands if not then-thousands of methods, which can lead to dramatic performance/latency drop-downs for several minutes. > > One could argue that dynamic agent attach is now deprecated anyway (see [JEP 451: Prepare to Disallow the Dynamic Loading of Agents](https://openjdk.org/jeps/451)) and this problem could be solved by making the recording of `evol_method` dependencies dependent on the new `-XX:+EnableDynamicAgentLoading` flag isntead of the concrete JVMTI capabilities (because the presence of the flag indicates that an agent will be loaded eventually). > > But there a single, however important exception to this rule and that's JFR. JFR is advertised as low overhead profiler which can be enabled in production at any time. However, when JFR is started dynamically (e.g. through JCMD or JMX) it will silently load a HotSpot internl JVMTI agent which requests the `can_retransform_classes` and retransforms some classes. This will inevitably trigger the deoptimization of all compiled methods as described above. > > I'd therefor like to propose to *always* and unconditionally record `evol_method` dependencies in JIT compiled code by exporting the relevant properties right at startup in `init_globals()`: > ```c++ > jint init_globals() { > management_init(); > JvmtiExport::initialize_oop_storage(); > +#if INCLUDE_JVMTI > + JvmtiExport::set_can_hotswap_or_post_breakpoint(true); > + JvmtiExport::set_all_dependencies_are_recorded(true); > +#endif > > > My measurements indicate that the overhead of doing so is minimal (around 1% increase of nmethod size) and justifies the benefit. E.g. a Spring Petclinic application started with `-Xbatch -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation` compiles about ~11500 methods (~9000 with C1 and ~2500 with C2) resulting in an aggregated nmethod size of around ~40bm. Additionally recording `evol_method` dependencies only increases this size be about 400kb. The ration remains about the same i... Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: Updated option description and assertion based on review feedback ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17509/files - new: https://git.openjdk.org/jdk/pull/17509/files/6d3e24ab..7b750da5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17509&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17509&range=01-02 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17509.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17509/head:pull/17509 PR: https://git.openjdk.org/jdk/pull/17509 From eastigeevich at openjdk.org Tue Jan 23 19:49:28 2024 From: eastigeevich at openjdk.org (Evgeny Astigeevich) Date: Tue, 23 Jan 2024 19:49:28 GMT Subject: RFR: 8324241: Always record evol_method deps to avoid excessive method flushing [v3] In-Reply-To: References: Message-ID: On Tue, 23 Jan 2024 19:00:51 GMT, Volker Simonis wrote: >> Currently we don't record dependencies on redefined methods (i.e. `evol_method` dependencies) in JIT compiled methods if none of the `can_redefine_classes`, `can_retransform_classes` or `can_generate_breakpoint_events` JVMTI capabalities is set. This means that if a JVMTI agent which requests one of these capabilities is dynamically attached, all the methods which have been JIT compiled until that point, will be marked for deoptimization and flushed from the code cache. For large, warmed-up applications this mean deoptimization and instant recompilation of thousands if not then-thousands of methods, which can lead to dramatic performance/latency drop-downs for several minutes. >> >> One could argue that dynamic agent attach is now deprecated anyway (see [JEP 451: Prepare to Disallow the Dynamic Loading of Agents](https://openjdk.org/jeps/451)) and this problem could be solved by making the recording of `evol_method` dependencies dependent on the new `-XX:+EnableDynamicAgentLoading` flag isntead of the concrete JVMTI capabilities (because the presence of the flag indicates that an agent will be loaded eventually). >> >> But there a single, however important exception to this rule and that's JFR. JFR is advertised as low overhead profiler which can be enabled in production at any time. However, when JFR is started dynamically (e.g. through JCMD or JMX) it will silently load a HotSpot internl JVMTI agent which requests the `can_retransform_classes` and retransforms some classes. This will inevitably trigger the deoptimization of all compiled methods as described above. >> >> I'd therefor like to propose to *always* and unconditionally record `evol_method` dependencies in JIT compiled code by exporting the relevant properties right at startup in `init_globals()`: >> ```c++ >> jint init_globals() { >> management_init(); >> JvmtiExport::initialize_oop_storage(); >> +#if INCLUDE_JVMTI >> + JvmtiExport::set_can_hotswap_or_post_breakpoint(true); >> + JvmtiExport::set_all_dependencies_are_recorded(true); >> +#endif >> >> >> My measurements indicate that the overhead of doing so is minimal (around 1% increase of nmethod size) and justifies the benefit. E.g. a Spring Petclinic application started with `-Xbatch -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation` compiles about ~11500 methods (~9000 with C1 and ~2500 with C2) resulting in an aggregated nmethod size of around ~40bm. Additionally recording `evol_method` dependencies only increases this size be about 400kb.... > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Updated option description and assertion based on review feedback lgtm ------------- Marked as reviewed by eastigeevich (Committer). PR Review: https://git.openjdk.org/jdk/pull/17509#pullrequestreview-1839792795 From vlivanov at openjdk.org Tue Jan 23 20:00:29 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 23 Jan 2024 20:00:29 GMT Subject: RFR: 8324241: Always record evol_method deps to avoid excessive method flushing [v3] In-Reply-To: References: Message-ID: On Tue, 23 Jan 2024 19:00:51 GMT, Volker Simonis wrote: >> Currently we don't record dependencies on redefined methods (i.e. `evol_method` dependencies) in JIT compiled methods if none of the `can_redefine_classes`, `can_retransform_classes` or `can_generate_breakpoint_events` JVMTI capabalities is set. This means that if a JVMTI agent which requests one of these capabilities is dynamically attached, all the methods which have been JIT compiled until that point, will be marked for deoptimization and flushed from the code cache. For large, warmed-up applications this mean deoptimization and instant recompilation of thousands if not then-thousands of methods, which can lead to dramatic performance/latency drop-downs for several minutes. >> >> One could argue that dynamic agent attach is now deprecated anyway (see [JEP 451: Prepare to Disallow the Dynamic Loading of Agents](https://openjdk.org/jeps/451)) and this problem could be solved by making the recording of `evol_method` dependencies dependent on the new `-XX:+EnableDynamicAgentLoading` flag isntead of the concrete JVMTI capabilities (because the presence of the flag indicates that an agent will be loaded eventually). >> >> But there a single, however important exception to this rule and that's JFR. JFR is advertised as low overhead profiler which can be enabled in production at any time. However, when JFR is started dynamically (e.g. through JCMD or JMX) it will silently load a HotSpot internl JVMTI agent which requests the `can_retransform_classes` and retransforms some classes. This will inevitably trigger the deoptimization of all compiled methods as described above. >> >> I'd therefor like to propose to *always* and unconditionally record `evol_method` dependencies in JIT compiled code by exporting the relevant properties right at startup in `init_globals()`: >> ```c++ >> jint init_globals() { >> management_init(); >> JvmtiExport::initialize_oop_storage(); >> +#if INCLUDE_JVMTI >> + JvmtiExport::set_can_hotswap_or_post_breakpoint(true); >> + JvmtiExport::set_all_dependencies_are_recorded(true); >> +#endif >> >> >> My measurements indicate that the overhead of doing so is minimal (around 1% increase of nmethod size) and justifies the benefit. E.g. a Spring Petclinic application started with `-Xbatch -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation` compiles about ~11500 methods (~9000 with C1 and ~2500 with C2) resulting in an aggregated nmethod size of around ~40bm. Additionally recording `evol_method` dependencies only increases this size be about 400kb.... > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Updated option description and assertion based on review feedback I support keeping the logic under a flag. I have some concerns about unconditionally turning it on. I expect significantly higher footprint overhead when an application has plenty of tiny methods and deep inlining trees. And java.lang.invoke implementation pushes it even further (with arbitrarily deep MethodHandle trees and unconditional inlining through them), so heavy users of MethodHandle API should experience higher overheads when evol dependencies are recorded. I suggest to make the flag experimental. Once JFR implementation is improved, it can be superseded by `-XX:+EnableDynamicAgentLoading` check. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17509#issuecomment-1906824455 From psandoz at openjdk.org Tue Jan 23 20:04:29 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 23 Jan 2024 20:04:29 GMT Subject: RFR: 8324433: Introduce a way to determine if an expression is evaluated as a constant by the Jit compiler [v5] In-Reply-To: References: Message-ID: On Tue, 23 Jan 2024 17:21:47 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch introduces `JitCompiler::isConstantExpression` which can be used to statically determine whether an expression has been constant-folded by the Jit compiler, leading to more constant-folding opportunities. For example, it can be used in `MemorySessionImpl::checkValidStateRaw` to eliminate the lifetime check on global sessions without imposing additional branches on other non-global sessions. This is similar to `__builtin_constant_p` in GCC and clang. >> >> Please kindly give your opinion as well as your reviews, thanks very much. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > ident src/java.base/share/classes/jdk/internal/misc/JitCompiler.java line 32: > 30: * Just-in-time-compiler-related queries > 31: */ > 32: public class JitCompiler { An alternative name and location is `jdk.internal.vm.ConstantSupport` with initial class doc: Defines methods to test if a value has been evaluated to a compile-time constant value by the HotSpot VM. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17527#discussion_r1463926393 From dlong at openjdk.org Tue Jan 23 20:39:30 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 23 Jan 2024 20:39:30 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store In-Reply-To: References: Message-ID: <2qjSLptpXzT1_zz0oD8GOUS0g8XaaHZAPQio1mtuWaQ=.0246cf09-f814-421d-87b5-64b0e06b7f39@github.com> On Tue, 23 Jan 2024 08:48:35 GMT, Emanuel Peter wrote: > Does this satisfy you, or are you worried about other cases too? Yes, that sounds good. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16245#issuecomment-1906877762 From duke at openjdk.org Tue Jan 23 22:01:26 2024 From: duke at openjdk.org (Joshua Cao) Date: Tue, 23 Jan 2024 22:01:26 GMT Subject: RFR: 8323220: Reassociate loop invariants involved in Cmps and Add/Subs [v3] In-Reply-To: References: Message-ID: On Tue, 23 Jan 2024 15:15:47 GMT, Emanuel Peter wrote: >> [notEqualsInvariantSubVariantLong.txt](https://github.com/openjdk/jdk/files/14015703/notEqualsInvariantSubVariantLong.txt) >> >> Attaching the IdealGraphVisualizer file (renamed to `.txt` cause GitHub wants that). >> >> When we use `Argument.NUMBER_42`, we can avoid generating traps. When `i == 0` we enter the if statement, otherwise we exit. >> >> If we use random numbers, we most likely never enter the if statement so we can generate a trap there. In this case, the trap block as an extra `SubL` for `inv1 - i`. > > @caojoshua I see. That is a sad limitation. You could now add a `@Run` statement, but that is overkill as well. I hope to improve the IR framework to make it a bit easier to pass better arguments soon. For now I think we could just leave it with constants, for simplicity. > Can you randomize just one value and use them for both inv1 and inv2? Yeah we can just take a single argument `inv1`. I'd prefer having two arguments and leaving them constant. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17375#discussion_r1464045674 From qamai at openjdk.org Tue Jan 23 22:44:28 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 23 Jan 2024 22:44:28 GMT Subject: RFR: 8324433: Introduce a way to determine if an expression is evaluated as a constant by the Jit compiler [v5] In-Reply-To: References: Message-ID: On Tue, 23 Jan 2024 17:40:52 GMT, Aleksey Shipilev wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> ident > > src/java.base/share/classes/jdk/internal/misc/JitCompiler.java line 119: > >> 117: * @see #isCompileConstant(boolean) >> 118: */ >> 119: @IntrinsicCandidate > > Note how the Java entry for MH intrinsic we have replaced had `@Hidden`. These methods should have `@Hidden` too then? Probably applies to other entries too. I don't understand why this needs to be `@Hidden`, the javadoc says that a function annotated with `@Hidden` is omitted from the stacktraces. This function does not call anything so what is the point of hiding it? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17527#discussion_r1464081953 From qamai at openjdk.org Tue Jan 23 22:49:28 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 23 Jan 2024 22:49:28 GMT Subject: RFR: 8324433: Introduce a way to determine if an expression is evaluated as a constant by the Jit compiler [v5] In-Reply-To: References: Message-ID: <_pAlUJwzkoFkCnQW_IQK-zUkUMMjq6KjZoDldS34CyA=.984549da-d6de-4977-a87f-18a33d58824d@github.com> On Tue, 23 Jan 2024 17:42:40 GMT, Aleksey Shipilev wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> ident > > src/java.base/share/classes/jdk/internal/misc/JitCompiler.java line 56: > >> 54: */ >> 55: @IntrinsicCandidate >> 56: public static boolean isCompileConstant(boolean expr) { > > Here and in other places: probably not `expr`, but just `val` or something? I think of this as an expression that is always evaluated to the same value. The value itself is not interesting, it is the set of values that this expression can take that we are talking about. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17527#discussion_r1464085126 From qamai at openjdk.org Tue Jan 23 22:52:27 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 23 Jan 2024 22:52:27 GMT Subject: RFR: 8324433: Introduce a way to determine if an expression is evaluated as a constant by the Jit compiler [v5] In-Reply-To: References: Message-ID: <81tjQoutCZRej3wZAnPDJIq31hz7D7tbiJLWyWpXXv0=.5786bc11-7aa2-4290-a1d5-37c82452ed41@github.com> On Tue, 23 Jan 2024 20:01:45 GMT, Paul Sandoz wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> ident > > src/java.base/share/classes/jdk/internal/misc/JitCompiler.java line 32: > >> 30: * Just-in-time-compiler-related queries >> 31: */ >> 32: public class JitCompiler { > > An alternative name and location is `jdk.internal.vm.ConstantSupport` with initial class doc: > > Defines methods to test if a value has been evaluated to a compile-time constant value by the HotSpot VM. That sounds like a better name for the class, although I think `jdk.internal.misc` is more suitable than `jdk.internal.vm`. Do you have any preference? Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17527#discussion_r1464087772 From dlong at openjdk.org Tue Jan 23 23:39:26 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 23 Jan 2024 23:39:26 GMT Subject: RFR: 8324241: Always record evol_method deps to avoid excessive method flushing [v2] In-Reply-To: References: Message-ID: <95WlbkH5Kqt53YrRttF5wzO8vjpZbJPetFlVEOR9Q-s=.8f6d6359-9705-457b-bb13-4621f1bb3b1d@github.com> On Tue, 23 Jan 2024 16:37:35 GMT, Volker Simonis wrote: >> src/hotspot/share/runtime/init.cpp line 121: >> >>> 119: if (AlwaysRecordEvolDependencies) { >>> 120: JvmtiExport::set_can_hotswap_or_post_breakpoint(true); >>> 121: JvmtiExport::set_all_dependencies_are_recorded(true); >> >> I think the check for AlwaysRecordEvolDependencies needs to be moved into set_can_hotswap_or_post_breakpoint and set_all_dependencies_are_recorded, otherwise don't we risk the value being accidentally reset to false when set_can_hotswap_or_post_breakpoint() is called again by JvmtiManageCapabilities::update()? > > A good question, but after deep digging (it took me quite some time to figure this out myself :) I don't think that `can_hotswap_or_post_breakpoint`/`all_dependencies_are_recorded` can ever be reset. Here's how it works: > > - there's a global set of `always_capabilities` which is initialized when the first JVMTI environment is created: > > // capabilities which are always potentially available > jvmtiCapabilities JvmtiManageCapabilities::always_capabilities; > > void JvmtiManageCapabilities::initialize() { > _capabilities_lock = new Mutex(Mutex::nosafepoint, "Capabilities_lock"); > always_capabilities = init_always_capabilities(); > > > - as the name implies, this set of capabilities contains all generally available capabilities (except for the `onload` and `solo` capabilites, which are maintained in separate sets). > - `JvmtiManageCapabilities::update()` only operates on this global sets of capabilites (and *not* on the concrete capabilities of a specific JVMTI environment): > > void JvmtiManageCapabilities::update() { > jvmtiCapabilities avail; > // all capabilities > either(&always_capabilities, &always_solo_capabilities, &avail); > ... > // If can_redefine_classes is enabled in the onload phase then we know that the > // dependency information recorded by the compiler is complete. > if ((avail.can_redefine_classes || avail.can_retransform_classes) && > JvmtiEnv::get_phase() == JVMTI_PHASE_ONLOAD) { > JvmtiExport::set_all_dependencies_are_recorded(true); > } > ... > JvmtiExport::set_can_hotswap_or_post_breakpoint( > avail.can_generate_breakpoint_events || > avail.can_redefine_classes || > avail.can_retransform_classes); > > - This means that `JvmtiManageCapabilities::update()` is always conservative regarding its exports to `JvmtiExport` as it always exports *all* potentially available capabilites once a JVMTI environment is created and not only the specific capabilities requested by concrete JVMTI environment. This means that even if we create a JVMTI agent with an empty set of capabilities, `can_hotswap_or_post_breakpoint`/`all_dependencies_are_recorded` will still be set in `JvmtiExport`. > - There's no code path (at least I couldn't find one) which takes capabilities away from the `always_capabilities` set. Only if an environment requests `on_load` capabilities, they will be transferred from the global `onload_capabilities` to the `always_capabilities` set: > > jvmtiError JvmtiManageCapabilities::add_capabilities(const jvmtiCapabilities *current, > ... An assert works for me. Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17509#discussion_r1464120039 From wzhuo at openjdk.org Wed Jan 24 02:05:26 2024 From: wzhuo at openjdk.org (Wang Zhuo) Date: Wed, 24 Jan 2024 02:05:26 GMT Subject: RFR: 8324123: aarch64: fix prfm literal encoding in assembler [v4] In-Reply-To: References: Message-ID: On Tue, 23 Jan 2024 17:11:32 GMT, Emanuel Peter wrote: > @sandlerwang could there be a regression test for this bug? Sorry, I tried but cannot get a test for this bug, because the prfm literal encoding is not used in current tip. I met the bug when I tried to use prfm to prefetch instructions, but no similar case in tip ------------- PR Comment: https://git.openjdk.org/jdk/pull/17482#issuecomment-1907226231 From xliu at openjdk.org Wed Jan 24 03:18:28 2024 From: xliu at openjdk.org (Xin Liu) Date: Wed, 24 Jan 2024 03:18:28 GMT Subject: RFR: 8323220: Reassociate loop invariants involved in Cmps and Add/Subs [v3] In-Reply-To: References: Message-ID: <6DVcORGbZtAwJjbI6yQItDdL42FzTa-1Xs4h7Z---8g=.1e009c86-abbb-43e8-96f4-e1c0c7bfae11@github.com> On Tue, 23 Jan 2024 18:34:13 GMT, Joshua Cao wrote: >> I think CmpNode has the same re-association rule as SubNode. >> take your own example, >>> inv1 == (x - inv2) => ( inv1 + inv2 ) == x >> >> Cmp(inv1, (x-inv2)) => Eq(0, Sub(inv1, (x-inv2)) => Eq(0, Sub(inv1+inv2, x)) => Cmp(inv1+inv2, x) >> >> Originally, n1->is_Sub() covers both CmpNode and SubNode. I don't you need to split them it into 2 cases. >> >> I think you only to check if n1 Cmp or Sub when you are going to return Cmp/SubNode > > The rules are not always the same. For example > >> inv1 - (inv2 - x) => (inv1 - inv2) + x >> inv1 == (inv2 - x) => (-inv1 + inv2) == x > > The signs of inv1 and inv2 are flipped. I don't think we can fold these into a single conditional okay. I see you mean. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17375#discussion_r1464245651 From xliu at openjdk.org Wed Jan 24 03:24:27 2024 From: xliu at openjdk.org (Xin Liu) Date: Wed, 24 Jan 2024 03:24:27 GMT Subject: RFR: 8323220: Reassociate loop invariants involved in Cmps and Add/Subs [v4] In-Reply-To: References: Message-ID: On Mon, 22 Jan 2024 21:58:42 GMT, Joshua Cao wrote: >> // inv1 == (x + inv2) => ( inv1 - inv2 ) == x >> // inv1 == (x - inv2) => ( inv1 + inv2 ) == x >> // inv1 == (inv2 - x) => (-inv1 + inv2 ) == x >> >> >> For example, >> >> >> fn(inv1, inv2) >> while(...) >> x = foobar() >> if inv1 == x + inv2 >> blackhole() >> >> >> We can transform this into >> >> >> fn(inv1, inv2) >> t = inv1 - inv2 >> while(...) >> x = foobar() >> if t == x >> blackhole() >> >> >> Here is an example: https://github.com/openjdk/jdk/blob/b78896b9aafcb15f453eaed6e154a5461581407b/src/java.base/share/classes/java/lang/invoke/LambdaFormEditor.java#L910. LHS `1` and RHS `pos` are both loop invariant >> >> Passes tier1 locally on Linux machine. Passes GHA on my fork. > > Joshua Cao has updated the pull request incrementally with one additional commit since the last revision: > > Formatting and fix typo LTGM. I am not a reviewer. just one suggestion. Should we also update the comments of 'reassociate' ? this line: > For add/sub expressions: see "reassociate_add_sub" ------------- Marked as reviewed by xliu (Committer). PR Review: https://git.openjdk.org/jdk/pull/17375#pullrequestreview-1840319203 From jkarthikeyan at openjdk.org Wed Jan 24 04:59:27 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 24 Jan 2024 04:59:27 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v4] In-Reply-To: References: Message-ID: On Mon, 22 Jan 2024 19:58:50 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. >> >> In general, a TypeInt/Long represents a set of values x that satisfies: x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (~x & ones) == 0. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must normalize the constraints (tighten the constraints so that they are optimal) before constructing a TypeInt/Long instance. >> >> This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. >> >> Please kindly review, thanks a lot. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > fix release build I took a quick look through the patch, this is really impressive :) Early last year I had [an attempt at the same idea](https://github.com/openjdk/jdk/compare/master...jaskarth:jdk:bit-tracking) (extremely rough patch, sorry), where I went with the approach of using a 2-bit value for each bit position to represent `0`, `1`, `BOTTOM`, and `TOP`. My general idea was to create a boolean lattice so that the meet() and dual() operations were easier to implement, before I realized how difficult reasoning about multiple constraints in the meet and dual operations was. I think your idea of marking the dual makes more sense and is cleaner, especially with how the constraints interact. src/hotspot/share/opto/type.cpp line 1642: > 1640: > 1641: const Type* TypeInt::widen(const Type* old, const Type* limit) const { > 1642: assert(!_dual, ""); I think it'd be helpful for these `!_dual` asserts to have a message, even if it's something simple like `dual not expected here` src/hotspot/share/opto/type.hpp line 617: > 615: bool is_con(jint i) const { return is_con() && _lo == i; } > 616: jint get_con() const { assert(is_con(), ""); return _lo; } > 617: bool contains(jint i) const; I think the `contains` and `properly_contains` functions could use some documentation as well to explain under what conditions they return true or false. For `contains(TypeInt* t)`, would it return the same as `t->higher_equal(this)`? If so, that might simplify the implementation of it. ------------- PR Review: https://git.openjdk.org/jdk/pull/17508#pullrequestreview-1839638653 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1463870704 PR Review Comment: https://git.openjdk.org/jdk/pull/17508#discussion_r1463814326 From dholmes at openjdk.org Wed Jan 24 06:30:27 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 24 Jan 2024 06:30:27 GMT Subject: RFR: 8324433: Introduce a way to determine if an expression is evaluated as a constant by the Jit compiler [v5] In-Reply-To: <_pAlUJwzkoFkCnQW_IQK-zUkUMMjq6KjZoDldS34CyA=.984549da-d6de-4977-a87f-18a33d58824d@github.com> References: <_pAlUJwzkoFkCnQW_IQK-zUkUMMjq6KjZoDldS34CyA=.984549da-d6de-4977-a87f-18a33d58824d@github.com> Message-ID: <5AWq0nDx_AQPwnEp1cMisZ6ytn2ieq9FHDwDQp5A4QQ=.5043ac3e-04bf-4fd8-a680-448f392e5cb1@github.com> On Tue, 23 Jan 2024 22:46:20 GMT, Quan Anh Mai wrote: >> src/java.base/share/classes/jdk/internal/misc/JitCompiler.java line 56: >> >>> 54: */ >>> 55: @IntrinsicCandidate >>> 56: public static boolean isCompileConstant(boolean expr) { >> >> Here and in other places: probably not `expr`, but just `val` or something? > > I think of this as an expression that is always evaluated to the same value. The value itself is not interesting, it is the set of values that this expression can take that we are talking about. This seems really weird to me for Java code. The method doesn't get the original "expression" it only gets the value of that expression after it has been evaluated. Is there some kind of weird "magic" happening here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17527#discussion_r1464361310 From qamai at openjdk.org Wed Jan 24 07:17:27 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 24 Jan 2024 07:17:27 GMT Subject: RFR: 8324433: Introduce a way to determine if an expression is evaluated as a constant by the Jit compiler [v5] In-Reply-To: <5AWq0nDx_AQPwnEp1cMisZ6ytn2ieq9FHDwDQp5A4QQ=.5043ac3e-04bf-4fd8-a680-448f392e5cb1@github.com> References: <_pAlUJwzkoFkCnQW_IQK-zUkUMMjq6KjZoDldS34CyA=.984549da-d6de-4977-a87f-18a33d58824d@github.com> <5AWq0nDx_AQPwnEp1cMisZ6ytn2ieq9FHDwDQp5A4QQ=.5043ac3e-04bf-4fd8-a680-448f392e5cb1@github.com> Message-ID: <9iDFu8I4w_i1Uso5q7oEi0Le1JvgDNgNyuSZlmKQiuE=.5739d448-fc73-4bcf-bec8-26b3a1b75d21@github.com> On Wed, 24 Jan 2024 06:27:20 GMT, David Holmes wrote: >> I think of this as an expression that is always evaluated to the same value. The value itself is not interesting, it is the set of values that this expression can take that we are talking about. > > This seems really weird to me for Java code. The method doesn't get the original "expression" it only gets the value of that expression after it has been evaluated. Is there some kind of weird "magic" happening here? @dholmes-ora Indeed it's a compiler magic, albeit not really weird. While the method execution only receives the evaluated value of `expr`, the method compilation has the expression in its original form. As a result, it can determine the result based on this information. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17527#discussion_r1464415357 From epeter at openjdk.org Wed Jan 24 08:45:31 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 24 Jan 2024 08:45:31 GMT Subject: RFR: 8324123: aarch64: fix prfm literal encoding in assembler [v4] In-Reply-To: References: Message-ID: On Tue, 23 Jan 2024 10:03:38 GMT, Wang Zhuo wrote: >> Current prfm literal mode encoding in aarch64 assembler is not correct. >> The prfm_literal instruction requires 31 and 30 bits to be 0x11, while current assembler encodes the two bits to be 0x11, which is a ldr instruction, not prfm. >> For example, if adding the following code in stubGenerator >> __ prfm(Address(__ pc())) >> we get a ldr instruction like >> ldr x0, 0x0000ffff83f8539c >> but it should be a prfm instruction like >> prfm pldl1keep, 0x0000ffff8ff8539c >> >> The bug is caused in ld_st2, literal mode, bit 31 and 30 bits are set to (size & 0b01), while for prfm instructions, 31 and 30 bits must be 0b11. >> void ld_st2(Register Rt, const Address &adr, int size, int op, int V = 0) { >> starti; >> >> f(V, 26); // general reg? >> zrf(Rt, 0); >> >> // Encoding for literal loads is done here (rather than pushed >> // down into Address::encode) because the encoding of this >> // instruction is too different from all of the other forms to >> // make it worth sharing. >> if (adr.getMode() == Address::literal) { >> assert(size == 0b10 || size == 0b11, "bad operand size in ldr"); >> assert(op == 0b01, "literal form can only be used with loads"); >> f(**size & 0b01, 31, 30**), f(0b011, 29, 27), f(0b00, 25, 24); >> int64_t offset = (adr.target() - pc()) >> 2; >> sf(offset, 23, 5); >> code_section()->relocate(pc(), adr.rspec()); >> return; >> } >> >> f(size, 31, 30); >> f(op, 23, 22); // str >> adr.encode(¤t_insn); >> } > > Wang Zhuo has updated the pull request incrementally with one additional commit since the last revision: > > get some comments back src/hotspot/cpu/aarch64/assembler_aarch64.cpp line 192: > 190: // This encoding is similar (but not quite identical) to the encoding used > 191: // by literal ld/st. see JDK-8324123. > 192: // FIXME: PRFM should not be used with writeback modes, but the assembler FIXME: is it ok to leave this in the code? I think we prefer filed RFE's to comments in the code that nobody will ever look at again. You can put the RFE number in the code though. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17482#discussion_r1464525152 From shade at openjdk.org Wed Jan 24 08:58:30 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 24 Jan 2024 08:58:30 GMT Subject: RFR: 8324433: Introduce a way to determine if an expression is evaluated as a constant by the Jit compiler [v5] In-Reply-To: References: Message-ID: On Tue, 23 Jan 2024 22:41:44 GMT, Quan Anh Mai wrote: >> src/java.base/share/classes/jdk/internal/misc/JitCompiler.java line 119: >> >>> 117: * @see #isCompileConstant(boolean) >>> 118: */ >>> 119: @IntrinsicCandidate >> >> Note how the Java entry for MH intrinsic we have replaced had `@Hidden`. These methods should have `@Hidden` too then? Probably applies to other entries too. > > I don't understand why this needs to be `@Hidden`, the javadoc says that a function annotated with `@Hidden` is omitted from the stacktraces. This function does not call anything so what is the point of hiding it? I suspect there is a code that counts stack traces somewhere that relies on it in MH parts. There is no harm for doing `@Hidden` here, I think. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17527#discussion_r1464541674 From epeter at openjdk.org Wed Jan 24 09:08:30 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 24 Jan 2024 09:08:30 GMT Subject: RFR: 8323220: Reassociate loop invariants involved in Cmps and Add/Subs [v4] In-Reply-To: References: Message-ID: <9my9QvARsJXotzo7rit4d3CeE9k8-NOoml4kXdmYzvU=.f5748f8a-8b57-451a-8350-cab6b52e83c7@github.com> On Mon, 22 Jan 2024 21:58:42 GMT, Joshua Cao wrote: >> // inv1 == (x + inv2) => ( inv1 - inv2 ) == x >> // inv1 == (x - inv2) => ( inv1 + inv2 ) == x >> // inv1 == (inv2 - x) => (-inv1 + inv2 ) == x >> >> >> For example, >> >> >> fn(inv1, inv2) >> while(...) >> x = foobar() >> if inv1 == x + inv2 >> blackhole() >> >> >> We can transform this into >> >> >> fn(inv1, inv2) >> t = inv1 - inv2 >> while(...) >> x = foobar() >> if t == x >> blackhole() >> >> >> Here is an example: https://github.com/openjdk/jdk/blob/b78896b9aafcb15f453eaed6e154a5461581407b/src/java.base/share/classes/java/lang/invoke/LambdaFormEditor.java#L910. LHS `1` and RHS `pos` are both loop invariant >> >> Passes tier1 locally on Linux machine. Passes GHA on my fork. > > Joshua Cao has updated the pull request incrementally with one additional commit since the last revision: > > Formatting and fix typo src/hotspot/share/opto/loopTransform.cpp line 333: > 331: // inv1 == (inv2 - x) => (-inv1 + inv2 ) == x > 332: // > 333: Node* IdealLoopTree::reassociate_add_sub_cmp(Node* n1, int inv1_idx, int inv2_idx, PhaseIdealLoop* phase) { as @navyxliu said, you should also update any comments with the old name. One thing I have spotted is this: `//---------------------reassociate_add_sub------------------------` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17375#discussion_r1464554665 From shade at openjdk.org Wed Jan 24 09:06:29 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 24 Jan 2024 09:06:29 GMT Subject: RFR: 8324433: Introduce a way to determine if an expression is evaluated as a constant by the Jit compiler [v5] In-Reply-To: <81tjQoutCZRej3wZAnPDJIq31hz7D7tbiJLWyWpXXv0=.5786bc11-7aa2-4290-a1d5-37c82452ed41@github.com> References: <81tjQoutCZRej3wZAnPDJIq31hz7D7tbiJLWyWpXXv0=.5786bc11-7aa2-4290-a1d5-37c82452ed41@github.com> Message-ID: On Tue, 23 Jan 2024 22:49:49 GMT, Quan Anh Mai wrote: >> src/java.base/share/classes/jdk/internal/misc/JitCompiler.java line 32: >> >>> 30: * Just-in-time-compiler-related queries >>> 31: */ >>> 32: public class JitCompiler { >> >> An alternative name and location is `jdk.internal.vm.ConstantSupport` with initial class doc: >> >> Defines methods to test if a value has been evaluated to a compile-time constant value by the HotSpot VM. > > That sounds like a better name for the class, although I think `jdk.internal.misc` is more suitable than `jdk.internal.vm`. Do you have any preference? Thanks. +1 to `ConstantSupport`. I think `jdk.internal.vm` is a proper place for it. There is adjacent `jdk.internal.vm.vector.VectorSupport`, and whole `jdk.internal.vm.annotations` package is there too. `jdk.internal.misc` sounds like a place for utility classes. `Unsafe` is a historical exception, I think. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17527#discussion_r1464551793 From dfenacci at openjdk.org Wed Jan 24 09:31:38 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 24 Jan 2024 09:31:38 GMT Subject: RFR: JDK-8317299: safepoint scalarization doesn't keep track of the depth of the JVM state Message-ID: # Issue The origin of the problem is tied to the fact that, when C2 optimizes vector boxes, it performs safepoint object scalarization before late inlining. This can lead to situations in which scalarization adds scalarized values to the JVM state and late inlining of further methods adds further JVM state entries on top for each inlined method. With the example of the reported bug (_TestIntrinsicBailOut.java_) we get to a situation like this: ... bc: JVMS depth=6 loc=20 stk=23 arg=23 mon=23 scalar=23 end=23 mondepth=0 sp=0 bci=1 reexecute=false method=virtual jobject jdk.incubator.vector.ByteVector.rearrangeTemplate(jobject, jobject) bc: JVMS depth=7 loc=23 stk=27 arg=27 mon=27 scalar=27 end=27 mondepth=0 sp=0 bci=36 reexecute=false method=virtual jobject jdk.incubator.vector.AbstractShuffle.checkIndexes() bc: JVMS depth=8 loc=27 stk=28 arg=28 mon=28 scalar=28 end=28 mondepth=0 sp=0 bci=1 reexecute=false method=virtual jobject jdk.incubator.vector.AbstractShuffle.reorder() bc: JVMS depth=9 loc=28 stk=29 arg=29 mon=29 scalar=29 end=31 mondepth=0 sp=0 bci=1 reexecute=false method=virtual jobject jdk.internal.vm.vector.VectorSupport$VectorPayload.getPayload() bc: JVMS depth=10 loc=31 stk=32 arg=32 mon=32 scalar=32 end=32 mondepth=0 sp=0 bci=3 reexecute=false method=static jobject jdk.internal.vm.vector.VectorSupport.maybeRebox(jobject) bc: JVMS depth=11 loc=32 stk=33 arg=33 mon=33 scalar=33 end=33 mondepth=0 sp=0 bci=1 reexecute=false method=virtual void jdk.internal.misc.Unsafe.loadFence() `JVMS depth=9` shows 2 scalars but 2 further inlines added 2 more JVM states (with no scalars). The corresponding node looks like this: image To keep track of its scalarized inputs, `SafePointScalarObjectNode` keeps a field `_first_index`, which is supposed to be "relative to the last (youngest) jvms->_scloff"... https://github.com/openjdk/jdk/blob/c5e72450966ad50d57a8d22e9d634bfcb319aee9/src/hotspot/share/opto/callnode.hpp#L509-L511 but if there are late inlined methods, this field is going to be relative to the JVM state at the depth before inlining happened (e.g. depth=9 in the example) and not relative to the youngest depth. # Solution In order to keep track of the right depth a `_depth` field is added to `SafePointScalarObjectNode`, which refers to the depth of the JVM state the `_first_index` field refers to. The method `uint first_index(JVMState* jvms)` is adapted accordingly. ------------- Commit messages: - JDK-8317299: fix copyright year - JDK-8317299: add comment for depth field - JDK-8317299: fix test - JDK-8317299: assert(local) failed: use _top instead of null Changes: https://git.openjdk.org/jdk/pull/17500/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17500&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8317299 Stats: 22 lines in 5 files changed: 13 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/17500.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17500/head:pull/17500 PR: https://git.openjdk.org/jdk/pull/17500 From rcastanedalo at openjdk.org Wed Jan 24 09:39:28 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 24 Jan 2024 09:39:28 GMT Subject: RFR: 8322996: BoxLockNode creation fails with assert(reg < CHUNK_SIZE) failed: sanity [v4] In-Reply-To: References: Message-ID: On Tue, 23 Jan 2024 10:14:00 GMT, Daniel Lund?n wrote: >> This changeset fixes an issue where deeply nested synchronized statements triggered an assert in C2. >> >> Changes: >> - Bail out on compilation when we create a `BoxLockNode` with a slot index that cannot fit in a `RegMask`. This is similar to how we handle the case when we do not have space to represent arguments in [`opto/matcher.cpp`](https://github.com/openjdk/jdk/blob/58b01dce054c50bcb5a28aad4c1b574acaa90f6d/src/hotspot/share/opto/matcher.cpp#L314-L318) >> - Generalize `RegMask::can_represent` to take an additional and optional size argument to facilitate reuse. The default size value, 1, corresponds to the previous functionality. Rewrite `can_represent_arg` to directly call `can_represent(reg, SlotsPerVecZ)`. >> - Add a regression test. >> >> Testing: >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/7612890820) >> - tier1 to tier5 on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. >> - The new regression test in all tier1 to tier10 contexts on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. > > Daniel Lund?n has updated the pull request incrementally with two additional commits since the last revision: > > - Update missed copyright > - Refactor The fix itself looks good to me. Would it make sense, for better coverage, to add a couple of additional test cases that exercise the boundaries of the condition that is tested? E.g. one with one `synchronized` statement less than the current one and one with one `synchronized` statement more. test/hotspot/jtreg/compiler/locks/TestNestedSynchronize.java line 34: > 32: */ > 33: > 34: package compiler.c2; Update package name to `compiler.locks`. ------------- Changes requested by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17370#pullrequestreview-1840875518 PR Review Comment: https://git.openjdk.org/jdk/pull/17370#discussion_r1464590718 From dlunden at openjdk.org Wed Jan 24 09:47:27 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 24 Jan 2024 09:47:27 GMT Subject: RFR: 8322996: BoxLockNode creation fails with assert(reg < CHUNK_SIZE) failed: sanity [v4] In-Reply-To: References: Message-ID: On Wed, 24 Jan 2024 09:22:50 GMT, Roberto Casta?eda Lozano wrote: >> Daniel Lund?n has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update missed copyright >> - Refactor > > test/hotspot/jtreg/compiler/locks/TestNestedSynchronize.java line 34: > >> 32: */ >> 33: >> 34: package compiler.c2; > > Update package name to `compiler.locks`. Oops, thanks. Updating and rerunning tests. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17370#discussion_r1464631044 From dlunden at openjdk.org Wed Jan 24 09:53:27 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 24 Jan 2024 09:53:27 GMT Subject: RFR: 8322996: BoxLockNode creation fails with assert(reg < CHUNK_SIZE) failed: sanity [v4] In-Reply-To: References: Message-ID: On Wed, 24 Jan 2024 09:36:28 GMT, Roberto Casta?eda Lozano wrote: > The fix itself looks good to me. Would it make sense, for better coverage, to add a couple of additional test cases that exercise the boundaries of the condition that is tested? E.g. one with one `synchronized` statement less than the current one and one with one `synchronized` statement more. I have experimented with such test cases (various edge cases) and as a result found a related (but separate) issue from this one. I was planning to add these additional tests for that separate issue, to not introduce unnecessary test failures before that fix is integrated. Maybe it is better to add the additional tests directly as part of this changeset instead? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17370#issuecomment-1907776710 From simonis at openjdk.org Wed Jan 24 10:13:29 2024 From: simonis at openjdk.org (Volker Simonis) Date: Wed, 24 Jan 2024 10:13:29 GMT Subject: RFR: 8324241: Always record evol_method deps to avoid excessive method flushing [v2] In-Reply-To: <95WlbkH5Kqt53YrRttF5wzO8vjpZbJPetFlVEOR9Q-s=.8f6d6359-9705-457b-bb13-4621f1bb3b1d@github.com> References: <95WlbkH5Kqt53YrRttF5wzO8vjpZbJPetFlVEOR9Q-s=.8f6d6359-9705-457b-bb13-4621f1bb3b1d@github.com> Message-ID: On Tue, 23 Jan 2024 23:36:37 GMT, Dean Long wrote: >> A good question, but after deep digging (it took me quite some time to figure this out myself :) I don't think that `can_hotswap_or_post_breakpoint`/`all_dependencies_are_recorded` can ever be reset. Here's how it works: >> >> - there's a global set of `always_capabilities` which is initialized when the first JVMTI environment is created: >> >> // capabilities which are always potentially available >> jvmtiCapabilities JvmtiManageCapabilities::always_capabilities; >> >> void JvmtiManageCapabilities::initialize() { >> _capabilities_lock = new Mutex(Mutex::nosafepoint, "Capabilities_lock"); >> always_capabilities = init_always_capabilities(); >> >> >> - as the name implies, this set of capabilities contains all generally available capabilities (except for the `onload` and `solo` capabilites, which are maintained in separate sets). >> - `JvmtiManageCapabilities::update()` only operates on this global sets of capabilites (and *not* on the concrete capabilities of a specific JVMTI environment): >> >> void JvmtiManageCapabilities::update() { >> jvmtiCapabilities avail; >> // all capabilities >> either(&always_capabilities, &always_solo_capabilities, &avail); >> ... >> // If can_redefine_classes is enabled in the onload phase then we know that the >> // dependency information recorded by the compiler is complete. >> if ((avail.can_redefine_classes || avail.can_retransform_classes) && >> JvmtiEnv::get_phase() == JVMTI_PHASE_ONLOAD) { >> JvmtiExport::set_all_dependencies_are_recorded(true); >> } >> ... >> JvmtiExport::set_can_hotswap_or_post_breakpoint( >> avail.can_generate_breakpoint_events || >> avail.can_redefine_classes || >> avail.can_retransform_classes); >> >> - This means that `JvmtiManageCapabilities::update()` is always conservative regarding its exports to `JvmtiExport` as it always exports *all* potentially available capabilites once a JVMTI environment is created and not only the specific capabilities requested by concrete JVMTI environment. This means that even if we create a JVMTI agent with an empty set of capabilities, `can_hotswap_or_post_breakpoint`/`all_dependencies_are_recorded` will still be set in `JvmtiExport`. >> - There's no code path (at least I couldn't find one) which takes capabilities away from the `always_capabilities` set. Only if an environment requests `on_load` capabilities, they will be transferred from the global `onload_capabilities` to the `always_capabilities` set: >> >> jvmtiError JvmtiManageCapabilities::... > > An assert works for me. Thanks. Added assert in `JvmtiExport::set_can_hotswap_or_post_breakpoint()`. Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17509#discussion_r1464670162 From thartmann at openjdk.org Wed Jan 24 10:08:28 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 24 Jan 2024 10:08:28 GMT Subject: RFR: JDK-8317299: safepoint scalarization doesn't keep track of the depth of the JVM state In-Reply-To: References: Message-ID: <0DXVMJeiCUpJPjYw7knuLuXzoPRY2pKMj2InB98gJJ4=.73ea6aa8-795a-4922-94cc-9ebf428a63fa@github.com> On Fri, 19 Jan 2024 16:31:25 GMT, Damon Fenacci wrote: > # Issue > > The origin of the problem is tied to the fact that, when C2 optimizes vector boxes, it performs safepoint object scalarization before late inlining. > This can lead to situations in which scalarization adds scalarized values to the JVM state and late inlining of further methods adds further JVM state entries on top for each inlined method. > With the example of the reported bug (_TestIntrinsicBailOut.java_) we get to a situation like this: > > ... > bc: JVMS depth=6 loc=20 stk=23 arg=23 mon=23 scalar=23 end=23 mondepth=0 sp=0 bci=1 reexecute=false method=virtual jobject jdk.incubator.vector.ByteVector.rearrangeTemplate(jobject, jobject) > bc: JVMS depth=7 loc=23 stk=27 arg=27 mon=27 scalar=27 end=27 mondepth=0 sp=0 bci=36 reexecute=false method=virtual jobject jdk.incubator.vector.AbstractShuffle.checkIndexes() > bc: JVMS depth=8 loc=27 stk=28 arg=28 mon=28 scalar=28 end=28 mondepth=0 sp=0 bci=1 reexecute=false method=virtual jobject jdk.incubator.vector.AbstractShuffle.reorder() > bc: JVMS depth=9 loc=28 stk=29 arg=29 mon=29 scalar=29 end=31 mondepth=0 sp=0 bci=1 reexecute=false method=virtual jobject jdk.internal.vm.vector.VectorSupport$VectorPayload.getPayload() > bc: JVMS depth=10 loc=31 stk=32 arg=32 mon=32 scalar=32 end=32 mondepth=0 sp=0 bci=3 reexecute=false method=static jobject jdk.internal.vm.vector.VectorSupport.maybeRebox(jobject) > bc: JVMS depth=11 loc=32 stk=33 arg=33 mon=33 scalar=33 end=33 mondepth=0 sp=0 bci=1 reexecute=false method=virtual void jdk.internal.misc.Unsafe.loadFence() > > `JVMS depth=9` shows 2 scalars but 2 further inlines added 2 more JVM states (with no scalars). > > The corresponding node looks like this: > image > > To keep track of its scalarized inputs, `SafePointScalarObjectNode` keeps a field `_first_index`, which is supposed to be "relative to the last (youngest) jvms->_scloff"... > https://github.com/openjdk/jdk/blob/c5e72450966ad50d57a8d22e9d634bfcb319aee9/src/hotspot/share/opto/callnode.hpp#L509-L511 > but if there are late inlined methods, this field is going to be relative to the JVM state at the depth before inlining happened (e.g. depth=9 in the example) and not relative to the youngest depth. > > # Solution > > In order to keep track of the right depth a `_depth` field is added to `SafePointScalarObjectNode`, which refers to the depth of the JVM state the `_first_index` field refers to. The method `uint first_index(JVMState*... Great work, Damon. When we discussed this, I always wondered why we don't hit the same issue in Valhalla, where we perform even more aggressive scalarization during IGVN. Turns out that I postponed scalarization to after inlining, to work around that exact problem: https://github.com/openjdk/valhalla/blob/c48006dfc05bb0c41ab9ae55ead226356259c46d/src/hotspot/share/opto/compile.cpp#L2002 With your fix, we can remove that limitation. I filed [JDK-8324605](https://bugs.openjdk.org/browse/JDK-8324605) for this. ------------- PR Review: https://git.openjdk.org/jdk/pull/17500#pullrequestreview-1840984132 From qamai at openjdk.org Wed Jan 24 10:33:05 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 24 Jan 2024 10:33:05 GMT Subject: RFR: 8324433: Introduce a way to determine if an expression is evaluated as a constant by the Jit compiler [v6] In-Reply-To: References: Message-ID: > Hi, > > This patch introduces `JitCompiler::isConstantExpression` which can be used to statically determine whether an expression has been constant-folded by the Jit compiler, leading to more constant-folding opportunities. For example, it can be used in `MemorySessionImpl::checkValidStateRaw` to eliminate the lifetime check on global sessions without imposing additional branches on other non-global sessions. This is similar to `__builtin_constant_p` in GCC and clang. > > Please kindly give your opinion as well as your reviews, thanks very much. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: address reviews ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17527/files - new: https://git.openjdk.org/jdk/pull/17527/files/3ecb2c66..b4445e2e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17527&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17527&range=04-05 Stats: 36 lines in 4 files changed: 10 ins; 0 del; 26 mod Patch: https://git.openjdk.org/jdk/pull/17527.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17527/head:pull/17527 PR: https://git.openjdk.org/jdk/pull/17527 From qamai at openjdk.org Wed Jan 24 10:40:27 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 24 Jan 2024 10:40:27 GMT Subject: RFR: 8324433: Introduce a way to determine if an expression is evaluated as a constant by the Jit compiler [v5] In-Reply-To: References: <81tjQoutCZRej3wZAnPDJIq31hz7D7tbiJLWyWpXXv0=.5786bc11-7aa2-4290-a1d5-37c82452ed41@github.com> Message-ID: On Wed, 24 Jan 2024 09:03:43 GMT, Aleksey Shipilev wrote: >> That sounds like a better name for the class, although I think `jdk.internal.misc` is more suitable than `jdk.internal.vm`. Do you have any preference? Thanks. > > +1 to `ConstantSupport`. I think `jdk.internal.vm` is a proper place for it. There is adjacent `jdk.internal.vm.vector.VectorSupport`, and whole `jdk.internal.vm.annotations` package is there too. > > `jdk.internal.misc` sounds like a place for utility classes. `Unsafe` is a historical exception, I think. I see, my main premise is that it is somewhat similar to `Unsafe` which turns out to be an exception :) Thanks a lot for your suggestions, I have updated the PR, also added `@Hidden` back. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17527#discussion_r1464707689 From dlunden at openjdk.org Wed Jan 24 11:18:56 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 24 Jan 2024 11:18:56 GMT Subject: RFR: 8322996: BoxLockNode creation fails with assert(reg < CHUNK_SIZE) failed: sanity [v5] In-Reply-To: References: Message-ID: > This changeset fixes an issue where deeply nested synchronized statements triggered an assert in C2. > > Changes: > - Bail out on compilation when we create a `BoxLockNode` with a slot index that cannot fit in a `RegMask`. This is similar to how we handle the case when we do not have space to represent arguments in [`opto/matcher.cpp`](https://github.com/openjdk/jdk/blob/58b01dce054c50bcb5a28aad4c1b574acaa90f6d/src/hotspot/share/opto/matcher.cpp#L314-L318) > - Generalize `RegMask::can_represent` to take an additional and optional size argument to facilitate reuse. The default size value, 1, corresponds to the previous functionality. Rewrite `can_represent_arg` to directly call `can_represent(reg, SlotsPerVecZ)`. > - Add a regression test. > > Testing: > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/7612890820) > - tier1 to tier5 on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. > - The new regression test in all tier1 to tier10 contexts on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: Fix incorrect package name ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17370/files - new: https://git.openjdk.org/jdk/pull/17370/files/bf87138f..524438ca Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17370&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17370&range=03-04 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/17370.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17370/head:pull/17370 PR: https://git.openjdk.org/jdk/pull/17370 From dlunden at openjdk.org Wed Jan 24 13:05:51 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 24 Jan 2024 13:05:51 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v4] In-Reply-To: References: Message-ID: <2Cdgobz7FKpgudHqcu_PHt1mLjN0q03Z6ss0OtOMnaU=.5dda527c-6e09-484c-9b7a-aa870c908b51@github.com> > This changeset translates the tests in `compiler/c2/cr7200264/` to use the IR verification framework. > > The proposed translation, to the extent possible, attempts to preserve the semantics of the original test. A major difference is that the IR checks are now local (for every `test_*` method) instead of global. The execution time of the new test is comparable to the old test. > > Testing: > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/7553846710) > - Ran the new translated tests within all tier1 through tier10 contexts on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. > - Tested that manually adding `-XX:LoopUnrollLimit=0` to the test framework flags caused the translated tests to fail. Note: it is, however, no longer possible to break the test by passing `-XX:LoopUnrollLimit=0` on the command line. Daniel Lund?n has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: - Updates after reviews - Merge remote-tracking branch 'upstream/master' into test-sse2-int-vect-8291809-tmp - Apply suggestions from code review Co-authored-by: Emanuel Peter - Refactor test to use multiple @Test - Remove TestDriver - Readd verification - Finalize changes - Use static initialization block - Experiments - Naive translation complete - ... and 1 more: https://git.openjdk.org/jdk/compare/cbebc6cd...1e4e74f0 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17428/files - new: https://git.openjdk.org/jdk/pull/17428/files/87715718..1e4e74f0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17428&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17428&range=02-03 Stats: 7545 lines in 265 files changed: 4574 ins; 1735 del; 1236 mod Patch: https://git.openjdk.org/jdk/pull/17428.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17428/head:pull/17428 PR: https://git.openjdk.org/jdk/pull/17428 From dlunden at openjdk.org Wed Jan 24 13:05:52 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 24 Jan 2024 13:05:52 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v2] In-Reply-To: References: Message-ID: On Thu, 18 Jan 2024 12:24:42 GMT, Emanuel Peter wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Refactor test to use multiple @Test > > Well, I think at least some of the `shift` examples should also vectorize: > `./java -XX:CompileCommand=compileonly,Test::test1 -XX:+TraceSuperWord -XX:+TraceLoopOpts -XX:+TraceNewVectors -XX:UseAVX=2 Test.java` > > Not sure if for all SSE and AVX levels, but all that I quickly checked with the UseSSE and USEAVX flags. > > > TraceNewVectors [SuperWord]: 832 LoadVector === 347 766 740 [[ 738 734 731 727 619 616 518 136 ]] @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched #vectory[8]:{int} !orig=[739],[620],[519],[135] !jvms: Test::test2 @ bci:12 (line 21) > TraceNewVectors [SuperWord]: 836 LShiftVI === _ 832 835 [[ 736 733 730 725 618 615 516 157 ]] #vectory[8]:{int} !orig=[738],[619],[518],[136] !jvms: Test::test2 @ bci:14 (line 21) > TraceNewVectors [SuperWord]: 837 StoreVector === 763 766 737 836 [[ 341 766 160 339 ]] @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched Memory: @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; !orig=[736],[618],[516],[157],535 !jvms: Test::test2 @ bci:15 (line 21) > > > Test.java: > > public class Test { > static int RANGE = 10_000; > > public static void main(String[] args) { > int[] a = new int[RANGE]; > int[] b = new int[RANGE]; > for (int i = 0; i < 10_000; i++) { > test1(a, b); > test2(a, b, i % 200 - 100); > } > } > > static void test1(int[] a, int[] b) { > for (int i = 0; i < a.length; i++) { > a[i] = (int)(b[i] << 32); > } > } > > static void test2(int[] a, int[] b, int s) { > for (int i = 0; i < a.length; i++) { > a[i] = (int)(b[i] << s); > } > } > } > > > I also found this test in `test/hotspot/jtreg/compiler/vectorization/runner/BasicIntOpTest.java`: > > @Test > @IR(applyIfCPUFeatureOr = {"asimd", "true", "sse2", "true"}, > counts = {IRNode.LSHIFT_VI, ">0"}) > public int[] vectorShiftLeft() { > int[] res = new int[SIZE]; > for (int i = 0; i < SIZE; i++) { > res[i] = a[i] << 3; > } > return res; > } > > > Plus, I see `test.addExpectedVectorization("LShiftVI", 5);` in `test/hotspot/jtreg/compiler/c2/cr7200264/TestSSE2IntVect.java`, which you now deleted. > > @dlunde would you mind investigating a bit more if you can add some IR rules for all (or at least a few) of the shift examples? > If you think they really do not vectorize, can you paste me a Test.java with comm... Thanks @eme64. I've addressed all comments now; please have a look again. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17428#issuecomment-1908081925 From epeter at openjdk.org Wed Jan 24 13:33:29 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 24 Jan 2024 13:33:29 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v4] In-Reply-To: <2Cdgobz7FKpgudHqcu_PHt1mLjN0q03Z6ss0OtOMnaU=.5dda527c-6e09-484c-9b7a-aa870c908b51@github.com> References: <2Cdgobz7FKpgudHqcu_PHt1mLjN0q03Z6ss0OtOMnaU=.5dda527c-6e09-484c-9b7a-aa870c908b51@github.com> Message-ID: On Wed, 24 Jan 2024 13:05:51 GMT, Daniel Lund?n wrote: >> This changeset translates the tests in `compiler/c2/cr7200264/` to use the IR verification framework. >> >> The proposed translation, to the extent possible, attempts to preserve the semantics of the original test. A major difference is that the IR checks are now local (for every `test_*` method) instead of global. The execution time of the new test is comparable to the old test. >> >> Testing: >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/7553846710) >> - Ran the new translated tests within all tier1 through tier10 contexts on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. >> - Tested that manually adding `-XX:LoopUnrollLimit=0` to the test framework flags caused the translated tests to fail. Note: it is, however, no longer possible to break the test by passing `-XX:LoopUnrollLimit=0` on the command line. > > Daniel Lund?n has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: > > - Updates after reviews > - Merge remote-tracking branch 'upstream/master' into test-sse2-int-vect-8291809-tmp > - Apply suggestions from code review > > Co-authored-by: Emanuel Peter > - Refactor test to use multiple @Test > - Remove TestDriver > - Readd verification > - Finalize changes > - Use static initialization block > - Experiments > - Naive translation complete > - ... and 1 more: https://git.openjdk.org/jdk/compare/9eb9fff4...1e4e74f0 LGTM, thanks for the work @dlunde ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17428#pullrequestreview-1841381587 From ddong at openjdk.org Wed Jan 24 13:42:34 2024 From: ddong at openjdk.org (Denghui Dong) Date: Wed, 24 Jan 2024 13:42:34 GMT Subject: RFR: 8324630: C1: Canonicalizer::do_LookupSwitch doesn't break the loop when the successor is found Message-ID: Hi, Please review the small change that breaks the loop in Canonicalizer::do_LookupSwitch if the successor is found. The keys of LookupSwitch are sorted, so there is no need to continue the loop once matched. Thanks. ------------- Commit messages: - 8324630: C1: Canonicalizer::do_LookupSwitch doesn't break the loop when the successor is found Changes: https://git.openjdk.org/jdk/pull/17553/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17553&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8324630 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17553.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17553/head:pull/17553 PR: https://git.openjdk.org/jdk/pull/17553 From mdoerr at openjdk.org Wed Jan 24 13:48:29 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 24 Jan 2024 13:48:29 GMT Subject: RFR: JDK-8322648: Improve class initialization barrier in TemplateTable::_new for PPC In-Reply-To: References: Message-ID: On Mon, 22 Jan 2024 12:26:51 GMT, Varada M wrote: > ppc port implementation of https://github.com/openjdk/jdk/pull/17006 > > Fastdebug and Release : build and tier1 testing successful. > > JBS Issue : [JDK-8322648](https://bugs.openjdk.org/browse/JDK-8322648) You need to adapt the succeeding code and remove the dependent `crnand` instruction. I suggest to use `cmpdi(CCR0, Rscratch, InstanceKlass::fully_initialized);`. ------------- Changes requested by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17518#pullrequestreview-1841412353 From mdoerr at openjdk.org Wed Jan 24 13:53:29 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 24 Jan 2024 13:53:29 GMT Subject: RFR: 8322649: Improve class initialization barrier in TemplateTable::_new for S390 In-Reply-To: References: Message-ID: On Thu, 18 Jan 2024 09:49:37 GMT, Amit Kumar wrote: > s390 Port implementation for https://github.com/openjdk/jdk/pull/17006, > > Testing: > Build: fastdebug + release > Test: Tier1 {fastdebug} LGTM. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17481#pullrequestreview-1841424805 From chagedorn at openjdk.org Wed Jan 24 13:53:31 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 24 Jan 2024 13:53:31 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v2] In-Reply-To: References: Message-ID: On Thu, 18 Jan 2024 13:28:17 GMT, Christian Hagedorn wrote: >> test/hotspot/jtreg/compiler/c2/cr7200264/TestIntVect.java line 510: >> >>> 508: } >>> 509: >>> 510: void test_divc(int[] a0, int[] a1) { >> >> Suggestion: >> >> // Not vectorized: no vector div. Might vectorize after JDK-8282365 (transform div to mul/add/shift). >> void test_divc(int[] a0, int[] a1) { > > I was curious about that and it actually does: > > TraceNewVectors [SuperWord]: 744 LoadVector === 380 693 677 [[ 675 671 669 664 660 658 558 554 552 145 150 154 ]] @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched #vectorx[4]:{int} !orig=[676],[559],[134] !jvms: Test::test_divc @ bci:12 (line 36) > TraceNewVectors [SuperWord]: 746 RShiftVI === _ 744 745 [[ 668 657 551 155 ]] #vectorx[4]:{int} !orig=[669],[552],[154] !jvms: Test::test_divc @ bci:15 (line 36) > TraceNewVectors [SuperWord]: 747 VectorCastI2X === _ 744 [[ 674 663 557 146 ]] #vectory[4]:{long} !orig=[675],[558],[145] !jvms: Test::test_divc @ bci:15 (line 36) > TraceNewVectors [SuperWord]: 748 Replicate === _ 144 [[ ]] #vectory[4]:{long} > TraceNewVectors [SuperWord]: 749 MulVL === _ 747 748 [[ 673 662 556 148 ]] #vectory[4]:{long} !orig=[674],[557],[146] !jvms: Test::test_divc @ bci:15 (line 36) > TraceNewVectors [SuperWord]: 751 RShiftVL === _ 749 750 [[ 672 661 555 149 ]] #vectory[4]:{long} !orig=[673],[556],[148] !jvms: Test::test_divc @ bci:15 (line 36) > TraceNewVectors [SuperWord]: 752 VectorCastL2X === _ 751 [[ 671 660 554 150 ]] #vectorx[4]:{int} !orig=[672],[555],[149] !jvms: Test::test_divc @ bci:15 (line 36) > TraceNewVectors [SuperWord]: 753 AddVI === _ 752 744 [[ 670 659 553 152 ]] #vectorx[4]:{int} !orig=[671],[554],[150] !jvms: Test::test_divc @ bci:15 (line 36) > TraceNewVectors [SuperWord]: 755 RShiftVI === _ 753 754 [[ 668 657 551 155 ]] #vectorx[4]:{int} !orig=[670],[553],[152] !jvms: Test::test_divc @ bci:15 (line 36) > TraceNewVectors [SuperWord]: 756 SubVI === _ 755 746 [[ 666 656 549 176 ]] #vectorx[4]:{int} !orig=[668],[551],[155] !jvms: Test::test_divc @ bci:15 (line 36) > TraceNewVectors [SuperWord]: 757 StoreVector === 687 693 667 756 [[ 374 693 372 179 ]] @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched Memory: @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; !orig=[666],[549],[176],575 !jvms: Test::test_divc @ bci:16 (line 36) > > I have not checked any other methods but it might indeed be possible to vectorize some them. I think it's a good idea to check all methods and add a comment with a short explanation why it's not possible or if there are plans to support vectorization in the future. All these tests look like a good collection of (seemingly good) vectorization opportunities. Thanks @eme64 for your hel... Should we also add checks for these vectors (same for `test_divc_n()`)? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17428#discussion_r1464943591 From rcastanedalo at openjdk.org Wed Jan 24 13:59:27 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 24 Jan 2024 13:59:27 GMT Subject: RFR: 8322996: BoxLockNode creation fails with assert(reg < CHUNK_SIZE) failed: sanity [v5] In-Reply-To: References: Message-ID: On Wed, 24 Jan 2024 11:18:56 GMT, Daniel Lund?n wrote: >> This changeset fixes an issue where deeply nested synchronized statements triggered an assert in C2. >> >> Changes: >> - Bail out on compilation when we create a `BoxLockNode` with a slot index that cannot fit in a `RegMask`. This is similar to how we handle the case when we do not have space to represent arguments in [`opto/matcher.cpp`](https://github.com/openjdk/jdk/blob/58b01dce054c50bcb5a28aad4c1b574acaa90f6d/src/hotspot/share/opto/matcher.cpp#L314-L318) >> - Generalize `RegMask::can_represent` to take an additional and optional size argument to facilitate reuse. The default size value, 1, corresponds to the previous functionality. Rewrite `can_represent_arg` to directly call `can_represent(reg, SlotsPerVecZ)`. >> - Add a regression test. >> >> Testing: >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/7612890820) >> - tier1 to tier5 on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. >> - The new regression test in all tier1 to tier10 contexts on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Fix incorrect package name Marked as reviewed by rcastanedalo (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/17370#pullrequestreview-1841438442 From rcastanedalo at openjdk.org Wed Jan 24 13:59:29 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 24 Jan 2024 13:59:29 GMT Subject: RFR: 8322996: BoxLockNode creation fails with assert(reg < CHUNK_SIZE) failed: sanity [v4] In-Reply-To: References: Message-ID: On Wed, 24 Jan 2024 09:50:23 GMT, Daniel Lund?n wrote: > I have experimented with such test cases (various edge cases) and as a result found a related (but separate) issue from this one. I was planning to add these additional tests for that separate issue, to not introduce unnecessary test failures before that fix is integrated. Maybe it is better to add the additional tests directly as part of this changeset instead? If the additional tests trigger failures after this fix is applied, I would suggest including them as part of the fix to the separate issue. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17370#issuecomment-1908172373 From kxu at openjdk.org Wed Jan 24 14:27:29 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 24 Jan 2024 14:27:29 GMT Subject: RFR: 8320237 C2: late inlining of method handle invoke causes duplicate lines in PrintInlining output [v2] In-Reply-To: References: Message-ID: <9cvD9MsMO1NM2tO8CmP1emKQ0gbK3TdinJIoW5t01is=.27fb231d-1a43-475c-98e6-55cf3265087b@github.com> On Thu, 18 Jan 2024 09:29:57 GMT, Roland Westrelin wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> fix VM crashes > > I think the subprocess needs to be run with `-XX:-BackgroundCompilation` otherwise there's a chance it completes before the compilation finishes and the print inlining output is produced. I've failed to recreate this failure on Windows after multiple attempts. At this time, I can only suspect it is indeed an intermittent failure caused by what @rwestrel suggested. The latest commit adds the `-XX:-BackgroundCompilation` flag which can hopefully make the test more stable. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17147#issuecomment-1908234678 From simonis at openjdk.org Wed Jan 24 14:48:52 2024 From: simonis at openjdk.org (Volker Simonis) Date: Wed, 24 Jan 2024 14:48:52 GMT Subject: RFR: 8324241: Always record evol_method deps to avoid excessive method flushing [v4] In-Reply-To: References: Message-ID: > Currently we don't record dependencies on redefined methods (i.e. `evol_method` dependencies) in JIT compiled methods if none of the `can_redefine_classes`, `can_retransform_classes` or `can_generate_breakpoint_events` JVMTI capabalities is set. This means that if a JVMTI agent which requests one of these capabilities is dynamically attached, all the methods which have been JIT compiled until that point, will be marked for deoptimization and flushed from the code cache. For large, warmed-up applications this mean deoptimization and instant recompilation of thousands if not then-thousands of methods, which can lead to dramatic performance/latency drop-downs for several minutes. > > One could argue that dynamic agent attach is now deprecated anyway (see [JEP 451: Prepare to Disallow the Dynamic Loading of Agents](https://openjdk.org/jeps/451)) and this problem could be solved by making the recording of `evol_method` dependencies dependent on the new `-XX:+EnableDynamicAgentLoading` flag isntead of the concrete JVMTI capabilities (because the presence of the flag indicates that an agent will be loaded eventually). > > But there a single, however important exception to this rule and that's JFR. JFR is advertised as low overhead profiler which can be enabled in production at any time. However, when JFR is started dynamically (e.g. through JCMD or JMX) it will silently load a HotSpot internl JVMTI agent which requests the `can_retransform_classes` and retransforms some classes. This will inevitably trigger the deoptimization of all compiled methods as described above. > > I'd therefor like to propose to *always* and unconditionally record `evol_method` dependencies in JIT compiled code by exporting the relevant properties right at startup in `init_globals()`: > ```c++ > jint init_globals() { > management_init(); > JvmtiExport::initialize_oop_storage(); > +#if INCLUDE_JVMTI > + JvmtiExport::set_can_hotswap_or_post_breakpoint(true); > + JvmtiExport::set_all_dependencies_are_recorded(true); > +#endif > > > My measurements indicate that the overhead of doing so is minimal (around 1% increase of nmethod size) and justifies the benefit. E.g. a Spring Petclinic application started with `-Xbatch -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation` compiles about ~11500 methods (~9000 with C1 and ~2500 with C2) resulting in an aggregated nmethod size of around ~40bm. Additionally recording `evol_method` dependencies only increases this size be about 400kb. The ration remains about the same i... Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: Made the flag experimental and added an assertion to set_can_hotswap_or_post_breakpoint() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17509/files - new: https://git.openjdk.org/jdk/pull/17509/files/7b750da5..29966635 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17509&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17509&range=02-03 Stats: 10 lines in 2 files changed: 8 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17509.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17509/head:pull/17509 PR: https://git.openjdk.org/jdk/pull/17509 From dlunden at openjdk.org Wed Jan 24 15:06:31 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Wed, 24 Jan 2024 15:06:31 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v2] In-Reply-To: References: Message-ID: On Wed, 24 Jan 2024 13:50:19 GMT, Christian Hagedorn wrote: >> I was curious about that and it actually does: >> >> TraceNewVectors [SuperWord]: 744 LoadVector === 380 693 677 [[ 675 671 669 664 660 658 558 554 552 145 150 154 ]] @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched #vectorx[4]:{int} !orig=[676],[559],[134] !jvms: Test::test_divc @ bci:12 (line 36) >> TraceNewVectors [SuperWord]: 746 RShiftVI === _ 744 745 [[ 668 657 551 155 ]] #vectorx[4]:{int} !orig=[669],[552],[154] !jvms: Test::test_divc @ bci:15 (line 36) >> TraceNewVectors [SuperWord]: 747 VectorCastI2X === _ 744 [[ 674 663 557 146 ]] #vectory[4]:{long} !orig=[675],[558],[145] !jvms: Test::test_divc @ bci:15 (line 36) >> TraceNewVectors [SuperWord]: 748 Replicate === _ 144 [[ ]] #vectory[4]:{long} >> TraceNewVectors [SuperWord]: 749 MulVL === _ 747 748 [[ 673 662 556 148 ]] #vectory[4]:{long} !orig=[674],[557],[146] !jvms: Test::test_divc @ bci:15 (line 36) >> TraceNewVectors [SuperWord]: 751 RShiftVL === _ 749 750 [[ 672 661 555 149 ]] #vectory[4]:{long} !orig=[673],[556],[148] !jvms: Test::test_divc @ bci:15 (line 36) >> TraceNewVectors [SuperWord]: 752 VectorCastL2X === _ 751 [[ 671 660 554 150 ]] #vectorx[4]:{int} !orig=[672],[555],[149] !jvms: Test::test_divc @ bci:15 (line 36) >> TraceNewVectors [SuperWord]: 753 AddVI === _ 752 744 [[ 670 659 553 152 ]] #vectorx[4]:{int} !orig=[671],[554],[150] !jvms: Test::test_divc @ bci:15 (line 36) >> TraceNewVectors [SuperWord]: 755 RShiftVI === _ 753 754 [[ 668 657 551 155 ]] #vectorx[4]:{int} !orig=[670],[553],[152] !jvms: Test::test_divc @ bci:15 (line 36) >> TraceNewVectors [SuperWord]: 756 SubVI === _ 755 746 [[ 666 656 549 176 ]] #vectorx[4]:{int} !orig=[668],[551],[155] !jvms: Test::test_divc @ bci:15 (line 36) >> TraceNewVectors [SuperWord]: 757 StoreVector === 687 693 667 756 [[ 374 693 372 179 ]] @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched Memory: @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; !orig=[666],[549],[176],575 !jvms: Test::test_divc @ bci:16 (line 36) >> >> I have not checked any other methods but it might indeed be possible to vectorize some them. I think it's a good idea to check all methods and add a comment with a short explanation why it's not possible or if there are plans to support vectorization in the future. All these tests look like a good collection of (seemingly good) vectorization opportuniti... > > Should we also add checks for these vectors (same for `test_divc_n()`)? @chhagedorn: Do you mean that `test_divc` and `test_divc_n` vectorize after JDK-8282365? They don't vectorize on my machine (on this PR). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17428#discussion_r1465053003 From qamai at openjdk.org Wed Jan 24 18:09:49 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 24 Jan 2024 18:09:49 GMT Subject: RFR: 8282365: Consolidate and improve division by constant idealizations [v44] In-Reply-To: References: Message-ID: On Tue, 23 Jan 2024 15:02:26 GMT, Raffaello Giulietti wrote: >> That is an excellent analysis. To add to the analysis, we do not really need the minimal value of `c`, since 2 values of `c` that both satisfy the inequations must give the same upper bits for all input values. As a result, for the purpose of the algorithm, they are equivalent. >> >> My concern is that it will complicate the analysis, which is complicated enough, for a minor improvement in the exit conditions. > > I have no idea about the timing difference with the current exit condition or with the simplified one. It might indeed be negligible. > Anyway, there's a choice now ;-) @rgiulietti Thanks a lot for your patience in reviewing this patch. Do you have any more concerns or suggestions? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1465355389 From rgiulietti at openjdk.org Wed Jan 24 18:15:56 2024 From: rgiulietti at openjdk.org (Raffaello Giulietti) Date: Wed, 24 Jan 2024 18:15:56 GMT Subject: RFR: 8282365: Consolidate and improve division by constant idealizations [v44] In-Reply-To: References: Message-ID: On Wed, 24 Jan 2024 18:06:18 GMT, Quan Anh Mai wrote: >> I have no idea about the timing difference with the current exit condition or with the simplified one. It might indeed be negligible. >> Anyway, there's a choice now ;-) > > @rgiulietti Thanks a lot for your patience in reviewing this patch. Do you have any more concerns or suggestions? I've the impression that we can replace `m < c * d <= m + m / v` with the stricter `m < c * d < m + m / v` by using `N_neg - 1` instead of `N_neg`, but I need some time to have a solid proof. That would simplify the code of the algorithm. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1465362000 From rgiulietti at openjdk.org Wed Jan 24 18:33:50 2024 From: rgiulietti at openjdk.org (Raffaello Giulietti) Date: Wed, 24 Jan 2024 18:33:50 GMT Subject: RFR: 8282365: Consolidate and improve division by constant idealizations [v44] In-Reply-To: References: Message-ID: On Wed, 24 Jan 2024 18:12:37 GMT, Raffaello Giulietti wrote: >> @rgiulietti Thanks a lot for your patience in reviewing this patch. Do you have any more concerns or suggestions? > > I've the impression that we can replace `m < c * d <= m + m / v` with the stricter `m < c * d < m + m / v` by using `N_neg - 1` instead of `N_neg`, but I need some time to have a solid proof. > > That would simplify the code of the algorithm. But IMO the current algorithm is correct. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1465383617 From mcimadamore at openjdk.org Wed Jan 24 18:51:29 2024 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Wed, 24 Jan 2024 18:51:29 GMT Subject: RFR: 8324433: Introduce a way to determine if an expression is evaluated as a constant by the Jit compiler [v6] In-Reply-To: References: Message-ID: <9SikKzxs8M1JdTLvTB6JTozvpCw2CSziF2koHw0ELAQ=.fb2e7696-6fa6-4a2b-87b8-ec57d4fef05c@github.com> On Wed, 24 Jan 2024 10:33:05 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch introduces `JitCompiler::isConstantExpression` which can be used to statically determine whether an expression has been constant-folded by the Jit compiler, leading to more constant-folding opportunities. For example, it can be used in `MemorySessionImpl::checkValidStateRaw` to eliminate the lifetime check on global sessions without imposing additional branches on other non-global sessions. This is similar to `__builtin_constant_p` in GCC and clang. >> >> Please kindly give your opinion as well as your reviews, thanks very much. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > address reviews Naive question: the right way to use this would be almost invariably be like this: if (isCompileConstant(foo) && fooHasCertainStaticProperties(foo)) { // fast-path } // slow path Right? Then the expectation is that during interpreter and C1, `isCompileConstant` always returns false, so we just never take the fast path (but we probably still pay for the branch, right?). And, when we get to C2 and this method is inlined, at this point we know that either `foo` is constant or not. If it is constant we can check other conditions on foo (which presumably is cheap because `foo` is constant) and maybe take the fast-path. In both cases, there's no branch in the generated code because we know "statically" when inlining if `foo` has the right shape or not. Correct? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17527#issuecomment-1908724632 From shade at openjdk.org Wed Jan 24 18:51:31 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 24 Jan 2024 18:51:31 GMT Subject: RFR: 8324433: Introduce a way to determine if an expression is evaluated as a constant by the Jit compiler [v5] In-Reply-To: <9iDFu8I4w_i1Uso5q7oEi0Le1JvgDNgNyuSZlmKQiuE=.5739d448-fc73-4bcf-bec8-26b3a1b75d21@github.com> References: <_pAlUJwzkoFkCnQW_IQK-zUkUMMjq6KjZoDldS34CyA=.984549da-d6de-4977-a87f-18a33d58824d@github.com> <5AWq0nDx_AQPwnEp1cMisZ6ytn2ieq9FHDwDQp5A4QQ=.5043ac3e-04bf-4fd8-a680-448f392e5cb1@github.com> <9iDFu8I4w_i1Uso5q7oEi0Le1JvgDNgNyuSZlmKQiuE=.5739d448-fc73-4bcf-bec8-26b3a1b75d21@github.com> Message-ID: On Wed, 24 Jan 2024 07:15:12 GMT, Quan Anh Mai wrote: >> This seems really weird to me for Java code. The method doesn't get the original "expression" it only gets the value of that expression after it has been evaluated. Is there some kind of weird "magic" happening here? > > @dholmes-ora Indeed it's a compiler magic, albeit not really weird. While the method execution only receives the evaluated value of `expr`, the method compilation has the expression in its original form. As a result, it can determine the result based on this information. It is still weird to talk about expressions at this level. We really check if the value is constant, like the method name suggests now. Yes, this implicitly tests that the expression that produced that value is fully constant-folded. But that's a detail that we do not need to capture here. Let's rename `expr` -> `val`, and tighten up the javadoc for the method to mention we only test the constness of the final value. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17527#discussion_r1465401456 From shade at openjdk.org Wed Jan 24 18:51:32 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 24 Jan 2024 18:51:32 GMT Subject: RFR: 8324433: Introduce a way to determine if an expression is evaluated as a constant by the Jit compiler [v6] In-Reply-To: References: Message-ID: On Wed, 24 Jan 2024 10:33:05 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch introduces `JitCompiler::isConstantExpression` which can be used to statically determine whether an expression has been constant-folded by the Jit compiler, leading to more constant-folding opportunities. For example, it can be used in `MemorySessionImpl::checkValidStateRaw` to eliminate the lifetime check on global sessions without imposing additional branches on other non-global sessions. This is similar to `__builtin_constant_p` in GCC and clang. >> >> Please kindly give your opinion as well as your reviews, thanks very much. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > address reviews src/java.base/share/classes/jdk/internal/vm/ConstantSupport.java line 32: > 30: /** > 31: * Just-in-time-compiler-related queries > 32: */ This looks like a stale comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17527#discussion_r1465397036 From mcimadamore at openjdk.org Wed Jan 24 18:54:28 2024 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Wed, 24 Jan 2024 18:54:28 GMT Subject: RFR: 8324433: Introduce a way to determine if an expression is evaluated as a constant by the Jit compiler [v6] In-Reply-To: <9SikKzxs8M1JdTLvTB6JTozvpCw2CSziF2koHw0ELAQ=.fb2e7696-6fa6-4a2b-87b8-ec57d4fef05c@github.com> References: <9SikKzxs8M1JdTLvTB6JTozvpCw2CSziF2koHw0ELAQ=.fb2e7696-6fa6-4a2b-87b8-ec57d4fef05c@github.com> Message-ID: On Wed, 24 Jan 2024 18:48:03 GMT, Maurizio Cimadamore wrote: > Naive question: the right way to use this would be almost invariably be like this: > > ``` > if (isCompileConstant(foo) && fooHasCertainStaticProperties(foo)) { > // fast-path > } > // slow path > ``` > > Right? Then the expectation is that during interpreter and C1, `isCompileConstant` always returns false, so we just never take the fast path (but we probably still pay for the branch, right?). And, when we get to C2 and this method is inlined, at this point we know that either `foo` is constant or not. If it is constant we can check other conditions on foo (which presumably is cheap because `foo` is constant) and maybe take the fast-path. In both cases, there's no branch in the generated code because we know "statically" when inlining if `foo` has the right shape or not. Correct? P.S. if this is correct, please consider adding something along those lines in the javadoc of `isCompileConstant`; as it stands it is a bit obscure to understand how this thing might be used, and what are the common pitfalls to avoid when using it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17527#issuecomment-1908729766 From shade at openjdk.org Wed Jan 24 18:58:28 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 24 Jan 2024 18:58:28 GMT Subject: RFR: 8324433: Introduce a way to determine if an expression is evaluated as a constant by the Jit compiler [v6] In-Reply-To: References: <9SikKzxs8M1JdTLvTB6JTozvpCw2CSziF2koHw0ELAQ=.fb2e7696-6fa6-4a2b-87b8-ec57d4fef05c@github.com> Message-ID: On Wed, 24 Jan 2024 18:51:27 GMT, Maurizio Cimadamore wrote: > Naive question: the right way to use this would be almost invariably be like this: > > ``` > if (isCompileConstant(foo) && fooHasCertainStaticProperties(foo)) { > // fast-path > } > // slow path > ``` > > Right? Yes, I think so. > Then the expectation is that during interpreter and C1, `isCompileConstant` always returns false, so we just never take the fast path (but we probably still pay for the branch, right?). Yes, I think so. For C1, we would still prune the "dead" path, because C1 is able to know that `if (false)` is never taken. We do pay with the branch and the method call in interpreter. (There are ways to special-case these intrinsics for interpreter too, if we choose to care.) > And, when we get to C2 and this method is inlined, at this point we know that either `foo` is constant or not. If it is constant we can check other conditions on foo (which presumably is cheap because `foo` is constant) and maybe take the fast-path. In both cases, there's no branch in the generated code because we know "statically" when inlining if `foo` has the right shape or not. Correct? Yes. I think the major use would be using `constexpr`-like code on "const" path, so that the entire "const" branch constant-folds completely. In [my experiments](https://github.com/openjdk/jdk/pull/17527#issuecomment-1906379544) with `Integer.toString` that certainly happens. But that is not a requirement, and we could probably still reap some benefits from partial constant folds; but at that point we would need to prove that a "partially const" path is better than generic "non-const" path under the same conditions. I agree it would be convenient to put some examples in javadoc. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17527#issuecomment-1908736651 From psandoz at openjdk.org Wed Jan 24 19:40:27 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Wed, 24 Jan 2024 19:40:27 GMT Subject: RFR: 8324433: Introduce a way to determine if an expression is evaluated as a constant by the Jit compiler [v5] In-Reply-To: References: <_pAlUJwzkoFkCnQW_IQK-zUkUMMjq6KjZoDldS34CyA=.984549da-d6de-4977-a87f-18a33d58824d@github.com> <5AWq0nDx_AQPwnEp1cMisZ6ytn2ieq9FHDwDQp5A4QQ=.5043ac3e-04bf-4fd8-a680-448f392e5cb1@github.com> <9iDFu8I4w_i1Uso5q7oEi0Le1JvgDNgNyuSZlmKQiuE=.5739d448-fc73-4bcf-bec8-26b3a1b75d21@github.com> Message-ID: <-msFouQp2kpWPf6LTKgbDAeLPUkfET6wVesLbAz-6T4=.54ca377c-2e49-4229-a060-daa34485eead@github.com> On Wed, 24 Jan 2024 18:48:34 GMT, Aleksey Shipilev wrote: >> @dholmes-ora Indeed it's a compiler magic, albeit not really weird. While the method execution only receives the evaluated value of `expr`, the method compilation has the expression in its original form. As a result, it can determine the result based on this information. > > It is still weird to talk about expressions at this level. We really check if the value is constant, like the method name suggests now. Yes, this implicitly tests that the expression that produced that value is fully constant-folded. But that's a detail that we do not need to capture here. Let's rename `expr` -> `val`, and tighten up the javadoc for the method to mention we only test the constness of the final value. I agree. All values are produced by evaluating expressions. In this case we want to query whether a value produced by the compiler evaluating its expression is a constant value (inputs to the expression are constants and the expression had no material side-effects). Meaning if the method returns true then we could use that knowledge in subsequent expressions that may also produce constants or some specific behavior. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17527#discussion_r1465449454 From duke at openjdk.org Wed Jan 24 23:24:48 2024 From: duke at openjdk.org (Joshua Cao) Date: Wed, 24 Jan 2024 23:24:48 GMT Subject: RFR: 8323220: Reassociate loop invariants involved in Cmps and Add/Subs [v5] In-Reply-To: References: Message-ID: > // inv1 == (x + inv2) => ( inv1 - inv2 ) == x > // inv1 == (x - inv2) => ( inv1 + inv2 ) == x > // inv1 == (inv2 - x) => (-inv1 + inv2 ) == x > > > For example, > > > fn(inv1, inv2) > while(...) > x = foobar() > if inv1 == x + inv2 > blackhole() > > > We can transform this into > > > fn(inv1, inv2) > t = inv1 - inv2 > while(...) > x = foobar() > if t == x > blackhole() > > > Here is an example: https://github.com/openjdk/jdk/blob/b78896b9aafcb15f453eaed6e154a5461581407b/src/java.base/share/classes/java/lang/invoke/LambdaFormEditor.java#L910. LHS `1` and RHS `pos` are both loop invariant > > Passes tier1 locally on Linux machine. Passes GHA on my fork. Joshua Cao has updated the pull request incrementally with one additional commit since the last revision: reassociate_add_sub -> reassociate_add_sub_cmp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17375/files - new: https://git.openjdk.org/jdk/pull/17375/files/5ea7a53a..a08df7f7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17375&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17375&range=03-04 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/17375.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17375/head:pull/17375 PR: https://git.openjdk.org/jdk/pull/17375 From cslucas at openjdk.org Wed Jan 24 23:24:52 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 24 Jan 2024 23:24:52 GMT Subject: RFR: JDK-8322854: Incorrect rematerialization of scalar replaced objects in C2 Message-ID: Current implementation of `PhaseMacroExpand::value_from_mem` returns `return _igvn.zerocon(ft);` when it hits a sentinel while searching for a memory operation on a given slice. One of the sentinels is the memory input of the allocate node origin of the memory slice. Therefore, `value_from_mem` may return `zeroconf(ft)` if `sfpt_mem` is the same memory edge used by the Allocate node origin of the memory slice being traversed. The scalar replacement implementation uses `value_from_mem` during creation of metadata describing object scalar replaced (see `PhaseMacroExpand::create_scalarized_object_description`). The `create_scalarized_object_description` method is also used as part of RAM optimization implementation. The RAM optimization targets Phi nodes and therefore a memory graph loop created by a _memory phi_ node is possible to seen as part of the transformation. See image below: This pattern doesn't show up when scalarizing objects that don't participate in allocation merges. To fix the issue I changed the code in `value_from_mem` to instead of using the _input_ memory edge of the Allocate as a stop condition, it will now use the projection memory edge of the Allocate. Tested locally on windows, mac and linux x86_64 with JTREG tier1-3 and didn't observe any regression. ------------- Commit messages: - Make value_from_mem able to detect memory loop. Changes: https://git.openjdk.org/jdk/pull/17562/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17562&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8322854 Stats: 74 lines in 2 files changed: 70 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/17562.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17562/head:pull/17562 PR: https://git.openjdk.org/jdk/pull/17562 From qamai at openjdk.org Thu Jan 25 02:59:28 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 25 Jan 2024 02:59:28 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v4] In-Reply-To: References: Message-ID: <7ZqplWMMT9Rs-UNV94VY4cXldlPbYVZ2FafssMTSRKg=.b6cc2967-0040-4452-bd6d-fa4eec2d545d@github.com> On Wed, 24 Jan 2024 04:57:10 GMT, Jasmine Karthikeyan wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> fix release build > > I took a quick look through the patch, this is really impressive :) > > Early last year I had [an attempt at the same idea](https://github.com/openjdk/jdk/compare/master...jaskarth:jdk:bit-tracking) (extremely rough patch, sorry), where I went with the approach of using a 2-bit value for each bit position to represent `0`, `1`, `BOTTOM`, and `TOP`. My general idea was to create a boolean lattice so that the meet() and dual() operations were easier to implement, before I realized how difficult reasoning about multiple constraints in the meet and dual operations was. I think your idea of marking the dual makes more sense and is cleaner, especially with how the constraints interact. @jaskarth Thanks for looking into this patch. I have tried not having an explicit `_dual` field but in the end it is too hard and cumbersome without any benefits so I end up with this approach. I will address your suggestions in the next iteration. Regarding `contains` vs `higher_equal`, it is mainly due to the fact that `contains` being a much cheaper operation while `higher_equal` will do a `meet` followed by a hash table indexing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-1909262717 From qamai at openjdk.org Thu Jan 25 03:13:27 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 25 Jan 2024 03:13:27 GMT Subject: RFR: 8324433: Introduce a way to determine if an expression is evaluated as a constant by the Jit compiler [v6] In-Reply-To: References: <9SikKzxs8M1JdTLvTB6JTozvpCw2CSziF2koHw0ELAQ=.fb2e7696-6fa6-4a2b-87b8-ec57d4fef05c@github.com> Message-ID: On Wed, 24 Jan 2024 18:56:15 GMT, Aleksey Shipilev wrote: >>> Naive question: the right way to use this would be almost invariably be like this: >>> >>> ``` >>> if (isCompileConstant(foo) && fooHasCertainStaticProperties(foo)) { >>> // fast-path >>> } >>> // slow path >>> ``` >>> >>> Right? Then the expectation is that during interpreter and C1, `isCompileConstant` always returns false, so we just never take the fast path (but we probably still pay for the branch, right?). And, when we get to C2 and this method is inlined, at this point we know that either `foo` is constant or not. If it is constant we can check other conditions on foo (which presumably is cheap because `foo` is constant) and maybe take the fast-path. In both cases, there's no branch in the generated code because we know "statically" when inlining if `foo` has the right shape or not. Correct? >> >> P.S. if this is correct, please consider adding something along those lines in the javadoc of `isCompileConstant`; as it stands it is a bit obscure to understand how this thing might be used, and what are the common pitfalls to avoid when using it. > >> Naive question: the right way to use this would be almost invariably be like this: >> >> ``` >> if (isCompileConstant(foo) && fooHasCertainStaticProperties(foo)) { >> // fast-path >> } >> // slow path >> ``` >> >> Right? > > Yes, I think so. > >> Then the expectation is that during interpreter and C1, `isCompileConstant` always returns false, so we just never take the fast path (but we probably still pay for the branch, right?). > > Yes, I think so. For C1, we would still prune the "dead" path, because C1 is able to know that `if (false)` is never taken. We do pay with the branch and the method call in interpreter. (There are ways to special-case these intrinsics for interpreter too, if we choose to care.) > >> And, when we get to C2 and this method is inlined, at this point we know that either `foo` is constant or not. If it is constant we can check other conditions on foo (which presumably is cheap because `foo` is constant) and maybe take the fast-path. In both cases, there's no branch in the generated code because we know "statically" when inlining if `foo` has the right shape or not. Correct? > > Yes. I think the major use would be using `constexpr`-like code on "const" path, so that the entire code constant-folds completely, _or_ just compiles to branch-less "generic" version. In [my experiments](https://github.com/openjdk/jdk/pull/17527#issuecomment-1906379544) with `Integer.toString` that certainly happens. But that is not a requirement, and we could probably still reap some benefits from partial constant folds; but at that point we would need to prove that a "partially const" path is better than generic "non-const" path under the same conditions. > > I agree it would be convenient to put some examples in javadoc. @merykitty, I can help you with that, if you want. @shipilev I can come up with 2 examples that are pretty generic: void checkIndex(int index, int length) { boolean indexPositive = index >= 0; if (ConstantSupport.isCompileConstant(indexPositive) && indexPositive) { if (index >= length) { throw; } return; } if (length < 0 || Integer.compareUnsigned(index, length) >= 0) { throw; } } bool equals(Point p1, Point p2) { idEqual = p1 == p2; if (ConstantSupport.isCompileConstant(idEqual) && idEqual) { return true; } return p1.x == p2.x && p1.y == p2.y; } @mcimadamore Yes I believe your expectations are correct. Pitfalls may vary case-by-case, but I just realised that since we do not have profile information in the fast path, the compiler may be less willingly to inline the callees here. While it has not been an issue, a solution I can think of is to have something like `ConstantSupport::evaluate` in which the compiler will try to inline infinitely expecting constant-folding similar to how a `constexpr` variable behaves in C++ (and maybe bail-out compilation if the final result is not a constant, too). ------------- PR Comment: https://git.openjdk.org/jdk/pull/17527#issuecomment-1909272480 From jbhateja at openjdk.org Thu Jan 25 03:14:31 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 25 Jan 2024 03:14:31 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v9] In-Reply-To: References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> <8oB-M1TUk9aqQIYOGijNmykLCyM1AUTXLTsgy4r8Wk4=.49c90c06-5f2e-47f0-9ac1-ffd6eb438fa4@github.com> Message-ID: On Tue, 23 Jan 2024 15:20:47 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: >> >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8322768 >> - Modifying comments. >> - Review comments resolution >> - Modified code comment for clarity. >> - Space fixup >> - Using emulated variable blend E-Core optimized instruction. >> - Review suggestions incorporated. >> - Review comments resolutions. >> - Updating copyright year of modified files. >> - 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. > > Ok, I'll just run the testing again, and then I will approve this :) Hi @eme64 , let us know test results. Best Regards ------------- PR Comment: https://git.openjdk.org/jdk/pull/17261#issuecomment-1909272731 From kvn at openjdk.org Thu Jan 25 03:51:27 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 25 Jan 2024 03:51:27 GMT Subject: RFR: 8322996: BoxLockNode creation fails with assert(reg < CHUNK_SIZE) failed: sanity [v5] In-Reply-To: References: Message-ID: On Wed, 24 Jan 2024 11:18:56 GMT, Daniel Lund?n wrote: >> This changeset fixes an issue where deeply nested synchronized statements triggered an assert in C2. >> >> Changes: >> - Bail out on compilation when we create a `BoxLockNode` with a slot index that cannot fit in a `RegMask`. This is similar to how we handle the case when we do not have space to represent arguments in [`opto/matcher.cpp`](https://github.com/openjdk/jdk/blob/58b01dce054c50bcb5a28aad4c1b574acaa90f6d/src/hotspot/share/opto/matcher.cpp#L314-L318) >> - Generalize `RegMask::can_represent` to take an additional and optional size argument to facilitate reuse. The default size value, 1, corresponds to the previous functionality. Rewrite `can_represent_arg` to directly call `can_represent(reg, SlotsPerVecZ)`. >> - Add a regression test. >> >> Testing: >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/7612890820) >> - tier1 to tier5 on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. >> - The new regression test in all tier1 to tier10 contexts on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Fix incorrect package name Looks good. I have one question. src/hotspot/share/opto/graphKit.cpp line 3473: > 3471: Node* box = _gvn.transform(new BoxLockNode(next_monitor())); > 3472: // Check for bailout after new BoxLockNode > 3473: if (failing()) { return nullptr; } Do all callers of `shared_lock()` checks for `failing()` or returned `nullptr`? ------------- PR Review: https://git.openjdk.org/jdk/pull/17370#pullrequestreview-1842788723 PR Review Comment: https://git.openjdk.org/jdk/pull/17370#discussion_r1465809098 From dholmes at openjdk.org Thu Jan 25 05:08:26 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 25 Jan 2024 05:08:26 GMT Subject: RFR: 8324433: Introduce a way to determine if an expression is evaluated as a constant by the Jit compiler [v5] In-Reply-To: <-msFouQp2kpWPf6LTKgbDAeLPUkfET6wVesLbAz-6T4=.54ca377c-2e49-4229-a060-daa34485eead@github.com> References: <_pAlUJwzkoFkCnQW_IQK-zUkUMMjq6KjZoDldS34CyA=.984549da-d6de-4977-a87f-18a33d58824d@github.com> <5AWq0nDx_AQPwnEp1cMisZ6ytn2ieq9FHDwDQp5A4QQ=.5043ac3e-04bf-4fd8-a680-448f392e5cb1@github.com> <9iDFu8I4w_i1Uso5q7oEi0Le1JvgDNgNyuSZlmKQiuE=.5739d448-fc73-4bcf-bec8-26b3a1b75d21@github.com> <-msFouQp2kpWPf6LTKgbDAeLPUkfET6wVesLbAz-6T4=.54ca377c-2e49-4229-a060-daa34485eead@github.com> Message-ID: On Wed, 24 Jan 2024 19:37:40 GMT, Paul Sandoz wrote: >> It is still weird to talk about expressions at this level. We really check if the value is constant, like the method name suggests now. Yes, this implicitly tests that the expression that produced that value is fully constant-folded. But that's a detail that we do not need to capture here. Let's rename `expr` -> `val`, and tighten up the javadoc for the method to mention we only test the constness of the final value. > > I agree. All values are produced by evaluating expressions. In this case we want to query whether a value produced by the compiler evaluating its expression is a constant value (inputs to the expression are constants and the expression had no material side-effects). Meaning if the method returns true then we could use that knowledge in subsequent expressions that may also produce constants or some specific behavior. > the method compilation has the expression in its original form So the JIT analyses the bytecode used to place the result on the call stack, before the call, and from that determines if the expression were a constant? This kind of self-analysis is not something I was aware of. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17527#discussion_r1465846860 From amitkumar at openjdk.org Thu Jan 25 05:39:45 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 25 Jan 2024 05:39:45 GMT Subject: RFR: 8322649: Improve class initialization barrier in TemplateTable::_new for S390 [v2] In-Reply-To: References: Message-ID: > s390 Port implementation for https://github.com/openjdk/jdk/pull/17006, > > Testing: > Build: fastdebug + release > Test: Tier1 {fastdebug} Amit Kumar has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: - Merge master - s390 port ------------- Changes: https://git.openjdk.org/jdk/pull/17481/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17481&range=01 Stats: 11 lines in 1 file changed: 0 ins; 5 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/17481.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17481/head:pull/17481 PR: https://git.openjdk.org/jdk/pull/17481 From roland at openjdk.org Thu Jan 25 07:44:28 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 25 Jan 2024 07:44:28 GMT Subject: RFR: 8324433: Introduce a way to determine if an expression is evaluated as a constant by the Jit compiler [v6] In-Reply-To: References: <9SikKzxs8M1JdTLvTB6JTozvpCw2CSziF2koHw0ELAQ=.fb2e7696-6fa6-4a2b-87b8-ec57d4fef05c@github.com> Message-ID: On Wed, 24 Jan 2024 18:56:15 GMT, Aleksey Shipilev wrote: > > Naive question: the right way to use this would be almost invariably be like this: > > ``` > > if (isCompileConstant(foo) && fooHasCertainStaticProperties(foo)) { > > // fast-path > > } > > // slow path > > ``` > > > > > > > > > > > > > > > > > > > > > > > > Right? > > Yes, I think so. But then whatever is in the fast path and `fooHasCertainStaticProperties` are never profiled because never executed by the interpreter or c1. So `fooHasCertainStaticProperties` will likely not be inlined and c2 will do a poor (or rather not as good as you'd like) job of compiling whatever is in the fast path. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17527#issuecomment-1909531426 From gcao at openjdk.org Thu Jan 25 07:51:35 2024 From: gcao at openjdk.org (Gui Cao) Date: Thu, 25 Jan 2024 07:51:35 GMT Subject: RFR: 8324125: Improve class initialization barrier in TemplateTable::_new for RISC-V Message-ID: <_TyDNsY_znk2GR9l7rpALqDOuwMwj4hkuF5kKcyJBQg=.1d9f5448-ad18-49bf-9fde-a68e88626470@github.com> Hi, This RISC-V Port implementation for https://github.com/openjdk/jdk/pull/17006, ### Testing: - [x] Run tier1-3 tests on qemu 8.1.0 with UseRVV (fastdebug) - [x] Run tier1-3 tests with SiFive unmatched (release) ------------- Commit messages: - JDK-8324125: Improve class initialization barrier in TemplateTable::_new for RISC-V Changes: https://git.openjdk.org/jdk/pull/17548/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17548&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8324125 Stats: 5 lines in 1 file changed: 0 ins; 2 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/17548.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17548/head:pull/17548 PR: https://git.openjdk.org/jdk/pull/17548 From rcastanedalo at openjdk.org Thu Jan 25 08:00:28 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 25 Jan 2024 08:00:28 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v2] In-Reply-To: References: Message-ID: On Wed, 24 Jan 2024 15:03:28 GMT, Daniel Lund?n wrote: >> Should we also add checks for these vectors (same for `test_divc_n()`)? > > @chhagedorn: Do you mean that `test_divc` and `test_divc_n` vectorize after JDK-8282365? They don't vectorize on my machine (on this PR). I just checked on my machine (on top of commit fb822e49f2a84423c8fd17db2e95bbdd5e7ec191) and these division tests do seem to vectorize, this is e.g. the innermost loop in `test_divc` right before code emission: ![test_divc](https://github.com/openjdk/jdk/assets/8792647/129d51c2-a1ad-4d02-ab81-02cd849af36f) Here are my processor features in case it helps (subset of `lscpu` output): Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 39 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 12 On-line CPU(s) list: 0-11 Vendor ID: GenuineIntel Model name: Intel(R) Core(TM) i7-9850H CPU @ 2.60GHz CPU family: 6 Model: 158 Thread(s) per core: 2 Core(s) per socket: 6 (...) Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mc a cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_ tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cp l vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid ss e4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_f ault epb invpcid_single ssbd ibrs ibpb stibp ibrs_enhan ced tpr_shadow flexpriority ept vpid ept_ad fsgsbase ts c_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed ad x smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsav es dtherm ida arat pln pts hwp hwp_notify hwp_act_windo w hwp_epp vnmi md_clear flush_l1d arch_capabilities (...) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17428#discussion_r1465971933 From wzhuo at openjdk.org Thu Jan 25 08:38:41 2024 From: wzhuo at openjdk.org (Wang Zhuo) Date: Thu, 25 Jan 2024 08:38:41 GMT Subject: RFR: 8324123: aarch64: fix prfm literal encoding in assembler [v5] In-Reply-To: References: Message-ID: > Current prfm literal mode encoding in aarch64 assembler is not correct. > The prfm_literal instruction requires 31 and 30 bits to be 0x11, while current assembler encodes the two bits to be 0x11, which is a ldr instruction, not prfm. > For example, if adding the following code in stubGenerator > __ prfm(Address(__ pc())) > we get a ldr instruction like > ldr x0, 0x0000ffff83f8539c > but it should be a prfm instruction like > prfm pldl1keep, 0x0000ffff8ff8539c > > The bug is caused in ld_st2, literal mode, bit 31 and 30 bits are set to (size & 0b01), while for prfm instructions, 31 and 30 bits must be 0b11. > void ld_st2(Register Rt, const Address &adr, int size, int op, int V = 0) { > starti; > > f(V, 26); // general reg? > zrf(Rt, 0); > > // Encoding for literal loads is done here (rather than pushed > // down into Address::encode) because the encoding of this > // instruction is too different from all of the other forms to > // make it worth sharing. > if (adr.getMode() == Address::literal) { > assert(size == 0b10 || size == 0b11, "bad operand size in ldr"); > assert(op == 0b01, "literal form can only be used with loads"); > f(**size & 0b01, 31, 30**), f(0b011, 29, 27), f(0b00, 25, 24); > int64_t offset = (adr.target() - pc()) >> 2; > sf(offset, 23, 5); > code_section()->relocate(pc(), adr.rspec()); > return; > } > > f(size, 31, 30); > f(op, 23, 22); // str > adr.encode(¤t_insn); > } Wang Zhuo has updated the pull request incrementally with one additional commit since the last revision: adding checks in prfm encoding to avoid using pre/post index ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17482/files - new: https://git.openjdk.org/jdk/pull/17482/files/7f59b473..11d46ff7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17482&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17482&range=03-04 Stats: 11 lines in 1 file changed: 8 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17482.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17482/head:pull/17482 PR: https://git.openjdk.org/jdk/pull/17482 From epeter at openjdk.org Thu Jan 25 09:18:36 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 25 Jan 2024 09:18:36 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v9] In-Reply-To: <8oB-M1TUk9aqQIYOGijNmykLCyM1AUTXLTsgy4r8Wk4=.49c90c06-5f2e-47f0-9ac1-ffd6eb438fa4@github.com> References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> <8oB-M1TUk9aqQIYOGijNmykLCyM1AUTXLTsgy4r8Wk4=.49c90c06-5f2e-47f0-9ac1-ffd6eb438fa4@github.com> Message-ID: On Tue, 23 Jan 2024 11:56:58 GMT, Jatin Bhateja wrote: >> Hi, >> >> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only targets. >> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 instruction set. >> These are very frequently used APIs in columnar database filter operation. >> >> Implementation uses a lookup table to record permute indices. Table index is computed using >> mask argument of compress/expand operation. >> >> Following are the performance number of JMH micro included with the patch. >> >> >> System : Intel(R) Xeon(R) Platinum 8480+ (Sapphire Rapids) >> >> Baseline: >> Benchmark (size) Mode Cnt Score Error Units >> ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 142.767 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 71.436 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 35.992 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 182.151 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 91.096 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 44.757 ops/ms >> ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 184.099 ops/ms >> ColumnFilterBenchmark.filterIntColumn 2047 thrpt 2 91.981 ops/ms >> ColumnFilterBenchmark.filterIntColumn 4096 thrpt 2 45.170 ops/ms >> ColumnFilterBenchmark.filterLongColumn 1024 thrpt 2 148.017 ops/ms >> ColumnFilterBenchmark.filterLongColumn 2047 thrpt 2 73.516 ops/ms >> ColumnFilterBenchmark.filterLongColumn 4096 thrpt 2 36.844 ops/ms >> >> Withopt: >> Benchmark (size) Mode Cnt Score Error Units >> ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 2051.707 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 914.072 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 489.898 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 5324.195 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 2587.229 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 1278.665 ops/ms >> ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 4149.384 ops/ms >> ColumnFilterBenchmark.filterIntColumn 2047 thrpt ... > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: > > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8322768 > - Modifying comments. > - Review comments resolution > - Modified code comment for clarity. > - Space fixup > - Using emulated variable blend E-Core optimized instruction. > - Review suggestions incorporated. > - Review comments resolutions. > - Updating copyright year of modified files. > - 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. Testing passed, looks good now :) Nice progress, the code now is simpler and much more understandable! ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17261#pullrequestreview-1843198049 From aph at openjdk.org Thu Jan 25 09:26:33 2024 From: aph at openjdk.org (Andrew Haley) Date: Thu, 25 Jan 2024 09:26:33 GMT Subject: RFR: 8324123: aarch64: fix prfm literal encoding in assembler [v5] In-Reply-To: References: Message-ID: On Thu, 25 Jan 2024 08:38:41 GMT, Wang Zhuo wrote: >> Current prfm literal mode encoding in aarch64 assembler is not correct. >> The prfm_literal instruction requires 31 and 30 bits to be 0x11, while current assembler encodes the two bits to be 0x11, which is a ldr instruction, not prfm. >> For example, if adding the following code in stubGenerator >> __ prfm(Address(__ pc())) >> we get a ldr instruction like >> ldr x0, 0x0000ffff83f8539c >> but it should be a prfm instruction like >> prfm pldl1keep, 0x0000ffff8ff8539c >> >> The bug is caused in ld_st2, literal mode, bit 31 and 30 bits are set to (size & 0b01), while for prfm instructions, 31 and 30 bits must be 0b11. >> void ld_st2(Register Rt, const Address &adr, int size, int op, int V = 0) { >> starti; >> >> f(V, 26); // general reg? >> zrf(Rt, 0); >> >> // Encoding for literal loads is done here (rather than pushed >> // down into Address::encode) because the encoding of this >> // instruction is too different from all of the other forms to >> // make it worth sharing. >> if (adr.getMode() == Address::literal) { >> assert(size == 0b10 || size == 0b11, "bad operand size in ldr"); >> assert(op == 0b01, "literal form can only be used with loads"); >> f(**size & 0b01, 31, 30**), f(0b011, 29, 27), f(0b00, 25, 24); >> int64_t offset = (adr.target() - pc()) >> 2; >> sf(offset, 23, 5); >> code_section()->relocate(pc(), adr.rspec()); >> return; >> } >> >> f(size, 31, 30); >> f(op, 23, 22); // str >> adr.encode(¤t_insn); >> } > > Wang Zhuo has updated the pull request incrementally with one additional commit since the last revision: > > adding checks in prfm encoding to avoid using pre/post index Still good. src/hotspot/cpu/aarch64/assembler_aarch64.cpp line 197: > 195: // PRFM does not support pre/post index > 196: // Passing Address with pre/post mode to ld_st2 will generate an undefined instruction. > 197: // So use guarantee to avoid pre/post mode Address operand Suggestion: src/hotspot/cpu/aarch64/assembler_aarch64.cpp line 199: > 197: // So use guarantee to avoid pre/post mode Address operand > 198: guarantee((mode != Address::pre), "prfm does not support pre index"); > 199: guarantee((mode != Address::post), "prfm does not support post index"); Suggestion: guarantee((mode != Address::pre) && (mode != Address::post), "prfm does not support pre/post indexing"); ------------- Marked as reviewed by aph (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17482#pullrequestreview-1843213993 PR Review Comment: https://git.openjdk.org/jdk/pull/17482#discussion_r1466073793 PR Review Comment: https://git.openjdk.org/jdk/pull/17482#discussion_r1466073361 From epeter at openjdk.org Thu Jan 25 09:28:33 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 25 Jan 2024 09:28:33 GMT Subject: RFR: 8323220: Reassociate loop invariants involved in Cmps and Add/Subs [v5] In-Reply-To: References: Message-ID: On Wed, 24 Jan 2024 23:24:48 GMT, Joshua Cao wrote: >> // inv1 == (x + inv2) => ( inv1 - inv2 ) == x >> // inv1 == (x - inv2) => ( inv1 + inv2 ) == x >> // inv1 == (inv2 - x) => (-inv1 + inv2 ) == x >> >> >> For example, >> >> >> fn(inv1, inv2) >> while(...) >> x = foobar() >> if inv1 == x + inv2 >> blackhole() >> >> >> We can transform this into >> >> >> fn(inv1, inv2) >> t = inv1 - inv2 >> while(...) >> x = foobar() >> if t == x >> blackhole() >> >> >> Here is an example: https://github.com/openjdk/jdk/blob/b78896b9aafcb15f453eaed6e154a5461581407b/src/java.base/share/classes/java/lang/invoke/LambdaFormEditor.java#L910. LHS `1` and RHS `pos` are both loop invariant >> >> Passes tier1 locally on Linux machine. Passes GHA on my fork. > > Joshua Cao has updated the pull request incrementally with one additional commit since the last revision: > > reassociate_add_sub -> reassociate_add_sub_cmp Tests pass, thanks for the updates. I left a few optional comments. One requiest I would still like to see: You could improve the tests, by returning a value that indicates when the loop was exited, i.e. `return i`. In the future, I intend to verify the return values from test methods, and then we would have additional coverage for free ;) src/hotspot/share/opto/loopnode.hpp line 745: > 743: // Reassociate invariant binary expressions. > 744: Node* reassociate(Node* n1, PhaseIdealLoop *phase); > 745: // Reassociate invariant add and subtract expressions. I guess you could mention cmp here too. test/hotspot/jtreg/compiler/loopopts/InvariantCodeMotionReassociateCmp.java line 2: > 1: /* > 2: * Copyright Amazon.com Inc. or its affiliates. All Rights Reserved. Does the Amazon copyright header not have a year associated with it? ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17375#pullrequestreview-1843206952 PR Review Comment: https://git.openjdk.org/jdk/pull/17375#discussion_r1466069682 PR Review Comment: https://git.openjdk.org/jdk/pull/17375#discussion_r1466071003 From epeter at openjdk.org Thu Jan 25 09:28:35 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 25 Jan 2024 09:28:35 GMT Subject: RFR: 8323220: Reassociate loop invariants involved in Cmps and Add/Subs [v5] In-Reply-To: References: Message-ID: On Thu, 25 Jan 2024 09:20:52 GMT, Emanuel Peter wrote: >> Joshua Cao has updated the pull request incrementally with one additional commit since the last revision: >> >> reassociate_add_sub -> reassociate_add_sub_cmp > > test/hotspot/jtreg/compiler/loopopts/InvariantCodeMotionReassociateCmp.java line 2: > >> 1: /* >> 2: * Copyright Amazon.com Inc. or its affiliates. All Rights Reserved. > > Does the Amazon copyright header not have a year associated with it? I guess not, I see other files without a year. Still, a bit strange. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17375#discussion_r1466072733 From fyang at openjdk.org Thu Jan 25 09:30:31 2024 From: fyang at openjdk.org (Fei Yang) Date: Thu, 25 Jan 2024 09:30:31 GMT Subject: RFR: 8324125: Improve class initialization barrier in TemplateTable::_new for RISC-V In-Reply-To: <_TyDNsY_znk2GR9l7rpALqDOuwMwj4hkuF5kKcyJBQg=.1d9f5448-ad18-49bf-9fde-a68e88626470@github.com> References: <_TyDNsY_znk2GR9l7rpALqDOuwMwj4hkuF5kKcyJBQg=.1d9f5448-ad18-49bf-9fde-a68e88626470@github.com> Message-ID: On Wed, 24 Jan 2024 09:16:09 GMT, Gui Cao wrote: > Hi, This RISC-V Port implementation for https://github.com/openjdk/jdk/pull/17006, > > ### Testing: > > - [x] Run tier1-3 tests on qemu 8.1.0 with UseRVV (fastdebug) > - [x] Run tier1-3 tests with SiFive unmatched (release) Marked as reviewed by fyang (Reviewer). src/hotspot/cpu/riscv/templateTable_riscv.cpp line 3551: > 3549: > 3550: // make sure klass is initialized > 3551: assert(VM_Version::supports_fast_class_init_checks(), "Optimization requires support for fast class initialization checks"); Nit: better to put the msg string on a separate line. ------------- PR Review: https://git.openjdk.org/jdk/pull/17548#pullrequestreview-1843223066 PR Review Comment: https://git.openjdk.org/jdk/pull/17548#discussion_r1466079862 From chagedorn at openjdk.org Thu Jan 25 09:35:29 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 25 Jan 2024 09:35:29 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v2] In-Reply-To: References: Message-ID: On Wed, 24 Jan 2024 13:50:19 GMT, Christian Hagedorn wrote: >> I was curious about that and it actually does: >> >> TraceNewVectors [SuperWord]: 744 LoadVector === 380 693 677 [[ 675 671 669 664 660 658 558 554 552 145 150 154 ]] @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched #vectorx[4]:{int} !orig=[676],[559],[134] !jvms: Test::test_divc @ bci:12 (line 36) >> TraceNewVectors [SuperWord]: 746 RShiftVI === _ 744 745 [[ 668 657 551 155 ]] #vectorx[4]:{int} !orig=[669],[552],[154] !jvms: Test::test_divc @ bci:15 (line 36) >> TraceNewVectors [SuperWord]: 747 VectorCastI2X === _ 744 [[ 674 663 557 146 ]] #vectory[4]:{long} !orig=[675],[558],[145] !jvms: Test::test_divc @ bci:15 (line 36) >> TraceNewVectors [SuperWord]: 748 Replicate === _ 144 [[ ]] #vectory[4]:{long} >> TraceNewVectors [SuperWord]: 749 MulVL === _ 747 748 [[ 673 662 556 148 ]] #vectory[4]:{long} !orig=[674],[557],[146] !jvms: Test::test_divc @ bci:15 (line 36) >> TraceNewVectors [SuperWord]: 751 RShiftVL === _ 749 750 [[ 672 661 555 149 ]] #vectory[4]:{long} !orig=[673],[556],[148] !jvms: Test::test_divc @ bci:15 (line 36) >> TraceNewVectors [SuperWord]: 752 VectorCastL2X === _ 751 [[ 671 660 554 150 ]] #vectorx[4]:{int} !orig=[672],[555],[149] !jvms: Test::test_divc @ bci:15 (line 36) >> TraceNewVectors [SuperWord]: 753 AddVI === _ 752 744 [[ 670 659 553 152 ]] #vectorx[4]:{int} !orig=[671],[554],[150] !jvms: Test::test_divc @ bci:15 (line 36) >> TraceNewVectors [SuperWord]: 755 RShiftVI === _ 753 754 [[ 668 657 551 155 ]] #vectorx[4]:{int} !orig=[670],[553],[152] !jvms: Test::test_divc @ bci:15 (line 36) >> TraceNewVectors [SuperWord]: 756 SubVI === _ 755 746 [[ 666 656 549 176 ]] #vectorx[4]:{int} !orig=[668],[551],[155] !jvms: Test::test_divc @ bci:15 (line 36) >> TraceNewVectors [SuperWord]: 757 StoreVector === 687 693 667 756 [[ 374 693 372 179 ]] @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched Memory: @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; !orig=[666],[549],[176],575 !jvms: Test::test_divc @ bci:16 (line 36) >> >> I have not checked any other methods but it might indeed be possible to vectorize some them. I think it's a good idea to check all methods and add a comment with a short explanation why it's not possible or if there are plans to support vectorization in the future. All these tests look like a good collection of (seemingly good) vectorization opportuniti... > > Should we also add checks for these vectors (same for `test_divc_n()`)? > @chhagedorn: Do you mean that `test_divc` and `test_divc_n` vectorize after JDK-8282365? They don't vectorize on my machine (on this PR). Ah, I see. Yes, as for Roberto, it does vectorize on my machine as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17428#discussion_r1466086887 From jbhateja at openjdk.org Thu Jan 25 10:10:50 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 25 Jan 2024 10:10:50 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v9] In-Reply-To: References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> <8oB-M1TUk9aqQIYOGijNmykLCyM1AUTXLTsgy4r8Wk4=.49c90c06-5f2e-47f0-9ac1-ffd6eb438fa4@github.com> Message-ID: On Thu, 25 Jan 2024 09:15:26 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: >> >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8322768 >> - Modifying comments. >> - Review comments resolution >> - Modified code comment for clarity. >> - Space fixup >> - Using emulated variable blend E-Core optimized instruction. >> - Review suggestions incorporated. >> - Review comments resolutions. >> - Updating copyright year of modified files. >> - 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. > > Testing passed, looks good now :) > Nice progress, the code now is simpler and much more understandable! Thanks @eme64 and @sviswa7 for your comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17261#issuecomment-1909805107 From jbhateja at openjdk.org Thu Jan 25 10:10:52 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 25 Jan 2024 10:10:52 GMT Subject: Integrated: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. In-Reply-To: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Message-ID: <12ztWzMqNa9AHlWy7O9fx6YWqZDbEcB7xPkBH0nYD-o=.ff640895-79f8-4a75-b1be-800a410a4c28@github.com> On Thu, 4 Jan 2024 05:28:59 GMT, Jatin Bhateja wrote: > Hi, > > Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only targets. > Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 instruction set. > These are very frequently used APIs in columnar database filter operation. > > Implementation uses a lookup table to record permute indices. Table index is computed using > mask argument of compress/expand operation. > > Following are the performance number of JMH micro included with the patch. > > > System : Intel(R) Xeon(R) Platinum 8480+ (Sapphire Rapids) > > Baseline: > Benchmark (size) Mode Cnt Score Error Units > ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 142.767 ops/ms > ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 71.436 ops/ms > ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 35.992 ops/ms > ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 182.151 ops/ms > ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 91.096 ops/ms > ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 44.757 ops/ms > ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 184.099 ops/ms > ColumnFilterBenchmark.filterIntColumn 2047 thrpt 2 91.981 ops/ms > ColumnFilterBenchmark.filterIntColumn 4096 thrpt 2 45.170 ops/ms > ColumnFilterBenchmark.filterLongColumn 1024 thrpt 2 148.017 ops/ms > ColumnFilterBenchmark.filterLongColumn 2047 thrpt 2 73.516 ops/ms > ColumnFilterBenchmark.filterLongColumn 4096 thrpt 2 36.844 ops/ms > > Withopt: > Benchmark (size) Mode Cnt Score Error Units > ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 2051.707 ops/ms > ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 914.072 ops/ms > ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 489.898 ops/ms > ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 5324.195 ops/ms > ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 2587.229 ops/ms > ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 1278.665 ops/ms > ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 4149.384 ops/ms > ColumnFilterBenchmark.filterIntColumn 2047 thrpt 2 1791.170 ops/ms > ColumnFilterBenchmark.filterIntColumn 4096... This pull request has now been integrated. Changeset: 6d36eb78 Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/6d36eb78ad781ecd80d66d1319921a8746820394 Stats: 372 lines in 10 files changed: 354 ins; 8 del; 10 mod 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. Reviewed-by: epeter, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/17261 From chagedorn at openjdk.org Thu Jan 25 10:32:49 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 25 Jan 2024 10:32:49 GMT Subject: [jdk22] RFR: 8324688: C2: Disable ReduceAllocationMerges by default Message-ID: Due to several recent bug reports after the integration of [JDK-8287061](https://bugs.openjdk.org/browse/JDK-8287061) in JDK 22 (latest one being [JDK-8322854](https://bugs.openjdk.org/browse/JDK-8322854)), we've decided together with @JohnTortugo that we want to minimize the risk and that it's best to disable `ReduceAllocationMerges` by default for JDK 22 which disables the optimizations of [JDK-8287061](https://bugs.openjdk.org/browse/JDK-8287061). Thanks, Christian ------------- Commit messages: - 8324688: C2: Disable ReduceAllocationMerges by default Changes: https://git.openjdk.org/jdk22/pull/97/files Webrev: https://webrevs.openjdk.org/?repo=jdk22&pr=97&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8324688 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk22/pull/97.diff Fetch: git fetch https://git.openjdk.org/jdk22.git pull/97/head:pull/97 PR: https://git.openjdk.org/jdk22/pull/97 From thartmann at openjdk.org Thu Jan 25 10:32:49 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 25 Jan 2024 10:32:49 GMT Subject: [jdk22] RFR: 8324688: C2: Disable ReduceAllocationMerges by default In-Reply-To: References: Message-ID: On Thu, 25 Jan 2024 10:27:37 GMT, Christian Hagedorn wrote: > Due to several recent bug reports after the integration of [JDK-8287061](https://bugs.openjdk.org/browse/JDK-8287061) in JDK 22 (latest one being [JDK-8322854](https://bugs.openjdk.org/browse/JDK-8322854)), we've decided together with @JohnTortugo that we want to minimize the risk and that it's best to disable `ReduceAllocationMerges` by default for JDK 22 which disables the optimizations of [JDK-8287061](https://bugs.openjdk.org/browse/JDK-8287061). > > Thanks, > Christian Looks good and trivial. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk22/pull/97#pullrequestreview-1843355672 From chagedorn at openjdk.org Thu Jan 25 10:36:43 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 25 Jan 2024 10:36:43 GMT Subject: [jdk22] RFR: 8324688: C2: Disable ReduceAllocationMerges by default In-Reply-To: References: Message-ID: On Thu, 25 Jan 2024 10:27:37 GMT, Christian Hagedorn wrote: > Due to several recent bug reports after the integration of [JDK-8287061](https://bugs.openjdk.org/browse/JDK-8287061) in JDK 22 (latest one being [JDK-8322854](https://bugs.openjdk.org/browse/JDK-8322854)), we've decided together with @JohnTortugo that we want to minimize the risk and that it's best to disable `ReduceAllocationMerges` by default for JDK 22 which disables the optimizations of [JDK-8287061](https://bugs.openjdk.org/browse/JDK-8287061). > > Thanks, > Christian Thanks Tobias for your review! ------------- PR Comment: https://git.openjdk.org/jdk22/pull/97#issuecomment-1909862094 From dlunden at openjdk.org Thu Jan 25 10:44:50 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 25 Jan 2024 10:44:50 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v5] In-Reply-To: References: Message-ID: > This changeset translates the tests in `compiler/c2/cr7200264/` to use the IR verification framework. > > The proposed translation, to the extent possible, attempts to preserve the semantics of the original test. A major difference is that the IR checks are now local (for every `test_*` method) instead of global. The execution time of the new test is comparable to the old test. > > Testing: > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/7553846710) > - Ran the new translated tests within all tier1 through tier10 contexts on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. > - Tested that manually adding `-XX:LoopUnrollLimit=0` to the test framework flags caused the translated tests to fail. Note: it is, however, no longer possible to break the test by passing `-XX:LoopUnrollLimit=0` on the command line. Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: Add checks to test_divc and test_divc_n ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17428/files - new: https://git.openjdk.org/jdk/pull/17428/files/1e4e74f0..cb575780 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17428&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17428&range=03-04 Stats: 16 lines in 1 file changed: 6 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/17428.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17428/head:pull/17428 PR: https://git.openjdk.org/jdk/pull/17428 From wzhuo at openjdk.org Thu Jan 25 11:34:00 2024 From: wzhuo at openjdk.org (Wang Zhuo) Date: Thu, 25 Jan 2024 11:34:00 GMT Subject: RFR: 8324123: aarch64: fix prfm literal encoding in assembler [v6] In-Reply-To: References: Message-ID: <1F918465vpiJUQ0XbadPAJLrs58TmBj3sVK5TapAWqA=.768a151b-c0ce-447d-951e-f440df83e9f1@github.com> > Current prfm literal mode encoding in aarch64 assembler is not correct. > The prfm_literal instruction requires 31 and 30 bits to be 0x11, while current assembler encodes the two bits to be 0x11, which is a ldr instruction, not prfm. > For example, if adding the following code in stubGenerator > __ prfm(Address(__ pc())) > we get a ldr instruction like > ldr x0, 0x0000ffff83f8539c > but it should be a prfm instruction like > prfm pldl1keep, 0x0000ffff8ff8539c > > The bug is caused in ld_st2, literal mode, bit 31 and 30 bits are set to (size & 0b01), while for prfm instructions, 31 and 30 bits must be 0b11. > void ld_st2(Register Rt, const Address &adr, int size, int op, int V = 0) { > starti; > > f(V, 26); // general reg? > zrf(Rt, 0); > > // Encoding for literal loads is done here (rather than pushed > // down into Address::encode) because the encoding of this > // instruction is too different from all of the other forms to > // make it worth sharing. > if (adr.getMode() == Address::literal) { > assert(size == 0b10 || size == 0b11, "bad operand size in ldr"); > assert(op == 0b01, "literal form can only be used with loads"); > f(**size & 0b01, 31, 30**), f(0b011, 29, 27), f(0b00, 25, 24); > int64_t offset = (adr.target() - pc()) >> 2; > sf(offset, 23, 5); > code_section()->relocate(pc(), adr.rspec()); > return; > } > > f(size, 31, 30); > f(op, 23, 22); // str > adr.encode(¤t_insn); > } Wang Zhuo has updated the pull request incrementally with two additional commits since the last revision: - Update assembler_aarch64.cpp, merge guarantee Co-authored-by: Andrew Haley - Update assembler_aarch64.cpp delete some comments Co-authored-by: Andrew Haley ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17482/files - new: https://git.openjdk.org/jdk/pull/17482/files/11d46ff7..62cce404 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17482&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17482&range=04-05 Stats: 4 lines in 1 file changed: 0 ins; 3 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17482.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17482/head:pull/17482 PR: https://git.openjdk.org/jdk/pull/17482 From wzhuo at openjdk.org Thu Jan 25 11:56:38 2024 From: wzhuo at openjdk.org (Wang Zhuo) Date: Thu, 25 Jan 2024 11:56:38 GMT Subject: RFR: 8324123: aarch64: fix prfm literal encoding in assembler [v4] In-Reply-To: References: Message-ID: <5qD7GIRLkOqkdf25fm48rDaHtGMdv_TkqBGsIe6iUdU=.2e90c47b-dd87-42f4-b2a2-6e7019df5c74@github.com> On Wed, 24 Jan 2024 08:42:34 GMT, Emanuel Peter wrote: >> Wang Zhuo has updated the pull request incrementally with one additional commit since the last revision: >> >> get some comments back > > src/hotspot/cpu/aarch64/assembler_aarch64.cpp line 192: > >> 190: // This encoding is similar (but not quite identical) to the encoding used >> 191: // by literal ld/st. see JDK-8324123. >> 192: // FIXME: PRFM should not be used with writeback modes, but the assembler > > FIXME: is it ok to leave this in the code? > I think we prefer filed RFE's to comments in the code that nobody will ever look at again. > You can put the RFE number in the code though. Thanks. The FIXME was there because PRFM did not support pre/post indexing and we had no check for that. Some guarantee checks were added and FIXME was removed. Please check the last 3 patches. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17482#discussion_r1466260270 From qamai at openjdk.org Thu Jan 25 11:59:52 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 25 Jan 2024 11:59:52 GMT Subject: RFR: 8282365: Consolidate and improve division by constant idealizations [v33] In-Reply-To: <0D9E-3Nj0VvCYUmIXKgMoRI7W3xioc6n5phQ_TGNHRE=.80f0ef3a-243d-4eea-9351-c407ed92b6b8@github.com> References: <0D9E-3Nj0VvCYUmIXKgMoRI7W3xioc6n5phQ_TGNHRE=.80f0ef3a-243d-4eea-9351-c407ed92b6b8@github.com> Message-ID: On Mon, 30 Oct 2023 15:38:52 GMT, Raffaello Giulietti wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 74 commits: >> >> - fix proof >> - Merge branch 'master' into unsignedDiv >> - fix assert macro, benchmarks >> - comment styles >> - disable test with Xcomp >> - remove verify >> - fix x86 test >> - more rigorous control >> - verify the effectiveness of test >> - require x64 >> - ... and 64 more: https://git.openjdk.org/jdk/compare/5224e979...529bd0f9 > > I agree this is not necessarily optimal, but it's far easier to prove and to code. > > There's a tradeoff here between simplicity of the proofs and of the code on one side, and squeezing out the last sub-nanosecond on the other side. > Hard choice! @rgiulietti Thanks very much for your reviews @vnkozlov @eme64 Could you do another round of reviews, please? There has not been much change, though. ------------- PR Comment: https://git.openjdk.org/jdk/pull/9947#issuecomment-1910035550 From epeter at openjdk.org Thu Jan 25 12:00:32 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 25 Jan 2024 12:00:32 GMT Subject: RFR: 8324123: aarch64: fix prfm literal encoding in assembler [v4] In-Reply-To: <5qD7GIRLkOqkdf25fm48rDaHtGMdv_TkqBGsIe6iUdU=.2e90c47b-dd87-42f4-b2a2-6e7019df5c74@github.com> References: <5qD7GIRLkOqkdf25fm48rDaHtGMdv_TkqBGsIe6iUdU=.2e90c47b-dd87-42f4-b2a2-6e7019df5c74@github.com> Message-ID: On Thu, 25 Jan 2024 11:54:11 GMT, Wang Zhuo wrote: >> src/hotspot/cpu/aarch64/assembler_aarch64.cpp line 192: >> >>> 190: // This encoding is similar (but not quite identical) to the encoding used >>> 191: // by literal ld/st. see JDK-8324123. >>> 192: // FIXME: PRFM should not be used with writeback modes, but the assembler >> >> FIXME: is it ok to leave this in the code? >> I think we prefer filed RFE's to comments in the code that nobody will ever look at again. >> You can put the RFE number in the code though. > > Thanks. The FIXME was there because PRFM did not support pre/post indexing and we had no check for that. > Some guarantee checks were added and FIXME was removed. > Please check the last 3 patches. Thanks @sandlerwang ! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17482#discussion_r1466264412 From epeter at openjdk.org Thu Jan 25 12:04:00 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 25 Jan 2024 12:04:00 GMT Subject: RFR: 8282365: Consolidate and improve division by constant idealizations [v33] In-Reply-To: References: <0D9E-3Nj0VvCYUmIXKgMoRI7W3xioc6n5phQ_TGNHRE=.80f0ef3a-243d-4eea-9351-c407ed92b6b8@github.com> Message-ID: On Thu, 25 Jan 2024 11:57:09 GMT, Quan Anh Mai wrote: >> I agree this is not necessarily optimal, but it's far easier to prove and to code. >> >> There's a tradeoff here between simplicity of the proofs and of the code on one side, and squeezing out the last sub-nanosecond on the other side. >> Hard choice! > > @rgiulietti Thanks very much for your reviews > @vnkozlov @eme64 Could you do another round of reviews, please? There has not been much change, though. @merykitty I see there are some proofs now, great! I'll have a look soon :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/9947#issuecomment-1910041871 From dlunden at openjdk.org Thu Jan 25 12:29:29 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 25 Jan 2024 12:29:29 GMT Subject: RFR: 8322996: BoxLockNode creation fails with assert(reg < CHUNK_SIZE) failed: sanity [v5] In-Reply-To: References: Message-ID: On Thu, 25 Jan 2024 03:47:12 GMT, Vladimir Kozlov wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix incorrect package name > > src/hotspot/share/opto/graphKit.cpp line 3473: > >> 3471: Node* box = _gvn.transform(new BoxLockNode(next_monitor())); >> 3472: // Check for bailout after new BoxLockNode >> 3473: if (failing()) { return nullptr; } > > Do all callers of `shared_lock()` checks for `failing()` or returned `nullptr`? No, not the immediate callers at least. Below is a quick call graph analysis for the places where we create `BoxLockNode`s (up to the first bailout check). Should I add returns at all points in the call chain up to the first checks? graphKit.cpp:3471 (this is in shared_lock) locknode.cpp:196 parse2.cpp:2759 parse1.cpp:1594 (Checks for bailout at parse1.cpp:1595) parse1.cpp:1264 parse1.cpp:582 (Checks for bailout at parse1.cpp:596) parse1.cpp:227 parse1.cpp:579 (Checks for bailout at parse1.cpp:596) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17370#discussion_r1466298523 From rgiulietti at openjdk.org Thu Jan 25 12:42:58 2024 From: rgiulietti at openjdk.org (Raffaello Giulietti) Date: Thu, 25 Jan 2024 12:42:58 GMT Subject: RFR: 8282365: Consolidate and improve division by constant idealizations [v46] In-Reply-To: References: Message-ID: On Sat, 20 Jan 2024 12:17:04 GMT, Quan Anh Mai wrote: >> This patch implements idealisation for unsigned divisions to change a division by a constant into a series of multiplication and shift. I also change the idealisation of `DivI` to get a more efficient series when the magic constant overflows an int32. >> >> In general, the idea behind a signed division transformation is that for a positive constant `d`, we would need to find constants `c` and `m` so that: >> >> floor(x / d) = floor(x * c / 2**m) for 0 < x < 2**(N - 1) (1) >> ceil(x / d) = floor(x * c / 2**m) + 1 for -2**(N - 1) <= x < 0 (2) >> >> The implementation in the original book takes into consideration that the machine may not be able to perform the full multiplication `x * c`, so the constant overflow and we need to add back the dividend as in `DivLNode::Ideal` cases. However, for int32 division, `x * c` cannot overflow an int64. As a result, it is always feasible to just calculate the product and shift the result. >> >> For unsigned multiplication, the situation is somewhat trickier because the condition needs to be twice as strong (the condition (1) and (2) above are mostly the same). This results in the magic constant `c` calculated based on the method presented in Hacker's Delight by Henry S. Warren, Jr. may overflow an uintN. For int division, we can depend on the theorem devised by Arch D. Robison in N-Bit Unsigned Division Via N-Bit Multiply-Add, which states that there exists either: >> >> c1 in uint32 and m1, such that floor(x / d) = floor(x * c1 / 2**m1) for 0 < x < 2**32 (3) >> c2 in uint32 and m2, such that floor(x / d) = floor((x + 1) * c2 / 2**m2) for 0 < x < 2**32 (4) >> >> which means that either `x * c1` never overflows an uint64 or `(x + 1) * c2` never overflows an uint64. And we can perform a full multiplication. >> >> For longs, there is no way to do a full multiplication so we do some basic transformations to achieve a computable formula. The details I have written as comments in the overflow case. >> >> More tests are added to cover the possible patterns. >> >> Please take a look and have some reviews. Thank you very much. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > just be simple src/hotspot/share/opto/divconstants.cpp line 126: > 124: // > 125: // c * d - rc = 2**s with 0 < rc <= d > 126: // qv * v + rv = 2**s with 0 <= rv < v To clarify the roles of these quantities, I suggest to extend the comment a bit, like so // Let r = m - floor(m / d) * d, that is, let r be the remainder of the indicated floor division. // Then // c = floor(m / d) + 1, rc = d - r. // Further // qv = floor(m / v), rv = m - floor(m / v) * d, that is, qv and rv are the quotient, // resp., the remainder of the floor division. src/hotspot/share/opto/divconstants.cpp line 130: > 128: void magic_divide_constants(T d, T N_neg, T N_pos, juint min_s, T& c, bool& c_ovf, juint& s) { > 129: static_assert(std::is_unsigned::value, "calculations must be done in the unsigned domain"); > 130: assert(!is_power_of_2(d), "this case should be handled separately"); The algorithm also works for `d` a power of 2 (except, perhaps, `d = 1`). But of course, it makes more sense to handle them separately. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1466309214 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1466313682 From mcimadamore at openjdk.org Thu Jan 25 12:55:36 2024 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Thu, 25 Jan 2024 12:55:36 GMT Subject: RFR: 8324433: Introduce a way to determine if an expression is evaluated as a constant by the Jit compiler [v6] In-Reply-To: References: <9SikKzxs8M1JdTLvTB6JTozvpCw2CSziF2koHw0ELAQ=.fb2e7696-6fa6-4a2b-87b8-ec57d4fef05c@github.com> Message-ID: On Thu, 25 Jan 2024 07:41:27 GMT, Roland Westrelin wrote: > > > Naive question: the right way to use this would be almost invariably be like this: > > > ``` > > > if (isCompileConstant(foo) && fooHasCertainStaticProperties(foo)) { > > > // fast-path > > > } > > > // slow path > > > ``` > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Right? > > > > > > Yes, I think so. > > But then whatever is in the fast path and `fooHasCertainStaticProperties` are never profiled because never executed by the interpreter or c1. So `fooHasCertainStaticProperties` will likely not be inlined and c2 will do a poor (or rather not as good as you'd like) job of compiling whatever is in the fast path. I suppose perhaps it is implied that `fooHasCertainStaticProperties` should have `@ForceInline` ? But yes, there seems to be several assumptions in how this logic is supposed to be used, and at the moment, it seems to me more of a footgun than something actually useful (but I admit my ignorance on the subject). ------------- PR Comment: https://git.openjdk.org/jdk/pull/17527#issuecomment-1910140980 From simonis at openjdk.org Thu Jan 25 13:28:40 2024 From: simonis at openjdk.org (Volker Simonis) Date: Thu, 25 Jan 2024 13:28:40 GMT Subject: RFR: 8324241: Always record evol_method deps to avoid excessive method flushing [v3] In-Reply-To: References: Message-ID: On Tue, 23 Jan 2024 19:57:14 GMT, Vladimir Ivanov wrote: >> Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: >> >> Updated option description and assertion based on review feedback > > I support keeping the logic under a flag. I have some concerns about unconditionally turning it on. > > I expect significantly higher footprint overhead when an application has plenty of tiny methods and deep inlining trees. And java.lang.invoke implementation pushes it even further (with arbitrarily deep MethodHandle trees and unconditional inlining through them), so heavy users of MethodHandle API should experience higher overheads when evol dependencies are recorded. > > I suggest to make the flag experimental. Once JFR implementation is improved, it can be superseded by `-XX:+EnableDynamicAgentLoading` check. @iwanowww , @shipilev , @dean-long any more comments or concerns? Otherwise I'd like to finish this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17509#issuecomment-1910218698 From dlunden at openjdk.org Thu Jan 25 13:41:27 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 25 Jan 2024 13:41:27 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v2] In-Reply-To: References: Message-ID: <1hgiSYOX1vJdXoKJyMqd-YfHaDqaG5qaJI8n2ZwmF08=.16fdb641-1ace-484a-a3f3-2438e9fd4cfa@github.com> On Thu, 25 Jan 2024 07:58:14 GMT, Roberto Casta?eda Lozano wrote: >> @chhagedorn: Do you mean that `test_divc` and `test_divc_n` vectorize after JDK-8282365? They don't vectorize on my machine (on this PR). > > I just checked in my machine (on top of commit fb822e49f2a84423c8fd17db2e95bbdd5e7ec191) and these division tests do seem to vectorize, this is e.g. the innermost loop in `test_divc` right before code emission: > > ![test_divc](https://github.com/openjdk/jdk/assets/8792647/129d51c2-a1ad-4d02-ab81-02cd849af36f) > > Here are my processor features in case it helps (subset of `lscpu` output): > > > Architecture: x86_64 > CPU op-mode(s): 32-bit, 64-bit > Address sizes: 39 bits physical, 48 bits virtual > Byte Order: Little Endian > CPU(s): 12 > On-line CPU(s) list: 0-11 > Vendor ID: GenuineIntel > Model name: Intel(R) Core(TM) i7-9850H CPU @ 2.60GHz > CPU family: 6 > Model: 158 > Thread(s) per core: 2 > Core(s) per socket: 6 > (...) > Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mc > a cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss > ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art > arch_perfmon pebs bts rep_good nopl xtopology nonstop_ > tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cp > l vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid ss > e4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes > xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_f > ault epb invpcid_single ssbd ibrs ibpb stibp ibrs_enhan > ced tpr_shadow flexpriority ept vpid ept_ad fsgsbase ts > c_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed ad > x smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsav > es dtherm ida arat pln pts hwp hwp_notify hwp_act_windo > w hwp_epp vnmi md_clear flush_l1d arch_capabilities > (...) Thanks for the clarification @robcasloz and @chhagedorn. I've investigated now, and they do vectorize on my machine as well. I was confused because, before the change below, the IR framework did not register the nodes (wrong vector size of 4 instead of the default of 8). Is that expected, and should we specify something else instead of the catch-all `IRNode.VECTOR_SIZE_ANY`? @Test - @IR(counts = { IRNode.ADD_VI, "> 0", - IRNode.RSHIFT_VI, "> 0", - IRNode.SUB_VI, "> 0" }, + @IR(counts = { IRNode.ADD_VI, IRNode.VECTOR_SIZE_ANY, "> 0", + IRNode.RSHIFT_VI, IRNode.VECTOR_SIZE_ANY, "> 0", + IRNode.SUB_VI, IRNode.VECTOR_SIZE_ANY, "> 0" }, applyIfCPUFeatureOr = {"sse2", "true", "asimd", "true"}) void test_divc(int[] a0, int[] a1) { for (int i = 0; i < a0.length; i+=1) { @@ -519,9 +519,9 @@ void test_divc(int[] a0, int[] a1) { } @Test - @IR(counts = { IRNode.ADD_VI, "> 0", - IRNode.RSHIFT_VI, "> 0", - IRNode.SUB_VI, "> 0" }, + @IR(counts = { IRNode.ADD_VI, IRNode.VECTOR_SIZE_ANY, "> 0", + IRNode.RSHIFT_VI, IRNode.VECTOR_SIZE_ANY, "> 0", + IRNode.SUB_VI, IRNode.VECTOR_SIZE_ANY, "> 0" }, applyIfCPUFeatureOr = {"sse2", "true", "asimd", "true"}) void test_divc_n(int[] a0, int[] a1) { for (int i = 0; i < a0.length; i+=1) { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17428#discussion_r1466392626 From qamai at openjdk.org Thu Jan 25 14:01:59 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 25 Jan 2024 14:01:59 GMT Subject: RFR: 8324433: Introduce a way to determine if an expression is evaluated as a constant by the Jit compiler [v7] In-Reply-To: References: Message-ID: > Hi, > > This patch introduces `JitCompiler::isConstantExpression` which can be used to statically determine whether an expression has been constant-folded by the Jit compiler, leading to more constant-folding opportunities. For example, it can be used in `MemorySessionImpl::checkValidStateRaw` to eliminate the lifetime check on global sessions without imposing additional branches on other non-global sessions. This is similar to `__builtin_constant_p` in GCC and clang. > > Please kindly give your opinion as well as your reviews, thanks very much. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: change expr to val, add examples ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17527/files - new: https://git.openjdk.org/jdk/pull/17527/files/b4445e2e..84f9f7eb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17527&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17527&range=05-06 Stats: 51 lines in 1 file changed: 28 ins; 4 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/17527.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17527/head:pull/17527 PR: https://git.openjdk.org/jdk/pull/17527 From qamai at openjdk.org Thu Jan 25 14:02:00 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 25 Jan 2024 14:02:00 GMT Subject: RFR: 8324433: Introduce a way to determine if an expression is evaluated as a constant by the Jit compiler [v6] In-Reply-To: References: <9SikKzxs8M1JdTLvTB6JTozvpCw2CSziF2koHw0ELAQ=.fb2e7696-6fa6-4a2b-87b8-ec57d4fef05c@github.com> Message-ID: On Thu, 25 Jan 2024 12:52:21 GMT, Maurizio Cimadamore wrote: >>> > Naive question: the right way to use this would be almost invariably be like this: >>> > ``` >>> > if (isCompileConstant(foo) && fooHasCertainStaticProperties(foo)) { >>> > // fast-path >>> > } >>> > // slow path >>> > ``` >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > Right? >>> >>> Yes, I think so. >> >> But then whatever is in the fast path and `fooHasCertainStaticProperties` are never profiled because never executed by the interpreter or c1. So `fooHasCertainStaticProperties` will likely not be inlined and c2 will do a poor (or rather not as good as you'd like) job of compiling whatever is in the fast path. > >> > > Naive question: the right way to use this would be almost invariably be like this: >> > > ``` >> > > if (isCompileConstant(foo) && fooHasCertainStaticProperties(foo)) { >> > > // fast-path >> > > } >> > > // slow path >> > > ``` >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > Right? >> > >> > >> > Yes, I think so. >> >> But then whatever is in the fast path and `fooHasCertainStaticProperties` are never profiled because never executed by the interpreter or c1. So `fooHasCertainStaticProperties` will likely not be inlined and c2 will do a poor (or rather not as good as you'd like) job of compiling whatever is in the fast path. > > I suppose perhaps it is implied that `fooHasCertainStaticProperties` should have `@ForceInline` ? But yes, there seems to be several assumptions in how this logic is supposed to be used, and at the moment, it seems to me more of a footgun than something actually useful (but I admit my ignorance on the subject). @mcimadamore Yes this is hard to use apart from the simple cases. Considering we have already used this technique in the `MethodHandle` implementation, I think there are valid use cases. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17527#issuecomment-1910271891 From qamai at openjdk.org Thu Jan 25 14:02:01 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 25 Jan 2024 14:02:01 GMT Subject: RFR: 8324433: Introduce a way to determine if an expression is evaluated as a constant by the Jit compiler [v5] In-Reply-To: References: <_pAlUJwzkoFkCnQW_IQK-zUkUMMjq6KjZoDldS34CyA=.984549da-d6de-4977-a87f-18a33d58824d@github.com> <5AWq0nDx_AQPwnEp1cMisZ6ytn2ieq9FHDwDQp5A4QQ=.5043ac3e-04bf-4fd8-a680-448f392e5cb1@github.com> <9iDFu8I4w_i1Uso5q7oEi0Le1JvgDNgNyuSZlmKQiuE=.5739d448-fc73-4bcf-bec8-26b3a1b75d21@github.com> <-msFouQp2kpWPf6LTKgbDAeLPUkfET6wVesLbAz-6T4=.54ca377c-2e49-4229-a060-daa34485eead@github.com> Message-ID: On Thu, 25 Jan 2024 05:06:12 GMT, David Holmes wrote: >> I agree. All values are produced by evaluating expressions. In this case we want to query whether a value produced by the compiler evaluating its expression is a constant value (inputs to the expression are constants and the expression had no material side-effects). Meaning if the method returns true then we could use that knowledge in subsequent expressions that may also produce constants or some specific behavior. > >> the method compilation has the expression in its original form > > So the JIT analyses the bytecode used to place the result on the call stack, before the call, and from that determines if the expression were a constant? This kind of self-analysis is not something I was aware of. I see, changed `expr` to `val`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17527#discussion_r1466418470 From rcastanedalo at openjdk.org Thu Jan 25 14:04:38 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 25 Jan 2024 14:04:38 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v2] In-Reply-To: <1hgiSYOX1vJdXoKJyMqd-YfHaDqaG5qaJI8n2ZwmF08=.16fdb641-1ace-484a-a3f3-2438e9fd4cfa@github.com> References: <1hgiSYOX1vJdXoKJyMqd-YfHaDqaG5qaJI8n2ZwmF08=.16fdb641-1ace-484a-a3f3-2438e9fd4cfa@github.com> Message-ID: On Thu, 25 Jan 2024 13:38:20 GMT, Daniel Lund?n wrote: >> I just checked in my machine (on top of commit fb822e49f2a84423c8fd17db2e95bbdd5e7ec191) and these division tests do seem to vectorize, this is e.g. the innermost loop in `test_divc` right before code emission: >> >> ![test_divc](https://github.com/openjdk/jdk/assets/8792647/129d51c2-a1ad-4d02-ab81-02cd849af36f) >> >> Here are my processor features in case it helps (subset of `lscpu` output): >> >> >> Architecture: x86_64 >> CPU op-mode(s): 32-bit, 64-bit >> Address sizes: 39 bits physical, 48 bits virtual >> Byte Order: Little Endian >> CPU(s): 12 >> On-line CPU(s) list: 0-11 >> Vendor ID: GenuineIntel >> Model name: Intel(R) Core(TM) i7-9850H CPU @ 2.60GHz >> CPU family: 6 >> Model: 158 >> Thread(s) per core: 2 >> Core(s) per socket: 6 >> (...) >> Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mc >> a cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss >> ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art >> arch_perfmon pebs bts rep_good nopl xtopology nonstop_ >> tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cp >> l vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid ss >> e4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes >> xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_f >> ault epb invpcid_single ssbd ibrs ibpb stibp ibrs_enhan >> ced tpr_shadow flexpriority ept vpid ept_ad fsgsbase ts >> c_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed ad >> x smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsav >> es dtherm ida arat pln pts hwp hwp_notify hwp_act_windo >> w hwp_epp vnmi md_clear flush_l1d arch_capabilities >> (...) > > Thanks for the clarification @robcasloz and @chhagedorn. I've investigated now, and they do vectorize on my machine as well. I was confused because, before the change below, the IR framework did not register the nodes (wrong vector size of 4 instead of the default of 8). Is that expected, and should we specify something else instead of the catch-all `IRNode.VECTOR_SIZE_ANY`? > > > @Test > - @IR(counts = { IRNode.ADD_VI, "> 0", > - IRNode.RSHIFT_VI, "> 0", > - IRNode.SUB_VI, "> 0" }, > + @IR(counts = { IRNode.ADD_VI, IRNode.VECTOR_SIZE_ANY, "> 0", > + IRNode.RSHIFT_VI, IRNode.VECTOR_SIZE_ANY, "> 0", > + IRNode.SUB_VI, IRNode.VECTOR_SIZE_ANY, "> 0" }, > applyIfCPUFeatureOr = {"sse2", "true", "asimd", "true"}) > void test_divc(int[] a0, int[] a1) { > for (int i = 0; i < a0.length; i+=1) { > @@ -519,9 +519,9 @@ void test_divc(int[] a0, int[] a1) { > } > > @Test > - @IR(counts = { IRNode.ADD_VI, "> 0", > - IRNode.RSHIFT_VI, "> 0", > - IRNode.SUB_VI, "> 0" }, > + @IR(counts = { IRNode.ADD_VI, IRNode.VECTOR_SIZE_ANY, "> 0", > + IRNode.RSHIFT_VI, IRNode.VECTOR_SIZE_ANY, "> 0", > + IRNode.SUB_VI, IRNode.VECTOR_SIZE_ANY, "> 0" }, > applyIfCPUFeatureOr = {"sse2", "true", "asimd", "true"}) > void test_divc_n(int[] a0, int[] a1) { > for (int i = 0; i < a0.length; i+=1) { In my opinion, using `IRNode.VECTOR_SIZE_ANY` as you propose is a reasonable trade-off, we just want to check that some vectorization is performed, and do not want to over-specify behavior that may be affected by subtle platform details. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17428#discussion_r1466422340 From epeter at openjdk.org Thu Jan 25 14:07:36 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 25 Jan 2024 14:07:36 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v2] In-Reply-To: <1hgiSYOX1vJdXoKJyMqd-YfHaDqaG5qaJI8n2ZwmF08=.16fdb641-1ace-484a-a3f3-2438e9fd4cfa@github.com> References: <1hgiSYOX1vJdXoKJyMqd-YfHaDqaG5qaJI8n2ZwmF08=.16fdb641-1ace-484a-a3f3-2438e9fd4cfa@github.com> Message-ID: On Thu, 25 Jan 2024 13:38:20 GMT, Daniel Lund?n wrote: >> I just checked in my machine (on top of commit fb822e49f2a84423c8fd17db2e95bbdd5e7ec191) and these division tests do seem to vectorize, this is e.g. the innermost loop in `test_divc` right before code emission: >> >> ![test_divc](https://github.com/openjdk/jdk/assets/8792647/129d51c2-a1ad-4d02-ab81-02cd849af36f) >> >> Here are my processor features in case it helps (subset of `lscpu` output): >> >> >> Architecture: x86_64 >> CPU op-mode(s): 32-bit, 64-bit >> Address sizes: 39 bits physical, 48 bits virtual >> Byte Order: Little Endian >> CPU(s): 12 >> On-line CPU(s) list: 0-11 >> Vendor ID: GenuineIntel >> Model name: Intel(R) Core(TM) i7-9850H CPU @ 2.60GHz >> CPU family: 6 >> Model: 158 >> Thread(s) per core: 2 >> Core(s) per socket: 6 >> (...) >> Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mc >> a cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss >> ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art >> arch_perfmon pebs bts rep_good nopl xtopology nonstop_ >> tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cp >> l vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid ss >> e4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes >> xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_f >> ault epb invpcid_single ssbd ibrs ibpb stibp ibrs_enhan >> ced tpr_shadow flexpriority ept vpid ept_ad fsgsbase ts >> c_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed ad >> x smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsav >> es dtherm ida arat pln pts hwp hwp_notify hwp_act_windo >> w hwp_epp vnmi md_clear flush_l1d arch_capabilities >> (...) > > Thanks for the clarification @robcasloz and @chhagedorn. I've investigated now, and they do vectorize on my machine as well. I was confused because, before the change below, the IR framework did not register the nodes (wrong vector size of 4 instead of the default of 8). Is that expected, and should we specify something else instead of the catch-all `IRNode.VECTOR_SIZE_ANY`? > > > @Test > - @IR(counts = { IRNode.ADD_VI, "> 0", > - IRNode.RSHIFT_VI, "> 0", > - IRNode.SUB_VI, "> 0" }, > + @IR(counts = { IRNode.ADD_VI, IRNode.VECTOR_SIZE_ANY, "> 0", > + IRNode.RSHIFT_VI, IRNode.VECTOR_SIZE_ANY, "> 0", > + IRNode.SUB_VI, IRNode.VECTOR_SIZE_ANY, "> 0" }, > applyIfCPUFeatureOr = {"sse2", "true", "asimd", "true"}) > void test_divc(int[] a0, int[] a1) { > for (int i = 0; i < a0.length; i+=1) { > @@ -519,9 +519,9 @@ void test_divc(int[] a0, int[] a1) { > } > > @Test > - @IR(counts = { IRNode.ADD_VI, "> 0", > - IRNode.RSHIFT_VI, "> 0", > - IRNode.SUB_VI, "> 0" }, > + @IR(counts = { IRNode.ADD_VI, IRNode.VECTOR_SIZE_ANY, "> 0", > + IRNode.RSHIFT_VI, IRNode.VECTOR_SIZE_ANY, "> 0", > + IRNode.SUB_VI, IRNode.VECTOR_SIZE_ANY, "> 0" }, > applyIfCPUFeatureOr = {"sse2", "true", "asimd", "true"}) > void test_divc_n(int[] a0, int[] a1) { > for (int i = 0; i < a0.length; i+=1) { @dlunde do you understand what factors determine the length of the vector? Why is the default of `IRNode.VECTOR_SIZE_MAX` not working? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17428#discussion_r1466426158 From rcastanedalo at openjdk.org Thu Jan 25 14:19:38 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 25 Jan 2024 14:19:38 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v2] In-Reply-To: <1hgiSYOX1vJdXoKJyMqd-YfHaDqaG5qaJI8n2ZwmF08=.16fdb641-1ace-484a-a3f3-2438e9fd4cfa@github.com> References: <1hgiSYOX1vJdXoKJyMqd-YfHaDqaG5qaJI8n2ZwmF08=.16fdb641-1ace-484a-a3f3-2438e9fd4cfa@github.com> Message-ID: On Thu, 25 Jan 2024 13:38:20 GMT, Daniel Lund?n wrote: >> I just checked in my machine (on top of commit fb822e49f2a84423c8fd17db2e95bbdd5e7ec191) and these division tests do seem to vectorize, this is e.g. the innermost loop in `test_divc` right before code emission: >> >> ![test_divc](https://github.com/openjdk/jdk/assets/8792647/129d51c2-a1ad-4d02-ab81-02cd849af36f) >> >> Here are my processor features in case it helps (subset of `lscpu` output): >> >> >> Architecture: x86_64 >> CPU op-mode(s): 32-bit, 64-bit >> Address sizes: 39 bits physical, 48 bits virtual >> Byte Order: Little Endian >> CPU(s): 12 >> On-line CPU(s) list: 0-11 >> Vendor ID: GenuineIntel >> Model name: Intel(R) Core(TM) i7-9850H CPU @ 2.60GHz >> CPU family: 6 >> Model: 158 >> Thread(s) per core: 2 >> Core(s) per socket: 6 >> (...) >> Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mc >> a cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss >> ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art >> arch_perfmon pebs bts rep_good nopl xtopology nonstop_ >> tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cp >> l vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid ss >> e4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes >> xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_f >> ault epb invpcid_single ssbd ibrs ibpb stibp ibrs_enhan >> ced tpr_shadow flexpriority ept vpid ept_ad fsgsbase ts >> c_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed ad >> x smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsav >> es dtherm ida arat pln pts hwp hwp_notify hwp_act_windo >> w hwp_epp vnmi md_clear flush_l1d arch_capabilities >> (...) > > Thanks for the clarification @robcasloz and @chhagedorn. I've investigated now, and they do vectorize on my machine as well. I was confused because, before the change below, the IR framework did not register the nodes (wrong vector size of 4 instead of the default of 8). Is that expected, and should we specify something else instead of the catch-all `IRNode.VECTOR_SIZE_ANY`? > > > @Test > - @IR(counts = { IRNode.ADD_VI, "> 0", > - IRNode.RSHIFT_VI, "> 0", > - IRNode.SUB_VI, "> 0" }, > + @IR(counts = { IRNode.ADD_VI, IRNode.VECTOR_SIZE_ANY, "> 0", > + IRNode.RSHIFT_VI, IRNode.VECTOR_SIZE_ANY, "> 0", > + IRNode.SUB_VI, IRNode.VECTOR_SIZE_ANY, "> 0" }, > applyIfCPUFeatureOr = {"sse2", "true", "asimd", "true"}) > void test_divc(int[] a0, int[] a1) { > for (int i = 0; i < a0.length; i+=1) { > @@ -519,9 +519,9 @@ void test_divc(int[] a0, int[] a1) { > } > > @Test > - @IR(counts = { IRNode.ADD_VI, "> 0", > - IRNode.RSHIFT_VI, "> 0", > - IRNode.SUB_VI, "> 0" }, > + @IR(counts = { IRNode.ADD_VI, IRNode.VECTOR_SIZE_ANY, "> 0", > + IRNode.RSHIFT_VI, IRNode.VECTOR_SIZE_ANY, "> 0", > + IRNode.SUB_VI, IRNode.VECTOR_SIZE_ANY, "> 0" }, > applyIfCPUFeatureOr = {"sse2", "true", "asimd", "true"}) > void test_divc_n(int[] a0, int[] a1) { > for (int i = 0; i < a0.length; i+=1) { > @dlunde do you understand what factors determine the length of the vector? Why is the default of IRNode.VECTOR_SIZE_MAX not working? Perhaps C2 hits the loop unrolling limit? @dlunde you can test this by trying out a large value for `-XX:LoopUnrollLimit`. But even if this turned out to be the case, I would still suggest using `IRNode.VECTOR_SIZE_ANY` rather than forcing a higher loop unroll limit value for the tests. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17428#discussion_r1466441530 From roland at openjdk.org Thu Jan 25 14:24:44 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 25 Jan 2024 14:24:44 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v5] In-Reply-To: References: Message-ID: <7JL2c1nMOZLit9ZmOPyLepIWDWeWHq85NFlDRO3LFv0=.0c3bbfa6-5ea5-4706-bea6-e74844682b87@github.com> > This change implements C2 optimizations for calls to > ScopedValue.get(). Indeed, in: > > > v1 = scopedValue.get(); > ... > v2 = scopedValue.get(); > > > `v2` can be replaced by `v1` and the second call to `get()` can be > optimized out. That's true whatever is between the 2 calls unless a > new mapping for `scopedValue` is created in between (when that happens > no optimizations is performed for the method being compiled). Hoisting > a `get()` call out of loop for a loop invariant `scopedValue` should > also be legal in most cases. > > `ScopedValue.get()` is implemented in java code as a 2 step process. A > cache is attached to the current thread object. If the `ScopedValue` > object is in the cache then the result from `get()` is read from > there. Otherwise a slow call is performed that also inserts the > mapping in the cache. The cache itself is lazily allocated. One > `ScopedValue` can be hashed to 2 different indexes in the cache. On a > cache probe, both indexes are checked. As a consequence, the process > of probing the cache is a multi step process (check if the cache is > present, check first index, check second index if first index > failed). If the cache is populated early on, then when the method that > calls `ScopedValue.get()` is compiled, profile reports the slow path > as never taken and only the read from the cache is compiled. > > To perform the optimizations, I added 3 new node types to C2: > > - the pair > ScopedValueGetHitsInCacheNode/ScopedValueGetLoadFromCacheNode for > the cache probe > > - a cfg node ScopedValueGetResultNode to help locate the result of the > `get()` call in the IR graph. > > In pseudo code, once the nodes are inserted, the code of a `get()` is: > > > hits_in_the_cache = ScopedValueGetHitsInCache(scopedValue) > if (hits_in_the_cache) { > res = ScopedValueGetLoadFromCache(hits_in_the_cache); > } else { > res = ..; //slow call possibly inlined. Subgraph can be arbitray complex > } > res = ScopedValueGetResult(res) > > > In the snippet: > > > v1 = scopedValue.get(); > ... > v2 = scopedValue.get(); > > > Replacing `v2` by `v1` is then done by starting from the > `ScopedValueGetResult` node for the second `get()` and looking for a > dominating `ScopedValueGetResult` for the same `ScopedValue` > object. When one is found, it is used as a replacement. Eliminating > the second `get()` call is achieved by making > `ScopedValueGetHitsInCache` always successful if there's a dominating > `ScopedValueGetResult` and replacing its companion > `ScopedValueGetLoadFromCache` by the dominating > `ScopedValueGetResult`. > > Hoisting a `g... Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: Update test/hotspot/jtreg/compiler/c2/irTests/TestScopedValue.java Co-authored-by: Andrey Turbanov ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16966/files - new: https://git.openjdk.org/jdk/pull/16966/files/28fa7f74..6e797117 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16966&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16966&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16966.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16966/head:pull/16966 PR: https://git.openjdk.org/jdk/pull/16966 From epeter at openjdk.org Thu Jan 25 14:33:38 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 25 Jan 2024 14:33:38 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v2] In-Reply-To: References: <1hgiSYOX1vJdXoKJyMqd-YfHaDqaG5qaJI8n2ZwmF08=.16fdb641-1ace-484a-a3f3-2438e9fd4cfa@github.com> Message-ID: On Thu, 25 Jan 2024 14:16:50 GMT, Roberto Casta?eda Lozano wrote: >> Thanks for the clarification @robcasloz and @chhagedorn. I've investigated now, and they do vectorize on my machine as well. I was confused because, before the change below, the IR framework did not register the nodes (wrong vector size of 4 instead of the default of 8). Is that expected, and should we specify something else instead of the catch-all `IRNode.VECTOR_SIZE_ANY`? >> >> >> @Test >> - @IR(counts = { IRNode.ADD_VI, "> 0", >> - IRNode.RSHIFT_VI, "> 0", >> - IRNode.SUB_VI, "> 0" }, >> + @IR(counts = { IRNode.ADD_VI, IRNode.VECTOR_SIZE_ANY, "> 0", >> + IRNode.RSHIFT_VI, IRNode.VECTOR_SIZE_ANY, "> 0", >> + IRNode.SUB_VI, IRNode.VECTOR_SIZE_ANY, "> 0" }, >> applyIfCPUFeatureOr = {"sse2", "true", "asimd", "true"}) >> void test_divc(int[] a0, int[] a1) { >> for (int i = 0; i < a0.length; i+=1) { >> @@ -519,9 +519,9 @@ void test_divc(int[] a0, int[] a1) { >> } >> >> @Test >> - @IR(counts = { IRNode.ADD_VI, "> 0", >> - IRNode.RSHIFT_VI, "> 0", >> - IRNode.SUB_VI, "> 0" }, >> + @IR(counts = { IRNode.ADD_VI, IRNode.VECTOR_SIZE_ANY, "> 0", >> + IRNode.RSHIFT_VI, IRNode.VECTOR_SIZE_ANY, "> 0", >> + IRNode.SUB_VI, IRNode.VECTOR_SIZE_ANY, "> 0" }, >> applyIfCPUFeatureOr = {"sse2", "true", "asimd", "true"}) >> void test_divc_n(int[] a0, int[] a1) { >> for (int i = 0; i < a0.length; i+=1) { > >> @dlunde do you understand what factors determine the length of the vector? Why is the default of IRNode.VECTOR_SIZE_MAX not working? > > Perhaps C2 hits the loop unrolling limit? @dlunde you can test this by trying out a large value for `-XX:LoopUnrollLimit`. But even if this turned out to be the case, I would still suggest using `IRNode.VECTOR_SIZE_ANY` rather than forcing a higher loop unroll limit value for the tests. Ah, I see what is the issue here: the loop does not just contain `int` vectors but also `long` vectors. Specifically, I see a `VectorCastI2X` and `VectorCastL2X` nodes in the loop, which converst `int` to/from `long`. Hence, if you have a `32 byte` vector, you can only have `4 long`, and so the loop-unrolling is limited to 4x. And then you only see `4 int` vectors, when you were expecting `8 int` vectors. You will probably be able to fix the issue with this: `IRNode.VECTOR_SIZE + "min(max_int, max_long)"` For more examples, check out: `grep "IRNode.VECTOR_SIZE +" test/hotspot/jtreg/ -r` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17428#discussion_r1466460916 From epeter at openjdk.org Thu Jan 25 14:37:38 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 25 Jan 2024 14:37:38 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v2] In-Reply-To: References: Message-ID: On Wed, 24 Jan 2024 13:03:15 GMT, Daniel Lund?n wrote: >> Well, I think at least some of the `shift` examples should also vectorize: >> `./java -XX:CompileCommand=compileonly,Test::test1 -XX:+TraceSuperWord -XX:+TraceLoopOpts -XX:+TraceNewVectors -XX:UseAVX=2 Test.java` >> >> Not sure if for all SSE and AVX levels, but all that I quickly checked with the UseSSE and USEAVX flags. >> >> >> TraceNewVectors [SuperWord]: 832 LoadVector === 347 766 740 [[ 738 734 731 727 619 616 518 136 ]] @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched #vectory[8]:{int} !orig=[739],[620],[519],[135] !jvms: Test::test2 @ bci:12 (line 21) >> TraceNewVectors [SuperWord]: 836 LShiftVI === _ 832 835 [[ 736 733 730 725 618 615 516 157 ]] #vectory[8]:{int} !orig=[738],[619],[518],[136] !jvms: Test::test2 @ bci:14 (line 21) >> TraceNewVectors [SuperWord]: 837 StoreVector === 763 766 737 836 [[ 341 766 160 339 ]] @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; mismatched Memory: @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact+any *, idx=5; !orig=[736],[618],[516],[157],535 !jvms: Test::test2 @ bci:15 (line 21) >> >> >> Test.java: >> >> public class Test { >> static int RANGE = 10_000; >> >> public static void main(String[] args) { >> int[] a = new int[RANGE]; >> int[] b = new int[RANGE]; >> for (int i = 0; i < 10_000; i++) { >> test1(a, b); >> test2(a, b, i % 200 - 100); >> } >> } >> >> static void test1(int[] a, int[] b) { >> for (int i = 0; i < a.length; i++) { >> a[i] = (int)(b[i] << 32); >> } >> } >> >> static void test2(int[] a, int[] b, int s) { >> for (int i = 0; i < a.length; i++) { >> a[i] = (int)(b[i] << s); >> } >> } >> } >> >> >> I also found this test in `test/hotspot/jtreg/compiler/vectorization/runner/BasicIntOpTest.java`: >> >> @Test >> @IR(applyIfCPUFeatureOr = {"asimd", "true", "sse2", "true"}, >> counts = {IRNode.LSHIFT_VI, ">0"}) >> public int[] vectorShiftLeft() { >> int[] res = new int[SIZE]; >> for (int i = 0; i < SIZE; i++) { >> res[i] = a[i] << 3; >> } >> return res; >> } >> >> >> Plus, I see `test.addExpectedVectorization("LShiftVI", 5);` in `test/hotspot/jtreg/compiler/c2/cr7200264/TestSSE2IntVect.java`, which you now deleted. >> >> @dlunde would you mind investigating a bit more if you can add some IR rules for all (or at least... > > Thanks @eme64. I've addressed all comments now; please have a look again. @dlunde Given the findings here: https://github.com/openjdk/jdk/pull/17428#discussion_r1466460916 I think you should add a IR rule on every test. And for the ones that do not currently vectorize, please add a negative IR rule, so that we can detect when that changes. For example when we implement a feature, then we can properly fix up the IR rule. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17428#issuecomment-1910334886 From roland at openjdk.org Thu Jan 25 14:44:41 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 25 Jan 2024 14:44:41 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v4] In-Reply-To: References: Message-ID: On Tue, 16 Jan 2024 18:35:37 GMT, Emanuel Peter wrote: > Personal wishlist: can you add a case where this optimization enables vectorization? Or do your optimizations happen too late for that? In its simplest form, a ScopedValue get will have a test (to check it's indeed in the cache) and a load so I don't think vectorization is possible. > test/hotspot/jtreg/compiler/c2/irTests/TestScopedValue.java line 169: > >> 167: // MyLong long2 = (MyLong)scopedValue.get(); >> 168: // return long1.getValue() + long2.getValue(); >> 169: // } > > Are you still working on this? No. I couldn't make the test work unfortunately, so I wasn't sure whether to leave the test commented out (in case someone revisits that later) or not. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-1910338270 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1466472279 From epeter at openjdk.org Thu Jan 25 14:44:42 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 25 Jan 2024 14:44:42 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v4] In-Reply-To: References: Message-ID: On Thu, 25 Jan 2024 14:36:26 GMT, Roland Westrelin wrote: > In its simplest form, a ScopedValue get will have a test (to check it's indeed in the cache) and a load so I don't think vectorization is possible. So you think this could not be predicated and hoisted out of the loop? That would also be a sad limitation ? >> test/hotspot/jtreg/compiler/c2/irTests/TestScopedValue.java line 169: >> >>> 167: // MyLong long2 = (MyLong)scopedValue.get(); >>> 168: // return long1.getValue() + long2.getValue(); >>> 169: // } >> >> Are you still working on this? > > No. I couldn't make the test work unfortunately, so I wasn't sure whether to leave the test commented out (in case someone revisits that later) or not. Maybe have the body of the test put in, and the IR-rules commented out, with a follow-up RFE for investigation, if you think there is something one can do about it? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-1910341086 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1466476164 From duke at openjdk.org Thu Jan 25 14:47:47 2024 From: duke at openjdk.org (Yuri Gaevsky) Date: Thu, 25 Jan 2024 14:47:47 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v2] In-Reply-To: References: Message-ID: > The patch adds possibility to use RVV instructions for faster vectorizedHashCode calculations on RVV v1.0.0 capable hardware. > > Testing: hotspot/jtreg/compiler/ under QEMU-8.1 with RVV v1.0.0. Yuri Gaevsky has updated the pull request incrementally with two additional commits since the last revision: - num_8b_elems_in_vec --> nof_vec_elems - Removed checks for (MaxVectorSize >= 16) per @RealFYang suggestion. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17413/files - new: https://git.openjdk.org/jdk/pull/17413/files/1d6bc62c..7ed3d86e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17413&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17413&range=00-01 Stats: 6 lines in 3 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/17413.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17413/head:pull/17413 PR: https://git.openjdk.org/jdk/pull/17413 From duke at openjdk.org Thu Jan 25 14:47:50 2024 From: duke at openjdk.org (Yuri Gaevsky) Date: Thu, 25 Jan 2024 14:47:50 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v2] In-Reply-To: References: Message-ID: On Wed, 17 Jan 2024 07:28:48 GMT, Fei Yang wrote: >> Yuri Gaevsky has updated the pull request incrementally with two additional commits since the last revision: >> >> - num_8b_elems_in_vec --> nof_vec_elems >> - Removed checks for (MaxVectorSize >= 16) per @RealFYang suggestion. > > src/hotspot/cpu/riscv/riscv_v.ad line 2681: > >> 2679: iRegLNoSp tmp4, iRegLNoSp tmp5, iRegLNoSp tmp6, rFlagsReg cr) >> 2680: %{ >> 2681: predicate(UseRVV && (MaxVectorSize >= 16)); > > Similar here: `MaxVectorSize >= 16` condition is already checked and ensured on JVM startup. Fixed. > src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5266: > >> 5264: } >> 5265: >> 5266: if (UseVectorizedHashCodeIntrinsic && UseRVV && (MaxVectorSize >= 16)) { > > I think `MaxVectorSize >= 16` condition is already checked and ensured on JVM startup when RVV extension is available. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17413#discussion_r1466480812 PR Review Comment: https://git.openjdk.org/jdk/pull/17413#discussion_r1466480645 From roland at openjdk.org Thu Jan 25 14:48:06 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 25 Jan 2024 14:48:06 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v6] In-Reply-To: References: Message-ID: <57L2s5zBxDqLiYHAPtgHZH0wBulUkcN0YNrR_kndj7k=.6badbd56-6423-4e9c-b808-93b96a2adbce@github.com> > This change implements C2 optimizations for calls to > ScopedValue.get(). Indeed, in: > > > v1 = scopedValue.get(); > ... > v2 = scopedValue.get(); > > > `v2` can be replaced by `v1` and the second call to `get()` can be > optimized out. That's true whatever is between the 2 calls unless a > new mapping for `scopedValue` is created in between (when that happens > no optimizations is performed for the method being compiled). Hoisting > a `get()` call out of loop for a loop invariant `scopedValue` should > also be legal in most cases. > > `ScopedValue.get()` is implemented in java code as a 2 step process. A > cache is attached to the current thread object. If the `ScopedValue` > object is in the cache then the result from `get()` is read from > there. Otherwise a slow call is performed that also inserts the > mapping in the cache. The cache itself is lazily allocated. One > `ScopedValue` can be hashed to 2 different indexes in the cache. On a > cache probe, both indexes are checked. As a consequence, the process > of probing the cache is a multi step process (check if the cache is > present, check first index, check second index if first index > failed). If the cache is populated early on, then when the method that > calls `ScopedValue.get()` is compiled, profile reports the slow path > as never taken and only the read from the cache is compiled. > > To perform the optimizations, I added 3 new node types to C2: > > - the pair > ScopedValueGetHitsInCacheNode/ScopedValueGetLoadFromCacheNode for > the cache probe > > - a cfg node ScopedValueGetResultNode to help locate the result of the > `get()` call in the IR graph. > > In pseudo code, once the nodes are inserted, the code of a `get()` is: > > > hits_in_the_cache = ScopedValueGetHitsInCache(scopedValue) > if (hits_in_the_cache) { > res = ScopedValueGetLoadFromCache(hits_in_the_cache); > } else { > res = ..; //slow call possibly inlined. Subgraph can be arbitray complex > } > res = ScopedValueGetResult(res) > > > In the snippet: > > > v1 = scopedValue.get(); > ... > v2 = scopedValue.get(); > > > Replacing `v2` by `v1` is then done by starting from the > `ScopedValueGetResult` node for the second `get()` and looking for a > dominating `ScopedValueGetResult` for the same `ScopedValue` > object. When one is found, it is used as a replacement. Eliminating > the second `get()` call is achieved by making > `ScopedValueGetHitsInCache` always successful if there's a dominating > `ScopedValueGetResult` and replacing its companion > `ScopedValueGetLoadFromCache` by the dominating > `ScopedValueGetResult`. > > Hoisting a `g... Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/callGenerator.cpp Co-authored-by: Emanuel Peter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16966/files - new: https://git.openjdk.org/jdk/pull/16966/files/6e797117..a24f729f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16966&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16966&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16966.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16966/head:pull/16966 PR: https://git.openjdk.org/jdk/pull/16966 From dnsimon at openjdk.org Thu Jan 25 14:48:44 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 25 Jan 2024 14:48:44 GMT Subject: RFR: 8324717: Remove HotSpotJVMCICompilerFactory Message-ID: There has been no active use of `jdk.vm.ci.hotspot.HotSpotJVMCICompilerFactory.CompilationLevelAdjustment` since [JDK-8219403](https://bugs.openjdk.org/browse/JDK-8219403) effectively [disabled](https://github.com/openjdk/jdk/commit/61f35bf898d6a0f4e7b6e514821b40efd87396dc#diff-4d3a3b7e7e12e1d5b4cf3e4677d9e0de5e9df3bbf1bbfa0d8d43d12098d67dc4R302-R306) it. Since `HotSpotJVMCICompilerFactory` exists solely for `CompilationLevelAdjustment` related logic, this PR removes the whole `HotSpotJVMCICompilerFactory` class. ------------- Commit messages: - remove unsupported HotSpotJVMCICompilerFactory class Changes: https://git.openjdk.org/jdk/pull/17570/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17570&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8324717 Stats: 100 lines in 2 files changed: 0 ins; 100 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17570.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17570/head:pull/17570 PR: https://git.openjdk.org/jdk/pull/17570 From mcimadamore at openjdk.org Thu Jan 25 14:51:36 2024 From: mcimadamore at openjdk.org (Maurizio Cimadamore) Date: Thu, 25 Jan 2024 14:51:36 GMT Subject: RFR: 8324433: Introduce a way to determine if an expression is evaluated as a constant by the Jit compiler [v6] In-Reply-To: References: <9SikKzxs8M1JdTLvTB6JTozvpCw2CSziF2koHw0ELAQ=.fb2e7696-6fa6-4a2b-87b8-ec57d4fef05c@github.com> Message-ID: On Thu, 25 Jan 2024 12:52:21 GMT, Maurizio Cimadamore wrote: >>> > Naive question: the right way to use this would be almost invariably be like this: >>> > ``` >>> > if (isCompileConstant(foo) && fooHasCertainStaticProperties(foo)) { >>> > // fast-path >>> > } >>> > // slow path >>> > ``` >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > Right? >>> >>> Yes, I think so. >> >> But then whatever is in the fast path and `fooHasCertainStaticProperties` are never profiled because never executed by the interpreter or c1. So `fooHasCertainStaticProperties` will likely not be inlined and c2 will do a poor (or rather not as good as you'd like) job of compiling whatever is in the fast path. > >> > > Naive question: the right way to use this would be almost invariably be like this: >> > > ``` >> > > if (isCompileConstant(foo) && fooHasCertainStaticProperties(foo)) { >> > > // fast-path >> > > } >> > > // slow path >> > > ``` >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > Right? >> > >> > >> > Yes, I think so. >> >> But then whatever is in the fast path and `fooHasCertainStaticProperties` are never profiled because never executed by the interpreter or c1. So `fooHasCertainStaticProperties` will likely not be inlined and c2 will do a poor (or rather not as good as you'd like) job of compiling whatever is in the fast path. > > I suppose perhaps it is implied that `fooHasCertainStaticProperties` should have `@ForceInline` ? But yes, there seems to be several assumptions in how this logic is supposed to be used, and at the moment, it seems to me more of a footgun than something actually useful (but I admit my ignorance on the subject). > @mcimadamore Yes this is hard to use apart from the simple cases. Considering we have already used this technique in the `MethodHandle` implementation, I think there are valid use cases. I don't 100% buy the `MethodHandleImpl` analogy. In that case the check is not simply used to save a branch, but to spare spinning of a completely new lambda form. That is a very heavy operation. What I'm trying to say is that I'm not too sure how robust of a mechanism this is in the context of micro(nano?)-optimizations (such as the one you are considering). ------------- PR Comment: https://git.openjdk.org/jdk/pull/17527#issuecomment-1910359911 From dnsimon at openjdk.org Thu Jan 25 14:51:36 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 25 Jan 2024 14:51:36 GMT Subject: RFR: 8324717: Remove HotSpotJVMCICompilerFactory In-Reply-To: References: Message-ID: <3fpmn387bb-nq8Ld5Ik7HK0CsEU4WmEMZm6noKT62rQ=.bcf2e742-d2b7-4edc-a495-80a0664f483a@github.com> On Thu, 25 Jan 2024 14:42:19 GMT, Doug Simon wrote: > There has been no active use of `jdk.vm.ci.hotspot.HotSpotJVMCICompilerFactory.CompilationLevelAdjustment` since [JDK-8219403](https://bugs.openjdk.org/browse/JDK-8219403) effectively [disabled](https://github.com/openjdk/jdk/commit/61f35bf898d6a0f4e7b6e514821b40efd87396dc#diff-4d3a3b7e7e12e1d5b4cf3e4677d9e0de5e9df3bbf1bbfa0d8d43d12098d67dc4R302-R306) it. Since `HotSpotJVMCICompilerFactory` exists solely for `CompilationLevelAdjustment` related logic, this PR removes the whole `HotSpotJVMCICompilerFactory` class. Graal PR to adopt this change: https://github.com/oracle/graal/pull/8252 ------------- PR Comment: https://git.openjdk.org/jdk/pull/17570#issuecomment-1910360359 From duke at openjdk.org Thu Jan 25 15:00:32 2024 From: duke at openjdk.org (Yuri Gaevsky) Date: Thu, 25 Jan 2024 15:00:32 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v2] In-Reply-To: References: Message-ID: <_XgH4VuPRDJLnIXlkoNtfdDIvvujW-zf1o3UeYlCrn8=.cb21e8dd-6160-4245-a169-4db548776aec@github.com> On Wed, 17 Jan 2024 07:56:03 GMT, Fei Yang wrote: >> Yuri Gaevsky has updated the pull request incrementally with two additional commits since the last revision: >> >> - num_8b_elems_in_vec --> nof_vec_elems >> - Removed checks for (MaxVectorSize >= 16) per @RealFYang suggestion. > > src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1603: > >> 1601: la(pows31, ExternalAddress(adr_pows31)); >> 1602: mv(t1, num_8b_elems_in_vec); >> 1603: vsetvli(t0, t1, Assembler::e32, Assembler::m4); > > I wonder if the scalar code for handling `WIDE_TAIL` could be eliminated with RVV's design for stripmining approach [1]? Looks like the current code doesn't take advantage of this design as new vl returned by `vsetvli` is not checked and used. > > [1] https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#sec-vector-config > > One of the common approaches to handling a large number of elements is "stripmining" where each iteration of > a loop handles some number of elements, and the iterations continue until all elements have been processed. > The RISC-V vector specification provides direct, portable support for this approach. The application specifies the > total number of elements to be processed (the application vector length or AVL) as a candidate value for vl, and > the hardware responds via a general-purpose register with the (frequently smaller) number of elements that the > hardware will handle per iteration (stored in vl), based on the microarchitectural implementation and the vtype > setting. A straightforward loop structure, shown in [Example of stripmining and changes to SEW] > (https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#example-stripmine-sew), depicts the ease with > which the code keeps track of the remaining number of elements and the amount per iteration handled by hardware. Thank you for your comments, @RealFYang. I have tried to use vector instructions (m4 ==> m2) for the tail calculations but that makes the perfromance numbers only worse. :-( I've made additional measurements with more granularity: [ -XX:-UseRVV ] [-XX:+UseRVV } ArraysHashCode.multiints 10 avgt 30 12.460 ? 0.155 13.836 ? 0.054 ns/op ArraysHashCode.multiints 11 avgt 30 14.541 ? 0.140 14.613 ? 0.084 ns/op ArraysHashCode.multiints 12 avgt 30 15.097 ? 0.052 15.517 ? 0.097 ns/op ArraysHashCode.multiints 13 avgt 30 13.632 ? 0.137 14.486 ? 0.181 ns/op ArraysHashCode.multiints 14 avgt 30 15.771 ? 0.108 16.153 ? 0.092 ns/op ArraysHashCode.multiints 15 avgt 30 14.726 ? 0.088 15.930 ? 0.077 ns/op ArraysHashCode.multiints 16 avgt 30 15.533 ? 0.067 15.496 ? 0.083 ns/op ArraysHashCode.multiints 17 avgt 30 15.875 ? 0.173 16.878 ? 0.172 ns/op ArraysHashCode.multiints 18 avgt 30 15.740 ? 0.114 16.465 ? 0.089 ns/op ArraysHashCode.multiints 19 avgt 30 17.252 ? 0.051 17.628 ? 0.155 ns/op ArraysHashCode.multiints 20 avgt 30 20.193 ? 0.282 19.039 ? 0.441 ns/op ArraysHashCode.multiints 25 avgt 30 20.209 ? 0.070 20.513 ? 0.071 ns/op ArraysHashCode.multiints 30 avgt 30 23.157 ? 0.068 23.290 ? 0.165 ns/op ArraysHashCode.multiints 35 avgt 30 28.671 ? 0.116 26.198 ? 0.127 ns/op <--- ArraysHashCode.multiints 40 avgt 30 30.992 ? 0.068 27.342 ? 0.072 ns/op ArraysHashCode.multiints 45 avgt 30 39.408 ? 1.428 32.170 ? 0.230 ns/op ArraysHashCode.multiints 50 avgt 30 41.976 ? 0.442 33.103 ? 0.090 ns/op ArraysHashCode.multiints 55 avgt 30 45.379 ? 0.236 35.899 ? 0.692 ns/op ArraysHashCode.multiints 60 avgt 30 48.615 ? 0.249 35.709 ? 0.477 ns/op ArraysHashCode.multiints 65 avgt 30 51.455 ? 0.213 38.275 ? 0.266 ns/op ArraysHashCode.multiints 70 avgt 30 54.032 ? 0.324 37.985 ? 0.264 ns/op ArraysHashCode.multiints 75 avgt 30 56.759 ? 0.164 39.446 ? 0.425 ns/op ArraysHashCode.multiints 80 avgt 30 61.334 ? 0.267 41.521 ? 0.310 ns/op ArraysHashCode.multiints 85 avgt 30 66.177 ? 0.299 44.136 ? 0.407 ns/op ArraysHashCode.multiints 90 avgt 30 67.444 ? 0.282 42.909 ? 0.275 ns/op ArraysHashCode.multiints 95 avgt 30 77.312 ? 0.303 49.078 ? 1.166 ns/op ArraysHashCode.multiints 100 avgt 30 78.405 ? 0.220 47.499 ? 0.553 ns/op ArraysHashCode.multiints 105 avgt 30 75.706 ? 0.265 46.029 ? 0.579 ns/op As you can see the numbers become better with +UseRVV only after length >= 30 and perhaps that can explain why my attempt to improve the tail with RVV instructions was unsuccessful - the cost of setting up Vector Unit for small lengths is to high. :-( ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17413#discussion_r1466499576 From roland at openjdk.org Thu Jan 25 15:02:37 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 25 Jan 2024 15:02:37 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v4] In-Reply-To: References: Message-ID: On Thu, 25 Jan 2024 14:37:58 GMT, Emanuel Peter wrote: > > In its simplest form, a ScopedValue get will have a test (to check it's indeed in the cache) and a load so I don't think vectorization is possible. > > So you think this could not be predicated and hoisted out of the loop? That would also be a sad limitation ? Even if that was possible, ScopedValue get loads are from the cache indexed by a hash stored as a field in the ScopedValue object. I'm not sure how you would be able to tell which of the loads from several get() calls are contiguous in memory. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-1910382083 From dlunden at openjdk.org Thu Jan 25 15:32:36 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 25 Jan 2024 15:32:36 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v2] In-Reply-To: References: <1hgiSYOX1vJdXoKJyMqd-YfHaDqaG5qaJI8n2ZwmF08=.16fdb641-1ace-484a-a3f3-2438e9fd4cfa@github.com> Message-ID: On Thu, 25 Jan 2024 14:30:41 GMT, Emanuel Peter wrote: >>> @dlunde do you understand what factors determine the length of the vector? Why is the default of IRNode.VECTOR_SIZE_MAX not working? >> >> Perhaps C2 hits the loop unrolling limit? @dlunde you can test this by trying out a large value for `-XX:LoopUnrollLimit`. But even if this turned out to be the case, I would still suggest using `IRNode.VECTOR_SIZE_ANY` rather than forcing a higher loop unroll limit value for the tests. > > Ah, I see what is the issue here: the loop does not just contain `int` vectors but also `long` vectors. > Specifically, I see a `VectorCastI2X` and `VectorCastL2X` nodes in the loop, which converst `int` to/from `long`. > Hence, if you have a `32 byte` vector, you can only have `4 long`, and so the loop-unrolling is limited to 4x. > And then you only see `4 int` vectors, when you were expecting `8 int` vectors. > > You will probably be able to fix the issue with this: > `IRNode.VECTOR_SIZE + "min(max_int, max_long)"` > > For more examples, check out: > `grep "IRNode.VECTOR_SIZE +" test/hotspot/jtreg/ -r` @eme64: Thanks, I'll check that `IRNode.VECTOR_SIZE + "min(max_int, max_long)"` works and use that. @robcasloz: Thanks for the good idea; it indeed had to do with an unrolling limit. Just to clarify regarding `IRNode.VECTOR_SIZE_ANY` (after discussions with @eme64): I also thought `IRNode.VECTOR_SIZE_ANY` seemed like a good solution, but learned that `IRNode.VECTOR_SIZE_MAX` is the default because less than max vectorization usually indicates a problem. I may be wrong, but I believe `IRNode.VECTOR_SIZE_MAX` also varies depending on the platform. Therefore, I'll go with what Emanuel suggests for the `test_divc` and `test_divc_n` IR checks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17428#discussion_r1466545244 From shade at openjdk.org Thu Jan 25 15:36:39 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 25 Jan 2024 15:36:39 GMT Subject: RFR: 8324433: Introduce a way to determine if an expression is evaluated as a constant by the Jit compiler [v6] In-Reply-To: References: <9SikKzxs8M1JdTLvTB6JTozvpCw2CSziF2koHw0ELAQ=.fb2e7696-6fa6-4a2b-87b8-ec57d4fef05c@github.com> Message-ID: On Thu, 25 Jan 2024 14:48:16 GMT, Maurizio Cimadamore wrote: > I don't 100% buy the `MethodHandleImpl` analogy. In that case the check is not simply used to save a branch, but to spare spinning of a completely new lambda form. Doing this to save a few intructions would not likely to worth the hassle outside the _really performance critical paths_, but even then it might be useful for hot JDK code. On larger examples, you can avoid memory accesses, allocations, etc. by coding up the constant-foldable path that you know compiler would not be able to extract when propagating constants through the generic code. For example, giving quantitative substance to my previous example: diff --git a/src/java.base/share/classes/java/lang/Integer.java b/src/java.base/share/classes/java/lang/Integer.java index 1c5b3c414ba..d50748c369e 100644 --- a/src/java.base/share/classes/java/lang/Integer.java +++ b/src/java.base/share/classes/java/lang/Integer.java @@ -28,4 +28,5 @@ import jdk.internal.misc.CDS; import jdk.internal.misc.VM; +import jdk.internal.vm.ConstantSupport; import jdk.internal.vm.annotation.ForceInline; import jdk.internal.vm.annotation.IntrinsicCandidate; @@ -416,4 +417,7 @@ private static void formatUnsignedIntUTF16(int val, int shift, byte[] buf, int l } + @Stable + static final String[] TO_STRINGS = { "-1", "0", "1" }; + /** * Returns a {@code String} object representing the @@ -428,4 +432,8 @@ private static void formatUnsignedIntUTF16(int val, int shift, byte[] buf, int l @IntrinsicCandidate public static String toString(int i) { + if (ConstantSupport.isCompileConstant(i) && + (i >= -1) && (i <= 1)) { + return TO_STRINGS[i + 1]; + } int size = stringSize(i); if (COMPACT_STRINGS) { diff --git a/test/micro/org/openjdk/bench/java/lang/Integers.java b/test/micro/org/openjdk/bench/java/lang/Integers.java index 43ceb5d18d2..28248593a73 100644 --- a/test/micro/org/openjdk/bench/java/lang/Integers.java +++ b/test/micro/org/openjdk/bench/java/lang/Integers.java @@ -91,4 +91,18 @@ public void decode(Blackhole bh) { } + @Benchmark + @OutputTimeUnit(TimeUnit.NANOSECONDS) + public String toStringConstYay() { + return Integer.toString(0); + } + + int v = 0; + + @Benchmark + @OutputTimeUnit(TimeUnit.NANOSECONDS) + public String toStringConstNope() { + return Integer.toString(v); + } + /** Performs toString on small values, just a couple of digits. */ @Benchmark Benchmark (size) Mode Cnt Score Error Units Integers.toStringConstNope 500 avgt 15 3,599 ? 0,034 ns/op Integers.toStringConstNope:gc.alloc.rate.norm 500 avgt 15 48,000 ? 0,001 B/op Integers.toStringConstNope:gc.time 500 avgt 15 223,000 ms Integers.toStringConstYay 500 avgt 15 0,568 ? 0,046 ns/op Integers.toStringConstYay:gc.alloc.rate.norm 500 avgt 15 ? 10?? B/op Think about it as simplifying/avoiding the need for full compiler intrinsics. I could, in principle, do this by intrinsifying `Integer.toString` completely, check the same `isCon`, and then either construct the access to some String constant, or arrange the call to actual toString slow path. That would not be as simple as doing the similar thing in plain Java, with just a little of compiler support in form of `ConstantSupport`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17527#issuecomment-1910449450 From dlunden at openjdk.org Thu Jan 25 15:38:38 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 25 Jan 2024 15:38:38 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v2] In-Reply-To: References: Message-ID: On Thu, 25 Jan 2024 14:34:27 GMT, Emanuel Peter wrote: >> Thanks @eme64. I've addressed all comments now; please have a look again. > > @dlunde Given the findings here: https://github.com/openjdk/jdk/pull/17428#discussion_r1466460916 > I think you should add a IR rule on every test. > And for the ones that do not currently vectorize, please add a negative IR rule, so that we can detect when that changes. > For example when we implement a feature, then we can properly fix up the IR rule. @eme64 > @dlunde Given the findings here: [#17428 (comment)](https://github.com/openjdk/jdk/pull/17428#discussion_r1466460916) I think you should add a IR rule on every test. I don't quite understand, what IR rule do you want me to add for every test? I'll of course add what we discussed for the problematic `test_divc` and `test_divc_n`. > And for the ones that do not currently vectorize, please add a negative IR rule, so that we can detect when that changes. For example when we implement a feature, then we can properly fix up the IR rule. Sure, I'll add negative rules for tests that do not currently vectorize (at all). ------------- PR Comment: https://git.openjdk.org/jdk/pull/17428#issuecomment-1910452632 From gcao at openjdk.org Thu Jan 25 15:40:51 2024 From: gcao at openjdk.org (Gui Cao) Date: Thu, 25 Jan 2024 15:40:51 GMT Subject: RFR: 8324125: Improve class initialization barrier in TemplateTable::_new for RISC-V [v2] In-Reply-To: <_TyDNsY_znk2GR9l7rpALqDOuwMwj4hkuF5kKcyJBQg=.1d9f5448-ad18-49bf-9fde-a68e88626470@github.com> References: <_TyDNsY_znk2GR9l7rpALqDOuwMwj4hkuF5kKcyJBQg=.1d9f5448-ad18-49bf-9fde-a68e88626470@github.com> Message-ID: > Hi, This RISC-V Port implementation for https://github.com/openjdk/jdk/pull/17006, > > ### Testing: > > - [x] Run tier1-3 tests on qemu 8.1.0 with UseRVV (fastdebug) > - [x] Run tier1-3 tests with SiFive unmatched (release) Gui Cao has updated the pull request incrementally with one additional commit since the last revision: Put the msg string on a separate line ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17548/files - new: https://git.openjdk.org/jdk/pull/17548/files/ae736225..0935a0b7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17548&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17548&range=00-01 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17548.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17548/head:pull/17548 PR: https://git.openjdk.org/jdk/pull/17548 From epeter at openjdk.org Thu Jan 25 15:59:30 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 25 Jan 2024 15:59:30 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v4] In-Reply-To: References: Message-ID: On Thu, 25 Jan 2024 14:59:52 GMT, Roland Westrelin wrote: > > > In its simplest form, a ScopedValue get will have a test (to check it's indeed in the cache) and a load so I don't think vectorization is possible. > > > > > > So you think this could not be predicated and hoisted out of the loop? That would also be a sad limitation ? > > Even if that was possible, ScopedValue get loads are from the cache indexed by a hash stored as a field in the ScopedValue object. I'm not sure how you would be able to tell which of the loads from several get() calls are contiguous in memory. A simple example would be to add/multiply it to every element in an array. Imagine we stream over a container (some array or memory segment). And some "map" method is applied to every element, which adds in the scoped value. If everything is inlined, then this might come to be a loop, where inside the loop we add every element to the `scopedValue.get()`, and we would hope the `get` floats out of the loop completely, so we can broadcast that value, and vectorize the loop. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-1910493248 From roland at openjdk.org Thu Jan 25 16:25:40 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 25 Jan 2024 16:25:40 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v4] In-Reply-To: References: Message-ID: On Wed, 17 Jan 2024 10:59:07 GMT, Emanuel Peter wrote: > And what is the argument for why they always succeed? Because of the way the cache is constructed: theCache = new Object[CACHE_TABLE_SIZE * 2]; setScopedValueCache(theCache); ``` and CACHE_TABLE_SIZE = cacheSize; SLOT_MASK = cacheSize - 1; accessed with: int n = (hash & Cache.SLOT_MASK) * 2; if (objects[n] == this) { return (T)objects[n + 1]; } n = ((hash >>> Cache.INDEX_BITS) & Cache.SLOT_MASK) * 2; if (objects[n] == this) { return (T)objects[n + 1]; } > How do we know we do not accidentally kill a unrelated RangeCheck? They are the only RangeChecks in `ScopedValue.get()`. Of course, because the c2 code pattern matches the IR of `ScopedValue.get()` it assumes it has a certain shape. The java code and c2 code have to stay in sync. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1466619427 From roland at openjdk.org Thu Jan 25 16:34:31 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 25 Jan 2024 16:34:31 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v4] In-Reply-To: References: Message-ID: On Wed, 17 Jan 2024 11:10:08 GMT, Emanuel Peter wrote: > Why can this not be done in the first traversal, and why does this (down) traversal do the right thing? The first traversal starts from the end of the method and follows control paths until it reaches the `Thread.scopedValueCache()` call. Given the shape of the method and that some paths may have been trimmed and end with an uncommon trap, it could reach either the first or the second if that probe the cache first. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1466635670 From roland at openjdk.org Thu Jan 25 16:38:39 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 25 Jan 2024 16:38:39 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v4] In-Reply-To: References: Message-ID: <6tKBArzS5G1fEbPq3_xJfkU0Pp29fXtk4hn_7fQfv3g=.f0937e90-0f7a-4420-afb3-6c79aab27b50@github.com> On Wed, 17 Jan 2024 11:12:24 GMT, Emanuel Peter wrote: > No visited set. Can this trigger an exponential explosion with if/region diamonds? It only follows the control subgraph for the `ScopedValue.get()` which is fairly simple. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1466640810 From roland at openjdk.org Thu Jan 25 16:46:38 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 25 Jan 2024 16:46:38 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v4] In-Reply-To: References: Message-ID: On Thu, 25 Jan 2024 14:59:52 GMT, Roland Westrelin wrote: >>> In its simplest form, a ScopedValue get will have a test (to check it's indeed in the cache) and a load so I don't think vectorization is possible. >> >> So you think this could not be predicated and hoisted out of the loop? That would also be a sad limitation ? > >> > In its simplest form, a ScopedValue get will have a test (to check it's indeed in the cache) and a load so I don't think vectorization is possible. >> >> So you think this could not be predicated and hoisted out of the loop? That would also be a sad limitation ? > > Even if that was possible, ScopedValue get loads are from the cache indexed by a hash stored as a field in the ScopedValue object. I'm not sure how you would be able to tell which of the loads from several get() calls are contiguous in memory. > Nice work @rwestrel I'm sending out a first batch or comments, more coming later. Thanks for the careful review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-1910586098 From kvn at openjdk.org Thu Jan 25 16:50:40 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 25 Jan 2024 16:50:40 GMT Subject: [jdk22] RFR: 8324688: C2: Disable ReduceAllocationMerges by default In-Reply-To: References: Message-ID: On Thu, 25 Jan 2024 10:27:37 GMT, Christian Hagedorn wrote: > Due to several recent bug reports after the integration of [JDK-8287061](https://bugs.openjdk.org/browse/JDK-8287061) in JDK 22 (latest one being [JDK-8322854](https://bugs.openjdk.org/browse/JDK-8322854)), we've decided together with @JohnTortugo that we want to minimize the risk and that it's best to disable `ReduceAllocationMerges` by default for JDK 22 which disables the optimizations of [JDK-8287061](https://bugs.openjdk.org/browse/JDK-8287061). > > Thanks, > Christian Good. I approved this fix request for JDK 22. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk22/pull/97#pullrequestreview-1844206989 From duke at openjdk.org Thu Jan 25 18:03:46 2024 From: duke at openjdk.org (Joshua Cao) Date: Thu, 25 Jan 2024 18:03:46 GMT Subject: RFR: 8324667: fold Parse::seems_stable_comparison() Message-ID: passes GHA ------------- Commit messages: - 8324667: fold Parse::seems_stable_comparison() Changes: https://git.openjdk.org/jdk/pull/17573/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17573&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8324667 Stats: 17 lines in 1 file changed: 1 ins; 15 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17573.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17573/head:pull/17573 PR: https://git.openjdk.org/jdk/pull/17573 From jkarthikeyan at openjdk.org Thu Jan 25 18:23:53 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Thu, 25 Jan 2024 18:23:53 GMT Subject: RFR: 8324655: Identify integer minimum and maximum patterns created with if statements Message-ID: Hi all, I've created this patch which aims to convert common integer mininum and maximum patterns created using if statements into Min and Max nodes. These patterns are usually in the form of `a > b ? a : b` and similar, as well as patterns such as `if (a > b) b = a;`. While this transform doesn't generally improve code generation it's own, it simplifies control flow and creates new opportunities for vectorization. I've created a benchmark for the PR, and I've attached some data from my (Zen 3) machine: Baseline Patch Improvement Benchmark Mode Cnt Score Error Units Score Error Units IfMinMax.testReductionInt avgt 15 500.307 ? 16.687 ns/op 509.383 ? 32.645 ns/op (no change)* IfMinMax.testReductionLong avgt 15 493.184 ? 17.596 ns/op 513.587 ? 28.339 ns/op (no change)* IfMinMax.testSingleInt avgt 15 3.588 ? 0.540 ns/op 2.965 ? 1.380 ns/op (no change) IfMinMax.testSingleLong avgt 15 3.673 ? 0.128 ns/op 3.506 ? 0.590 ns/op (no change) IfMinMax.testVectorInt avgt 15 340.425 ? 13.123 ns/op 59.689 ? 7.509 ns/op + 5.7x IfMinMax.testVectorLong avgt 15 326.420 ? 15.554 ns/op 117.190 ? 5.622 ns/op + 2.8x * After writing this benchmark I discovered that the compiler doesn't seem to create some simple min/max reductions, even when using Math.min/max() directly. Is this known or should I create a followup RFE for this? The patch passes tier 1-3 testing on linux x64. Reviews or comments would be appreciated! ------------- Commit messages: - Convert integer min/max patterns to Min/Max nodes Changes: https://git.openjdk.org/jdk/pull/17574/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17574&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8324655 Stats: 387 lines in 4 files changed: 381 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/17574.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17574/head:pull/17574 PR: https://git.openjdk.org/jdk/pull/17574 From shade at openjdk.org Thu Jan 25 18:27:28 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 25 Jan 2024 18:27:28 GMT Subject: RFR: 8324655: Identify integer minimum and maximum patterns created with if statements In-Reply-To: References: Message-ID: On Thu, 25 Jan 2024 18:15:21 GMT, Jasmine Karthikeyan wrote: > Hi all, I've created this patch which aims to convert common integer mininum and maximum patterns created using if statements into Min and Max nodes. These patterns are usually in the form of `a > b ? a : b` and similar, as well as patterns such as `if (a > b) b = a;`. While this transform doesn't generally improve code generation it's own, it simplifies control flow and creates new opportunities for vectorization. > > I've created a benchmark for the PR, and I've attached some data from my (Zen 3) machine: > > Baseline Patch Improvement > Benchmark Mode Cnt Score Error Units Score Error Units > IfMinMax.testReductionInt avgt 15 500.307 ? 16.687 ns/op 509.383 ? 32.645 ns/op (no change)* > IfMinMax.testReductionLong avgt 15 493.184 ? 17.596 ns/op 513.587 ? 28.339 ns/op (no change)* > IfMinMax.testSingleInt avgt 15 3.588 ? 0.540 ns/op 2.965 ? 1.380 ns/op (no change) > IfMinMax.testSingleLong avgt 15 3.673 ? 0.128 ns/op 3.506 ? 0.590 ns/op (no change) > IfMinMax.testVectorInt avgt 15 340.425 ? 13.123 ns/op 59.689 ? 7.509 ns/op + 5.7x > IfMinMax.testVectorLong avgt 15 326.420 ? 15.554 ns/op 117.190 ? 5.622 ns/op + 2.8x > > > * After writing this benchmark I discovered that the compiler doesn't seem to create some simple min/max reductions, even when using Math.min/max() directly. Is this known or should I create a followup RFE for this? > > The patch passes tier 1-3 testing on linux x64. Reviews or comments would be appreciated! Drive-by comment: The problem I see with this approach is that _sometimes_ we replace `Math.{min|max}` with explicit branching to avoid cmov-s, e.g. when we know that branch would be fully predicted. Matching these branches back to `Min/Max` nodes shuts down that escape hatch :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/17574#issuecomment-1910756060 From jkarthikeyan at openjdk.org Thu Jan 25 18:45:36 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Thu, 25 Jan 2024 18:45:36 GMT Subject: RFR: 8324667: fold Parse::seems_stable_comparison() In-Reply-To: References: Message-ID: On Thu, 25 Jan 2024 17:57:55 GMT, Joshua Cao wrote: > The function has a long comment block that seems irrelevant since https://github.com/openjdk/jdk/commit/8bd4b5624c6ece31d965259aadc290a24d44423a. We can just fold away this method. It only has one caller. > > > passes GHA I think you should update the copyright year to 2024 and remove the definition of the function in `parse.hpp` as well: https://github.com/openjdk/jdk/blob/12b89cd2eeb5c2c43a2ce425c96fc4f718e30514/src/hotspot/share/opto/parse.hpp#L567 ------------- PR Review: https://git.openjdk.org/jdk/pull/17573#pullrequestreview-1844408584 From jkarthikeyan at openjdk.org Thu Jan 25 19:02:35 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Thu, 25 Jan 2024 19:02:35 GMT Subject: RFR: 8324655: Identify integer minimum and maximum patterns created with if statements In-Reply-To: References: Message-ID: On Thu, 25 Jan 2024 18:15:21 GMT, Jasmine Karthikeyan wrote: > Hi all, I've created this patch which aims to convert common integer mininum and maximum patterns created using if statements into Min and Max nodes. These patterns are usually in the form of `a > b ? a : b` and similar, as well as patterns such as `if (a > b) b = a;`. While this transform doesn't generally improve code generation it's own, it simplifies control flow and creates new opportunities for vectorization. > > I've created a benchmark for the PR, and I've attached some data from my (Zen 3) machine: > > Baseline Patch Improvement > Benchmark Mode Cnt Score Error Units Score Error Units > IfMinMax.testReductionInt avgt 15 500.307 ? 16.687 ns/op 509.383 ? 32.645 ns/op (no change)* > IfMinMax.testReductionLong avgt 15 493.184 ? 17.596 ns/op 513.587 ? 28.339 ns/op (no change)* > IfMinMax.testSingleInt avgt 15 3.588 ? 0.540 ns/op 2.965 ? 1.380 ns/op (no change) > IfMinMax.testSingleLong avgt 15 3.673 ? 0.128 ns/op 3.506 ? 0.590 ns/op (no change) > IfMinMax.testVectorInt avgt 15 340.425 ? 13.123 ns/op 59.689 ? 7.509 ns/op + 5.7x > IfMinMax.testVectorLong avgt 15 326.420 ? 15.554 ns/op 117.190 ? 5.622 ns/op + 2.8x > > > * After writing this benchmark I discovered that the compiler doesn't seem to create some simple min/max reductions, even when using Math.min/max() directly. Is this known or should I create a followup RFE for this? > > The patch passes tier 1-3 testing on linux x64. Reviews or comments would be appreciated! Ah true, I hadn't considered that- do you think it makes sense to only do the transform if the if statement isn't highly predictable? I'm not sure what an appropriate threshold value would be, but it seems `PhaseIdealLoop::conditional_move` doesn't make cmoves if the percentage is < 1% or > 99% (disregarding the different handling for loops.) And it seems the other transforms in this group suffer from this as well ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17574#issuecomment-1910806853 From duke at openjdk.org Thu Jan 25 19:05:52 2024 From: duke at openjdk.org (Joshua Cao) Date: Thu, 25 Jan 2024 19:05:52 GMT Subject: RFR: 8324667: fold Parse::seems_stable_comparison() [v2] In-Reply-To: References: Message-ID: <6HdCmuewEQMp7J4LFUTiJz3IeLnscqVFBJrYZBTtrJQ=.c2355671-80b2-4f9b-86b9-cdefcaf935f2@github.com> On Thu, 25 Jan 2024 18:42:21 GMT, Jasmine Karthikeyan wrote: > I think you should update the copyright year to 2024 and remove the definition of the function in `parse.hpp` as well: Thanks, that was a miss. Updated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17573#issuecomment-1910810369 From duke at openjdk.org Thu Jan 25 19:05:50 2024 From: duke at openjdk.org (Joshua Cao) Date: Thu, 25 Jan 2024 19:05:50 GMT Subject: RFR: 8324667: fold Parse::seems_stable_comparison() [v2] In-Reply-To: References: Message-ID: > The function has a long comment block that seems irrelevant since https://github.com/openjdk/jdk/commit/8bd4b5624c6ece31d965259aadc290a24d44423a. We can just fold away this method. It only has one caller. > > > passes GHA Joshua Cao has updated the pull request incrementally with two additional commits since the last revision: - Update copyright for parse.hpp - Remove seems_stable_comparison() from header and remove copyright ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17573/files - new: https://git.openjdk.org/jdk/pull/17573/files/42943fd5..a7884324 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17573&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17573&range=00-01 Stats: 3 lines in 2 files changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17573.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17573/head:pull/17573 PR: https://git.openjdk.org/jdk/pull/17573 From jkarthikeyan at openjdk.org Thu Jan 25 19:19:29 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Thu, 25 Jan 2024 19:19:29 GMT Subject: RFR: 8324667: fold Parse::seems_stable_comparison() [v2] In-Reply-To: References: Message-ID: On Thu, 25 Jan 2024 19:05:50 GMT, Joshua Cao wrote: >> The function has a long comment block that seems irrelevant since https://github.com/openjdk/jdk/commit/8bd4b5624c6ece31d965259aadc290a24d44423a. We can just fold away this method. It only has one caller. >> >> >> passes GHA > > Joshua Cao has updated the pull request incrementally with two additional commits since the last revision: > > - Update copyright for parse.hpp > - Remove seems_stable_comparison() from header and remove copyright Thanks! Looks good to me (I'm not a reviewer though). ------------- Marked as reviewed by jkarthikeyan (Author). PR Review: https://git.openjdk.org/jdk/pull/17573#pullrequestreview-1844461037 From kvn at openjdk.org Thu Jan 25 20:10:41 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 25 Jan 2024 20:10:41 GMT Subject: RFR: 8315361: C2 SuperWord: refactor out loop analysis into shared auto-vectorization facility VLoopAnalyzer [v12] In-Reply-To: References: Message-ID: On Mon, 22 Jan 2024 16:11:02 GMT, Emanuel Peter wrote: >> This is a refactoring of `SuperWord`. >> >> **Goals** >> >> 1. Clean up `SuperWord`: disentangle different components, make them more **modular**. >> 2. Make the loop analysis parts a **shared facility**, not just for SuperWord but also the post-loop-vectorizer ([JDK-8308994](https://bugs.openjdk.org/browse/JDK-8308994)). >> 3. It is also a necessary step on my bigger plans for improvement with the C2 Auto-Vectorizer ([see my blog post](https://eme64.github.io/blog/2023/11/03/C2-AutoVectorizer-Improvement-Ideas.html)). >> 4. Improve tracing in the auto-vectorization by making it more systematic. >> >> **Summary** >> >> - I wrote a summary of how C2 auto-vectorization with SuperWord works (please read!): >> https://github.com/openjdk/jdk/blob/95fd361e60fc66eb91edad321662e508b2d1bdde/src/hotspot/share/opto/superword.hpp#L32-L177 >> - I moved many `Superword` components out to `VLoop` and to `VLoopAnalyzer`. The idea is that any vectorizer can use these facilities in the future. They are therefore made more modular, which should hopefully make future changes easier. These components are: >> - Checking the pre-conditions for vectorization (e.g. no unwanted ctrl-flow). >> - `VLoop::check_preconditions_helper` replaces code from old `SuperWord::transform_loop`. >> - Running all submodules of `VLoopAnalyzer`: `VLoopAnalyzer::analyze_helper`. Replaces analysis part of `SuperWord::SLP_extract`. >> - Finding and marking reductions -> `VLoopReductions` >> - Detecting memory slices -> `VLoopMemorySlices` >> - Analyzing the body -> `VLoopBody` (renamed `in_bb` -> `in_body`) >> - Determining vector element types, and functions to determine the `vector_width` of a node -> `VLoopTypes` >> - Constructing the dependence graph -> `VLoopDependenceGraph`. Replaces old `DepGraph` with all its components. >> - New: CompileCommand option `TraceAutovectorization` >> - Run with `-XX:CompileCommand=traceAutovectorization,*::*,help` to get a usage description. >> - Replaced all printing with flags `TraceSuperWord` (and `Verbose`) and of `VectorizeDebug`. >> - The advantage of a CompileCommand is that tracing can be applied selectively for only a limited set of java classes / methods. >> - It uses tags, which are more readable than the `VectorizeDebug` bit-flags. These tags can be used for all parts of the vectorizer, but one can also target SuperWord specifically. >> - I systematically added tracing at every point where vectorization (partially) fails (use tag `SW_REJECTIONS`... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > remove SuperWord::init, and reserve space in data structures For easy reviewing I would suggest to separate TraceAutovectorization to separate RFE. I also think upper case `V` in name looks better: TraceAutoVectorization. Also renaming which affects platform specific code could be in separate RFE. These 2 changes are easy to review and can be pushed first. I start looking on VLoop related changes and it will take time. ------------- PR Review: https://git.openjdk.org/jdk/pull/16620#pullrequestreview-1844593295 From cslucas at openjdk.org Thu Jan 25 22:12:43 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Thu, 25 Jan 2024 22:12:43 GMT Subject: [jdk22] RFR: 8324688: C2: Disable ReduceAllocationMerges by default In-Reply-To: References: Message-ID: On Thu, 25 Jan 2024 10:27:37 GMT, Christian Hagedorn wrote: > Due to several recent bug reports after the integration of [JDK-8287061](https://bugs.openjdk.org/browse/JDK-8287061) in JDK 22 (latest one being [JDK-8322854](https://bugs.openjdk.org/browse/JDK-8322854)), we've decided together with @JohnTortugo that we want to minimize the risk and that it's best to disable `ReduceAllocationMerges` by default for JDK 22 which disables the optimizations of [JDK-8287061](https://bugs.openjdk.org/browse/JDK-8287061). > > Thanks, > Christian LGTM! ------------- PR Comment: https://git.openjdk.org/jdk22/pull/97#issuecomment-1911080416 From dlong at openjdk.org Fri Jan 26 00:58:29 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 26 Jan 2024 00:58:29 GMT Subject: RFR: 8324241: Always record evol_method deps to avoid excessive method flushing [v4] In-Reply-To: References: Message-ID: On Wed, 24 Jan 2024 14:48:52 GMT, Volker Simonis wrote: >> Currently we don't record dependencies on redefined methods (i.e. `evol_method` dependencies) in JIT compiled methods if none of the `can_redefine_classes`, `can_retransform_classes` or `can_generate_breakpoint_events` JVMTI capabalities is set. This means that if a JVMTI agent which requests one of these capabilities is dynamically attached, all the methods which have been JIT compiled until that point, will be marked for deoptimization and flushed from the code cache. For large, warmed-up applications this mean deoptimization and instant recompilation of thousands if not then-thousands of methods, which can lead to dramatic performance/latency drop-downs for several minutes. >> >> One could argue that dynamic agent attach is now deprecated anyway (see [JEP 451: Prepare to Disallow the Dynamic Loading of Agents](https://openjdk.org/jeps/451)) and this problem could be solved by making the recording of `evol_method` dependencies dependent on the new `-XX:+EnableDynamicAgentLoading` flag isntead of the concrete JVMTI capabilities (because the presence of the flag indicates that an agent will be loaded eventually). >> >> But there a single, however important exception to this rule and that's JFR. JFR is advertised as low overhead profiler which can be enabled in production at any time. However, when JFR is started dynamically (e.g. through JCMD or JMX) it will silently load a HotSpot internl JVMTI agent which requests the `can_retransform_classes` and retransforms some classes. This will inevitably trigger the deoptimization of all compiled methods as described above. >> >> I'd therefor like to propose to *always* and unconditionally record `evol_method` dependencies in JIT compiled code by exporting the relevant properties right at startup in `init_globals()`: >> ```c++ >> jint init_globals() { >> management_init(); >> JvmtiExport::initialize_oop_storage(); >> +#if INCLUDE_JVMTI >> + JvmtiExport::set_can_hotswap_or_post_breakpoint(true); >> + JvmtiExport::set_all_dependencies_are_recorded(true); >> +#endif >> >> >> My measurements indicate that the overhead of doing so is minimal (around 1% increase of nmethod size) and justifies the benefit. E.g. a Spring Petclinic application started with `-Xbatch -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation` compiles about ~11500 methods (~9000 with C1 and ~2500 with C2) resulting in an aggregated nmethod size of around ~40bm. Additionally recording `evol_method` dependencies only increases this size be about 400kb.... > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Made the flag experimental and added an assertion to set_can_hotswap_or_post_breakpoint() Marked as reviewed by dlong (Reviewer). No, go ahead. ------------- PR Review: https://git.openjdk.org/jdk/pull/17509#pullrequestreview-1844939013 PR Comment: https://git.openjdk.org/jdk/pull/17509#issuecomment-1911240540 From dlong at openjdk.org Fri Jan 26 01:03:37 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 26 Jan 2024 01:03:37 GMT Subject: RFR: 8324123: aarch64: fix prfm literal encoding in assembler [v6] In-Reply-To: <1F918465vpiJUQ0XbadPAJLrs58TmBj3sVK5TapAWqA=.768a151b-c0ce-447d-951e-f440df83e9f1@github.com> References: <1F918465vpiJUQ0XbadPAJLrs58TmBj3sVK5TapAWqA=.768a151b-c0ce-447d-951e-f440df83e9f1@github.com> Message-ID: On Thu, 25 Jan 2024 11:34:00 GMT, Wang Zhuo wrote: >> Current prfm literal mode encoding in aarch64 assembler is not correct. >> The prfm_literal instruction requires 31 and 30 bits to be 0x11, while current assembler encodes the two bits to be 0x11, which is a ldr instruction, not prfm. >> For example, if adding the following code in stubGenerator >> __ prfm(Address(__ pc())) >> we get a ldr instruction like >> ldr x0, 0x0000ffff83f8539c >> but it should be a prfm instruction like >> prfm pldl1keep, 0x0000ffff8ff8539c >> >> The bug is caused in ld_st2, literal mode, bit 31 and 30 bits are set to (size & 0b01), while for prfm instructions, 31 and 30 bits must be 0b11. >> void ld_st2(Register Rt, const Address &adr, int size, int op, int V = 0) { >> starti; >> >> f(V, 26); // general reg? >> zrf(Rt, 0); >> >> // Encoding for literal loads is done here (rather than pushed >> // down into Address::encode) because the encoding of this >> // instruction is too different from all of the other forms to >> // make it worth sharing. >> if (adr.getMode() == Address::literal) { >> assert(size == 0b10 || size == 0b11, "bad operand size in ldr"); >> assert(op == 0b01, "literal form can only be used with loads"); >> f(**size & 0b01, 31, 30**), f(0b011, 29, 27), f(0b00, 25, 24); >> int64_t offset = (adr.target() - pc()) >> 2; >> sf(offset, 23, 5); >> code_section()->relocate(pc(), adr.rspec()); >> return; >> } >> >> f(size, 31, 30); >> f(op, 23, 22); // str >> adr.encode(¤t_insn); >> } > > Wang Zhuo has updated the pull request incrementally with two additional commits since the last revision: > > - Update assembler_aarch64.cpp, merge guarantee > > Co-authored-by: Andrew Haley > - Update assembler_aarch64.cpp delete some comments > > Co-authored-by: Andrew Haley Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/17482#pullrequestreview-1844942756 From dlong at openjdk.org Fri Jan 26 01:07:35 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 26 Jan 2024 01:07:35 GMT Subject: RFR: 8321308: AArch64: Fix matching predication for cbz/cbnz In-Reply-To: References: Message-ID: <2n5LYk8qmqsiwx3ZXae8pFhg9V91pMIt9reKSRUib8M=.cc280bf7-bf66-4371-8d37-0de8df347b08@github.com> On Wed, 6 Dec 2023 01:54:59 GMT, Fei Gao wrote: > For array length check like: > > if (a.length > 0) { > [Block 1] > } else { > [Block 2] > } > > > Since `a.length` is unsigned, it's semantically equivalent to: > > if (a.length != 0) { > [Block 1] > } else { > [Block 2] > } > > > On aarch64 port, we can do the conversion like above, during c2 compiler instruction matching, for certain unsigned integral comparisons. > > For example, > > cmpw w11, #0 # unsigned > bls label # unsigned > [Block 1] > > label: > [Block 2] > > > can be converted to: > > cbz w11, label > [Block 1] > > label: > [Block 2] > > > Currently, we have some matching rules to do the conversion [[1]](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64.ad#L16179). But the predicate here [[2]](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64.ad#L6140) matches wrong `BoolTest` masks, so these rules fail to convert. I guess it's a typo introduced in [JDK-8160006](https://bugs.openjdk.org/browse/JDK-8160006). The patch fixes it. Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16989#pullrequestreview-1844945198 From fyang at openjdk.org Fri Jan 26 02:04:25 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 26 Jan 2024 02:04:25 GMT Subject: RFR: 8321308: AArch64: Fix matching predication for cbz/cbnz In-Reply-To: References: Message-ID: <4K2aineo5-XU0yF3Dpms_DBBu_5b2LrkgM10EEpjiCo=.f1a0cb0c-d8d0-4ec0-bf95-3ad2f4a81fda@github.com> On Wed, 6 Dec 2023 01:54:59 GMT, Fei Gao wrote: > For array length check like: > > if (a.length > 0) { > [Block 1] > } else { > [Block 2] > } > > > Since `a.length` is unsigned, it's semantically equivalent to: > > if (a.length != 0) { > [Block 1] > } else { > [Block 2] > } > > > On aarch64 port, we can do the conversion like above, during c2 compiler instruction matching, for certain unsigned integral comparisons. > > For example, > > cmpw w11, #0 # unsigned > bls label # unsigned > [Block 1] > > label: > [Block 2] > > > can be converted to: > > cbz w11, label > [Block 1] > > label: > [Block 2] > > > Currently, we have some matching rules to do the conversion [[1]](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64.ad#L16179). But the predicate here [[2]](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64.ad#L6140) matches wrong `BoolTest` masks, so these rules fail to convert. I guess it's a typo introduced in [JDK-8160006](https://bugs.openjdk.org/browse/JDK-8160006). The patch fixes it. LGTM. Now the predicate the same as riscv's `cmpOpUEqNeLeGt` operand. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16989#pullrequestreview-1844980318 From wzhuo at openjdk.org Fri Jan 26 02:33:41 2024 From: wzhuo at openjdk.org (Wang Zhuo) Date: Fri, 26 Jan 2024 02:33:41 GMT Subject: Integrated: 8324123: aarch64: fix prfm literal encoding in assembler In-Reply-To: References: Message-ID: On Thu, 18 Jan 2024 10:02:59 GMT, Wang Zhuo wrote: > Current prfm literal mode encoding in aarch64 assembler is not correct. > The prfm_literal instruction requires 31 and 30 bits to be 0x11, while current assembler encodes the two bits to be 0x11, which is a ldr instruction, not prfm. > For example, if adding the following code in stubGenerator > __ prfm(Address(__ pc())) > we get a ldr instruction like > ldr x0, 0x0000ffff83f8539c > but it should be a prfm instruction like > prfm pldl1keep, 0x0000ffff8ff8539c > > The bug is caused in ld_st2, literal mode, bit 31 and 30 bits are set to (size & 0b01), while for prfm instructions, 31 and 30 bits must be 0b11. > void ld_st2(Register Rt, const Address &adr, int size, int op, int V = 0) { > starti; > > f(V, 26); // general reg? > zrf(Rt, 0); > > // Encoding for literal loads is done here (rather than pushed > // down into Address::encode) because the encoding of this > // instruction is too different from all of the other forms to > // make it worth sharing. > if (adr.getMode() == Address::literal) { > assert(size == 0b10 || size == 0b11, "bad operand size in ldr"); > assert(op == 0b01, "literal form can only be used with loads"); > f(**size & 0b01, 31, 30**), f(0b011, 29, 27), f(0b00, 25, 24); > int64_t offset = (adr.target() - pc()) >> 2; > sf(offset, 23, 5); > code_section()->relocate(pc(), adr.rspec()); > return; > } > > f(size, 31, 30); > f(op, 23, 22); // str > adr.encode(¤t_insn); > } This pull request has now been integrated. Changeset: bde87895 Author: Wang Zhuo Committer: Denghui Dong URL: https://git.openjdk.org/jdk/commit/bde87895c8b1b9df198e3883d24cd9ea840efc98 Stats: 33 lines in 2 files changed: 22 ins; 11 del; 0 mod 8324123: aarch64: fix prfm literal encoding in assembler Reviewed-by: aph, dlong ------------- PR: https://git.openjdk.org/jdk/pull/17482 From kvn at openjdk.org Fri Jan 26 03:17:39 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 26 Jan 2024 03:17:39 GMT Subject: RFR: 8322996: BoxLockNode creation fails with assert(reg < CHUNK_SIZE) failed: sanity [v5] In-Reply-To: References: Message-ID: On Thu, 25 Jan 2024 12:26:45 GMT, Daniel Lund?n wrote: >> src/hotspot/share/opto/graphKit.cpp line 3473: >> >>> 3471: Node* box = _gvn.transform(new BoxLockNode(next_monitor())); >>> 3472: // Check for bailout after new BoxLockNode >>> 3473: if (failing()) { return nullptr; } >> >> Do all callers of `shared_lock()` checks for `failing()` or returned `nullptr`? > > No, not the immediate callers at least. Below is a quick call graph analysis for the places where we create `BoxLockNode`s (up to the first bailout check). Should I add returns at all points in the call chain up to the first checks? > > graphKit.cpp:3471 (this is in shared_lock) > locknode.cpp:196 > parse2.cpp:2759 > parse1.cpp:1594 (Checks for bailout at parse1.cpp:1595) > parse1.cpp:1264 > parse1.cpp:582 (Checks for bailout at parse1.cpp:596) > > parse1.cpp:227 > parse1.cpp:579 (Checks for bailout at parse1.cpp:596) I am not worry about exit in`load_interpreter_state()` there is check after call. There is call to `record_profiled_parameters_for_speculation()` after `shared_lock()` call in `do_method_entry()`. It does not check `failing()`. May be add check `if (!failing())` when calling it (or inside it). For `do_one_bytecode()` case the only question is if IGVN code can handle such bailout. It is called at the end of method. If not, we need to add the check. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17370#discussion_r1467186357 From dlong at openjdk.org Fri Jan 26 08:34:34 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 26 Jan 2024 08:34:34 GMT Subject: RFR: 8324630: C1: Canonicalizer::do_LookupSwitch doesn't break the loop when the successor is found In-Reply-To: References: Message-ID: On Wed, 24 Jan 2024 13:37:05 GMT, Denghui Dong wrote: > Hi, > > Please review the small change that breaks the loop in Canonicalizer::do_LookupSwitch if the successor is found. > > The keys of LookupSwitch are sorted, so there is no need to continue the loop once matched. > > Thanks. src/hotspot/share/c1/c1_Canonicalizer.cpp line 848: > 846: if (v == x->key_at(i)) { > 847: sux = x->sux_at(i); > 848: break; Shouldn't we also break when `v < x->key_at(i) `, meaning no key will match? Maybe we should consider binary search? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17553#discussion_r1467351976 From ddong at openjdk.org Fri Jan 26 09:08:42 2024 From: ddong at openjdk.org (Denghui Dong) Date: Fri, 26 Jan 2024 09:08:42 GMT Subject: RFR: 8324630: C1: Canonicalizer::do_LookupSwitch doesn't break the loop when the successor is found [v2] In-Reply-To: References: Message-ID: On Fri, 26 Jan 2024 08:31:40 GMT, Dean Long wrote: >> Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: >> >> update > > src/hotspot/share/c1/c1_Canonicalizer.cpp line 848: > >> 846: if (v == x->key_at(i)) { >> 847: sux = x->sux_at(i); >> 848: break; > > Shouldn't we also break when `v < x->key_at(i) `, meaning no key will match? Maybe we should consider binary search? Updated: break when `v < x->key_at(i)` > Maybe we should consider binary search? >From the perspective of pursuing performance, binary search can be considered. If you agree, I can change to using binary search here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17553#discussion_r1467385164 From epeter at openjdk.org Fri Jan 26 09:35:50 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 26 Jan 2024 09:35:50 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store [v4] In-Reply-To: References: Message-ID: > This is a feature requiested by @RogerRiggs and @cl4es . > > **Idea** > > Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get speedups by using Unsafe (e.g. `Unsafe.putLongUnaligned`), or ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code. > > This patch here supports a few simple use-cases, like these: > > Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395 > > Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the splitting (i.e. shifting and truncation), and directly store the variable: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456 > > The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian): > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338 > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472 > > **Details** > > This draft currently implements the optimization in an additional special IGVN phase: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485 > > We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. We essentially try to establish a chain of mergable stores: > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806 > > Mergable stores must have the same Opcode (implies they have the same element type and hence size). Further, mergable stores must have the same control (or be separated by only a RangeCheck). Further, they must either both store constants, or adjacent segments of a larger value ... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Add diagnostic flag MergeStores ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16245/files - new: https://git.openjdk.org/jdk/pull/16245/files/42a24bdf..83290c57 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16245&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16245&range=02-03 Stats: 4 lines in 2 files changed: 3 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16245.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16245/head:pull/16245 PR: https://git.openjdk.org/jdk/pull/16245 From shade at openjdk.org Fri Jan 26 09:48:31 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 26 Jan 2024 09:48:31 GMT Subject: RFR: 8324241: Always record evol_method deps to avoid excessive method flushing [v4] In-Reply-To: References: Message-ID: <8GMtoehKpsPFAXIBfZnT9eLn9PjRuSvGfqvL6os8H9k=.9303d525-b1e5-4d8e-a45e-d55884063540@github.com> On Wed, 24 Jan 2024 14:48:52 GMT, Volker Simonis wrote: >> Currently we don't record dependencies on redefined methods (i.e. `evol_method` dependencies) in JIT compiled methods if none of the `can_redefine_classes`, `can_retransform_classes` or `can_generate_breakpoint_events` JVMTI capabalities is set. This means that if a JVMTI agent which requests one of these capabilities is dynamically attached, all the methods which have been JIT compiled until that point, will be marked for deoptimization and flushed from the code cache. For large, warmed-up applications this mean deoptimization and instant recompilation of thousands if not then-thousands of methods, which can lead to dramatic performance/latency drop-downs for several minutes. >> >> One could argue that dynamic agent attach is now deprecated anyway (see [JEP 451: Prepare to Disallow the Dynamic Loading of Agents](https://openjdk.org/jeps/451)) and this problem could be solved by making the recording of `evol_method` dependencies dependent on the new `-XX:+EnableDynamicAgentLoading` flag isntead of the concrete JVMTI capabilities (because the presence of the flag indicates that an agent will be loaded eventually). >> >> But there a single, however important exception to this rule and that's JFR. JFR is advertised as low overhead profiler which can be enabled in production at any time. However, when JFR is started dynamically (e.g. through JCMD or JMX) it will silently load a HotSpot internl JVMTI agent which requests the `can_retransform_classes` and retransforms some classes. This will inevitably trigger the deoptimization of all compiled methods as described above. >> >> I'd therefor like to propose to *always* and unconditionally record `evol_method` dependencies in JIT compiled code by exporting the relevant properties right at startup in `init_globals()`: >> ```c++ >> jint init_globals() { >> management_init(); >> JvmtiExport::initialize_oop_storage(); >> +#if INCLUDE_JVMTI >> + JvmtiExport::set_can_hotswap_or_post_breakpoint(true); >> + JvmtiExport::set_all_dependencies_are_recorded(true); >> +#endif >> >> >> My measurements indicate that the overhead of doing so is minimal (around 1% increase of nmethod size) and justifies the benefit. E.g. a Spring Petclinic application started with `-Xbatch -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation` compiles about ~11500 methods (~9000 with C1 and ~2500 with C2) resulting in an aggregated nmethod size of around ~40bm. Additionally recording `evol_method` dependencies only increases this size be about 400kb.... > > Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: > > Made the flag experimental and added an assertion to set_can_hotswap_or_post_breakpoint() I am good with these changes, only a few stylistic nits. I am on the fence where the default for this flag should stand. The 1% loss in nmethod size is probably okay, given that we gained as much with denser code improvements like [JDK-8319406](https://bugs.openjdk.org/browse/JDK-8319406). Maybe there are some tricks in dependency encoding that would drive this down even more. src/hotspot/share/prims/jvmtiExport.hpp line 159: > 157: // recorded from that point on. > 158: assert(!_can_hotswap_or_post_breakpoint || on, "sanity check"); > 159: _can_hotswap_or_post_breakpoint = (on != 0); Pre-existing: wild that this code checks `!= 0` against `bool`, when it could have just used the bool directly, like the new assert does. I see no reason to keep `!= 0` here. src/hotspot/share/runtime/globals.hpp line 2016: > 2014: "Unconditionally record nmethod dependencies on class " \ > 2015: "rewriting/transformation independently of the JVMTI " \ > 2016: " can_{retransform/redefine}_classes capabilities.") \ Probably match the indenting of previous delcarations? ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17509#pullrequestreview-1845365087 PR Review Comment: https://git.openjdk.org/jdk/pull/17509#discussion_r1467418701 PR Review Comment: https://git.openjdk.org/jdk/pull/17509#discussion_r1467415176 From shade at openjdk.org Fri Jan 26 09:55:36 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 26 Jan 2024 09:55:36 GMT Subject: RFR: 8324655: Identify integer minimum and maximum patterns created with if statements In-Reply-To: References: Message-ID: On Thu, 25 Jan 2024 18:59:23 GMT, Jasmine Karthikeyan wrote: > Ah true, I hadn't considered that- do you think it makes sense to only do the transform if the if statement isn't highly predictable? Yeah, I think if this is effectively translating branches to cmovs, it should be gated by cmov conversion heuristics somehow. Not sure how to do this cleanly, given the choice for cmov-s for min/max is done only later in matching rules. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17574#issuecomment-1911769670 From dlunden at openjdk.org Fri Jan 26 09:58:51 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Fri, 26 Jan 2024 09:58:51 GMT Subject: RFR: 8322996: BoxLockNode creation fails with assert(reg < CHUNK_SIZE) failed: sanity [v6] In-Reply-To: References: Message-ID: <_xAqDkA3y334spGXTfWzhIgAtpbQVk1OJr7dqI39slo=.942138ce-827b-4072-8407-8d6f21f1be1b@github.com> > This changeset fixes an issue where deeply nested synchronized statements triggered an assert in C2. > > Changes: > - Bail out on compilation when we create a `BoxLockNode` with a slot index that cannot fit in a `RegMask`. This is similar to how we handle the case when we do not have space to represent arguments in [`opto/matcher.cpp`](https://github.com/openjdk/jdk/blob/58b01dce054c50bcb5a28aad4c1b574acaa90f6d/src/hotspot/share/opto/matcher.cpp#L314-L318) > - Generalize `RegMask::can_represent` to take an additional and optional size argument to facilitate reuse. The default size value, 1, corresponds to the previous functionality. Rewrite `can_represent_arg` to directly call `can_represent(reg, SlotsPerVecZ)`. > - Add a regression test. > > Testing: > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/7612890820) > - tier1 to tier5 on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. > - The new regression test in all tier1 to tier10 contexts on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: Revise bailout checks ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17370/files - new: https://git.openjdk.org/jdk/pull/17370/files/524438ca..5fedbc3b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17370&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17370&range=04-05 Stats: 6 lines in 2 files changed: 4 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17370.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17370/head:pull/17370 PR: https://git.openjdk.org/jdk/pull/17370 From dlunden at openjdk.org Fri Jan 26 09:58:52 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Fri, 26 Jan 2024 09:58:52 GMT Subject: RFR: 8322996: BoxLockNode creation fails with assert(reg < CHUNK_SIZE) failed: sanity [v5] In-Reply-To: References: Message-ID: <5qX3luVttnofEYa0QzvwJIEjXu7ndtmt8Bz2vdat-4c=.b73470b4-ec7b-4e1d-81ba-f6edb46a9cb0@github.com> On Fri, 26 Jan 2024 03:14:47 GMT, Vladimir Kozlov wrote: > There is call to record_profiled_parameters_for_speculation() after shared_lock() call in do_method_entry(). It does not check failing(). May be add check if (!failing()) when calling it (or inside it). I've now added a check just after `_synch_lock = shared_lock(lock_obj);`, thanks. > For do_one_bytecode() case the only question is if IGVN code can handle such bailout. **It is called at the end of method**. If not, we need to add the check. I cannot see any call to IGVN after `do_monitor_enter()` in `do_one_bytecode()`. Can you elaborate? Maybe related, I also changed to Node *box = new BoxLockNode(next_monitor()); // Check for bailout after new BoxLockNode if (failing()) { return; } box = _gvn.transform(box); from the previous Node *box = _gvn.transform(new BoxLockNode(next_monitor())); // Check for bailout after new BoxLockNode if (failing()) { return; } to avoid potential issues with `_gvn.transform` due to the new bailout. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17370#discussion_r1467438125 From epeter at openjdk.org Fri Jan 26 09:59:59 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 26 Jan 2024 09:59:59 GMT Subject: RFR: 8324750: C2: rename Matcher methods using "superword" -> "autovectorization" Message-ID: Subtask of https://github.com/openjdk/jdk/pull/16620 Matcher::superword_max_vector_size -> Matcher::max_vector_size_autovectorization Matcher::match_rule_supported_superword -> match_rule_supported_autovectorization This is to make the naming more general, since these methods can be used by any autovectorizer in the future. ------------- Commit messages: - 8324750 Changes: https://git.openjdk.org/jdk/pull/17583/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17583&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8324750 Stats: 34 lines in 12 files changed: 0 ins; 0 del; 34 mod Patch: https://git.openjdk.org/jdk/pull/17583.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17583/head:pull/17583 PR: https://git.openjdk.org/jdk/pull/17583 From rcastanedalo at openjdk.org Fri Jan 26 10:01:38 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 26 Jan 2024 10:01:38 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v2] In-Reply-To: References: Message-ID: On Thu, 25 Jan 2024 15:35:24 GMT, Daniel Lund?n wrote: > Sure, I'll add negative rules for tests that do not currently vectorize (at all). Please, if you do this add a comment to each negative rule clarifying that it does not document the desired behavior of the system but the current behavior, to support future development and improvements. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17428#issuecomment-1911777962 From epeter at openjdk.org Fri Jan 26 10:19:58 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 26 Jan 2024 10:19:58 GMT Subject: RFR: 8324752: C2 Superword: remove SuperWordRTDepCheck Message-ID: <-P8KzeUnHvnu9mv1F2IqxJBZIsx_EJYRvYaR4CsKkCY=.73c6743d-25ea-491d-9be6-70b89a76ae7f@github.com> Subtask of https://github.com/openjdk/jdk/pull/16620 SuperWordRTDepCheck is a debug-only flag, which detects if there are arrays in the same slice that have different bases, i.e. may be different arrays. This could be the basis for alias-analysis. We should do aliasing-analysis properly in a future RFE ([JDK-8324751](https://bugs.openjdk.org/browse/JDK-8324751)). If we can prove (statically or with a runtime-check) that two arrays are different, then this removes edges from the dependency graph, and may allow vectorization that would otherwise not be possible. ------------- Commit messages: - 8324752 Changes: https://git.openjdk.org/jdk/pull/17585/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17585&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8324752 Stats: 56 lines in 3 files changed: 0 ins; 55 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17585.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17585/head:pull/17585 PR: https://git.openjdk.org/jdk/pull/17585 From dlunden at openjdk.org Fri Jan 26 10:40:43 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Fri, 26 Jan 2024 10:40:43 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v6] In-Reply-To: References: Message-ID: > This changeset translates the tests in `compiler/c2/cr7200264/` to use the IR verification framework. > > The proposed translation, to the extent possible, attempts to preserve the semantics of the original test. A major difference is that the IR checks are now local (for every `test_*` method) instead of global. The execution time of the new test is comparable to the old test. > > Testing: > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/7553846710) > - Ran the new translated tests within all tier1 through tier10 contexts on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. > - Tested that manually adding `-XX:LoopUnrollLimit=0` to the test framework flags caused the translated tests to fail. Note: it is, however, no longer possible to break the test by passing `-XX:LoopUnrollLimit=0` on the command line. Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: Update after discussions and comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17428/files - new: https://git.openjdk.org/jdk/pull/17428/files/cb575780..b2dd79ed Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17428&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17428&range=04-05 Stats: 70 lines in 1 file changed: 49 ins; 0 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/17428.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17428/head:pull/17428 PR: https://git.openjdk.org/jdk/pull/17428 From dlunden at openjdk.org Fri Jan 26 10:43:37 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Fri, 26 Jan 2024 10:43:37 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v2] In-Reply-To: References: Message-ID: <8SCoq1z8_QeQPAWR013XqrIK80FGuRBRs5Q6TdmpH40=.7170407a-52f9-4b15-a877-96d6cd183969@github.com> On Fri, 26 Jan 2024 09:58:34 GMT, Roberto Casta?eda Lozano wrote: >> @eme64 >> >>> @dlunde Given the findings here: [#17428 (comment)](https://github.com/openjdk/jdk/pull/17428#discussion_r1466460916) I think you should add a IR rule on every test. >> >> I don't quite understand, what IR rule do you want me to add for every test? I'll of course add what we discussed for the problematic `test_divc` and `test_divc_n`. >> >>> And for the ones that do not currently vectorize, please add a negative IR rule, so that we can detect when that changes. For example when we implement a feature, then we can properly fix up the IR rule. >> >> Sure, I'll add negative rules for tests that do not currently vectorize (at all). > >> Sure, I'll add negative rules for tests that do not currently vectorize (at all). > > Please, if you do this add a comment to each negative rule clarifying that it does not document the desired behavior of the system but the current behavior, to support future development and improvements. @robcasloz @eme64 @chhagedorn: I've now added everything that we've discussed. Please have another look. I'll wait for approval before rerunning tests. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17428#issuecomment-1911836874 From rcastanedalo at openjdk.org Fri Jan 26 11:58:40 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 26 Jan 2024 11:58:40 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v6] In-Reply-To: References: Message-ID: On Fri, 26 Jan 2024 10:40:43 GMT, Daniel Lund?n wrote: >> This changeset translates the tests in `compiler/c2/cr7200264/` to use the IR verification framework. >> >> The proposed translation, to the extent possible, attempts to preserve the semantics of the original test. A major difference is that the IR checks are now local (for every `test_*` method) instead of global. The execution time of the new test is comparable to the old test. >> >> Testing: >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/7553846710) >> - Ran the new translated tests within all tier1 through tier10 contexts on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. >> - Tested that manually adding `-XX:LoopUnrollLimit=0` to the test framework flags caused the translated tests to fail. Note: it is, however, no longer possible to break the test by passing `-XX:LoopUnrollLimit=0` on the command line. > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Update after discussions and comments Looks good! Please also check that the [linux-x86 tests](https://github.com/openjdk/jdk/pull/17428/checks?check_run_id=20897748570) pass before integration. ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17428#pullrequestreview-1845625469 From epeter at openjdk.org Fri Jan 26 12:02:36 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 26 Jan 2024 12:02:36 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v6] In-Reply-To: References: Message-ID: On Fri, 26 Jan 2024 10:40:43 GMT, Daniel Lund?n wrote: >> This changeset translates the tests in `compiler/c2/cr7200264/` to use the IR verification framework. >> >> The proposed translation, to the extent possible, attempts to preserve the semantics of the original test. A major difference is that the IR checks are now local (for every `test_*` method) instead of global. The execution time of the new test is comparable to the old test. >> >> Testing: >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/7553846710) >> - Ran the new translated tests within all tier1 through tier10 contexts on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. >> - Tested that manually adding `-XX:LoopUnrollLimit=0` to the test framework flags caused the translated tests to fail. Note: it is, however, no longer possible to break the test by passing `-XX:LoopUnrollLimit=0` on the command line. > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Update after discussions and comments Looks great now, thanks @dlunde ! ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17428#pullrequestreview-1845630659 From fyang at openjdk.org Fri Jan 26 12:51:27 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 26 Jan 2024 12:51:27 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v2] In-Reply-To: <_XgH4VuPRDJLnIXlkoNtfdDIvvujW-zf1o3UeYlCrn8=.cb21e8dd-6160-4245-a169-4db548776aec@github.com> References: <_XgH4VuPRDJLnIXlkoNtfdDIvvujW-zf1o3UeYlCrn8=.cb21e8dd-6160-4245-a169-4db548776aec@github.com> Message-ID: On Thu, 25 Jan 2024 14:57:48 GMT, Yuri Gaevsky wrote: >> src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1603: >> >>> 1601: la(pows31, ExternalAddress(adr_pows31)); >>> 1602: mv(t1, num_8b_elems_in_vec); >>> 1603: vsetvli(t0, t1, Assembler::e32, Assembler::m4); >> >> I wonder if the scalar code for handling `WIDE_TAIL` could be eliminated with RVV's design for stripmining approach [1]? Looks like the current code doesn't take advantage of this design as new vl returned by `vsetvli` is not checked and used. >> >> [1] https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#sec-vector-config >> >> One of the common approaches to handling a large number of elements is "stripmining" where each iteration of >> a loop handles some number of elements, and the iterations continue until all elements have been processed. >> The RISC-V vector specification provides direct, portable support for this approach. The application specifies the >> total number of elements to be processed (the application vector length or AVL) as a candidate value for vl, and >> the hardware responds via a general-purpose register with the (frequently smaller) number of elements that the >> hardware will handle per iteration (stored in vl), based on the microarchitectural implementation and the vtype >> setting. A straightforward loop structure, shown in [Example of stripmining and changes to SEW] >> (https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#example-stripmine-sew), depicts the ease with >> which the code keeps track of the remaining number of elements and the amount per iteration handled by hardware. > > Thank you for your comments, @RealFYang. I have tried to use vector instructions (m4 ==> m2) for the tail calculations but that makes the perfromance numbers only worse. :-( > > I've made additional measurements with more granularity: > > [ -XX:-UseRVV ] [-XX:+UseRVV } > ArraysHashCode.multiints 10 avgt 30 12.460 ? 0.155 13.836 ? 0.054 ns/op > ArraysHashCode.multiints 11 avgt 30 14.541 ? 0.140 14.613 ? 0.084 ns/op > ArraysHashCode.multiints 12 avgt 30 15.097 ? 0.052 15.517 ? 0.097 ns/op > ArraysHashCode.multiints 13 avgt 30 13.632 ? 0.137 14.486 ? 0.181 ns/op > ArraysHashCode.multiints 14 avgt 30 15.771 ? 0.108 16.153 ? 0.092 ns/op > ArraysHashCode.multiints 15 avgt 30 14.726 ? 0.088 15.930 ? 0.077 ns/op > ArraysHashCode.multiints 16 avgt 30 15.533 ? 0.067 15.496 ? 0.083 ns/op > ArraysHashCode.multiints 17 avgt 30 15.875 ? 0.173 16.878 ? 0.172 ns/op > ArraysHashCode.multiints 18 avgt 30 15.740 ? 0.114 16.465 ? 0.089 ns/op > ArraysHashCode.multiints 19 avgt 30 17.252 ? 0.051 17.628 ? 0.155 ns/op > ArraysHashCode.multiints 20 avgt 30 20.193 ? 0.282 19.039 ? 0.441 ns/op > ArraysHashCode.multiints 25 avgt 30 20.209 ? 0.070 20.513 ? 0.071 ns/op > ArraysHashCode.multiints 30 avgt 30 23.157 ? 0.068 23.290 ? 0.165 ns/op > ArraysHashCode.multiints 35 avgt 30 28.671 ? 0.116 26.198 ? 0.127 ns/op <--- > ArraysHashCode.multiints 40 avgt 30 30.992 ? 0.068 27.342 ? 0.072 ns/op > ArraysHashCode.multiints 45 avgt 30 39.408 ? 1.428 32.170 ? 0.230 ns/op > ArraysHashCode.multiints 50 avgt 30 41.976 ? 0.442 33.103 ? 0.090 ns/op > ArraysHashCode.multiints 55 avgt 30 45.379 ? 0.236 35.899 ? 0.692 ns/op > ArraysHashCode.multiints 60 avgt 30 48.615 ? 0.249 35.709 ? 0.477 ns/op > ArraysHashCode.multiints 65 avgt 30 51.455 ? 0.213 38.275 ? 0.266 ns/op > ArraysHashCode.multiints 70 avgt 30 54.032 ? 0.324 37.985 ? 0.264 ns/op > ArraysHashCode.multiints 75 avgt 30 56.759 ? 0.164 39.446 ? 0.425 ns/op > ArraysHashCode.multiints 80 avgt 30 61.334 ? 0.267 41.521 ? 0.310 ns/op > ArraysHashCode.multiints 85 avgt 30 66.177 ? 0.299 44.136 ? 0.407 ns/op > ArraysHashCode.multiints 90 avgt 30 67.444 ? 0.282 42.909 ? 0.275 ns/op > ArraysHashCode.multiints 95 avgt 30 77.312 ? 0.303 49.078 ? 1.166 ns/op > ArraysHashCode.multiints ... Hi, I don't quite understand why there is a need to change LMUL from `m4` to `m2` if we are switching to use the stripmining approach. The tail calculation should normally share the code for `VEC_LOOP`, which also means we need to use some vector mask instructions to filter out the active elements for each loop iteration especially the iteration for handing the tail elements. And the vl returned by `vsetvli` tells us the number of elements which could be processed in parallel for one certain iteration ([1] is one example). I am not sure if you are trying this way. Do you have more details or code changes to share? Thanks. [1] https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#example-stripmine-sew ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17413#discussion_r1467614985 From simonis at openjdk.org Fri Jan 26 12:53:41 2024 From: simonis at openjdk.org (Volker Simonis) Date: Fri, 26 Jan 2024 12:53:41 GMT Subject: RFR: 8324241: Always record evol_method deps to avoid excessive method flushing [v4] In-Reply-To: <8GMtoehKpsPFAXIBfZnT9eLn9PjRuSvGfqvL6os8H9k=.9303d525-b1e5-4d8e-a45e-d55884063540@github.com> References: <8GMtoehKpsPFAXIBfZnT9eLn9PjRuSvGfqvL6os8H9k=.9303d525-b1e5-4d8e-a45e-d55884063540@github.com> Message-ID: <5EFsj5nHnxdCfiKaJrkrhmn1XjMCFZ0209AEzklnIPU=.5667fffc-b365-41db-9378-bfd4547287aa@github.com> On Fri, 26 Jan 2024 09:33:17 GMT, Aleksey Shipilev wrote: >> Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: >> >> Made the flag experimental and added an assertion to set_can_hotswap_or_post_breakpoint() > > src/hotspot/share/runtime/globals.hpp line 2016: > >> 2014: "Unconditionally record nmethod dependencies on class " \ >> 2015: "rewriting/transformation independently of the JVMTI " \ >> 2016: " can_{retransform/redefine}_classes capabilities.") \ > > Probably match the indenting of previous delcarations? Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17509#discussion_r1467617316 From simonis at openjdk.org Fri Jan 26 13:14:57 2024 From: simonis at openjdk.org (Volker Simonis) Date: Fri, 26 Jan 2024 13:14:57 GMT Subject: RFR: 8324241: Always record evol_method deps to avoid excessive method flushing [v5] In-Reply-To: References: Message-ID: > Currently we don't record dependencies on redefined methods (i.e. `evol_method` dependencies) in JIT compiled methods if none of the `can_redefine_classes`, `can_retransform_classes` or `can_generate_breakpoint_events` JVMTI capabalities is set. This means that if a JVMTI agent which requests one of these capabilities is dynamically attached, all the methods which have been JIT compiled until that point, will be marked for deoptimization and flushed from the code cache. For large, warmed-up applications this mean deoptimization and instant recompilation of thousands if not then-thousands of methods, which can lead to dramatic performance/latency drop-downs for several minutes. > > One could argue that dynamic agent attach is now deprecated anyway (see [JEP 451: Prepare to Disallow the Dynamic Loading of Agents](https://openjdk.org/jeps/451)) and this problem could be solved by making the recording of `evol_method` dependencies dependent on the new `-XX:+EnableDynamicAgentLoading` flag isntead of the concrete JVMTI capabilities (because the presence of the flag indicates that an agent will be loaded eventually). > > But there a single, however important exception to this rule and that's JFR. JFR is advertised as low overhead profiler which can be enabled in production at any time. However, when JFR is started dynamically (e.g. through JCMD or JMX) it will silently load a HotSpot internl JVMTI agent which requests the `can_retransform_classes` and retransforms some classes. This will inevitably trigger the deoptimization of all compiled methods as described above. > > I'd therefor like to propose to *always* and unconditionally record `evol_method` dependencies in JIT compiled code by exporting the relevant properties right at startup in `init_globals()`: > ```c++ > jint init_globals() { > management_init(); > JvmtiExport::initialize_oop_storage(); > +#if INCLUDE_JVMTI > + JvmtiExport::set_can_hotswap_or_post_breakpoint(true); > + JvmtiExport::set_all_dependencies_are_recorded(true); > +#endif > > > My measurements indicate that the overhead of doing so is minimal (around 1% increase of nmethod size) and justifies the benefit. E.g. a Spring Petclinic application started with `-Xbatch -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation` compiles about ~11500 methods (~9000 with C1 and ~2500 with C2) resulting in an aggregated nmethod size of around ~40bm. Additionally recording `evol_method` dependencies only increases this size be about 400kb. The ration remains about the same i... Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: Fixed whitepspace in flag documentation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17509/files - new: https://git.openjdk.org/jdk/pull/17509/files/29966635..fd1dbd9e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17509&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17509&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17509.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17509/head:pull/17509 PR: https://git.openjdk.org/jdk/pull/17509 From simonis at openjdk.org Fri Jan 26 13:14:58 2024 From: simonis at openjdk.org (Volker Simonis) Date: Fri, 26 Jan 2024 13:14:58 GMT Subject: RFR: 8324241: Always record evol_method deps to avoid excessive method flushing [v4] In-Reply-To: <8GMtoehKpsPFAXIBfZnT9eLn9PjRuSvGfqvL6os8H9k=.9303d525-b1e5-4d8e-a45e-d55884063540@github.com> References: <8GMtoehKpsPFAXIBfZnT9eLn9PjRuSvGfqvL6os8H9k=.9303d525-b1e5-4d8e-a45e-d55884063540@github.com> Message-ID: On Fri, 26 Jan 2024 09:36:49 GMT, Aleksey Shipilev wrote: >> Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: >> >> Made the flag experimental and added an assertion to set_can_hotswap_or_post_breakpoint() > > src/hotspot/share/prims/jvmtiExport.hpp line 159: > >> 157: // recorded from that point on. >> 158: assert(!_can_hotswap_or_post_breakpoint || on, "sanity check"); >> 159: _can_hotswap_or_post_breakpoint = (on != 0); > > Pre-existing: wild that this code checks `!= 0` against `bool`, when it could have just used the bool directly, like the new assert does. I see no reason to keep `!= 0` here. I wondered myself about this strange style. But it's there since the first OpenJDK commit (probably it was written by a C programmer where `bool` isn't a builtin type and he wanted to make sure that all non-zero values get mapped to 1). But as this style is used throughout the entire file, I prefer to keep it as is and not change a single occurrence only. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17509#discussion_r1467632151 From simonis at openjdk.org Fri Jan 26 13:14:58 2024 From: simonis at openjdk.org (Volker Simonis) Date: Fri, 26 Jan 2024 13:14:58 GMT Subject: Integrated: 8324241: Always record evol_method deps to avoid excessive method flushing In-Reply-To: References: Message-ID: On Sat, 20 Jan 2024 19:48:07 GMT, Volker Simonis wrote: > Currently we don't record dependencies on redefined methods (i.e. `evol_method` dependencies) in JIT compiled methods if none of the `can_redefine_classes`, `can_retransform_classes` or `can_generate_breakpoint_events` JVMTI capabalities is set. This means that if a JVMTI agent which requests one of these capabilities is dynamically attached, all the methods which have been JIT compiled until that point, will be marked for deoptimization and flushed from the code cache. For large, warmed-up applications this mean deoptimization and instant recompilation of thousands if not then-thousands of methods, which can lead to dramatic performance/latency drop-downs for several minutes. > > One could argue that dynamic agent attach is now deprecated anyway (see [JEP 451: Prepare to Disallow the Dynamic Loading of Agents](https://openjdk.org/jeps/451)) and this problem could be solved by making the recording of `evol_method` dependencies dependent on the new `-XX:+EnableDynamicAgentLoading` flag isntead of the concrete JVMTI capabilities (because the presence of the flag indicates that an agent will be loaded eventually). > > But there a single, however important exception to this rule and that's JFR. JFR is advertised as low overhead profiler which can be enabled in production at any time. However, when JFR is started dynamically (e.g. through JCMD or JMX) it will silently load a HotSpot internl JVMTI agent which requests the `can_retransform_classes` and retransforms some classes. This will inevitably trigger the deoptimization of all compiled methods as described above. > > I'd therefor like to propose to *always* and unconditionally record `evol_method` dependencies in JIT compiled code by exporting the relevant properties right at startup in `init_globals()`: > ```c++ > jint init_globals() { > management_init(); > JvmtiExport::initialize_oop_storage(); > +#if INCLUDE_JVMTI > + JvmtiExport::set_can_hotswap_or_post_breakpoint(true); > + JvmtiExport::set_all_dependencies_are_recorded(true); > +#endif > > > My measurements indicate that the overhead of doing so is minimal (around 1% increase of nmethod size) and justifies the benefit. E.g. a Spring Petclinic application started with `-Xbatch -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation` compiles about ~11500 methods (~9000 with C1 and ~2500 with C2) resulting in an aggregated nmethod size of around ~40bm. Additionally recording `evol_method` dependencies only increases this size be about 400kb. The ration remains about the same i... This pull request has now been integrated. Changeset: 62b3293d Author: Volker Simonis URL: https://git.openjdk.org/jdk/commit/62b3293df0442b06cd00488774db7b608baca774 Stats: 31 lines in 4 files changed: 20 ins; 0 del; 11 mod 8324241: Always record evol_method deps to avoid excessive method flushing Reviewed-by: eastigeevich, phh, coleenp, dlong, shade ------------- PR: https://git.openjdk.org/jdk/pull/17509 From ddong at openjdk.org Fri Jan 26 13:21:36 2024 From: ddong at openjdk.org (Denghui Dong) Date: Fri, 26 Jan 2024 13:21:36 GMT Subject: RFR: 8322694: C1: Handle Constant and IfOp in NullCheckEliminator [v2] In-Reply-To: References: <1vvyuwLRjlWItKwCyighjCSM5SNbO4CSEE59hQtCU24=.b4783e52-328a-4ce3-8c92-7b736cea7546@github.com> Message-ID: On Wed, 3 Jan 2024 13:37:21 GMT, Denghui Dong wrote: >> This patch added the support for Constant and IfOn in NullCheckEliminator to eliminate more null check. >> >> testing: tier 1-4 no extra test failure > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > update Could I have a second review? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17191#issuecomment-1912056841 From epeter at openjdk.org Fri Jan 26 13:34:48 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 26 Jan 2024 13:34:48 GMT Subject: RFR: 8317572: C2 SuperWord: refactor/improve TraceSuperWord, replace VectorizeDebugOption with TraceAutoVectorization Message-ID: Subtask of https://github.com/openjdk/jdk/pull/16620 I got approval to remove VectorizeDebugOption: [JDK-8320668](https://bugs.openjdk.org/browse/JDK-8320668) I want a more general flag for AutoVectorization, that can trace different components of AutoVectorization. It should be a CompileCommand, so that it can select which methods it traces for. TraceSuperWord should still look similar, and select a subset of the TraceAutoVectorization components (those for SuperWord), but still apply to all classes/methods. With more refactoring later in [JDK-8315361](https://bugs.openjdk.org/browse/JDK-8315361), this flag should become more usable and interpretable. How to use the flag: TODO ------------- Commit messages: - a bit more - reordering some things - 8317572 Changes: https://git.openjdk.org/jdk/pull/17586/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17586&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8317572 Stats: 476 lines in 10 files changed: 380 ins; 40 del; 56 mod Patch: https://git.openjdk.org/jdk/pull/17586.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17586/head:pull/17586 PR: https://git.openjdk.org/jdk/pull/17586 From epeter at openjdk.org Fri Jan 26 13:34:52 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 26 Jan 2024 13:34:52 GMT Subject: RFR: 8317572: C2 SuperWord: refactor/improve TraceSuperWord, replace VectorizeDebugOption with TraceAutoVectorization In-Reply-To: References: Message-ID: On Fri, 26 Jan 2024 12:49:50 GMT, Emanuel Peter wrote: > Subtask of https://github.com/openjdk/jdk/pull/16620 > > I got approval to remove VectorizeDebugOption: [JDK-8320668](https://bugs.openjdk.org/browse/JDK-8320668) > > I want a more general flag for AutoVectorization, that can trace different components of AutoVectorization. > It should be a CompileCommand, so that it can select which methods it traces for. > > TraceSuperWord should still look similar, and select a subset of the TraceAutoVectorization components (those for SuperWord), but still apply to all classes/methods. > > With more refactoring later in [JDK-8315361](https://bugs.openjdk.org/browse/JDK-8315361), this flag should become more usable and interpretable. > > How to use the flag: > TODO src/hotspot/share/opto/superword.cpp line 83: > 81: } > 82: > 83: #endif Note: initialization now happens via `_vtrace` field, and is constructed implicitly. src/hotspot/share/opto/superword.cpp line 539: > 537: } > 538: #endif > 539: This was only printed if `_do_vector_loop` was on, i.e. if `OptionVectorize` enabled (kinda odd anyway). And we already do `print_bb`, which prints all relevant nodes (enabled with `SW_INFO` or `TraceSuperWord`). src/hotspot/share/opto/superword.cpp line 873: > 871: } > 872: } > 873: #endif We already print the mem slice in `mem_slice_preds`. src/hotspot/share/opto/superword.cpp line 2421: > 2419: uint vlen_in_bytes = 0; > 2420: Node* vn = nullptr; > 2421: NOT_PRODUCT(if(is_trace_cmov()) {tty->print_cr("VPointer::output: %d executed first, %d executed last in pack", first->_idx, n->_idx); print_pack(p);}) This was behind the wrong flag `is_trace_cmov`, and I think it was never used anyway. src/hotspot/share/opto/vectorization.cpp line 49: > 47: _nstack(nstack), _analyze_only(analyze_only), _stack_idx(0) > 48: #ifndef PRODUCT > 49: , _tracer(phase->C->directive()->traceautovectorization_tags().at(TraceAutoVectorizationTag::POINTER_ANALYSIS)) I now do the ugly thing. Later, with the bigger refactoring, I will pass `VLoop` into the `VPointer`, and then we can access the flag via `VPointer -> VLoop -> VTrace`. src/hotspot/share/opto/vectorization.hpp line 50: > 48: }; > 49: #endif > 50: The idea is that this is going to be a "component" of VLoop, once I do the bigger refactoring. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17586#discussion_r1467641637 PR Review Comment: https://git.openjdk.org/jdk/pull/17586#discussion_r1467644151 PR Review Comment: https://git.openjdk.org/jdk/pull/17586#discussion_r1467644620 PR Review Comment: https://git.openjdk.org/jdk/pull/17586#discussion_r1467646109 PR Review Comment: https://git.openjdk.org/jdk/pull/17586#discussion_r1467647587 PR Review Comment: https://git.openjdk.org/jdk/pull/17586#discussion_r1467648209 From epeter at openjdk.org Fri Jan 26 13:45:38 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 26 Jan 2024 13:45:38 GMT Subject: RFR: 8315361: C2 SuperWord: refactor out loop analysis into shared auto-vectorization facility VLoopAnalyzer [v12] In-Reply-To: References: Message-ID: <0UhGHPIfA4MjsXR2VkW5b6oUrbJcId6MeMJemEv85gc=.a2d4a1a7-9373-4b75-bd6d-de54813d35d7@github.com> On Thu, 25 Jan 2024 20:07:33 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> remove SuperWord::init, and reserve space in data structures > > For easy reviewing I would suggest to separate TraceAutovectorization to separate RFE. > I also think upper case `V` in name looks better: TraceAutoVectorization. > > Also renaming which affects platform specific code could be in separate RFE. > > These 2 changes are easy to review and can be pushed first. > > I start looking on VLoop related changes and it will take time. @vnkozlov I sent out 3 RFE's to split this up. I may produce more. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16620#issuecomment-1912089472 From chagedorn at openjdk.org Fri Jan 26 14:28:43 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 26 Jan 2024 14:28:43 GMT Subject: [jdk22] RFR: 8324688: C2: Disable ReduceAllocationMerges by default In-Reply-To: References: Message-ID: On Thu, 25 Jan 2024 10:27:37 GMT, Christian Hagedorn wrote: > Due to several recent bug reports after the integration of [JDK-8287061](https://bugs.openjdk.org/browse/JDK-8287061) in JDK 22 (latest one being [JDK-8322854](https://bugs.openjdk.org/browse/JDK-8322854)), we've decided together with @JohnTortugo that we want to minimize the risk and that it's best to disable `ReduceAllocationMerges` by default for JDK 22 which disables the optimizations of [JDK-8287061](https://bugs.openjdk.org/browse/JDK-8287061). > > Thanks, > Christian Thanks Vladimir for the approval and for the review! Thanks Cesar for your review! ------------- PR Comment: https://git.openjdk.org/jdk22/pull/97#issuecomment-1912148889 From chagedorn at openjdk.org Fri Jan 26 14:28:44 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 26 Jan 2024 14:28:44 GMT Subject: [jdk22] Integrated: 8324688: C2: Disable ReduceAllocationMerges by default In-Reply-To: References: Message-ID: <-bdmpQhpQjWe1wLzJB7lNdBEZJsIQEVDCFhZzjZlopQ=.e2228649-5047-4273-bacd-361964763bb3@github.com> On Thu, 25 Jan 2024 10:27:37 GMT, Christian Hagedorn wrote: > Due to several recent bug reports after the integration of [JDK-8287061](https://bugs.openjdk.org/browse/JDK-8287061) in JDK 22 (latest one being [JDK-8322854](https://bugs.openjdk.org/browse/JDK-8322854)), we've decided together with @JohnTortugo that we want to minimize the risk and that it's best to disable `ReduceAllocationMerges` by default for JDK 22 which disables the optimizations of [JDK-8287061](https://bugs.openjdk.org/browse/JDK-8287061). > > Thanks, > Christian This pull request has now been integrated. Changeset: b7f38fc5 Author: Christian Hagedorn URL: https://git.openjdk.org/jdk22/commit/b7f38fc54f7121e518db1e19c1ca7e80744b6798 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8324688: C2: Disable ReduceAllocationMerges by default Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.org/jdk22/pull/97 From chagedorn at openjdk.org Fri Jan 26 15:19:48 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 26 Jan 2024 15:19:48 GMT Subject: RFR: 8317572: C2 SuperWord: refactor/improve TraceSuperWord, replace VectorizeDebugOption with TraceAutoVectorization In-Reply-To: References: Message-ID: On Fri, 26 Jan 2024 12:49:50 GMT, Emanuel Peter wrote: > Subtask of https://github.com/openjdk/jdk/pull/16620 > > I got approval to remove VectorizeDebugOption: [JDK-8320668](https://bugs.openjdk.org/browse/JDK-8320668) > > I want a more general flag for AutoVectorization, that can trace different components of AutoVectorization. > It should be a CompileCommand, so that it can select which methods it traces for. > > TraceSuperWord should still look similar, and select a subset of the TraceAutoVectorization components (those for SuperWord), but still apply to all classes/methods. > > With more refactoring later in [JDK-8315361](https://bugs.openjdk.org/browse/JDK-8315361), this flag should become more usable and interpretable. Especially, the idea is that different components of the `VLoop / VLoopAnalyzer` can have tracing enabled / disabled. > > **How to use the flag:** > Get "help", i.e. see all available tags: > `./java -Xcomp -XX:CompileCommand=TraceAutoVectorization,*::*,help --version` > > See "rejections" (i.e. failures where we don't vectorize) and successes (using TraceNewVectors): > `./java -Xcomp -XX:CompileCommand=TraceAutoVectorization,*::*,SW_REJECTIONS -XX:+TraceNewVectors --version` > The results are currently underwhealming. I will have to track many more failures, and I will do that with the bigger refactoring, when I move around the code and require error code returning everywhere, and then I can use that error code for printing. Generally looks good! I have some comments. src/hotspot/share/compiler/compilerDirectives.cpp line 305: > 303: _directive(d), > 304: _ideal_phase_name_set(PHASE_NUM_TYPES, mtCompiler), > 305: _traceautovectorization_tags(TRACEAUTOVECTORIZATION_TAG_NUM, mtCompiler) I also suggest to use underlines and at other places to separate the words trace, auto, and vectorization: Suggestion: _trace_auto_vectorization_tags(TRACE_AUTO_VECTORIZATION_TAG_NUM, mtCompiler) src/hotspot/share/opto/superword.cpp line 925: > 923: void SuperWord::mem_slice_preds(Node* start, Node* stop, GrowableArray &preds) { > 924: assert(preds.length() == 0, "start empty"); > 925: Node* n = start; There is still a usage of `TraceSuperWord` on L927. Should this also be replaced? src/hotspot/share/opto/superword.cpp line 1256: > 1254: > 1255: #ifndef PRODUCT > 1256: if(is_trace_superword_alignment()) { Suggestion: if (is_trace_superword_alignment()) { src/hotspot/share/opto/superword.cpp line 1280: > 1278: _packset.append(pair); > 1279: #ifndef PRODUCT > 1280: if(is_trace_superword_alignment()) { Suggestion: if (is_trace_superword_alignment()) { src/hotspot/share/opto/superword.cpp line 1307: > 1305: int align = alignment(s1); > 1306: #ifndef PRODUCT > 1307: if(is_trace_superword_alignment()) { Suggestion: if (is_trace_superword_alignment()) { src/hotspot/share/opto/superword.cpp line 1354: > 1352: _packset.append(pair); > 1353: #ifndef PRODUCT > 1354: if(is_trace_superword_alignment()) { Suggestion: if (is_trace_superword_alignment()) { src/hotspot/share/opto/superword.hpp line 282: > 280: return TraceSuperWord || > 281: _vtrace.is_trace(TraceAutoVectorizationTag::SW_PRECONDITION); > 282: } I suggest to add new lines between the methods src/hotspot/share/opto/traceAutoVectorizationTag.hpp line 31: > 29: > 30: // TODO: adjust tags to what we need > 31: #define COMPILER_TRACEAUTOVECTORIZATION_TAG(flags) \ I suggest to use underlines, same for `TRACEAUTOVECTORIZATION_TAG_NUM/NONE`: Suggestion: #define COMPILER_TRACE_AUTO_VECTORIZATION_TAG(flags) \ src/hotspot/share/opto/traceAutoVectorizationTag.hpp line 41: > 39: flags(SW_REJECTIONS, "Trace SuperWord rejections (non vectorizations)") \ > 40: flags(SW_PACKSET, "Trace SuperWord packset at different stages") \ > 41: flags(SW_INFO, "Trace SuperWord info") \ Maybe mention here that this tag prints some general info + the most important SW tags src/hotspot/share/opto/traceAutoVectorizationTag.hpp line 42: > 40: flags(SW_PACKSET, "Trace SuperWord packset at different stages") \ > 41: flags(SW_INFO, "Trace SuperWord info") \ > 42: flags(SW_VERBOSE, "Trace SuperWord verbose (all)") \ Maybe mention here that verbose prints all SW tags src/hotspot/share/opto/traceAutoVectorizationTag.hpp line 75: > 73: } > 74: > 75: class TraceAutoVectorizationTagNameIter { This class is almost identical to `PhaseNameIter`. Can the code somehow be shared? src/hotspot/share/opto/traceAutoVectorizationTag.hpp line 144: > 142: set_bit = false; > 143: } > 144: TraceAutoVectorizationTag tat = find_tag(tag_name); `tat` is not very intuitive when I've read the code below. I suggest to go with tag as it should be clear what kind of tags we mean in this context. Suggestion: TraceAutoVectorizationTag tag = find_tag(tag_name); ------------- PR Review: https://git.openjdk.org/jdk/pull/17586#pullrequestreview-1845862666 PR Review Comment: https://git.openjdk.org/jdk/pull/17586#discussion_r1467721995 PR Review Comment: https://git.openjdk.org/jdk/pull/17586#discussion_r1467775132 PR Review Comment: https://git.openjdk.org/jdk/pull/17586#discussion_r1467726494 PR Review Comment: https://git.openjdk.org/jdk/pull/17586#discussion_r1467726661 PR Review Comment: https://git.openjdk.org/jdk/pull/17586#discussion_r1467726799 PR Review Comment: https://git.openjdk.org/jdk/pull/17586#discussion_r1467726943 PR Review Comment: https://git.openjdk.org/jdk/pull/17586#discussion_r1467736522 PR Review Comment: https://git.openjdk.org/jdk/pull/17586#discussion_r1467720837 PR Review Comment: https://git.openjdk.org/jdk/pull/17586#discussion_r1467783144 PR Review Comment: https://git.openjdk.org/jdk/pull/17586#discussion_r1467752599 PR Review Comment: https://git.openjdk.org/jdk/pull/17586#discussion_r1467733616 PR Review Comment: https://git.openjdk.org/jdk/pull/17586#discussion_r1467758954 From rkennke at openjdk.org Fri Jan 26 15:23:46 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 26 Jan 2024 15:23:46 GMT Subject: RFR: 8324734: Remove too-strict assert(VM_Version::supports_evex()) in Assembler::locate_operand() Message-ID: <776ui1DnJfcgz89q2aPAeOyhOYKalPhAE1bMR28wK6I=.19728df4-3710-4503-a726-8ce217501f67@github.com> Details see bug report. The gist is that HotSpot downgrades to UseAVX=2 on some processors, and reports supports_evex() == false, but the instruction decoder can still encounter EVEX instructions when (e.g.) hitting a SIGBUS in memset() - which does have EVEX instructions. Testing: - [x] runtime/Unsafe/InternalErrorTest.java - [x] tier1 ------------- Commit messages: - 8324734: Remove too-strict assert(VM_Version::supports_evex()) in Assembler::locate_operand() Changes: https://git.openjdk.org/jdk/pull/17590/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17590&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8324734 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17590.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17590/head:pull/17590 PR: https://git.openjdk.org/jdk/pull/17590 From epeter at openjdk.org Fri Jan 26 15:34:36 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 26 Jan 2024 15:34:36 GMT Subject: RFR: 8317572: C2 SuperWord: refactor/improve TraceSuperWord, replace VectorizeDebugOption with TraceAutoVectorization In-Reply-To: References: Message-ID: On Fri, 26 Jan 2024 15:09:43 GMT, Christian Hagedorn wrote: >> Subtask of https://github.com/openjdk/jdk/pull/16620 >> >> I got approval to remove VectorizeDebugOption: [JDK-8320668](https://bugs.openjdk.org/browse/JDK-8320668) >> >> I want a more general flag for AutoVectorization, that can trace different components of AutoVectorization. >> It should be a CompileCommand, so that it can select which methods it traces for. >> >> TraceSuperWord should still look similar, and select a subset of the TraceAutoVectorization components (those for SuperWord), but still apply to all classes/methods. >> >> With more refactoring later in [JDK-8315361](https://bugs.openjdk.org/browse/JDK-8315361), this flag should become more usable and interpretable. Especially, the idea is that different components of the `VLoop / VLoopAnalyzer` can have tracing enabled / disabled. >> >> **How to use the flag:** >> Get "help", i.e. see all available tags: >> `./java -Xcomp -XX:CompileCommand=TraceAutoVectorization,*::*,help --version` >> >> See "rejections" (i.e. failures where we don't vectorize) and successes (using TraceNewVectors): >> `./java -Xcomp -XX:CompileCommand=TraceAutoVectorization,*::*,SW_REJECTIONS -XX:+TraceNewVectors --version` >> The results are currently underwhealming. I will have to track many more failures, and I will do that with the bigger refactoring, when I move around the code and require error code returning everywhere, and then I can use that error code for printing. > > src/hotspot/share/opto/superword.cpp line 925: > >> 923: void SuperWord::mem_slice_preds(Node* start, Node* stop, GrowableArray &preds) { >> 924: assert(preds.length() == 0, "start empty"); >> 925: Node* n = start; > > There is still a usage of `TraceSuperWord` on L927. Should this also be replaced? It will be removed with https://github.com/openjdk/jdk/pull/17585 anyway ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17586#discussion_r1467800698 From epeter at openjdk.org Fri Jan 26 15:42:50 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 26 Jan 2024 15:42:50 GMT Subject: RFR: 8317572: C2 SuperWord: refactor/improve TraceSuperWord, replace VectorizeDebugOption with TraceAutoVectorization [v2] In-Reply-To: References: Message-ID: > Subtask of https://github.com/openjdk/jdk/pull/16620 > > I got approval to remove VectorizeDebugOption: [JDK-8320668](https://bugs.openjdk.org/browse/JDK-8320668) > > I want a more general flag for AutoVectorization, that can trace different components of AutoVectorization. > It should be a CompileCommand, so that it can select which methods it traces for. > > TraceSuperWord should still look similar, and select a subset of the TraceAutoVectorization components (those for SuperWord), but still apply to all classes/methods. > > With more refactoring later in [JDK-8315361](https://bugs.openjdk.org/browse/JDK-8315361), this flag should become more usable and interpretable. Especially, the idea is that different components of the `VLoop / VLoopAnalyzer` can have tracing enabled / disabled. > > **How to use the flag:** > Get "help", i.e. see all available tags: > `./java -Xcomp -XX:CompileCommand=TraceAutoVectorization,*::*,help --version` > > See "rejections" (i.e. failures where we don't vectorize) and successes (using TraceNewVectors): > `./java -Xcomp -XX:CompileCommand=TraceAutoVectorization,*::*,SW_REJECTIONS -XX:+TraceNewVectors --version` > The results are currently underwhealming. I will have to track many more failures, and I will do that with the bigger refactoring, when I move around the code and require error code returning everywhere, and then I can use that error code for printing. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17586/files - new: https://git.openjdk.org/jdk/pull/17586/files/0ef53ac4..7772f60a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17586&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17586&range=00-01 Stats: 7 lines in 3 files changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/17586.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17586/head:pull/17586 PR: https://git.openjdk.org/jdk/pull/17586 From rgiulietti at openjdk.org Fri Jan 26 15:48:46 2024 From: rgiulietti at openjdk.org (Raffaello Giulietti) Date: Fri, 26 Jan 2024 15:48:46 GMT Subject: RFR: 8282365: Consolidate and improve division by constant idealizations [v44] In-Reply-To: References: Message-ID: On Wed, 24 Jan 2024 18:31:07 GMT, Raffaello Giulietti wrote: >> I've the impression that we can replace `m < c * d <= m + m / v` with the stricter `m < c * d < m + m / v` by using `N_neg - 1` instead of `N_neg`, but I need some time to have a solid proof. >> >> That would simplify the code of the algorithm. > > But IMO the current algorithm is correct. Nope, my impression [here](https://github.com/openjdk/jdk/pull/9947#discussion_r1465362000) turns out to be wrong. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1467816736 From epeter at openjdk.org Fri Jan 26 15:52:51 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 26 Jan 2024 15:52:51 GMT Subject: RFR: 8317572: C2 SuperWord: refactor/improve TraceSuperWord, replace VectorizeDebugOption with TraceAutoVectorization [v3] In-Reply-To: References: Message-ID: <9lEoqJjOm2eIwSRxJVGWHmr6lj7ETwte7YjZSaSI2NY=.5b8b0854-045f-4d27-9cc1-c4dc38985917@github.com> > Subtask of https://github.com/openjdk/jdk/pull/16620 > > I got approval to remove VectorizeDebugOption: [JDK-8320668](https://bugs.openjdk.org/browse/JDK-8320668) > > I want a more general flag for AutoVectorization, that can trace different components of AutoVectorization. > It should be a CompileCommand, so that it can select which methods it traces for. > > TraceSuperWord should still look similar, and select a subset of the TraceAutoVectorization components (those for SuperWord), but still apply to all classes/methods. > > With more refactoring later in [JDK-8315361](https://bugs.openjdk.org/browse/JDK-8315361), this flag should become more usable and interpretable. Especially, the idea is that different components of the `VLoop / VLoopAnalyzer` can have tracing enabled / disabled. > > **How to use the flag:** > Get "help", i.e. see all available tags: > `./java -Xcomp -XX:CompileCommand=TraceAutoVectorization,*::*,help --version` > > See "rejections" (i.e. failures where we don't vectorize) and successes (using TraceNewVectors): > `./java -Xcomp -XX:CompileCommand=TraceAutoVectorization,*::*,SW_REJECTIONS -XX:+TraceNewVectors --version` > The results are currently underwhealming. I will have to track many more failures, and I will do that with the bigger refactoring, when I move around the code and require error code returning everywhere, and then I can use that error code for printing. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: more for Christian ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17586/files - new: https://git.openjdk.org/jdk/pull/17586/files/7772f60a..57063915 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17586&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17586&range=01-02 Stats: 42 lines in 7 files changed: 11 ins; 1 del; 30 mod Patch: https://git.openjdk.org/jdk/pull/17586.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17586/head:pull/17586 PR: https://git.openjdk.org/jdk/pull/17586 From jkarthikeyan at openjdk.org Fri Jan 26 15:54:25 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Fri, 26 Jan 2024 15:54:25 GMT Subject: RFR: 8324655: Identify integer minimum and maximum patterns created with if statements In-Reply-To: References: Message-ID: On Thu, 25 Jan 2024 18:15:21 GMT, Jasmine Karthikeyan wrote: > Hi all, I've created this patch which aims to convert common integer mininum and maximum patterns created using if statements into Min and Max nodes. These patterns are usually in the form of `a > b ? a : b` and similar, as well as patterns such as `if (a > b) b = a;`. While this transform doesn't generally improve code generation it's own, it simplifies control flow and creates new opportunities for vectorization. > > I've created a benchmark for the PR, and I've attached some data from my (Zen 3) machine: > > Baseline Patch Improvement > Benchmark Mode Cnt Score Error Units Score Error Units > IfMinMax.testReductionInt avgt 15 500.307 ? 16.687 ns/op 509.383 ? 32.645 ns/op (no change)* > IfMinMax.testReductionLong avgt 15 493.184 ? 17.596 ns/op 513.587 ? 28.339 ns/op (no change)* > IfMinMax.testSingleInt avgt 15 3.588 ? 0.540 ns/op 2.965 ? 1.380 ns/op (no change) > IfMinMax.testSingleLong avgt 15 3.673 ? 0.128 ns/op 3.506 ? 0.590 ns/op (no change) > IfMinMax.testVectorInt avgt 15 340.425 ? 13.123 ns/op 59.689 ? 7.509 ns/op + 5.7x > IfMinMax.testVectorLong avgt 15 326.420 ? 15.554 ns/op 117.190 ? 5.622 ns/op + 2.8x > > > * After writing this benchmark I discovered that the compiler doesn't seem to create some simple min/max reductions, even when using Math.min/max() directly. Is this known or should I create a followup RFE for this? > > The patch passes tier 1-3 testing on linux x64. Reviews or comments would be appreciated! I'm not super sure how to do that cleanly either, and it seems some backends (like ppc) don't use cmovs to implement min/max in all cases as well. I think in general it's a good idea to gate the transform based on the if-block probability though, so I'll go ahead and make that change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17574#issuecomment-1912286243 From jkarthikeyan at openjdk.org Fri Jan 26 16:05:45 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Fri, 26 Jan 2024 16:05:45 GMT Subject: RFR: 8324655: Identify integer minimum and maximum patterns created with if statements [v2] In-Reply-To: References: Message-ID: > Hi all, I've created this patch which aims to convert common integer mininum and maximum patterns created using if statements into Min and Max nodes. These patterns are usually in the form of `a > b ? a : b` and similar, as well as patterns such as `if (a > b) b = a;`. While this transform doesn't generally improve code generation it's own, it simplifies control flow and creates new opportunities for vectorization. > > I've created a benchmark for the PR, and I've attached some data from my (Zen 3) machine: > > Baseline Patch Improvement > Benchmark Mode Cnt Score Error Units Score Error Units > IfMinMax.testReductionInt avgt 15 500.307 ? 16.687 ns/op 509.383 ? 32.645 ns/op (no change)* > IfMinMax.testReductionLong avgt 15 493.184 ? 17.596 ns/op 513.587 ? 28.339 ns/op (no change)* > IfMinMax.testSingleInt avgt 15 3.588 ? 0.540 ns/op 2.965 ? 1.380 ns/op (no change) > IfMinMax.testSingleLong avgt 15 3.673 ? 0.128 ns/op 3.506 ? 0.590 ns/op (no change) > IfMinMax.testVectorInt avgt 15 340.425 ? 13.123 ns/op 59.689 ? 7.509 ns/op + 5.7x > IfMinMax.testVectorLong avgt 15 326.420 ? 15.554 ns/op 117.190 ? 5.622 ns/op + 2.8x > > > * After writing this benchmark I discovered that the compiler doesn't seem to create some simple min/max reductions, even when using Math.min/max() directly. Is this known or should I create a followup RFE for this? > > The patch passes tier 1-3 testing on linux x64. Reviews or comments would be appreciated! Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: Don't transform highly predictable branches ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17574/files - new: https://git.openjdk.org/jdk/pull/17574/files/8c85ab6a..bafabad2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17574&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17574&range=00-01 Stats: 10 lines in 1 file changed: 8 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17574.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17574/head:pull/17574 PR: https://git.openjdk.org/jdk/pull/17574 From epeter at openjdk.org Fri Jan 26 16:17:09 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 26 Jan 2024 16:17:09 GMT Subject: RFR: 8317572: C2 SuperWord: refactor/improve TraceSuperWord, replace VectorizeDebugOption with TraceAutoVectorization [v4] In-Reply-To: References: Message-ID: > Subtask of https://github.com/openjdk/jdk/pull/16620 > > I got approval to remove VectorizeDebugOption: [JDK-8320668](https://bugs.openjdk.org/browse/JDK-8320668) > > I want a more general flag for AutoVectorization, that can trace different components of AutoVectorization. > It should be a CompileCommand, so that it can select which methods it traces for. > > TraceSuperWord should still look similar, and select a subset of the TraceAutoVectorization components (those for SuperWord), but still apply to all classes/methods. > > With more refactoring later in [JDK-8315361](https://bugs.openjdk.org/browse/JDK-8315361), this flag should become more usable and interpretable. Especially, the idea is that different components of the `VLoop / VLoopAnalyzer` can have tracing enabled / disabled. > > **How to use the flag:** > Get "help", i.e. see all available tags: > `./java -Xcomp -XX:CompileCommand=TraceAutoVectorization,*::*,help --version` > > See "rejections" (i.e. failures where we don't vectorize) and successes (using TraceNewVectors): > `./java -Xcomp -XX:CompileCommand=TraceAutoVectorization,*::*,SW_REJECTIONS -XX:+TraceNewVectors --version` > The results are currently underwhealming. I will have to track many more failures, and I will do that with the bigger refactoring, when I move around the code and require error code returning everywhere, and then I can use that error code for printing. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: move code to StringUtils::CommaSeparatedStringIterator ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17586/files - new: https://git.openjdk.org/jdk/pull/17586/files/57063915..fc59b6bb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17586&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17586&range=02-03 Stats: 131 lines in 3 files changed: 45 ins; 82 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/17586.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17586/head:pull/17586 PR: https://git.openjdk.org/jdk/pull/17586 From epeter at openjdk.org Fri Jan 26 16:17:09 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 26 Jan 2024 16:17:09 GMT Subject: RFR: 8317572: C2 SuperWord: refactor/improve TraceSuperWord, replace VectorizeDebugOption with TraceAutoVectorization [v4] In-Reply-To: References: Message-ID: On Fri, 26 Jan 2024 14:40:15 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> move code to StringUtils::CommaSeparatedStringIterator > > src/hotspot/share/opto/traceAutoVectorizationTag.hpp line 75: > >> 73: } >> 74: >> 75: class TraceAutoVectorizationTagNameIter { > > This class is almost identical to `PhaseNameIter`. Can the code somehow be shared? I moved both to stringUtils.hpp ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17586#discussion_r1467849055 From epeter at openjdk.org Fri Jan 26 16:19:56 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 26 Jan 2024 16:19:56 GMT Subject: RFR: 8317572: C2 SuperWord: refactor/improve TraceSuperWord, replace VectorizeDebugOption with TraceAutoVectorization [v5] In-Reply-To: References: Message-ID: > Subtask of https://github.com/openjdk/jdk/pull/16620 > > I got approval to remove VectorizeDebugOption: [JDK-8320668](https://bugs.openjdk.org/browse/JDK-8320668) > > I want a more general flag for AutoVectorization, that can trace different components of AutoVectorization. > It should be a CompileCommand, so that it can select which methods it traces for. > > TraceSuperWord should still look similar, and select a subset of the TraceAutoVectorization components (those for SuperWord), but still apply to all classes/methods. > > With more refactoring later in [JDK-8315361](https://bugs.openjdk.org/browse/JDK-8315361), this flag should become more usable and interpretable. Especially, the idea is that different components of the `VLoop / VLoopAnalyzer` can have tracing enabled / disabled. > > **How to use the flag:** > Get "help", i.e. see all available tags: > `./java -Xcomp -XX:CompileCommand=TraceAutoVectorization,*::*,help --version` > > See "rejections" (i.e. failures where we don't vectorize) and successes (using TraceNewVectors): > `./java -Xcomp -XX:CompileCommand=TraceAutoVectorization,*::*,SW_REJECTIONS -XX:+TraceNewVectors --version` > The results are currently underwhealming. I will have to track many more failures, and I will do that with the bigger refactoring, when I move around the code and require error code returning everywhere, and then I can use that error code for printing. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: fix a test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17586/files - new: https://git.openjdk.org/jdk/pull/17586/files/fc59b6bb..c9ac4344 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17586&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17586&range=03-04 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17586.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17586/head:pull/17586 PR: https://git.openjdk.org/jdk/pull/17586 From epeter at openjdk.org Fri Jan 26 16:23:55 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 26 Jan 2024 16:23:55 GMT Subject: RFR: 8324765: C2 SuperWord: remove dead code: SuperWord::insert_extracts Message-ID: Subtask of https://github.com/openjdk/jdk/pull/16620 As far as all tests and code study have showed me, SuperWord::insert_extracts is dead. I am replacing it with verification code, that checks that no ExtractNode is required. **Details** All the relevant cases are marked as "unprofitable" in `SuperWord::profitable`, see: https://github.com/openjdk/jdk/blob/dfdd2174d7af5e3e995147484db17b45b006f6d0/src/hotspot/share/opto/superword.cpp#L1912-L1915 ------------- Commit messages: - fix reduction case - 8324765 Changes: https://git.openjdk.org/jdk/pull/17589/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17589&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8324765 Stats: 54 lines in 2 files changed: 3 ins; 28 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/17589.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17589/head:pull/17589 PR: https://git.openjdk.org/jdk/pull/17589 From lucy at openjdk.org Fri Jan 26 16:36:37 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Fri, 26 Jan 2024 16:36:37 GMT Subject: RFR: 8322649: Improve class initialization barrier in TemplateTable::_new for S390 [v2] In-Reply-To: References: Message-ID: On Thu, 25 Jan 2024 05:39:45 GMT, Amit Kumar wrote: >> s390 Port implementation for https://github.com/openjdk/jdk/pull/17006, >> >> Testing: >> Build: fastdebug + release >> Test: Tier1 {fastdebug} > > Amit Kumar has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - Merge master > - s390 port Looks good. ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17481#pullrequestreview-1846098485 From epeter at openjdk.org Fri Jan 26 17:11:40 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 26 Jan 2024 17:11:40 GMT Subject: RFR: 8324775: C2 SuperWord: refactor visited sets Message-ID: Subtask of https://github.com/openjdk/jdk/pull/16620 Currently, the visited set is a "global" set that is reused all through SuperWord: `_visited`, `_post_visited`, and also `_stk`. This makes other refactorings difficult. I am refactoring all related code to make the visited sets local, with use of ResourceMark. I am also refactoring the `independent` queries: from using recursive function calls to iterative. At the same time, I wanted the code of `find_dependence` to become easier, and closer to `independent`, hence I re-defined it to `mutually_independent`. ------------- Commit messages: - more removal - 8324775 Changes: https://git.openjdk.org/jdk/pull/17594/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17594&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8324775 Stats: 128 lines in 2 files changed: 27 ins; 54 del; 47 mod Patch: https://git.openjdk.org/jdk/pull/17594.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17594/head:pull/17594 PR: https://git.openjdk.org/jdk/pull/17594 From rgiulietti at openjdk.org Fri Jan 26 17:25:55 2024 From: rgiulietti at openjdk.org (Raffaello Giulietti) Date: Fri, 26 Jan 2024 17:25:55 GMT Subject: RFR: 8282365: Consolidate and improve division by constant idealizations [v46] In-Reply-To: References: Message-ID: On Sat, 20 Jan 2024 12:17:04 GMT, Quan Anh Mai wrote: >> This patch implements idealisation for unsigned divisions to change a division by a constant into a series of multiplication and shift. I also change the idealisation of `DivI` to get a more efficient series when the magic constant overflows an int32. >> >> In general, the idea behind a signed division transformation is that for a positive constant `d`, we would need to find constants `c` and `m` so that: >> >> floor(x / d) = floor(x * c / 2**m) for 0 < x < 2**(N - 1) (1) >> ceil(x / d) = floor(x * c / 2**m) + 1 for -2**(N - 1) <= x < 0 (2) >> >> The implementation in the original book takes into consideration that the machine may not be able to perform the full multiplication `x * c`, so the constant overflow and we need to add back the dividend as in `DivLNode::Ideal` cases. However, for int32 division, `x * c` cannot overflow an int64. As a result, it is always feasible to just calculate the product and shift the result. >> >> For unsigned multiplication, the situation is somewhat trickier because the condition needs to be twice as strong (the condition (1) and (2) above are mostly the same). This results in the magic constant `c` calculated based on the method presented in Hacker's Delight by Henry S. Warren, Jr. may overflow an uintN. For int division, we can depend on the theorem devised by Arch D. Robison in N-Bit Unsigned Division Via N-Bit Multiply-Add, which states that there exists either: >> >> c1 in uint32 and m1, such that floor(x / d) = floor(x * c1 / 2**m1) for 0 < x < 2**32 (3) >> c2 in uint32 and m2, such that floor(x / d) = floor((x + 1) * c2 / 2**m2) for 0 < x < 2**32 (4) >> >> which means that either `x * c1` never overflows an uint64 or `(x + 1) * c2` never overflows an uint64. And we can perform a full multiplication. >> >> For longs, there is no way to do a full multiplication so we do some basic transformations to achieve a computable formula. The details I have written as comments in the overflow case. >> >> More tests are added to cover the possible patterns. >> >> Please take a look and have some reviews. Thank you very much. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > just be simple src/hotspot/share/opto/divconstants.cpp line 55: > 53: // > 54: // ceil(x / d) = floor(x * c / m) + 1 for every integer x in [-N, 0) > 55: // For the record, the domain for non-negative dividends can be extended to `[0, v + d)`, which is usually larger than `[0, N]`, since `v <= N < v + d`. Similarly, the domain for negative dividends can be extended to `(-(v + d), 0)`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1467917399 From kvn at openjdk.org Fri Jan 26 17:31:41 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 26 Jan 2024 17:31:41 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store [v4] In-Reply-To: References: Message-ID: <2s8lhnt7FIJ1EuMpSIZ_zu9H6kMnzUQf2gCgdPUjUjQ=.b2eae978-fbc3-4cfc-a54f-483a3b517a09@github.com> On Fri, 26 Jan 2024 09:35:50 GMT, Emanuel Peter wrote: >> This is a feature requiested by @RogerRiggs and @cl4es . >> >> **Idea** >> >> Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get speedups by using Unsafe (e.g. `Unsafe.putLongUnaligned`), or ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code. >> >> This patch here supports a few simple use-cases, like these: >> >> Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant: >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395 >> >> Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the splitting (i.e. shifting and truncation), and directly store the variable: >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456 >> >> The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian): >> >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338 >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472 >> >> **Details** >> >> This draft currently implements the optimization in an additional special IGVN phase: >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485 >> >> We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. We essentially try to establish a chain of mergable stores: >> >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806 >> >> Mergable stores must have the same Opcode (implies they have the same element type and hence size). Further, mergable stores must have the same control (or be separated by only a RangeCheck). Further, they must either bot... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Add diagnostic flag MergeStores If you think about it, this is vectorization. The question is: can we use our auto-vectorization code for it? Or at least part of it. I understand that we may need new code to combine stored values into one vector. src/hotspot/share/opto/c2_globals.hpp line 362: > 360: notproduct(bool, TraceMergeStores, false, \ > 361: "Trace creation of merged stores") \ > 362: \ The flag should be `develop` since it is under `#ifdef ASSERT`. ------------- PR Review: https://git.openjdk.org/jdk/pull/16245#pullrequestreview-1846173446 PR Review Comment: https://git.openjdk.org/jdk/pull/16245#discussion_r1467918074 From kvn at openjdk.org Fri Jan 26 17:44:37 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 26 Jan 2024 17:44:37 GMT Subject: RFR: 8322996: BoxLockNode creation fails with assert(reg < CHUNK_SIZE) failed: sanity [v5] In-Reply-To: <5qX3luVttnofEYa0QzvwJIEjXu7ndtmt8Bz2vdat-4c=.b73470b4-ec7b-4e1d-81ba-f6edb46a9cb0@github.com> References: <5qX3luVttnofEYa0QzvwJIEjXu7ndtmt8Bz2vdat-4c=.b73470b4-ec7b-4e1d-81ba-f6edb46a9cb0@github.com> Message-ID: On Fri, 26 Jan 2024 09:56:05 GMT, Daniel Lund?n wrote: >> I am not worry about exit in`load_interpreter_state()` there is check after call. >> >> There is call to `record_profiled_parameters_for_speculation()` after `shared_lock()` call in `do_method_entry()`. It does not check `failing()`. May be add check `if (!failing())` when calling it (or inside it). >> >> For `do_one_bytecode()` case the only question is if IGVN code can handle such bailout. It is called at the end of method. If not, we need to add the check. > >> There is call to record_profiled_parameters_for_speculation() after shared_lock() call in do_method_entry(). It does not check failing(). May be add check if (!failing()) when calling it (or inside it). > > I've now added a check just after `_synch_lock = shared_lock(lock_obj);`, thanks. > >> For do_one_bytecode() case the only question is if IGVN code can handle such bailout. **It is called at the end of method**. If not, we need to add the check. > > I cannot see any call to IGVN after `do_monitor_enter()` in `do_one_bytecode()`. Can you elaborate? Maybe related, I also changed to > > Node *box = new BoxLockNode(next_monitor()); > // Check for bailout after new BoxLockNode > if (failing()) { return; } > box = _gvn.transform(box); > > from the previous > > Node *box = _gvn.transform(new BoxLockNode(next_monitor())); > // Check for bailout after new BoxLockNode > if (failing()) { return; } > > to avoid potential issues with `_gvn.transform` due to the new bailout. Does `_gvn.transform` handle bailout? > I cannot see any call to IGVN after `do_monitor_enter()` in `do_one_bytecode()`. Can you elaborate? Sorry, I mean IGV (IdealGraphVisualizer) [parse2.cpp#L2782] (https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/parse2.cpp#L2782) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17370#discussion_r1467939143 From kvn at openjdk.org Fri Jan 26 17:47:23 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 26 Jan 2024 17:47:23 GMT Subject: RFR: 8324750: C2: rename Matcher methods using "superword" -> "autovectorization" In-Reply-To: References: Message-ID: On Fri, 26 Jan 2024 09:53:53 GMT, Emanuel Peter wrote: > Subtask of https://github.com/openjdk/jdk/pull/16620 > > Matcher::superword_max_vector_size -> Matcher::max_vector_size_autovectorization > > Matcher::match_rule_supported_superword -> match_rule_supported_autovectorization > > This is to make the naming more general, since these methods can be used by any autovectorizer in the future. Good and I think trivial. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17583#pullrequestreview-1846213375 From kvn at openjdk.org Fri Jan 26 17:52:33 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 26 Jan 2024 17:52:33 GMT Subject: RFR: 8324752: C2 Superword: remove SuperWordRTDepCheck In-Reply-To: <-P8KzeUnHvnu9mv1F2IqxJBZIsx_EJYRvYaR4CsKkCY=.73c6743d-25ea-491d-9be6-70b89a76ae7f@github.com> References: <-P8KzeUnHvnu9mv1F2IqxJBZIsx_EJYRvYaR4CsKkCY=.73c6743d-25ea-491d-9be6-70b89a76ae7f@github.com> Message-ID: On Fri, 26 Jan 2024 10:11:29 GMT, Emanuel Peter wrote: > Subtask of https://github.com/openjdk/jdk/pull/16620 > > SuperWordRTDepCheck is a debug-only flag, which detects if there are arrays in the same slice that have different bases, i.e. may be different arrays. This could be the basis for alias-analysis. > > We should do aliasing-analysis properly in a future RFE ([JDK-8324751](https://bugs.openjdk.org/browse/JDK-8324751)). If we can prove (statically or with a runtime-check) that two arrays are different, then this removes edges from the dependency graph, and may allow vectorization that would otherwise not be possible. I did not even know we had such code. I agree that we should do proper analysis instead of this experimental code (which is off by default and can't be switched on in product). ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17585#pullrequestreview-1846227740 From dlunden at openjdk.org Fri Jan 26 18:23:52 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Fri, 26 Jan 2024 18:23:52 GMT Subject: RFR: 8322996: BoxLockNode creation fails with assert(reg < CHUNK_SIZE) failed: sanity [v7] In-Reply-To: References: Message-ID: > This changeset fixes an issue where deeply nested synchronized statements triggered an assert in C2. > > Changes: > - Bail out on compilation when we create a `BoxLockNode` with a slot index that cannot fit in a `RegMask`. This is similar to how we handle the case when we do not have space to represent arguments in [`opto/matcher.cpp`](https://github.com/openjdk/jdk/blob/58b01dce054c50bcb5a28aad4c1b574acaa90f6d/src/hotspot/share/opto/matcher.cpp#L314-L318) > - Generalize `RegMask::can_represent` to take an additional and optional size argument to facilitate reuse. The default size value, 1, corresponds to the previous functionality. Rewrite `can_represent_arg` to directly call `can_represent(reg, SlotsPerVecZ)`. > - Add a regression test. > > Testing: > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/7612890820) > - tier1 to tier5 on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. > - The new regression test in all tier1 to tier10 contexts on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: Guard for bailout before IGV ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17370/files - new: https://git.openjdk.org/jdk/pull/17370/files/5fedbc3b..ebe23a23 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17370&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17370&range=05-06 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17370.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17370/head:pull/17370 PR: https://git.openjdk.org/jdk/pull/17370 From dlunden at openjdk.org Fri Jan 26 18:23:52 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Fri, 26 Jan 2024 18:23:52 GMT Subject: RFR: 8322996: BoxLockNode creation fails with assert(reg < CHUNK_SIZE) failed: sanity [v5] In-Reply-To: References: <5qX3luVttnofEYa0QzvwJIEjXu7ndtmt8Bz2vdat-4c=.b73470b4-ec7b-4e1d-81ba-f6edb46a9cb0@github.com> Message-ID: On Fri, 26 Jan 2024 17:41:59 GMT, Vladimir Kozlov wrote: > Does _gvn.transform handle bailout? Not sure, but with my recent update we check for bailout in between creating the `BoxLockNode` and calling `_gvn.transform` on it. So we guard against it in any case. > Sorry, I mean IGV Ah, OK. I've added a guard here as well now (inside the `#ifndef PRODUCT`). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17370#discussion_r1467991339 From dlunden at openjdk.org Fri Jan 26 18:28:02 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Fri, 26 Jan 2024 18:28:02 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v7] In-Reply-To: References: Message-ID: > This changeset translates the tests in `compiler/c2/cr7200264/` to use the IR verification framework. > > The proposed translation, to the extent possible, attempts to preserve the semantics of the original test. A major difference is that the IR checks are now local (for every `test_*` method) instead of global. The execution time of the new test is comparable to the old test. > > Testing: > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/7671780152) > - Ran the new translated tests within all tier1 through tier10 contexts on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. > - Tested that manually adding `-XX:LoopUnrollLimit=0` to the test framework flags caused the translated tests to fail. Note: it is, however, no longer possible to break the test by passing `-XX:LoopUnrollLimit=0` on the command line. Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: Change preconditions for test_divc and test_divc_n ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17428/files - new: https://git.openjdk.org/jdk/pull/17428/files/b2dd79ed..66f5ad44 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17428&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17428&range=05-06 Stats: 18 lines in 1 file changed: 16 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17428.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17428/head:pull/17428 PR: https://git.openjdk.org/jdk/pull/17428 From kvn at openjdk.org Fri Jan 26 19:12:39 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 26 Jan 2024 19:12:39 GMT Subject: RFR: 8317572: C2 SuperWord: refactor/improve TraceSuperWord, replace VectorizeDebugOption with TraceAutoVectorization [v5] In-Reply-To: References: Message-ID: On Fri, 26 Jan 2024 16:19:56 GMT, Emanuel Peter wrote: >> Subtask of https://github.com/openjdk/jdk/pull/16620 >> >> I got approval to remove VectorizeDebugOption: [JDK-8320668](https://bugs.openjdk.org/browse/JDK-8320668) >> >> I want a more general flag for AutoVectorization, that can trace different components of AutoVectorization. >> It should be a CompileCommand, so that it can select which methods it traces for. >> >> TraceSuperWord should still look similar, and select a subset of the TraceAutoVectorization components (those for SuperWord), but still apply to all classes/methods. >> >> With more refactoring later in [JDK-8315361](https://bugs.openjdk.org/browse/JDK-8315361), this flag should become more usable and interpretable. Especially, the idea is that different components of the `VLoop / VLoopAnalyzer` can have tracing enabled / disabled. >> >> **How to use the flag:** >> Get "help", i.e. see all available tags: >> `./java -Xcomp -XX:CompileCommand=TraceAutoVectorization,*::*,help --version` >> >> See "rejections" (i.e. failures where we don't vectorize) and successes (using TraceNewVectors): >> `./java -Xcomp -XX:CompileCommand=TraceAutoVectorization,*::*,SW_REJECTIONS -XX:+TraceNewVectors --version` >> The results are currently underwhealming. I will have to track many more failures, and I will do that with the bigger refactoring, when I move around the code and require error code returning everywhere, and then I can use that error code for printing. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > fix a test It is good. Two comments. src/hotspot/share/compiler/compilerOracle.cpp line 779: > 777: } > 778: } > 779: #ifndef PRODUCT Missing `#ifdef COMPILER2` for this and `PrintIdealPhase`. src/hotspot/share/compiler/directivesParser.cpp line 339: > 337: error(VALUE_ERROR, "Unrecognized intrinsic detected in DisableIntrinsic: %s", validator.what()); > 338: } > 339: } else if (strncmp(option_key->name, "TraceAutoVectorization", 22) == 0) { Missing `#ifndef PRODUCT` and `#ifdef COMPILER2` for this and for `PrintIdealPhase`. ------------- PR Review: https://git.openjdk.org/jdk/pull/17586#pullrequestreview-1846370954 PR Review Comment: https://git.openjdk.org/jdk/pull/17586#discussion_r1468038341 PR Review Comment: https://git.openjdk.org/jdk/pull/17586#discussion_r1468039546 From ccheung at openjdk.org Fri Jan 26 19:29:48 2024 From: ccheung at openjdk.org (Calvin Cheung) Date: Fri, 26 Jan 2024 19:29:48 GMT Subject: [jdk22] RFR: 8323556: CDS archive space addresses should be randomized with ArchiveRelocationMode=1 Message-ID: Hi all, This pull request contains a backport of commit [437342b9](https://github.com/openjdk/jdk/commit/437342b93e9e66340ac57bd1c6fdc948b3302db0) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. The commit being backported was authored by Calvin Cheung on 19 Jan 2024 and was reviewed by Ioi Lam and Matias Saavedra Silva. Thanks! ------------- Commit messages: - Backport 437342b93e9e66340ac57bd1c6fdc948b3302db0 Changes: https://git.openjdk.org/jdk22/pull/98/files Webrev: https://webrevs.openjdk.org/?repo=jdk22&pr=98&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8323556 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk22/pull/98.diff Fetch: git fetch https://git.openjdk.org/jdk22.git pull/98/head:pull/98 PR: https://git.openjdk.org/jdk22/pull/98 From kvn at openjdk.org Fri Jan 26 19:33:35 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 26 Jan 2024 19:33:35 GMT Subject: RFR: 8324734: Remove too-strict assert(VM_Version::supports_evex()) in Assembler::locate_operand() In-Reply-To: <776ui1DnJfcgz89q2aPAeOyhOYKalPhAE1bMR28wK6I=.19728df4-3710-4503-a726-8ce217501f67@github.com> References: <776ui1DnJfcgz89q2aPAeOyhOYKalPhAE1bMR28wK6I=.19728df4-3710-4503-a726-8ce217501f67@github.com> Message-ID: On Fri, 26 Jan 2024 15:18:39 GMT, Roman Kennke wrote: > Details see bug report. The gist is that HotSpot downgrades to UseAVX=2 on some processors, and reports supports_evex() == false, but the instruction decoder can still encounter EVEX instructions when (e.g.) hitting a SIGBUS in memset() - which does have EVEX instructions. > > Testing: > - [x] runtime/Unsafe/InternalErrorTest.java > - [x] tier1 The assert is there to make sure HotSpot generates correct code. What output you get in your test case with this change? You still crash in `memset`. ------------- PR Review: https://git.openjdk.org/jdk/pull/17590#pullrequestreview-1846415132 From kvn at openjdk.org Fri Jan 26 19:35:35 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 26 Jan 2024 19:35:35 GMT Subject: RFR: 8324765: C2 SuperWord: remove dead code: SuperWord::insert_extracts In-Reply-To: References: Message-ID: On Fri, 26 Jan 2024 15:14:13 GMT, Emanuel Peter wrote: > Subtask of https://github.com/openjdk/jdk/pull/16620 > > As far as all tests and code study have showed me, SuperWord::insert_extracts is dead. > > I am replacing it with verification code, that checks that no ExtractNode is required. > > **Details** > > All the relevant cases are marked as "unprofitable" in `SuperWord::profitable`, see: > https://github.com/openjdk/jdk/blob/dfdd2174d7af5e3e995147484db17b45b006f6d0/src/hotspot/share/opto/superword.cpp#L1912-L1915 Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17589#pullrequestreview-1846419744 From kvn at openjdk.org Fri Jan 26 19:42:35 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 26 Jan 2024 19:42:35 GMT Subject: RFR: 8324775: C2 SuperWord: refactor visited sets In-Reply-To: References: Message-ID: On Fri, 26 Jan 2024 16:53:46 GMT, Emanuel Peter wrote: > Subtask of https://github.com/openjdk/jdk/pull/16620 > > Currently, the visited set is a "global" set that is reused all through SuperWord: > `_visited`, `_post_visited`, and also `_stk`. > > This makes other refactorings difficult. I am refactoring all related code to make the visited sets local, with use of ResourceMark. > > I am also refactoring the `independent` queries: from using recursive function calls to iterative. This is necessary, unless we want to pass the visited set around into the recursive calls (I think not!) > > At the same time, I wanted the code of `find_dependence` to become easier, and closer to `independent`, hence I re-defined it to `mutually_independent`, and had to slightly adapt also its usages. Good refactoring. Thank you for replacing recursion in `independent_path()`. src/hotspot/share/opto/superword.cpp line 1110: > 1108: } > 1109: > 1110: // Are all nodes in nodes mutually independent? // Are all nodes in nodes list ------------- PR Review: https://git.openjdk.org/jdk/pull/17594#pullrequestreview-1846429616 PR Review Comment: https://git.openjdk.org/jdk/pull/17594#discussion_r1468074515 From rkennke at openjdk.org Fri Jan 26 19:44:34 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 26 Jan 2024 19:44:34 GMT Subject: RFR: 8324734: Remove too-strict assert(VM_Version::supports_evex()) in Assembler::locate_operand() In-Reply-To: References: <776ui1DnJfcgz89q2aPAeOyhOYKalPhAE1bMR28wK6I=.19728df4-3710-4503-a726-8ce217501f67@github.com> Message-ID: On Fri, 26 Jan 2024 19:30:32 GMT, Vladimir Kozlov wrote: > The assert is there to make sure HotSpot generates correct code. I understand, but the decoder here tries to decode non-HotSpot code, which can legitimately use EVEX instructions. > What output you get in your test case with this change? You still crash in `memset`. The test-case passes with the change. The test provokes a SIGBUS and checks that it is propely turned into an InternalError. Without the change, it would crash with the assert, with the change the SIGBUS would be handled and turned into the expected InternalError. What we *could* do is what Jorn suggests in JBS: "As a long term solution, maybe VM_Version could distinguish between CPU features that are supported by the CPU, and features that are enabled. Then, the decoder could check whether evex is supported, while other code could be changed to check whether use of evex is enabled for hotspot's own code gen." However, that seems more complex than what I'm currently willing to spend time on. ;-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/17590#issuecomment-1912597285 From kvn at openjdk.org Fri Jan 26 19:44:37 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 26 Jan 2024 19:44:37 GMT Subject: RFR: 8322996: BoxLockNode creation fails with assert(reg < CHUNK_SIZE) failed: sanity [v7] In-Reply-To: References: Message-ID: On Fri, 26 Jan 2024 18:23:52 GMT, Daniel Lund?n wrote: >> This changeset fixes an issue where deeply nested synchronized statements triggered an assert in C2. >> >> Changes: >> - Bail out on compilation when we create a `BoxLockNode` with a slot index that cannot fit in a `RegMask`. This is similar to how we handle the case when we do not have space to represent arguments in [`opto/matcher.cpp`](https://github.com/openjdk/jdk/blob/58b01dce054c50bcb5a28aad4c1b574acaa90f6d/src/hotspot/share/opto/matcher.cpp#L314-L318) >> - Generalize `RegMask::can_represent` to take an additional and optional size argument to facilitate reuse. The default size value, 1, corresponds to the previous functionality. Rewrite `can_represent_arg` to directly call `can_represent(reg, SlotsPerVecZ)`. >> - Add a regression test. >> >> Testing: >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/7612890820) >> - tier1 to tier5 on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. >> - The new regression test in all tier1 to tier10 contexts on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Guard for bailout before IGV Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17370#pullrequestreview-1846432753 From coleenp at openjdk.org Fri Jan 26 20:04:34 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 26 Jan 2024 20:04:34 GMT Subject: [jdk22] RFR: 8323556: CDS archive space addresses should be randomized with ArchiveRelocationMode=1 In-Reply-To: References: Message-ID: On Fri, 26 Jan 2024 19:24:27 GMT, Calvin Cheung wrote: > Hi all, > > This pull request contains a backport of commit [437342b9](https://github.com/openjdk/jdk/commit/437342b93e9e66340ac57bd1c6fdc948b3302db0) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Calvin Cheung on 19 Jan 2024 and was reviewed by Ioi Lam and Matias Saavedra Silva. > > Thanks! Backport looks good. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk22/pull/98#pullrequestreview-1846458855 From ccheung at openjdk.org Fri Jan 26 20:13:58 2024 From: ccheung at openjdk.org (Calvin Cheung) Date: Fri, 26 Jan 2024 20:13:58 GMT Subject: [jdk22] RFR: 8323556: CDS archive space addresses should be randomized with ArchiveRelocationMode=1 In-Reply-To: References: Message-ID: On Fri, 26 Jan 2024 20:01:30 GMT, Coleen Phillimore wrote: >> Hi all, >> >> This pull request contains a backport of commit [437342b9](https://github.com/openjdk/jdk/commit/437342b93e9e66340ac57bd1c6fdc948b3302db0) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. >> >> The commit being backported was authored by Calvin Cheung on 19 Jan 2024 and was reviewed by Ioi Lam and Matias Saavedra Silva. >> >> Thanks! > > Backport looks good. Thanks @coleenp for the review! ------------- PR Comment: https://git.openjdk.org/jdk22/pull/98#issuecomment-1912629646 From ccheung at openjdk.org Fri Jan 26 20:13:58 2024 From: ccheung at openjdk.org (Calvin Cheung) Date: Fri, 26 Jan 2024 20:13:58 GMT Subject: [jdk22] Integrated: 8323556: CDS archive space addresses should be randomized with ArchiveRelocationMode=1 In-Reply-To: References: Message-ID: On Fri, 26 Jan 2024 19:24:27 GMT, Calvin Cheung wrote: > Hi all, > > This pull request contains a backport of commit [437342b9](https://github.com/openjdk/jdk/commit/437342b93e9e66340ac57bd1c6fdc948b3302db0) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Calvin Cheung on 19 Jan 2024 and was reviewed by Ioi Lam and Matias Saavedra Silva. > > Thanks! This pull request has now been integrated. Changeset: 4338cb3d Author: Calvin Cheung URL: https://git.openjdk.org/jdk22/commit/4338cb3d706b356fdfebc0e0f3da3c3e9b4e6ddc Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod 8323556: CDS archive space addresses should be randomized with ArchiveRelocationMode=1 Reviewed-by: coleenp Backport-of: 437342b93e9e66340ac57bd1c6fdc948b3302db0 ------------- PR: https://git.openjdk.org/jdk22/pull/98 From dlong at openjdk.org Fri Jan 26 20:19:39 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 26 Jan 2024 20:19:39 GMT Subject: RFR: 8324630: C1: Canonicalizer::do_LookupSwitch doesn't break the loop when the successor is found [v2] In-Reply-To: References: Message-ID: On Fri, 26 Jan 2024 09:05:47 GMT, Denghui Dong wrote: >> src/hotspot/share/c1/c1_Canonicalizer.cpp line 848: >> >>> 846: if (v == x->key_at(i)) { >>> 847: sux = x->sux_at(i); >>> 848: break; >> >> Shouldn't we also break when `v < x->key_at(i) `, meaning no key will match? Maybe we should consider binary search? > > Updated: break when `v < x->key_at(i)` > >> Maybe we should consider binary search? > > From the perspective of pursuing performance, binary search can be considered. If you agree, I can change to using binary search here. Yes, I'd like to see how it looks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17553#discussion_r1468105724 From chagedorn at openjdk.org Fri Jan 26 20:25:40 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 26 Jan 2024 20:25:40 GMT Subject: RFR: 8324765: C2 SuperWord: remove dead code: SuperWord::insert_extracts In-Reply-To: References: Message-ID: On Fri, 26 Jan 2024 15:14:13 GMT, Emanuel Peter wrote: > Subtask of https://github.com/openjdk/jdk/pull/16620 > > As far as all tests and code study have showed me, SuperWord::insert_extracts is dead. > > I am replacing it with verification code, that checks that no ExtractNode is required. > > **Details** > > All the relevant cases are marked as "unprofitable" in `SuperWord::profitable`, see: > https://github.com/openjdk/jdk/blob/dfdd2174d7af5e3e995147484db17b45b006f6d0/src/hotspot/share/opto/superword.cpp#L1912-L1915 Looks good! src/hotspot/share/opto/superword.cpp line 2854: > 2852: #ifdef ASSERT > 2853: // We check that every packset (name it p_def) only has vector uses (p_use), > 2854: // which are properly vector uses of def. Suggestion: // which are proper vector uses of def. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17589#pullrequestreview-1846495504 PR Review Comment: https://git.openjdk.org/jdk/pull/17589#discussion_r1468113205 From chagedorn at openjdk.org Fri Jan 26 20:30:35 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 26 Jan 2024 20:30:35 GMT Subject: RFR: 8324750: C2: rename Matcher methods using "superword" -> "autovectorization" In-Reply-To: References: Message-ID: <2V2AqXhURYL6Iy68R4_Ab--PTxLgeu2Dgm2qQVGJBnA=.2252cf6f-a4e5-4fb7-8849-71c882f2abf4@github.com> On Fri, 26 Jan 2024 09:53:53 GMT, Emanuel Peter wrote: > Subtask of https://github.com/openjdk/jdk/pull/16620 > > Matcher::superword_max_vector_size -> Matcher::max_vector_size_autovectorization > > Matcher::match_rule_supported_superword -> match_rule_supported_autovectorization > > This is to make the naming more general, since these methods can be used by any autovectorizer in the future. src/hotspot/cpu/aarch64/aarch64.ad line 2380: > 2378: } > 2379: > 2380: int Matcher::max_vector_size_autovectorization(const BasicType bt) { Should we use two separate words like `auto_vectorization`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17583#discussion_r1468116668 From chagedorn at openjdk.org Fri Jan 26 20:31:34 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 26 Jan 2024 20:31:34 GMT Subject: RFR: 8324752: C2 Superword: remove SuperWordRTDepCheck In-Reply-To: <-P8KzeUnHvnu9mv1F2IqxJBZIsx_EJYRvYaR4CsKkCY=.73c6743d-25ea-491d-9be6-70b89a76ae7f@github.com> References: <-P8KzeUnHvnu9mv1F2IqxJBZIsx_EJYRvYaR4CsKkCY=.73c6743d-25ea-491d-9be6-70b89a76ae7f@github.com> Message-ID: On Fri, 26 Jan 2024 10:11:29 GMT, Emanuel Peter wrote: > Subtask of https://github.com/openjdk/jdk/pull/16620 > > SuperWordRTDepCheck is a debug-only flag, which detects if there are arrays in the same slice that have different bases, i.e. may be different arrays. This could be the basis for alias-analysis. > > We should do aliasing-analysis properly in a future RFE ([JDK-8324751](https://bugs.openjdk.org/browse/JDK-8324751)). If we can prove (statically or with a runtime-check) that two arrays are different, then this removes edges from the dependency graph, and may allow vectorization that would otherwise not be possible. Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17585#pullrequestreview-1846502169 From chagedorn at openjdk.org Fri Jan 26 20:39:37 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 26 Jan 2024 20:39:37 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v7] In-Reply-To: References: Message-ID: On Fri, 26 Jan 2024 18:28:02 GMT, Daniel Lund?n wrote: >> This changeset translates the tests in `compiler/c2/cr7200264/` to use the IR verification framework. >> >> The proposed translation, to the extent possible, attempts to preserve the semantics of the original test. A major difference is that the IR checks are now local (for every `test_*` method) instead of global. The execution time of the new test is comparable to the old test. >> >> Testing: >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/7671780152) >> - Ran the new translated tests within all tier1 through tier10 contexts on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. >> - Tested that manually adding `-XX:LoopUnrollLimit=0` to the test framework flags caused the translated tests to fail. Note: it is, however, no longer possible to break the test by passing `-XX:LoopUnrollLimit=0` on the command line. > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Change preconditions for test_divc and test_divc_n Thanks @dlunde and also to @eme64 for putting the extra effort in and to add negative rules as well! Looks good to me, too. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17428#pullrequestreview-1846511264 From kvn at openjdk.org Fri Jan 26 21:02:34 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 26 Jan 2024 21:02:34 GMT Subject: RFR: 8324734: Remove too-strict assert(VM_Version::supports_evex()) in Assembler::locate_operand() In-Reply-To: References: <776ui1DnJfcgz89q2aPAeOyhOYKalPhAE1bMR28wK6I=.19728df4-3710-4503-a726-8ce217501f67@github.com> Message-ID: On Fri, 26 Jan 2024 19:42:08 GMT, Roman Kennke wrote: > VM_Version could distinguish between CPU features that are supported by the CPU We can start with just EVEX check. It is not big change: $ git diff diff --git a/src/hotspot/cpu/x86/vm_version_x86.cpp b/src/hotspot/cpu/x86/vm_version_x86.cpp index df1ea6edd30..8b4ca442b5a 100644 --- a/src/hotspot/cpu/x86/vm_version_x86.cpp +++ b/src/hotspot/cpu/x86/vm_version_x86.cpp @@ -809,7 +809,8 @@ void VM_Version::get_processor_features() { _stepping = cpu_stepping(); if (cpu_family() > 4) { // it supports CPUID - _features = feature_flags(); + _features = feature_flags(); // It can be changed by VM flags + _cpu_features = _features; // Preserve features // Logical processors are only available on P4s and above, // and only if hyperthreading is available. _logical_processors_per_package = logical_processor_count(); diff --git a/src/hotspot/cpu/x86/vm_version_x86.hpp b/src/hotspot/cpu/x86/vm_version_x86.hpp index e521a6ee3bc..de86ce51541 100644 --- a/src/hotspot/cpu/x86/vm_version_x86.hpp +++ b/src/hotspot/cpu/x86/vm_version_x86.hpp @@ -640,7 +640,7 @@ class VM_Version : public Abstract_VM_Version { } // - // Feature identification + // Feature identification which can be affected by VM flags // static bool supports_cpuid() { return _features != 0; } static bool supports_cmov() { return (_features & CPU_CMOV) != 0; } @@ -703,6 +703,11 @@ class VM_Version : public Abstract_VM_Version { static bool supports_cet_ss() { return (_features & CPU_CET_SS) != 0; } static bool supports_cet_ibt() { return (_features & CPU_CET_IBT) != 0; } + // + // Feature identification not affected by VM flags + // + static bool cpu_supports_evex() { return (_cpu_features & CPU_AVX512F) != 0; } + // Intel features static bool is_intel_family_core() { return is_intel() && extended_cpu_family() == CPU_FAMILY_INTEL_CORE; } diff --git a/src/hotspot/share/runtime/abstract_vm_version.hpp b/src/hotspot/share/runtime/abstract_vm_version.hpp index d8ffca8de81..05675cc683a 100644 --- a/src/hotspot/share/runtime/abstract_vm_version.hpp +++ b/src/hotspot/share/runtime/abstract_vm_version.hpp @@ -54,10 +54,13 @@ class Abstract_VM_Version: AllStatic { static const char* _s_vm_release; static const char* _s_internal_vm_info_string; - // CPU feature flags. + // CPU feature flags which can be restricted by VM flags. static uint64_t _features; static const char* _features_string; + // CPU feature flags not affected by VM flags. + static uint64_t _cpu_features; + // These are set by machine-dependent initializations #ifndef SUPPORTS_NATIVE_CX8 static bool _supports_cx8; ------------- PR Comment: https://git.openjdk.org/jdk/pull/17590#issuecomment-1912689152 From chagedorn at openjdk.org Fri Jan 26 21:27:38 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 26 Jan 2024 21:27:38 GMT Subject: RFR: 8324775: C2 SuperWord: refactor visited sets In-Reply-To: References: Message-ID: On Fri, 26 Jan 2024 16:53:46 GMT, Emanuel Peter wrote: > Subtask of https://github.com/openjdk/jdk/pull/16620 > > Currently, the visited set is a "global" set that is reused all through SuperWord: > `_visited`, `_post_visited`, and also `_stk`. > > This makes other refactorings difficult. I am refactoring all related code to make the visited sets local, with use of ResourceMark. > > I am also refactoring the `independent` queries: from using recursive function calls to iterative. This is necessary, unless we want to pass the visited set around into the recursive calls (I think not!) > > At the same time, I wanted the code of `find_dependence` to become easier, and closer to `independent`, hence I re-defined it to `mutually_independent`, and had to slightly adapt also its usages. Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17594#pullrequestreview-1846572613 From chagedorn at openjdk.org Fri Jan 26 21:47:49 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 26 Jan 2024 21:47:49 GMT Subject: RFR: 8324236: compiler/ciReplay/TestInliningProtectionDomain.java failed with RuntimeException: should only dump inline information for ... expected true, was false Message-ID: The test failed when trying to match the compile id of the interesting method by looking for "` But it first found the earlier line: which unfortunately also matched "` References: Message-ID: On Thu, 25 Jan 2024 09:22:14 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/loopopts/InvariantCodeMotionReassociateCmp.java line 2: >> >>> 1: /* >>> 2: * Copyright Amazon.com Inc. or its affiliates. All Rights Reserved. >> >> Does the Amazon copyright header not have a year associated with it? > > I guess not, I see other files without a year. Still, a bit strange. Yes. Our open-source board advise us to do so. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17375#discussion_r1468191394 From kvn at openjdk.org Fri Jan 26 22:15:34 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 26 Jan 2024 22:15:34 GMT Subject: RFR: 8324236: compiler/ciReplay/TestInliningProtectionDomain.java failed with RuntimeException: should only dump inline information for ... expected true, was false In-Reply-To: References: Message-ID: On Fri, 26 Jan 2024 21:42:27 GMT, Christian Hagedorn wrote: > The test failed when trying to match the compile id of the interesting method by looking for "` > > But it first found the earlier line: > > > > which unfortunately also matched "` > Thanks, > Christian Good and trivial. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17598#pullrequestreview-1846625514 From kvn at openjdk.org Fri Jan 26 22:35:25 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 26 Jan 2024 22:35:25 GMT Subject: RFR: 8322743: assert(held_monitor_count() == jni_monitor_count()) failed [v2] In-Reply-To: References: Message-ID: <8NFT8OvCnKrGqiun9zJdwe_9KHIEsMyTjguoLGs16Mo=.022e5ab0-565b-4a5b-8f2c-576806a5620b@github.com> On Wed, 17 Jan 2024 20:20:05 GMT, Vladimir Kozlov wrote: >> Corner case with a local (not escaped) object used for synchronization. C2 Escape Analysis thinks that it can eliminate locks for it. In most cases it is true but not in this case. >> >> >> for (int i = 0; i < 2; ++i) { >> Object o = new Object(); >> synchronized (o) { // monitorenter >> // Trigger OSR compilation >> for (int j = 0; j < 100_000; ++j) { >> >> The test has nested loop which trigger OSR compilation. The locked object comes from Interpreter into compiled OSR code. During parsing C2 creates an other non escaped object and correctly merge both together (with Phi node) so that non escaped object is not scalar replaceable. Because it does not globally escapes EA still removes locks for it and, as result, also for merged locked object from Interpreter which is the bug. >> >> The fix is to check that synchronized block does not have any associated escaped objects when EA decides if locks can be eliminated. >> >> Added regression test prepared by @TobiHartmann. Tested tier1-5, xcomp and stress. >> Performance testing show no difference. > > Vladimir Kozlov has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments I am still working on it. I have to address Emanuel's suggestions (renaming is not trivial). And also take into account executions when `EliminateNestedLocks` flag switched off. As I explained, several synchronized region and different objects can be referenced by one `BoxLockNode` in such case. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17331#issuecomment-1912782595 From ddong at openjdk.org Fri Jan 26 22:47:46 2024 From: ddong at openjdk.org (Denghui Dong) Date: Fri, 26 Jan 2024 22:47:46 GMT Subject: RFR: 8324630: C1: Canonicalizer::do_LookupSwitch doesn't break the loop when the successor is found [v3] In-Reply-To: References: Message-ID: <-3LeFaCHrhS593HcrdEaKfE0LMdbQkg1hP6y3iLJ2Bc=.c96dd8de-ef94-4757-b5c7-83309074e036@github.com> > Hi, > > Please review the small change that breaks the loop in Canonicalizer::do_LookupSwitch if the successor is found. > > The keys of LookupSwitch are sorted, so there is no need to continue the loop once matched. > > Thanks. Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: binary search ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17553/files - new: https://git.openjdk.org/jdk/pull/17553/files/dc755321..9827f0c4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17553&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17553&range=01-02 Stats: 11 lines in 1 file changed: 4 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/17553.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17553/head:pull/17553 PR: https://git.openjdk.org/jdk/pull/17553 From ddong at openjdk.org Fri Jan 26 22:47:47 2024 From: ddong at openjdk.org (Denghui Dong) Date: Fri, 26 Jan 2024 22:47:47 GMT Subject: RFR: 8324630: C1: Canonicalizer::do_LookupSwitch doesn't break the loop when the successor is found [v3] In-Reply-To: References: Message-ID: <_tQl0kwDvygBnmfc-xd4QtnnzO9T1JqNk-1eebshYi4=.dbe12041-0ae5-4dc9-93e5-3d2401df1c5c@github.com> On Fri, 26 Jan 2024 20:16:42 GMT, Dean Long wrote: >> Updated: break when `v < x->key_at(i)` >> >>> Maybe we should consider binary search? >> >> From the perspective of pursuing performance, binary search can be considered. If you agree, I can change to using binary search here. > > Yes, I'd like to see how it looks. Updated. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17553#discussion_r1468214822 From duke at openjdk.org Fri Jan 26 23:35:42 2024 From: duke at openjdk.org (Joshua Cao) Date: Fri, 26 Jan 2024 23:35:42 GMT Subject: RFR: 8324790: ifnode::fold_compares_helper cleanup Message-ID: I hope my assumptions in `filtered_int_type` are correct here: * we assert that `if_proj` is an `IfTrue` or `IfFalse`, so it is safe to assume `if_proj->_in` is an `IfNode` * the 1'th input of a CmpNode is a BoolNode * Tthe 1'th input of an IfNode is **not always a BoolNode**, it can be a constant. We need to leave this check in. We also remove a some of the if-checks in `compare_folds_cleanup` which seem unnecessary. Passes tier1 locally. ------------- Commit messages: - 8324790: ifnode::fold_compares_helper cleanup Changes: https://git.openjdk.org/jdk/pull/17601/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17601&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8324790 Stats: 145 lines in 1 file changed: 47 ins; 58 del; 40 mod Patch: https://git.openjdk.org/jdk/pull/17601.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17601/head:pull/17601 PR: https://git.openjdk.org/jdk/pull/17601 From duke at openjdk.org Sat Jan 27 00:11:50 2024 From: duke at openjdk.org (Joshua Cao) Date: Sat, 27 Jan 2024 00:11:50 GMT Subject: RFR: 8323220: Reassociate loop invariants involved in Cmps and Add/Subs [v6] In-Reply-To: References: Message-ID: > // inv1 == (x + inv2) => ( inv1 - inv2 ) == x > // inv1 == (x - inv2) => ( inv1 + inv2 ) == x > // inv1 == (inv2 - x) => (-inv1 + inv2 ) == x > > > For example, > > > fn(inv1, inv2) > while(...) > x = foobar() > if inv1 == x + inv2 > blackhole() > > > We can transform this into > > > fn(inv1, inv2) > t = inv1 - inv2 > while(...) > x = foobar() > if t == x > blackhole() > > > Here is an example: https://github.com/openjdk/jdk/blob/b78896b9aafcb15f453eaed6e154a5461581407b/src/java.base/share/classes/java/lang/invoke/LambdaFormEditor.java#L910. LHS `1` and RHS `pos` are both loop invariant > > Passes tier1 locally on Linux machine. Passes GHA on my fork. Joshua Cao has updated the pull request incrementally with one additional commit since the last revision: Small fixes and add check methods for tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17375/files - new: https://git.openjdk.org/jdk/pull/17375/files/a08df7f7..1603864f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17375&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17375&range=04-05 Stats: 33 lines in 3 files changed: 24 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/17375.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17375/head:pull/17375 PR: https://git.openjdk.org/jdk/pull/17375 From dlong at openjdk.org Sat Jan 27 02:30:34 2024 From: dlong at openjdk.org (Dean Long) Date: Sat, 27 Jan 2024 02:30:34 GMT Subject: RFR: 8324630: C1: Canonicalizer::do_LookupSwitch doesn't break the loop when the successor is found [v3] In-Reply-To: <-3LeFaCHrhS593HcrdEaKfE0LMdbQkg1hP6y3iLJ2Bc=.c96dd8de-ef94-4757-b5c7-83309074e036@github.com> References: <-3LeFaCHrhS593HcrdEaKfE0LMdbQkg1hP6y3iLJ2Bc=.c96dd8de-ef94-4757-b5c7-83309074e036@github.com> Message-ID: On Fri, 26 Jan 2024 22:47:46 GMT, Denghui Dong wrote: >> Hi, >> >> Please review the small change that breaks the loop in Canonicalizer::do_LookupSwitch if the successor is found. >> >> The keys of LookupSwitch are sorted, so there is no need to continue the loop once matched. >> >> Thanks. > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > binary search src/hotspot/share/c1/c1_Canonicalizer.cpp line 848: > 846: int high = x->length() - 1; > 847: while (low <= high) { > 848: int mid = low + ((high - low) >> 1); Isn't this the same as `int mid = (low + high) >> 1;` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17553#discussion_r1468281722 From ddong at openjdk.org Sat Jan 27 03:23:35 2024 From: ddong at openjdk.org (Denghui Dong) Date: Sat, 27 Jan 2024 03:23:35 GMT Subject: RFR: 8324630: C1: Canonicalizer::do_LookupSwitch doesn't break the loop when the successor is found [v3] In-Reply-To: References: <-3LeFaCHrhS593HcrdEaKfE0LMdbQkg1hP6y3iLJ2Bc=.c96dd8de-ef94-4757-b5c7-83309074e036@github.com> Message-ID: On Sat, 27 Jan 2024 02:28:13 GMT, Dean Long wrote: >> Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: >> >> binary search > > src/hotspot/share/c1/c1_Canonicalizer.cpp line 848: > >> 846: int high = x->length() - 1; >> 847: while (low <= high) { >> 848: int mid = low + ((high - low) >> 1); > > Isn't this the same as > `int mid = (low + high) >> 1;` `low + ((high - low) >> 1)` can avoid integer overflow ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17553#discussion_r1468292902 From epeter at openjdk.org Sat Jan 27 05:06:47 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Sat, 27 Jan 2024 05:06:47 GMT Subject: RFR: 8324765: C2 SuperWord: remove dead code: SuperWord::insert_extracts [v2] In-Reply-To: References: Message-ID: > Subtask of https://github.com/openjdk/jdk/pull/16620 > > As far as all tests and code study have showed me, SuperWord::insert_extracts is dead. > > I am replacing it with verification code, that checks that no ExtractNode is required. > > **Details** > > All the relevant cases are marked as "unprofitable" in `SuperWord::profitable`, see: > https://github.com/openjdk/jdk/blob/dfdd2174d7af5e3e995147484db17b45b006f6d0/src/hotspot/share/opto/superword.cpp#L1912-L1915 Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: typo fix by Christian Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17589/files - new: https://git.openjdk.org/jdk/pull/17589/files/8a08a4f2..c9fe0153 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17589&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17589&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17589.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17589/head:pull/17589 PR: https://git.openjdk.org/jdk/pull/17589 From epeter at openjdk.org Sat Jan 27 05:13:35 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Sat, 27 Jan 2024 05:13:35 GMT Subject: RFR: 8324775: C2 SuperWord: refactor visited sets [v2] In-Reply-To: References: Message-ID: > Subtask of https://github.com/openjdk/jdk/pull/16620 > > Currently, the visited set is a "global" set that is reused all through SuperWord: > `_visited`, `_post_visited`, and also `_stk`. > > This makes other refactorings difficult. I am refactoring all related code to make the visited sets local, with use of ResourceMark. > > I am also refactoring the `independent` queries: from using recursive function calls to iterative. This is necessary, unless we want to pass the visited set around into the recursive calls (I think not!) > > At the same time, I wanted the code of `find_dependence` to become easier, and closer to `independent`, hence I re-defined it to `mutually_independent`, and had to slightly adapt also its usages. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: comment improvement by Vladimir ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17594/files - new: https://git.openjdk.org/jdk/pull/17594/files/220fc173..59a9205e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17594&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17594&range=00-01 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17594.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17594/head:pull/17594 PR: https://git.openjdk.org/jdk/pull/17594 From epeter at openjdk.org Sat Jan 27 05:13:36 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Sat, 27 Jan 2024 05:13:36 GMT Subject: RFR: 8324775: C2 SuperWord: refactor visited sets [v2] In-Reply-To: References: Message-ID: On Sat, 27 Jan 2024 05:10:56 GMT, Emanuel Peter wrote: >> Subtask of https://github.com/openjdk/jdk/pull/16620 >> >> Currently, the visited set is a "global" set that is reused all through SuperWord: >> `_visited`, `_post_visited`, and also `_stk`. >> >> This makes other refactorings difficult. I am refactoring all related code to make the visited sets local, with use of ResourceMark. >> >> I am also refactoring the `independent` queries: from using recursive function calls to iterative. This is necessary, unless we want to pass the visited set around into the recursive calls (I think not!) >> >> At the same time, I wanted the code of `find_dependence` to become easier, and closer to `independent`, hence I re-defined it to `mutually_independent`, and had to slightly adapt also its usages. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > comment improvement by Vladimir src/hotspot/share/opto/superword.cpp line 1110: > 1108: } > 1109: > 1110: // Are all nodes in nodes mutually independent? Suggestion: // Are all nodes in nodes list mutually independent? src/hotspot/share/opto/superword.hpp line 452: > 450: // Is there no data path from s1 to s2 or s2 to s1? > 451: bool independent(Node* s1, Node* s2); > 452: // Are all nodes in nodes mutually independent? Suggestion: // Are all nodes in nodes list mutually independent? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17594#discussion_r1468344947 PR Review Comment: https://git.openjdk.org/jdk/pull/17594#discussion_r1468344729 From epeter at openjdk.org Sat Jan 27 05:13:38 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Sat, 27 Jan 2024 05:13:38 GMT Subject: RFR: 8324775: C2 SuperWord: refactor visited sets [v2] In-Reply-To: References: Message-ID: On Fri, 26 Jan 2024 19:38:52 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> comment improvement by Vladimir > > src/hotspot/share/opto/superword.cpp line 1110: > >> 1108: } >> 1109: >> 1110: // Are all nodes in nodes mutually independent? > > // Are all nodes in nodes list fixed! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17594#discussion_r1468347871 From epeter at openjdk.org Sat Jan 27 05:28:48 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Sat, 27 Jan 2024 05:28:48 GMT Subject: RFR: 8317572: C2 SuperWord: refactor/improve TraceSuperWord, replace VectorizeDebugOption with TraceAutoVectorization [v6] In-Reply-To: References: Message-ID: > Subtask of https://github.com/openjdk/jdk/pull/16620 > > I got approval to remove VectorizeDebugOption: [JDK-8320668](https://bugs.openjdk.org/browse/JDK-8320668) > > I want a more general flag for AutoVectorization, that can trace different components of AutoVectorization. > It should be a CompileCommand, so that it can select which methods it traces for. > > TraceSuperWord should still look similar, and select a subset of the TraceAutoVectorization components (those for SuperWord), but still apply to all classes/methods. > > With more refactoring later in [JDK-8315361](https://bugs.openjdk.org/browse/JDK-8315361), this flag should become more usable and interpretable. Especially, the idea is that different components of the `VLoop / VLoopAnalyzer` can have tracing enabled / disabled. > > **How to use the flag:** > Get "help", i.e. see all available tags: > `./java -Xcomp -XX:CompileCommand=TraceAutoVectorization,*::*,help --version` > > See "rejections" (i.e. failures where we don't vectorize) and successes (using TraceNewVectors): > `./java -Xcomp -XX:CompileCommand=TraceAutoVectorization,*::*,SW_REJECTIONS -XX:+TraceNewVectors --version` > The results are currently underwhealming. I will have to track many more failures, and I will do that with the bigger refactoring, when I move around the code and require error code returning everywhere, and then I can use that error code for printing. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: add PRODUCT and COMPILER2 ifdefs for Vladimir ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17586/files - new: https://git.openjdk.org/jdk/pull/17586/files/c9ac4344..be11c9de Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17586&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17586&range=04-05 Stats: 5 lines in 2 files changed: 3 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17586.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17586/head:pull/17586 PR: https://git.openjdk.org/jdk/pull/17586 From epeter at openjdk.org Sat Jan 27 05:39:51 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Sat, 27 Jan 2024 05:39:51 GMT Subject: RFR: 8317572: C2 SuperWord: refactor/improve TraceSuperWord, replace VectorizeDebugOption with TraceAutoVectorization [v7] In-Reply-To: References: Message-ID: > Subtask of https://github.com/openjdk/jdk/pull/16620 > > I got approval to remove VectorizeDebugOption: [JDK-8320668](https://bugs.openjdk.org/browse/JDK-8320668) > > I want a more general flag for AutoVectorization, that can trace different components of AutoVectorization. > It should be a CompileCommand, so that it can select which methods it traces for. > > TraceSuperWord should still look similar, and select a subset of the TraceAutoVectorization components (those for SuperWord), but still apply to all classes/methods. > > With more refactoring later in [JDK-8315361](https://bugs.openjdk.org/browse/JDK-8315361), this flag should become more usable and interpretable. Especially, the idea is that different components of the `VLoop / VLoopAnalyzer` can have tracing enabled / disabled. > > **How to use the flag:** > Get "help", i.e. see all available tags: > `./java -Xcomp -XX:CompileCommand=TraceAutoVectorization,*::*,help --version` > > See "rejections" (i.e. failures where we don't vectorize) and successes (using TraceNewVectors): > `./java -Xcomp -XX:CompileCommand=TraceAutoVectorization,*::*,SW_REJECTIONS -XX:+TraceNewVectors --version` > The results are currently underwhealming. I will have to track many more failures, and I will do that with the bigger refactoring, when I move around the code and require error code returning everywhere, and then I can use that error code for printing. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: fix missing include ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17586/files - new: https://git.openjdk.org/jdk/pull/17586/files/be11c9de..28e0893f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17586&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17586&range=05-06 Stats: 40 lines in 2 files changed: 21 ins; 16 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/17586.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17586/head:pull/17586 PR: https://git.openjdk.org/jdk/pull/17586 From epeter at openjdk.org Sat Jan 27 05:42:37 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Sat, 27 Jan 2024 05:42:37 GMT Subject: RFR: 8317572: C2 SuperWord: refactor/improve TraceSuperWord, replace VectorizeDebugOption with TraceAutoVectorization [v5] In-Reply-To: References: Message-ID: On Fri, 26 Jan 2024 19:02:16 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> fix a test > > src/hotspot/share/compiler/compilerOracle.cpp line 779: > >> 777: } >> 778: } >> 779: #ifndef PRODUCT > > Missing `#ifdef COMPILER2` for this and `PrintIdealPhase`. done! > src/hotspot/share/compiler/directivesParser.cpp line 339: > >> 337: error(VALUE_ERROR, "Unrecognized intrinsic detected in DisableIntrinsic: %s", validator.what()); >> 338: } >> 339: } else if (strncmp(option_key->name, "TraceAutoVectorization", 22) == 0) { > > Missing `#ifndef PRODUCT` and `#ifdef COMPILER2` for this and for `PrintIdealPhase`. done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17586#discussion_r1468353324 PR Review Comment: https://git.openjdk.org/jdk/pull/17586#discussion_r1468353329 From epeter at openjdk.org Sat Jan 27 05:47:47 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Sat, 27 Jan 2024 05:47:47 GMT Subject: RFR: 8324750: C2: rename Matcher methods using "superword" -> "autovectorization" [v2] In-Reply-To: References: Message-ID: > Subtask of https://github.com/openjdk/jdk/pull/16620 > > Matcher::superword_max_vector_size -> Matcher::max_vector_size_autovectorization > > Matcher::match_rule_supported_superword -> match_rule_supported_autovectorization > > This is to make the naming more general, since these methods can be used by any autovectorizer in the future. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: underscore for Christian ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17583/files - new: https://git.openjdk.org/jdk/pull/17583/files/e85e7e1b..f380c671 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17583&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17583&range=00-01 Stats: 28 lines in 12 files changed: 0 ins; 0 del; 28 mod Patch: https://git.openjdk.org/jdk/pull/17583.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17583/head:pull/17583 PR: https://git.openjdk.org/jdk/pull/17583 From epeter at openjdk.org Sat Jan 27 05:47:47 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Sat, 27 Jan 2024 05:47:47 GMT Subject: RFR: 8324750: C2: rename Matcher methods using "superword" -> "autovectorization" [v2] In-Reply-To: <2V2AqXhURYL6Iy68R4_Ab--PTxLgeu2Dgm2qQVGJBnA=.2252cf6f-a4e5-4fb7-8849-71c882f2abf4@github.com> References: <2V2AqXhURYL6Iy68R4_Ab--PTxLgeu2Dgm2qQVGJBnA=.2252cf6f-a4e5-4fb7-8849-71c882f2abf4@github.com> Message-ID: On Fri, 26 Jan 2024 20:27:29 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> underscore for Christian > > src/hotspot/cpu/aarch64/aarch64.ad line 2380: > >> 2378: } >> 2379: >> 2380: int Matcher::max_vector_size_autovectorization(const BasicType bt) { > > Should we use two separate words like `auto_vectorization`? Done :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17583#discussion_r1468353901 From epeter at openjdk.org Sat Jan 27 06:19:45 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Sat, 27 Jan 2024 06:19:45 GMT Subject: RFR: 8324794: C2 SuperWord: do not ignore reductions in SuperWord::unrolling_analysis Message-ID: <4Dv9KCdW7B9Ms3a-MO-59PL-3qebNhNJejuuM6LCB0w=.7575de49-9d14-4c12-b95d-8aca28b07a05@github.com> Subtask of https://github.com/openjdk/jdk/pull/16620 Ignoring reductions in unrolling_analysis is unnecessary, and it adds unnecessary dependency of unrolling_analysis on reduction-analysis. That dependency needs to be removed for further refactoring. ------------- Commit messages: - 8324794 Changes: https://git.openjdk.org/jdk/pull/17604/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17604&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8324794 Stats: 14 lines in 2 files changed: 11 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17604.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17604/head:pull/17604 PR: https://git.openjdk.org/jdk/pull/17604 From kvn at openjdk.org Sat Jan 27 20:27:35 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sat, 27 Jan 2024 20:27:35 GMT Subject: RFR: 8324775: C2 SuperWord: refactor visited sets [v2] In-Reply-To: References: Message-ID: On Sat, 27 Jan 2024 05:13:35 GMT, Emanuel Peter wrote: >> Subtask of https://github.com/openjdk/jdk/pull/16620 >> >> Currently, the visited set is a "global" set that is reused all through SuperWord: >> `_visited`, `_post_visited`, and also `_stk`. >> >> This makes other refactorings difficult. I am refactoring all related code to make the visited sets local, with use of ResourceMark. >> >> I am also refactoring the `independent` queries: from using recursive function calls to iterative. This is necessary, unless we want to pass the visited set around into the recursive calls (I think not!) >> >> At the same time, I wanted the code of `find_dependence` to become easier, and closer to `independent`, hence I re-defined it to `mutually_independent`, and had to slightly adapt also its usages. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > comment improvement by Vladimir Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17594#pullrequestreview-1847259410 From kvn at openjdk.org Sat Jan 27 20:29:36 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sat, 27 Jan 2024 20:29:36 GMT Subject: RFR: 8317572: C2 SuperWord: refactor/improve TraceSuperWord, replace VectorizeDebugOption with TraceAutoVectorization [v7] In-Reply-To: References: Message-ID: On Sat, 27 Jan 2024 05:39:51 GMT, Emanuel Peter wrote: >> Subtask of https://github.com/openjdk/jdk/pull/16620 >> >> I got approval to remove VectorizeDebugOption: [JDK-8320668](https://bugs.openjdk.org/browse/JDK-8320668) >> >> I want a more general flag for AutoVectorization, that can trace different components of AutoVectorization. >> It should be a CompileCommand, so that it can select which methods it traces for. >> >> TraceSuperWord should still look similar, and select a subset of the TraceAutoVectorization components (those for SuperWord), but still apply to all classes/methods. >> >> With more refactoring later in [JDK-8315361](https://bugs.openjdk.org/browse/JDK-8315361), this flag should become more usable and interpretable. Especially, the idea is that different components of the `VLoop / VLoopAnalyzer` can have tracing enabled / disabled. >> >> **How to use the flag:** >> Get "help", i.e. see all available tags: >> `./java -Xcomp -XX:CompileCommand=TraceAutoVectorization,*::*,help --version` >> >> See "rejections" (i.e. failures where we don't vectorize) and successes (using TraceNewVectors): >> `./java -Xcomp -XX:CompileCommand=TraceAutoVectorization,*::*,SW_REJECTIONS -XX:+TraceNewVectors --version` >> The results are currently underwhealming. I will have to track many more failures, and I will do that with the bigger refactoring, when I move around the code and require error code returning everywhere, and then I can use that error code for printing. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > fix missing include Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17586#pullrequestreview-1847259580 From kvn at openjdk.org Sat Jan 27 20:30:34 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sat, 27 Jan 2024 20:30:34 GMT Subject: RFR: 8324750: C2: rename Matcher methods using "superword" -> "autovectorization" [v2] In-Reply-To: References: Message-ID: On Sat, 27 Jan 2024 05:47:47 GMT, Emanuel Peter wrote: >> Subtask of https://github.com/openjdk/jdk/pull/16620 >> >> Matcher::superword_max_vector_size -> Matcher::max_vector_size_autovectorization >> >> Matcher::match_rule_supported_superword -> match_rule_supported_autovectorization >> >> This is to make the naming more general, since these methods can be used by any autovectorizer in the future. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > underscore for Christian Update is good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17583#pullrequestreview-1847259724 From kvn at openjdk.org Sat Jan 27 20:39:36 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sat, 27 Jan 2024 20:39:36 GMT Subject: RFR: 8324794: C2 SuperWord: do not ignore reductions in SuperWord::unrolling_analysis In-Reply-To: <4Dv9KCdW7B9Ms3a-MO-59PL-3qebNhNJejuuM6LCB0w=.7575de49-9d14-4c12-b95d-8aca28b07a05@github.com> References: <4Dv9KCdW7B9Ms3a-MO-59PL-3qebNhNJejuuM6LCB0w=.7575de49-9d14-4c12-b95d-8aca28b07a05@github.com> Message-ID: <-f9_1s2hXm20N3DABo4Hq1dyKzAxRHq3kJ7I_D-PONo=.5566f4f5-e975-4bee-b274-c7a136672c1b@github.com> On Sat, 27 Jan 2024 06:12:40 GMT, Emanuel Peter wrote: > Subtask of https://github.com/openjdk/jdk/pull/16620 > > Ignoring reductions in unrolling_analysis is unnecessary, and it adds unnecessary dependency of unrolling_analysis on reduction-analysis. That dependency needs to be removed for further refactoring. Change match the subject but your description is confusing: > Ignoring reductions in unrolling_analysis is unnecessary, and it adds unnecessary dependency of unrolling_analysis on reduction-analysis. That dependency needs to be removed for further refactoring. The only dependency you are talking about is the 'is_marked_reduction()` call in the condition you are removing. Right? Do you know why this check was added? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17604#issuecomment-1913329053 From kvn at openjdk.org Sat Jan 27 21:10:24 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sat, 27 Jan 2024 21:10:24 GMT Subject: RFR: JDK-8322854: Incorrect rematerialization of scalar replaced objects in C2 In-Reply-To: References: Message-ID: On Wed, 24 Jan 2024 22:40:59 GMT, Cesar Soares Lucas wrote: > Current implementation of `PhaseMacroExpand::value_from_mem` returns `return _igvn.zerocon(ft);` when it hits a sentinel while searching for a memory operation on a given slice. One of the sentinels is the memory input of the allocate node origin of the memory slice. Therefore, `value_from_mem` may return `zeroconf(ft)` if `sfpt_mem` is the same memory edge used by the Allocate node origin of the memory slice being traversed. > > The scalar replacement implementation uses `value_from_mem` during creation of metadata describing object scalar replaced (see `PhaseMacroExpand::create_scalarized_object_description`). The `create_scalarized_object_description` method is also used as part of RAM optimization implementation. The RAM optimization targets Phi nodes and therefore a memory graph loop created by a _memory phi_ node is possible to seen as part of the transformation. See image below: > > > > This pattern doesn't show up when scalarizing objects that don't participate in allocation merges. > > To fix the issue I changed the code in `value_from_mem` to instead of using the _input_ memory edge of the Allocate as a stop condition, it will now use the projection memory edge of the Allocate. > > Tested locally on windows, mac and linux x86_64 with JTREG tier1-3 and didn't observe any regression. @JohnTortugo do I understand correctly that we have a loop and the Phi node we are processing is memory input to Allocation? If I recall correctly, the only way we get to `alloc->in(Mem)` if there is no `Initialize` node (there are such cases). In such case `Allocation` may not have memory out projection. Why your case see `alloc->in(Mem)`? What other `Phi` node's edge points to? I am concern if you use projection memory edge of the Allocate you may miss/skip it during search and start searching unrelated path. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17562#issuecomment-1913336084 From jkarthikeyan at openjdk.org Sat Jan 27 23:16:34 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Sat, 27 Jan 2024 23:16:34 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v4] In-Reply-To: <7ZqplWMMT9Rs-UNV94VY4cXldlPbYVZ2FafssMTSRKg=.b6cc2967-0040-4452-bd6d-fa4eec2d545d@github.com> References: <7ZqplWMMT9Rs-UNV94VY4cXldlPbYVZ2FafssMTSRKg=.b6cc2967-0040-4452-bd6d-fa4eec2d545d@github.com> Message-ID: On Thu, 25 Jan 2024 02:57:08 GMT, Quan Anh Mai wrote: > Regarding `contains` vs `higher_equal`, it is mainly due to the fact that `contains` being a much cheaper operation while `higher_equal` will do a `meet` followed by a hash table indexing. I forgot that `higher_equal` requires hashconsing- that makes sense to me! ------------- PR Comment: https://git.openjdk.org/jdk/pull/17508#issuecomment-1913363510 From kvn at openjdk.org Sat Jan 27 23:19:33 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sat, 27 Jan 2024 23:19:33 GMT Subject: RFR: JDK-8317299: safepoint scalarization doesn't keep track of the depth of the JVM state In-Reply-To: References: Message-ID: On Fri, 19 Jan 2024 16:31:25 GMT, Damon Fenacci wrote: > # Issue > > The origin of the problem is tied to the fact that, when C2 optimizes vector boxes, it performs safepoint object scalarization before late inlining. > This can lead to situations in which scalarization adds scalarized values to the JVM state and late inlining of further methods adds further JVM state entries on top for each inlined method. > With the example of the reported bug (_TestIntrinsicBailOut.java_) we get to a situation like this: > > ... > bc: JVMS depth=6 loc=20 stk=23 arg=23 mon=23 scalar=23 end=23 mondepth=0 sp=0 bci=1 reexecute=false method=virtual jobject jdk.incubator.vector.ByteVector.rearrangeTemplate(jobject, jobject) > bc: JVMS depth=7 loc=23 stk=27 arg=27 mon=27 scalar=27 end=27 mondepth=0 sp=0 bci=36 reexecute=false method=virtual jobject jdk.incubator.vector.AbstractShuffle.checkIndexes() > bc: JVMS depth=8 loc=27 stk=28 arg=28 mon=28 scalar=28 end=28 mondepth=0 sp=0 bci=1 reexecute=false method=virtual jobject jdk.incubator.vector.AbstractShuffle.reorder() > bc: JVMS depth=9 loc=28 stk=29 arg=29 mon=29 scalar=29 end=31 mondepth=0 sp=0 bci=1 reexecute=false method=virtual jobject jdk.internal.vm.vector.VectorSupport$VectorPayload.getPayload() > bc: JVMS depth=10 loc=31 stk=32 arg=32 mon=32 scalar=32 end=32 mondepth=0 sp=0 bci=3 reexecute=false method=static jobject jdk.internal.vm.vector.VectorSupport.maybeRebox(jobject) > bc: JVMS depth=11 loc=32 stk=33 arg=33 mon=33 scalar=33 end=33 mondepth=0 sp=0 bci=1 reexecute=false method=virtual void jdk.internal.misc.Unsafe.loadFence() > > `JVMS depth=9` shows 2 scalars but 2 further inlines added 2 more JVM states (with no scalars). > > The corresponding node looks like this: > image > > To keep track of its scalarized inputs, `SafePointScalarObjectNode` keeps a field `_first_index`, which is supposed to be "relative to the last (youngest) jvms->_scloff"... > https://github.com/openjdk/jdk/blob/c5e72450966ad50d57a8d22e9d634bfcb319aee9/src/hotspot/share/opto/callnode.hpp#L509-L511 > but if there are late inlined methods, this field is going to be relative to the JVM state at the depth before inlining happened (e.g. depth=9 in the example) and not relative to the youngest depth. > > # Solution > > In order to keep track of the right depth a `_depth` field is added to `SafePointScalarObjectNode`, which refers to the depth of the JVM state the `_first_index` field refers to. The method `uint first_index(JVMState*... Interesting. Could it resolve the issue [JDK-8276112](https://bugs.openjdk.org/browse/JDK-8276112) so we can REDO [DK-8276998](https://bugs.openjdk.org/browse/JDK-8276998)? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17500#issuecomment-1913364161 From epeter at openjdk.org Sun Jan 28 09:48:24 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Sun, 28 Jan 2024 09:48:24 GMT Subject: RFR: 8324794: C2 SuperWord: do not ignore reductions in SuperWord::unrolling_analysis In-Reply-To: <-f9_1s2hXm20N3DABo4Hq1dyKzAxRHq3kJ7I_D-PONo=.5566f4f5-e975-4bee-b274-c7a136672c1b@github.com> References: <4Dv9KCdW7B9Ms3a-MO-59PL-3qebNhNJejuuM6LCB0w=.7575de49-9d14-4c12-b95d-8aca28b07a05@github.com> <-f9_1s2hXm20N3DABo4Hq1dyKzAxRHq3kJ7I_D-PONo=.5566f4f5-e975-4bee-b274-c7a136672c1b@github.com> Message-ID: On Sat, 27 Jan 2024 20:37:00 GMT, Vladimir Kozlov wrote: >> Subtask of https://github.com/openjdk/jdk/pull/16620 >> >> Ignoring reductions in unrolling_analysis is unnecessary, and it adds unnecessary dependency of unrolling_analysis on reduction-analysis. That dependency needs to be removed for further refactoring. > > Change match the subject but your description is confusing: > >> Ignoring reductions in unrolling_analysis is unnecessary, and it adds unnecessary dependency of unrolling_analysis on reduction-analysis. That dependency needs to be removed for further refactoring. > > The only dependency you are talking about is the 'is_marked_reduction()` call in the condition you are removing. Right? > > Do you know why this check was added? @vnkozlov Yes, exactly, the call to `is_marked_reduction`. Other than that, unrolling_analysis could be static, and does not need any information from SuperWord. I'd like to splitt off unrolling_analysis, and so I'll have to remove the call to `is_marked_reduction`. It seems like this was in from the begginning, when Michael Berg added the unrolling_analysis with https://github.com/openjdk/jdk/commit/7c7b91845f94d13b8fed7911be7f933cf0df28d4 I can see no reason stated in the RFE or the code itself. I can only speculate: maybe the idea was that reductions are not profitable, unless there are other nodes, like stores and loads. So if we only find reductions, then we would not adjust the unrolling, since we are not expecting vectorization anyway. Again: only speculation. You reviewed the code in 2015, maybe you still remember the reason ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/17604#issuecomment-1913535400 From rgiulietti at openjdk.org Sun Jan 28 18:07:55 2024 From: rgiulietti at openjdk.org (Raffaello Giulietti) Date: Sun, 28 Jan 2024 18:07:55 GMT Subject: RFR: 8282365: Consolidate and improve division by constant idealizations [v46] In-Reply-To: References: Message-ID: <33m-Ki1mBcpVfzEeS0xi3MNUmhUwMYjD78akStWJmdo=.fdb5268a-ad38-4c44-8948-f3a347969996@github.com> On Sat, 20 Jan 2024 12:17:04 GMT, Quan Anh Mai wrote: >> This patch implements idealisation for unsigned divisions to change a division by a constant into a series of multiplication and shift. I also change the idealisation of `DivI` to get a more efficient series when the magic constant overflows an int32. >> >> In general, the idea behind a signed division transformation is that for a positive constant `d`, we would need to find constants `c` and `m` so that: >> >> floor(x / d) = floor(x * c / 2**m) for 0 < x < 2**(N - 1) (1) >> ceil(x / d) = floor(x * c / 2**m) + 1 for -2**(N - 1) <= x < 0 (2) >> >> The implementation in the original book takes into consideration that the machine may not be able to perform the full multiplication `x * c`, so the constant overflow and we need to add back the dividend as in `DivLNode::Ideal` cases. However, for int32 division, `x * c` cannot overflow an int64. As a result, it is always feasible to just calculate the product and shift the result. >> >> For unsigned multiplication, the situation is somewhat trickier because the condition needs to be twice as strong (the condition (1) and (2) above are mostly the same). This results in the magic constant `c` calculated based on the method presented in Hacker's Delight by Henry S. Warren, Jr. may overflow an uintN. For int division, we can depend on the theorem devised by Arch D. Robison in N-Bit Unsigned Division Via N-Bit Multiply-Add, which states that there exists either: >> >> c1 in uint32 and m1, such that floor(x / d) = floor(x * c1 / 2**m1) for 0 < x < 2**32 (3) >> c2 in uint32 and m2, such that floor(x / d) = floor((x + 1) * c2 / 2**m2) for 0 < x < 2**32 (4) >> >> which means that either `x * c1` never overflows an uint64 or `(x + 1) * c2` never overflows an uint64. And we can perform a full multiplication. >> >> For longs, there is no way to do a full multiplication so we do some basic transformations to achieve a computable formula. The details I have written as comments in the overflow case. >> >> More tests are added to cover the possible patterns. >> >> Please take a look and have some reviews. Thank you very much. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > just be simple src/hotspot/share/opto/divconstants.cpp line 192: > 190: // > 191: // floor(x / d) = floor((x + 1) * c / 2**s) for every integer x in [0, 2**W). > 192: // For the record, here's a slightly more general result. As above, let `N >= 0` be an upper bound for the non-negative dividend `n`, that is, `n` in `[0, N]`, and let divisor `d > 1`. Further, let `v = floor(N / d) * d` The following predicates on real number `x` are equivalent (a) `(1 / d) * (1 - 1 / (v + 1)) <= x < (1 / d)` (b) `floor(n / d) = floor((n + 1) * x)` for all `n` in `[0, v + d)` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1468911815 From jkarthikeyan at openjdk.org Mon Jan 29 04:39:34 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Mon, 29 Jan 2024 04:39:34 GMT Subject: RFR: 8324655: Identify integer minimum and maximum patterns created with if statements [v2] In-Reply-To: References: Message-ID: On Fri, 26 Jan 2024 16:05:45 GMT, Jasmine Karthikeyan wrote: >> Hi all, I've created this patch which aims to convert common integer mininum and maximum patterns created using if statements into Min and Max nodes. These patterns are usually in the form of `a > b ? a : b` and similar, as well as patterns such as `if (a > b) b = a;`. While this transform doesn't generally improve code generation it's own, it simplifies control flow and creates new opportunities for vectorization. >> >> I've created a benchmark for the PR, and I've attached some data from my (Zen 3) machine: >> >> Baseline Patch Improvement >> Benchmark Mode Cnt Score Error Units Score Error Units >> IfMinMax.testReductionInt avgt 15 500.307 ? 16.687 ns/op 509.383 ? 32.645 ns/op (no change)* >> IfMinMax.testReductionLong avgt 15 493.184 ? 17.596 ns/op 513.587 ? 28.339 ns/op (no change)* >> IfMinMax.testSingleInt avgt 15 3.588 ? 0.540 ns/op 2.965 ? 1.380 ns/op (no change) >> IfMinMax.testSingleLong avgt 15 3.673 ? 0.128 ns/op 3.506 ? 0.590 ns/op (no change) >> IfMinMax.testVectorInt avgt 15 340.425 ? 13.123 ns/op 59.689 ? 7.509 ns/op + 5.7x >> IfMinMax.testVectorLong avgt 15 326.420 ? 15.554 ns/op 117.190 ? 5.622 ns/op + 2.8x >> >> >> * After writing this benchmark I discovered that the compiler doesn't seem to create some simple min/max reductions, even when using Math.min/max() directly. Is this known or should I create a followup RFE for this? >> >> The patch passes tier 1-3 testing on linux x64. Reviews or comments would be appreciated! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Don't transform highly predictable branches > * After writing this benchmark I discovered that the compiler doesn't seem to create some simple min/max reductions, even when using Math.min/max() directly. Is this known or should I create a followup RFE for this? After a bit of digging, I think I'm running into: [JDK-8188313](https://bugs.openjdk.org/browse/JDK-8188313) ------------- PR Comment: https://git.openjdk.org/jdk/pull/17574#issuecomment-1913946831 From varadam at openjdk.org Mon Jan 29 05:51:45 2024 From: varadam at openjdk.org (Varada M) Date: Mon, 29 Jan 2024 05:51:45 GMT Subject: RFR: JDK-8322648: Improve class initialization barrier in TemplateTable::_new for PPC [v2] In-Reply-To: References: Message-ID: > ppc port implementation of https://github.com/openjdk/jdk/pull/17006 > > Fastdebug and Release : build and tier1 testing successful. > > JBS Issue : [JDK-8322648](https://bugs.openjdk.org/browse/JDK-8322648) Varada M has updated the pull request incrementally with one additional commit since the last revision: 8322648: Improve class initialization barrier in TemplateTable::_new for PPC ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17518/files - new: https://git.openjdk.org/jdk/pull/17518/files/b84d7d5d..2fb4cc08 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17518&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17518&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17518.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17518/head:pull/17518 PR: https://git.openjdk.org/jdk/pull/17518 From varadam at openjdk.org Mon Jan 29 06:00:34 2024 From: varadam at openjdk.org (Varada M) Date: Mon, 29 Jan 2024 06:00:34 GMT Subject: RFR: JDK-8322648: Improve class initialization barrier in TemplateTable::_new for PPC [v2] In-Reply-To: References: Message-ID: On Wed, 24 Jan 2024 13:45:35 GMT, Martin Doerr wrote: > You need to adapt the succeeding code and remove the dependent `crnand` instruction. I suggest to use `cmpdi(CCR0, Rscratch, InstanceKlass::fully_initialized);`. Thank you @TheRealMDoerr for the review. I have applied the suggested changes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17518#issuecomment-1914012379 From mdoerr at openjdk.org Mon Jan 29 06:16:33 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 29 Jan 2024 06:16:33 GMT Subject: RFR: JDK-8322648: Improve class initialization barrier in TemplateTable::_new for PPC [v2] In-Reply-To: References: Message-ID: On Mon, 29 Jan 2024 05:51:45 GMT, Varada M wrote: >> ppc port implementation of https://github.com/openjdk/jdk/pull/17006 >> >> Fastdebug and Release : build and tier1 testing successful. >> >> JBS Issue : [JDK-8322648](https://bugs.openjdk.org/browse/JDK-8322648) > > Varada M has updated the pull request incrementally with one additional commit since the last revision: > > 8322648: Improve class initialization barrier in TemplateTable::_new for PPC Sorry, my comment was wrong. "fully_initialized" check is done in `clinit_barrier`. We only need __ lwz(Rinstance_size, in_bytes(Klass::layout_helper_offset()), RinstanceKlass); __ andi_(R0, Rinstance_size, Klass::_lh_instance_slow_path_bit); __ bne(CCR0, Lslow_case); (See other platforms for reference.) Bugs in this code are likely to cause interpreter performance loss and no test failures. The whole purpose of this change is to improve interpreter performance. So, this should be checked. We should either check the petclinic benchmark referenced in the original issue or at least compare performance using -Xint. ------------- Changes requested by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17518#pullrequestreview-1847895107 From mdoerr at openjdk.org Mon Jan 29 06:38:36 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 29 Jan 2024 06:38:36 GMT Subject: RFR: JDK-8322648: Improve class initialization barrier in TemplateTable::_new for PPC [v2] In-Reply-To: References: Message-ID: On Mon, 29 Jan 2024 05:51:45 GMT, Varada M wrote: >> ppc port implementation of https://github.com/openjdk/jdk/pull/17006 >> >> Fastdebug and Release : build and tier1 testing successful. >> >> JBS Issue : [JDK-8322648](https://bugs.openjdk.org/browse/JDK-8322648) > > Varada M has updated the pull request incrementally with one additional commit since the last revision: > > 8322648: Improve class initialization barrier in TemplateTable::_new for PPC src/hotspot/cpu/ppc/templateTable_ppc_64.cpp line 3808: > 3806: // Make sure klass is initialized. > 3807: assert(VM_Version::supports_fast_class_init_checks(), "Optimization requires support for fast class initialization checks"); > 3808: __ clinit_barrier(Rcpool, R16_thread, nullptr /*L_fast_path*/, &Lslow_case); `Rcpool` is the wrong parameter. You need to use `RinstanceKlass`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17518#discussion_r1469131963 From vlivanov at openjdk.org Mon Jan 29 06:40:28 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 29 Jan 2024 06:40:28 GMT Subject: RFR: 8324433: Introduce a way to determine if an expression is evaluated as a constant by the Jit compiler [v7] In-Reply-To: References: Message-ID: On Thu, 25 Jan 2024 14:01:59 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch introduces `JitCompiler::isConstantExpression` which can be used to statically determine whether an expression has been constant-folded by the Jit compiler, leading to more constant-folding opportunities. For example, it can be used in `MemorySessionImpl::checkValidStateRaw` to eliminate the lifetime check on global sessions without imposing additional branches on other non-global sessions. This is similar to `__builtin_constant_p` in GCC and clang. >> >> Please kindly give your opinion as well as your reviews, thanks very much. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > change expr to val, add examples I share Roland's concerns w.r.t. profiling. If there's any code guarded by `isCompileConstant(value) == true`, the only way to trigger its profiling is by deoptimizing from C2-generated code. I added `MHI.isCompileConstant` intrinsic as part of a point fix for a performance problem caused by Java-level code profiling/specialization happening in `java.lang.invoke`. It guards profiling logic which is pruned completely once C2 kicks in. So, absence of profiling is not a problem there. Also, there's a constraint on implementation side: the current implementation supports only parse-time folding. If a value turns into a constant later (either during parsing after the call is encountered or during post-parsing phase), it won't have any effect. So, as it is now (both on API and implementation sides) it's hard to correctly use `isCompileConstant` for more general cases. It would be helpful to see more examples illustrating possible usage scenarios. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17527#issuecomment-1914052884 From varadam at openjdk.org Mon Jan 29 06:52:27 2024 From: varadam at openjdk.org (Varada M) Date: Mon, 29 Jan 2024 06:52:27 GMT Subject: RFR: JDK-8322648: Improve class initialization barrier in TemplateTable::_new for PPC [v2] In-Reply-To: References: Message-ID: On Mon, 29 Jan 2024 06:35:38 GMT, Martin Doerr wrote: >> Varada M has updated the pull request incrementally with one additional commit since the last revision: >> >> 8322648: Improve class initialization barrier in TemplateTable::_new for PPC > > src/hotspot/cpu/ppc/templateTable_ppc_64.cpp line 3808: > >> 3806: // Make sure klass is initialized. >> 3807: assert(VM_Version::supports_fast_class_init_checks(), "Optimization requires support for fast class initialization checks"); >> 3808: __ clinit_barrier(Rcpool, R16_thread, nullptr /*L_fast_path*/, &Lslow_case); > > `Rcpool` is the wrong parameter. You need to use `RinstanceKlass`. Thank you for the correction @TheRealMDoerr . I have applied the code change. Tier1 test is running. I will push once it is done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17518#discussion_r1469140673 From epeter at openjdk.org Mon Jan 29 06:57:42 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 29 Jan 2024 06:57:42 GMT Subject: RFR: 8324752: C2 Superword: remove SuperWordRTDepCheck In-Reply-To: References: <-P8KzeUnHvnu9mv1F2IqxJBZIsx_EJYRvYaR4CsKkCY=.73c6743d-25ea-491d-9be6-70b89a76ae7f@github.com> Message-ID: On Fri, 26 Jan 2024 17:49:51 GMT, Vladimir Kozlov wrote: >> Subtask of https://github.com/openjdk/jdk/pull/16620 >> >> SuperWordRTDepCheck is a debug-only flag, which detects if there are arrays in the same slice that have different bases, i.e. may be different arrays. This could be the basis for alias-analysis. >> >> We should do aliasing-analysis properly in a future RFE ([JDK-8324751](https://bugs.openjdk.org/browse/JDK-8324751)). If we can prove (statically or with a runtime-check) that two arrays are different, then this removes edges from the dependency graph, and may allow vectorization that would otherwise not be possible. > > I did not even know we had such code. > I agree that we should do proper analysis instead of this experimental code (which is off by default and can't be switched on in product). Thanks for the review @vnkozlov @chhagedorn ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/17585#issuecomment-1914070277 From epeter at openjdk.org Mon Jan 29 06:57:43 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 29 Jan 2024 06:57:43 GMT Subject: Integrated: 8324752: C2 Superword: remove SuperWordRTDepCheck In-Reply-To: <-P8KzeUnHvnu9mv1F2IqxJBZIsx_EJYRvYaR4CsKkCY=.73c6743d-25ea-491d-9be6-70b89a76ae7f@github.com> References: <-P8KzeUnHvnu9mv1F2IqxJBZIsx_EJYRvYaR4CsKkCY=.73c6743d-25ea-491d-9be6-70b89a76ae7f@github.com> Message-ID: <3X5c9lJQK_b4xmTR1uIlnI-VwR5Jfd3U70sy0t8qd6s=.bfa7533f-1773-4036-9c67-423879fe69a5@github.com> On Fri, 26 Jan 2024 10:11:29 GMT, Emanuel Peter wrote: > Subtask of https://github.com/openjdk/jdk/pull/16620 > > SuperWordRTDepCheck is a debug-only flag, which detects if there are arrays in the same slice that have different bases, i.e. may be different arrays. This could be the basis for alias-analysis. > > We should do aliasing-analysis properly in a future RFE ([JDK-8324751](https://bugs.openjdk.org/browse/JDK-8324751)). If we can prove (statically or with a runtime-check) that two arrays are different, then this removes edges from the dependency graph, and may allow vectorization that would otherwise not be possible. This pull request has now been integrated. Changeset: 525c0cd0 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/525c0cd09f98c3a9965cf20d2ac3b306a938a910 Stats: 56 lines in 3 files changed: 0 ins; 55 del; 1 mod 8324752: C2 Superword: remove SuperWordRTDepCheck Reviewed-by: kvn, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/17585 From epeter at openjdk.org Mon Jan 29 07:03:52 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 29 Jan 2024 07:03:52 GMT Subject: RFR: 8324765: C2 SuperWord: remove dead code: SuperWord::insert_extracts [v2] In-Reply-To: References: Message-ID: On Fri, 26 Jan 2024 19:32:54 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> typo fix by Christian >> >> Co-authored-by: Christian Hagedorn > > Good. Thanks for the reviews @vnkozlov @chhagedorn ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/17589#issuecomment-1914075146 From epeter at openjdk.org Mon Jan 29 07:03:52 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 29 Jan 2024 07:03:52 GMT Subject: Integrated: 8324765: C2 SuperWord: remove dead code: SuperWord::insert_extracts In-Reply-To: References: Message-ID: On Fri, 26 Jan 2024 15:14:13 GMT, Emanuel Peter wrote: > Subtask of https://github.com/openjdk/jdk/pull/16620 > > As far as all tests and code study have showed me, SuperWord::insert_extracts is dead. > > I am replacing it with verification code, that checks that no ExtractNode is required. > > **Details** > > All the relevant cases are marked as "unprofitable" in `SuperWord::profitable`, see: > https://github.com/openjdk/jdk/blob/dfdd2174d7af5e3e995147484db17b45b006f6d0/src/hotspot/share/opto/superword.cpp#L1912-L1915 This pull request has now been integrated. Changeset: 65d6bc1d Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/65d6bc1d4c1054e82ace2355d6802e0a7ba24a7f Stats: 54 lines in 2 files changed: 3 ins; 28 del; 23 mod 8324765: C2 SuperWord: remove dead code: SuperWord::insert_extracts Reviewed-by: kvn, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/17589 From rehn at openjdk.org Mon Jan 29 07:14:26 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 29 Jan 2024 07:14:26 GMT Subject: RFR: 8324125: Improve class initialization barrier in TemplateTable::_new for RISC-V [v2] In-Reply-To: References: <_TyDNsY_znk2GR9l7rpALqDOuwMwj4hkuF5kKcyJBQg=.1d9f5448-ad18-49bf-9fde-a68e88626470@github.com> Message-ID: On Thu, 25 Jan 2024 15:40:51 GMT, Gui Cao wrote: >> Hi, This RISC-V Port implementation for https://github.com/openjdk/jdk/pull/17006, >> >> ### Testing: >> >> - [x] Run tier1-3 tests on qemu 8.1.0 with UseRVV (fastdebug) >> - [x] Run tier1-3 tests with SiFive unmatched (release) > > Gui Cao has updated the pull request incrementally with one additional commit since the last revision: > > Put the msg string on a separate line Thanks! ------------- Marked as reviewed by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17548#pullrequestreview-1847959545 From epeter at openjdk.org Mon Jan 29 07:21:33 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 29 Jan 2024 07:21:33 GMT Subject: RFR: 8324775: C2 SuperWord: refactor visited sets [v3] In-Reply-To: References: Message-ID: <-CgLh4Ec10jrTIKKSTazUKLbNDEVIUwWXhlBJWCY1dM=.da61d36a-8ad6-4068-a452-236bbaca9814@github.com> > Subtask of https://github.com/openjdk/jdk/pull/16620 > > Currently, the visited set is a "global" set that is reused all through SuperWord: > `_visited`, `_post_visited`, and also `_stk`. > > This makes other refactorings difficult. I am refactoring all related code to make the visited sets local, with use of ResourceMark. > > I am also refactoring the `independent` queries: from using recursive function calls to iterative. This is necessary, unless we want to pass the visited set around into the recursive calls (I think not!) > > At the same time, I wanted the code of `find_dependence` to become easier, and closer to `independent`, hence I re-defined it to `mutually_independent`, and had to slightly adapt also its usages. Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: - manual merge because lines were too close - comment improvement by Vladimir - more removal - 8324775 ------------- Changes: https://git.openjdk.org/jdk/pull/17594/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17594&range=02 Stats: 129 lines in 2 files changed: 27 ins; 55 del; 47 mod Patch: https://git.openjdk.org/jdk/pull/17594.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17594/head:pull/17594 PR: https://git.openjdk.org/jdk/pull/17594 From epeter at openjdk.org Mon Jan 29 07:48:28 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 29 Jan 2024 07:48:28 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store In-Reply-To: References: Message-ID: On Tue, 16 Jan 2024 23:08:37 GMT, Vladimir Kozlov wrote: >> This is a feature requiested by @RogerRiggs and @cl4es . >> >> **Idea** >> >> Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get speedups by using Unsafe (e.g. `Unsafe.putLongUnaligned`), or ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code. >> >> This patch here supports a few simple use-cases, like these: >> >> Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant: >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395 >> >> Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the splitting (i.e. shifting and truncation), and directly store the variable: >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456 >> >> The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian): >> >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338 >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472 >> >> **Details** >> >> This draft currently implements the optimization in an additional special IGVN phase: >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485 >> >> We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. We essentially try to establish a chain of mergable stores: >> >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806 >> >> Mergable stores must have the same Opcode (implies they have the same element type and hence size). Further, mergable stores must have the same control (or be separated by only a RangeCheck). Further, they must either bot... > > About changes. May be you can use something similar to ClearArrayNode. Collect all stores into one node and corresponding Mach (machine) nodes will implement it using available instructions instead of C2 decide the size of combined store. > > One drawback for these changes I see that you may use a lot more registers to keep all values. > > For constants you need to keep in mind the order of memory (little or big endian). @vnkozlov exactly, it is some kind of stright-line vectorization, at least if we generalized the pattern. And I asked about that earlier: > @merykitty @cl4es @RogerRiggs @vnkozlov I wonder if you think that the approach of this PR is good, and if you have any suggestions about it? >Is a separate phase ok? >Is this PR in a sweet-spot that reaches the goals of the library-folks, but is not too complex? Would you prefer a more general solution, like a straight-line SLP algorithm, that can merge (even vectorize) any load / store sequences, even merge accesses with different element sizes and with gaps/padding? Currently, we only vectoize loops. But this patch here also optimizes straight-line code. Of course performance mostly matters in loops, but sometimes not everything gets inlined, and then a straight-line optimization is still helpful. It would be nice to include similar patterns into the auto-vectorizer. But it would require detecting patterns that cross vector-lanes, and we currently only do that for reductions. And for storing constants, we currently also only allow if all lanes use the same value, and it can be broadcast. I have been thinking about all these things, but I will have to see what is feasible, and I think they are a bit lower priority for now. TLDR: this RFE here deals with straight-line code, the vectorizer deals (so far) only with loops. So they are 2 optimizations that can stand on their own. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16245#issuecomment-1914131085 From chagedorn at openjdk.org Mon Jan 29 08:07:38 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 29 Jan 2024 08:07:38 GMT Subject: RFR: 8317572: C2 SuperWord: refactor/improve TraceSuperWord, replace VectorizeDebugOption with TraceAutoVectorization [v7] In-Reply-To: References: Message-ID: On Sat, 27 Jan 2024 05:39:51 GMT, Emanuel Peter wrote: >> Subtask of https://github.com/openjdk/jdk/pull/16620 >> >> I got approval to remove VectorizeDebugOption: [JDK-8320668](https://bugs.openjdk.org/browse/JDK-8320668) >> >> I want a more general flag for AutoVectorization, that can trace different components of AutoVectorization. >> It should be a CompileCommand, so that it can select which methods it traces for. >> >> TraceSuperWord should still look similar, and select a subset of the TraceAutoVectorization components (those for SuperWord), but still apply to all classes/methods. >> >> With more refactoring later in [JDK-8315361](https://bugs.openjdk.org/browse/JDK-8315361), this flag should become more usable and interpretable. Especially, the idea is that different components of the `VLoop / VLoopAnalyzer` can have tracing enabled / disabled. >> >> **How to use the flag:** >> Get "help", i.e. see all available tags: >> `./java -Xcomp -XX:CompileCommand=TraceAutoVectorization,*::*,help --version` >> >> See "rejections" (i.e. failures where we don't vectorize) and successes (using TraceNewVectors): >> `./java -Xcomp -XX:CompileCommand=TraceAutoVectorization,*::*,SW_REJECTIONS -XX:+TraceNewVectors --version` >> The results are currently underwhealming. I will have to track many more failures, and I will do that with the bigger refactoring, when I move around the code and require error code returning everywhere, and then I can use that error code for printing. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > fix missing include Looks good, thanks for the updates! src/hotspot/share/opto/traceAutoVectorizationTag.hpp line 26: > 24: > 25: #ifndef SHARE_OPTO_TRACE_AUTO_VECTORIZATION_TAG_HPP > 26: #define SHARE_OPTO_TRACE_AUTO_VECTORIZATION_TAG_HPP I think for this define, you should keep `SHARE_OPTO_TRACEAUTOVECTORIZATIONTAG_HPP` to follow the convention of other files where we do not insert underlines in the filename. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17586#pullrequestreview-1848009572 PR Review Comment: https://git.openjdk.org/jdk/pull/17586#discussion_r1469188205 From shami.thoke at gmail.com Mon Jan 29 08:15:25 2024 From: shami.thoke at gmail.com (shami) Date: Mon, 29 Jan 2024 13:45:25 +0530 Subject: Difference between [jdk20] Thread.ensureMaterializedForStackWalk and Blackhole. Message-ID: Hello, I am trying to understand the JDK20 intrinsic - Thread.ensureMaterializedForStackWalk ( https://github.com/openjdk/jdk/pull/10952/files). It seems to be functionally equivalent to the *Blackhole.consume* intrinsic. Is there any subtle difference(s) between the two, or can one be implemented using the other? Thanks in advance. Shami. -------------- next part -------------- An HTML attachment was scrubbed... URL: From tholenstein at openjdk.org Mon Jan 29 08:39:44 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 29 Jan 2024 08:39:44 GMT Subject: RFR: JDK-8210858: AArch64: remove Math.log intrinsic [v2] In-Reply-To: References: <7Zgt8VOxl48Niocueb70XwxmYIAIaEZUlxKGvPP6BnU=.e354f71c-d6c6-40b9-a93e-98ef7ba8009d@github.com> <-tZP1uJ_KakJpOgQT68dO5pqFOJ61YU-CV9-r2lzPz8=.894eac3e-ceaa-4908-8c67-9859caee6046@github.com> Message-ID: On Mon, 22 Jan 2024 12:43:35 GMT, Aleksey Shipilev wrote: >>> I think the discussion on merits of removing the StubRoutines can continue in the relevant RFE. >> >> So move that discussion to https://bugs.openjdk.org/browse/JDK-8324296 ? > >> > I think the discussion on merits of removing the StubRoutines can continue in the relevant RFE. >> So move that discussion to https://bugs.openjdk.org/browse/JDK-8324296 ? > > Yes! Thanks for the reviews @shipilev and @nick-arm ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/17480#issuecomment-1914205972 From tholenstein at openjdk.org Mon Jan 29 08:39:45 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 29 Jan 2024 08:39:45 GMT Subject: Integrated: JDK-8210858: AArch64: remove Math.log intrinsic In-Reply-To: <7Zgt8VOxl48Niocueb70XwxmYIAIaEZUlxKGvPP6BnU=.e354f71c-d6c6-40b9-a93e-98ef7ba8009d@github.com> References: <7Zgt8VOxl48Niocueb70XwxmYIAIaEZUlxKGvPP6BnU=.e354f71c-d6c6-40b9-a93e-98ef7ba8009d@github.com> Message-ID: On Thu, 18 Jan 2024 08:58:20 GMT, Tobias Holenstein wrote: > [JDK-8215133](https://bugs.openjdk.org/browse/JDK-8215133) disabled vmIntrinsics::_dlog. Remove it now > > ### Why remove > > That Java specification says: > > "The computed result must be within 1 ulp of the exact result. Results must be semi-monotonic... whenever the mathematical function is non-decreasing, so is the floating-point approximation, likewise, whenever the mathematical function is non-increasing, so is the floating-point approximation" > > There is no proof of the monotonicity of this intrinsics at the moment. This pull request has now been integrated. Changeset: 422020c4 Author: Tobias Holenstein URL: https://git.openjdk.org/jdk/commit/422020c4d691f3ad4c7af4fc2c60e7ada66734e0 Stats: 400 lines in 4 files changed: 0 ins; 395 del; 5 mod 8210858: AArch64: remove Math.log intrinsic Reviewed-by: ngasson, shade ------------- PR: https://git.openjdk.org/jdk/pull/17480 From chagedorn at openjdk.org Mon Jan 29 08:43:24 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 29 Jan 2024 08:43:24 GMT Subject: RFR: 8324750: C2: rename Matcher methods using "superword" -> "autovectorization" [v2] In-Reply-To: References: Message-ID: On Sat, 27 Jan 2024 05:47:47 GMT, Emanuel Peter wrote: >> Subtask of https://github.com/openjdk/jdk/pull/16620 >> >> Matcher::superword_max_vector_size -> Matcher::max_vector_size_autovectorization >> >> Matcher::match_rule_supported_superword -> match_rule_supported_autovectorization >> >> This is to make the naming more general, since these methods can be used by any autovectorizer in the future. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > underscore for Christian Thanks for the update, looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17583#pullrequestreview-1848100435 From epeter at openjdk.org Mon Jan 29 08:49:40 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 29 Jan 2024 08:49:40 GMT Subject: RFR: 8324775: C2 SuperWord: refactor visited sets [v2] In-Reply-To: References: Message-ID: On Sat, 27 Jan 2024 20:24:44 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> comment improvement by Vladimir > > Good. Thanks @vnkozlov @chhagedorn for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/17594#issuecomment-1914218190 From epeter at openjdk.org Mon Jan 29 08:49:41 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 29 Jan 2024 08:49:41 GMT Subject: Integrated: 8324775: C2 SuperWord: refactor visited sets In-Reply-To: References: Message-ID: On Fri, 26 Jan 2024 16:53:46 GMT, Emanuel Peter wrote: > Subtask of https://github.com/openjdk/jdk/pull/16620 > > Currently, the visited set is a "global" set that is reused all through SuperWord: > `_visited`, `_post_visited`, and also `_stk`. > > This makes other refactorings difficult. I am refactoring all related code to make the visited sets local, with use of ResourceMark. > > I am also refactoring the `independent` queries: from using recursive function calls to iterative. This is necessary, unless we want to pass the visited set around into the recursive calls (I think not!) > > At the same time, I wanted the code of `find_dependence` to become easier, and closer to `independent`, hence I re-defined it to `mutually_independent`, and had to slightly adapt also its usages. This pull request has now been integrated. Changeset: 6ad78ca8 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/6ad78ca8a5956d4ada6fd0bedebadddb5f6a0edc Stats: 129 lines in 2 files changed: 27 ins; 55 del; 47 mod 8324775: C2 SuperWord: refactor visited sets Reviewed-by: kvn, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/17594 From chagedorn at openjdk.org Mon Jan 29 08:50:27 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 29 Jan 2024 08:50:27 GMT Subject: RFR: 8324794: C2 SuperWord: do not ignore reductions in SuperWord::unrolling_analysis In-Reply-To: <4Dv9KCdW7B9Ms3a-MO-59PL-3qebNhNJejuuM6LCB0w=.7575de49-9d14-4c12-b95d-8aca28b07a05@github.com> References: <4Dv9KCdW7B9Ms3a-MO-59PL-3qebNhNJejuuM6LCB0w=.7575de49-9d14-4c12-b95d-8aca28b07a05@github.com> Message-ID: On Sat, 27 Jan 2024 06:12:40 GMT, Emanuel Peter wrote: > Subtask of https://github.com/openjdk/jdk/pull/16620 > > Ignoring reductions in unrolling_analysis is unnecessary, and it adds unnecessary dependency of unrolling_analysis on reduction-analysis. That dependency needs to be removed for further refactoring. That looks reasonable. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17604#pullrequestreview-1848112822 From epeter at openjdk.org Mon Jan 29 08:52:09 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 29 Jan 2024 08:52:09 GMT Subject: RFR: 8317572: C2 SuperWord: refactor/improve TraceSuperWord, replace VectorizeDebugOption with TraceAutoVectorization [v8] In-Reply-To: References: Message-ID: > Subtask of https://github.com/openjdk/jdk/pull/16620 > > I got approval to remove VectorizeDebugOption: [JDK-8320668](https://bugs.openjdk.org/browse/JDK-8320668) > > I want a more general flag for AutoVectorization, that can trace different components of AutoVectorization. > It should be a CompileCommand, so that it can select which methods it traces for. > > TraceSuperWord should still look similar, and select a subset of the TraceAutoVectorization components (those for SuperWord), but still apply to all classes/methods. > > With more refactoring later in [JDK-8315361](https://bugs.openjdk.org/browse/JDK-8315361), this flag should become more usable and interpretable. Especially, the idea is that different components of the `VLoop / VLoopAnalyzer` can have tracing enabled / disabled. > > **How to use the flag:** > Get "help", i.e. see all available tags: > `./java -Xcomp -XX:CompileCommand=TraceAutoVectorization,*::*,help --version` > > See "rejections" (i.e. failures where we don't vectorize) and successes (using TraceNewVectors): > `./java -Xcomp -XX:CompileCommand=TraceAutoVectorization,*::*,SW_REJECTIONS -XX:+TraceNewVectors --version` > The results are currently underwhealming. I will have to track many more failures, and I will do that with the bigger refactoring, when I move around the code and require error code returning everywhere, and then I can use that error code for printing. Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: - Merge branch 'master' into JDK-8317572 - adjust hpp guards for Christian - fix missing include - add PRODUCT and COMPILER2 ifdefs for Vladimir - fix a test - move code to StringUtils::CommaSeparatedStringIterator - more for Christian - Apply suggestions from code review Co-authored-by: Christian Hagedorn - a bit more - reordering some things - ... and 1 more: https://git.openjdk.org/jdk/compare/e6b2c16b...bf5e8263 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17586/files - new: https://git.openjdk.org/jdk/pull/17586/files/28e0893f..bf5e8263 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17586&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17586&range=06-07 Stats: 3114 lines in 181 files changed: 1899 ins; 758 del; 457 mod Patch: https://git.openjdk.org/jdk/pull/17586.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17586/head:pull/17586 PR: https://git.openjdk.org/jdk/pull/17586 From epeter at openjdk.org Mon Jan 29 08:52:09 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 29 Jan 2024 08:52:09 GMT Subject: RFR: 8317572: C2 SuperWord: refactor/improve TraceSuperWord, replace VectorizeDebugOption with TraceAutoVectorization [v7] In-Reply-To: References: Message-ID: On Sat, 27 Jan 2024 05:39:51 GMT, Emanuel Peter wrote: >> Subtask of https://github.com/openjdk/jdk/pull/16620 >> >> I got approval to remove VectorizeDebugOption: [JDK-8320668](https://bugs.openjdk.org/browse/JDK-8320668) >> >> I want a more general flag for AutoVectorization, that can trace different components of AutoVectorization. >> It should be a CompileCommand, so that it can select which methods it traces for. >> >> TraceSuperWord should still look similar, and select a subset of the TraceAutoVectorization components (those for SuperWord), but still apply to all classes/methods. >> >> With more refactoring later in [JDK-8315361](https://bugs.openjdk.org/browse/JDK-8315361), this flag should become more usable and interpretable. Especially, the idea is that different components of the `VLoop / VLoopAnalyzer` can have tracing enabled / disabled. >> >> **How to use the flag:** >> Get "help", i.e. see all available tags: >> `./java -Xcomp -XX:CompileCommand=TraceAutoVectorization,*::*,help --version` >> >> See "rejections" (i.e. failures where we don't vectorize) and successes (using TraceNewVectors): >> `./java -Xcomp -XX:CompileCommand=TraceAutoVectorization,*::*,SW_REJECTIONS -XX:+TraceNewVectors --version` >> The results are currently underwhealming. I will have to track many more failures, and I will do that with the bigger refactoring, when I move around the code and require error code returning everywhere, and then I can use that error code for printing. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > fix missing include src/hotspot/share/opto/traceAutoVectorizationTag.hpp line 26: > 24: > 25: #ifndef SHARE_OPTO_TRACE_AUTO_VECTORIZATION_TAG_HPP > 26: #define SHARE_OPTO_TRACE_AUTO_VECTORIZATION_TAG_HPP Suggestion: #ifndef SHARE_OPTO_TRACEAUTOVECTORIZATIONTAG_HPP #define SHARE_OPTO_TRACEAUTOVECTORIZATIONTAG_HPP src/hotspot/share/opto/traceAutoVectorizationTag.hpp line 165: > 163: }; > 164: > 165: #endif // SHARE_OPTO_TRACE_AUTO_VECTORIZATION_TAG_HPP Suggestion: #endif // SHARE_OPTO_TRACEAUTOVECTORIZATIONTAG_HPP ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17586#discussion_r1469249434 PR Review Comment: https://git.openjdk.org/jdk/pull/17586#discussion_r1469249654 From epeter at openjdk.org Mon Jan 29 08:53:40 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 29 Jan 2024 08:53:40 GMT Subject: RFR: 8324750: C2: rename Matcher methods using "superword" -> "autovectorization" [v2] In-Reply-To: References: Message-ID: On Mon, 29 Jan 2024 08:40:42 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> underscore for Christian > > Thanks for the update, looks good! Thanks for the reviews @chhagedorn @vnkozlov ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/17583#issuecomment-1914225498 From epeter at openjdk.org Mon Jan 29 08:53:41 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 29 Jan 2024 08:53:41 GMT Subject: Integrated: 8324750: C2: rename Matcher methods using "superword" -> "autovectorization" In-Reply-To: References: Message-ID: On Fri, 26 Jan 2024 09:53:53 GMT, Emanuel Peter wrote: > Subtask of https://github.com/openjdk/jdk/pull/16620 > > Matcher::superword_max_vector_size -> Matcher::max_vector_size_autovectorization > > Matcher::match_rule_supported_superword -> match_rule_supported_autovectorization > > This is to make the naming more general, since these methods can be used by any autovectorizer in the future. This pull request has now been integrated. Changeset: f0bae793 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/f0bae7939a61a79f3e07de97451c433e91742069 Stats: 34 lines in 12 files changed: 0 ins; 0 del; 34 mod 8324750: C2: rename Matcher methods using "superword" -> "autovectorization" Reviewed-by: kvn, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/17583 From dlunden at openjdk.org Mon Jan 29 09:10:54 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 29 Jan 2024 09:10:54 GMT Subject: RFR: 8322996: BoxLockNode creation fails with assert(reg < CHUNK_SIZE) failed: sanity [v8] In-Reply-To: References: Message-ID: <3IFFwKEbNZ0aaAzNw-6xsuZcE6eBW8yj4YkEa0U7VmA=.9bd3d7f3-5930-4610-ae91-a1ba4d4b2714@github.com> > This changeset fixes an issue where deeply nested synchronized statements triggered an assert in C2. > > Changes: > - Bail out on compilation when we create a `BoxLockNode` with a slot index that cannot fit in a `RegMask`. This is similar to how we handle the case when we do not have space to represent arguments in [`opto/matcher.cpp`](https://github.com/openjdk/jdk/blob/58b01dce054c50bcb5a28aad4c1b574acaa90f6d/src/hotspot/share/opto/matcher.cpp#L314-L318) > - Generalize `RegMask::can_represent` to take an additional and optional size argument to facilitate reuse. The default size value, 1, corresponds to the previous functionality. Rewrite `can_represent_arg` to directly call `can_represent(reg, SlotsPerVecZ)`. > - Add a regression test. > > Testing: > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/7671871191) > - tier1 to tier5 on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. > - The new regression test in all tier1 to tier10 contexts on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: Fix last copyright ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17370/files - new: https://git.openjdk.org/jdk/pull/17370/files/ebe23a23..8100a3e5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17370&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17370&range=06-07 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17370.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17370/head:pull/17370 PR: https://git.openjdk.org/jdk/pull/17370 From dlunden at openjdk.org Mon Jan 29 09:10:54 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 29 Jan 2024 09:10:54 GMT Subject: RFR: 8322996: BoxLockNode creation fails with assert(reg < CHUNK_SIZE) failed: sanity [v4] In-Reply-To: References: Message-ID: On Wed, 24 Jan 2024 13:55:56 GMT, Roberto Casta?eda Lozano wrote: >>> The fix itself looks good to me. Would it make sense, for better coverage, to add a couple of additional test cases that exercise the boundaries of the condition that is tested? E.g. one with one `synchronized` statement less than the current one and one with one `synchronized` statement more. >> >> I have experimented with such test cases (various edge cases) and as a result found a related (but separate) issue from this one. I was planning to add these additional tests for that separate issue, to not introduce unnecessary test failures before that fix is integrated. Maybe it is better to add the additional tests directly as part of this changeset instead? > >> I have experimented with such test cases (various edge cases) and as a result found a related (but separate) issue from this one. I was planning to add these additional tests for that separate issue, to not introduce unnecessary test failures before that fix is integrated. Maybe it is better to add the additional tests directly as part of this changeset instead? > > If the additional tests trigger failures after this fix is applied, I would suggest including them as part of the fix to the separate issue. Thanks @robcasloz and @vnkozlov. I've now rerun tests and the PR is ready for integration. Please sponsor! ------------- PR Comment: https://git.openjdk.org/jdk/pull/17370#issuecomment-1914254957 From chagedorn at openjdk.org Mon Jan 29 09:16:35 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 29 Jan 2024 09:16:35 GMT Subject: RFR: 8324213: C1: There is no need for Canonicalizer to handle IfOp [v2] In-Reply-To: <5PJm9P2Eyap_zvmmVIGpbuyMAY9M4BMJUOouWKndftc=.f1924da6-a1e2-453c-81c2-0dbb81935b9f@github.com> References: <5PJm9P2Eyap_zvmmVIGpbuyMAY9M4BMJUOouWKndftc=.f1924da6-a1e2-453c-81c2-0dbb81935b9f@github.com> Message-ID: <2jKwZuyGvqW3yiK3iLXgTdu8r9Z6eV3U7Ptkmnt649M=.17bcf185-aff1-4eb7-82bf-8ae1ad7ee453@github.com> On Fri, 19 Jan 2024 23:37:36 GMT, Denghui Dong wrote: >> IfOp will not be created when building the graph, so there is no need for Canonicalizer to handle IfOp. > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > update Marked as reviewed by chagedorn (Reviewer). src/hotspot/share/c1/c1_Canonicalizer.cpp line 472: > 470: > 471: void Canonicalizer::do_IfOp(IfOp* x) { > 472: ShouldNotReachHere(); Looks good, but maybe add a comment here why we cannot visit `IfOp` here (i.e. that `IfOp` is not created by the `GraphBuilder` but only later when eliminating conditional expressions with `CE_Eliminator`). ------------- PR Review: https://git.openjdk.org/jdk/pull/17499#pullrequestreview-1848163154 PR Review Comment: https://git.openjdk.org/jdk/pull/17499#discussion_r1469280880 From dlunden at openjdk.org Mon Jan 29 09:17:42 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 29 Jan 2024 09:17:42 GMT Subject: Integrated: 8322996: BoxLockNode creation fails with assert(reg < CHUNK_SIZE) failed: sanity In-Reply-To: References: Message-ID: On Thu, 11 Jan 2024 10:19:12 GMT, Daniel Lund?n wrote: > This changeset fixes an issue where deeply nested synchronized statements triggered an assert in C2. > > Changes: > - Bail out on compilation when we create a `BoxLockNode` with a slot index that cannot fit in a `RegMask`. This is similar to how we handle the case when we do not have space to represent arguments in [`opto/matcher.cpp`](https://github.com/openjdk/jdk/blob/58b01dce054c50bcb5a28aad4c1b574acaa90f6d/src/hotspot/share/opto/matcher.cpp#L314-L318) > - Generalize `RegMask::can_represent` to take an additional and optional size argument to facilitate reuse. The default size value, 1, corresponds to the previous functionality. Rewrite `can_represent_arg` to directly call `can_represent(reg, SlotsPerVecZ)`. > - Add a regression test. > > Testing: > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/7671871191) > - tier1 to tier5 on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. > - The new regression test in all tier1 to tier10 contexts on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. This pull request has now been integrated. Changeset: 69586e7b Author: Daniel Lund?n Committer: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk/commit/69586e7bdffe1a840c3a86e6ec83568de24c6fe5 Stats: 266 lines in 6 files changed: 252 ins; 1 del; 13 mod 8322996: BoxLockNode creation fails with assert(reg < CHUNK_SIZE) failed: sanity Reviewed-by: rcastanedalo, kvn ------------- PR: https://git.openjdk.org/jdk/pull/17370 From aph-open at littlepinkcloud.com Mon Jan 29 09:19:52 2024 From: aph-open at littlepinkcloud.com (Andrew Haley) Date: Mon, 29 Jan 2024 09:19:52 +0000 Subject: RFR: 8324655: Identify integer minimum and maximum patterns created with if statements In-Reply-To: References: Message-ID: On 1/26/24 09:55, Aleksey Shipilev wrote: > On Thu, 25 Jan 2024 18:59:23 GMT, Jasmine Karthikeyan wrote: > >> Ah true, I hadn't considered that- do you think it makes sense to only do the transform if the if statement isn't highly predictable? > > Yeah, I think if this is effectively translating branches to cmovs, it should be gated by cmov conversion heuristics somehow. Not sure how to do this cleanly, given the choice for cmov-s for min/max is done only later in matching rules. I believe the performance of branch predictors has improved so much that cmov is of little benefit in most cases. Even when we have recorded 50/50 branching for true/false, we still don't know much about how well a branch will be predicted. Having said that, max and min require both arguments to be fully evaluated, so it's not quite such a big deal. From chagedorn at openjdk.org Mon Jan 29 09:21:35 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 29 Jan 2024 09:21:35 GMT Subject: RFR: 8324667: fold Parse::seems_stable_comparison() [v2] In-Reply-To: References: Message-ID: On Thu, 25 Jan 2024 19:05:50 GMT, Joshua Cao wrote: >> The function has a long comment block that seems irrelevant since https://github.com/openjdk/jdk/commit/8bd4b5624c6ece31d965259aadc290a24d44423a. We can just fold away this method. It only has one caller. >> >> >> passes GHA > > Joshua Cao has updated the pull request incrementally with two additional commits since the last revision: > > - Update copyright for parse.hpp > - Remove seems_stable_comparison() from header and remove copyright Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17573#pullrequestreview-1848173890 From chagedorn at openjdk.org Mon Jan 29 09:23:35 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 29 Jan 2024 09:23:35 GMT Subject: RFR: 8324236: compiler/ciReplay/TestInliningProtectionDomain.java failed with RuntimeException: should only dump inline information for ... expected true, was false In-Reply-To: References: Message-ID: On Fri, 26 Jan 2024 21:42:27 GMT, Christian Hagedorn wrote: > The test failed when trying to match the compile id of the interesting method by looking for "` > > But it first found the earlier line: > > > > which unfortunately also matched "` > Thanks, > Christian Thanks Vladimir for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/17598#issuecomment-1914278059 From chagedorn at openjdk.org Mon Jan 29 09:23:36 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 29 Jan 2024 09:23:36 GMT Subject: Integrated: 8324236: compiler/ciReplay/TestInliningProtectionDomain.java failed with RuntimeException: should only dump inline information for ... expected true, was false In-Reply-To: References: Message-ID: On Fri, 26 Jan 2024 21:42:27 GMT, Christian Hagedorn wrote: > The test failed when trying to match the compile id of the interesting method by looking for "` > > But it first found the earlier line: > > > > which unfortunately also matched "` > Thanks, > Christian This pull request has now been integrated. Changeset: 72ba8178 Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/72ba8178a8271d4a04a0b789f28b23414b8989ed Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8324236: compiler/ciReplay/TestInliningProtectionDomain.java failed with RuntimeException: should only dump inline information for ... expected true, was false Reviewed-by: kvn ------------- PR: https://git.openjdk.org/jdk/pull/17598 From rkennke at openjdk.org Mon Jan 29 09:30:02 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 29 Jan 2024 09:30:02 GMT Subject: RFR: 8324734: Relax too-strict assert(VM_Version::supports_evex()) in Assembler::locate_operand() [v2] In-Reply-To: <776ui1DnJfcgz89q2aPAeOyhOYKalPhAE1bMR28wK6I=.19728df4-3710-4503-a726-8ce217501f67@github.com> References: <776ui1DnJfcgz89q2aPAeOyhOYKalPhAE1bMR28wK6I=.19728df4-3710-4503-a726-8ce217501f67@github.com> Message-ID: > Details see bug report. The gist is that HotSpot downgrades to UseAVX=2 on some processors, and reports supports_evex() == false, but the instruction decoder can still encounter EVEX instructions when (e.g.) hitting a SIGBUS in memset() - which does have EVEX instructions. > > Testing: > - [x] runtime/Unsafe/InternalErrorTest.java > - [x] tier1 Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Distinguish between CPU and HotSpot features for supports_evex() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17590/files - new: https://git.openjdk.org/jdk/pull/17590/files/17880550..f3929385 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17590&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17590&range=00-01 Stats: 14 lines in 5 files changed: 11 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/17590.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17590/head:pull/17590 PR: https://git.openjdk.org/jdk/pull/17590 From rkennke at openjdk.org Mon Jan 29 09:30:02 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 29 Jan 2024 09:30:02 GMT Subject: RFR: 8324734: Relax too-strict assert(VM_Version::supports_evex()) in Assembler::locate_operand() [v2] In-Reply-To: References: <776ui1DnJfcgz89q2aPAeOyhOYKalPhAE1bMR28wK6I=.19728df4-3710-4503-a726-8ce217501f67@github.com> Message-ID: On Fri, 26 Jan 2024 21:00:02 GMT, Vladimir Kozlov wrote: > > VM_Version could distinguish between CPU features that are supported by the CPU > > We can start with just EVEX check. It is not big change: > > ``` > $ git diff > diff --git a/src/hotspot/cpu/x86/vm_version_x86.cpp b/src/hotspot/cpu/x86/vm_version_x86.cpp > index df1ea6edd30..8b4ca442b5a 100644 > --- a/src/hotspot/cpu/x86/vm_version_x86.cpp > +++ b/src/hotspot/cpu/x86/vm_version_x86.cpp > @@ -809,7 +809,8 @@ void VM_Version::get_processor_features() { > _stepping = cpu_stepping(); > > if (cpu_family() > 4) { // it supports CPUID > - _features = feature_flags(); > + _features = feature_flags(); // It can be changed by VM flags > + _cpu_features = _features; // Preserve features > // Logical processors are only available on P4s and above, > // and only if hyperthreading is available. > _logical_processors_per_package = logical_processor_count(); > diff --git a/src/hotspot/cpu/x86/vm_version_x86.hpp b/src/hotspot/cpu/x86/vm_version_x86.hpp > index e521a6ee3bc..de86ce51541 100644 > --- a/src/hotspot/cpu/x86/vm_version_x86.hpp > +++ b/src/hotspot/cpu/x86/vm_version_x86.hpp > @@ -640,7 +640,7 @@ class VM_Version : public Abstract_VM_Version { > } > > // > - // Feature identification > + // Feature identification which can be affected by VM flags > // > static bool supports_cpuid() { return _features != 0; } > static bool supports_cmov() { return (_features & CPU_CMOV) != 0; } > @@ -703,6 +703,11 @@ class VM_Version : public Abstract_VM_Version { > static bool supports_cet_ss() { return (_features & CPU_CET_SS) != 0; } > static bool supports_cet_ibt() { return (_features & CPU_CET_IBT) != 0; } > > + // > + // Feature identification not affected by VM flags > + // > + static bool cpu_supports_evex() { return (_cpu_features & CPU_AVX512F) != 0; } > + > // Intel features > static bool is_intel_family_core() { return is_intel() && > extended_cpu_family() == CPU_FAMILY_INTEL_CORE; } > diff --git a/src/hotspot/share/runtime/abstract_vm_version.hpp b/src/hotspot/share/runtime/abstract_vm_version.hpp > index d8ffca8de81..05675cc683a 100644 > --- a/src/hotspot/share/runtime/abstract_vm_version.hpp > +++ b/src/hotspot/share/runtime/abstract_vm_version.hpp > @@ -54,10 +54,13 @@ class Abstract_VM_Version: AllStatic { > static const char* _s_vm_release; > static const char* _s_internal_vm_info_string; > > - // CPU feature flags. > + // CPU feature flags which can be restricted by VM flags. > static uint64_t _features; > static const char* _features_string; > > + // CPU feature flags not affected by VM flags. > + static uint64_t _cpu_features; > + > // These are set by machine-dependent initializations > #ifndef SUPPORTS_NATIVE_CX8 > static bool _supports_cx8; > ``` Nice, thank you! That works (with some minor modification). I've changed the PR from removing the assert to relaxing it to CPU feature check. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17590#issuecomment-1914284967 From chagedorn at openjdk.org Mon Jan 29 09:34:32 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 29 Jan 2024 09:34:32 GMT Subject: RFR: 8317572: C2 SuperWord: refactor/improve TraceSuperWord, replace VectorizeDebugOption with TraceAutoVectorization [v8] In-Reply-To: References: Message-ID: On Mon, 29 Jan 2024 08:52:09 GMT, Emanuel Peter wrote: >> Subtask of https://github.com/openjdk/jdk/pull/16620 >> >> I got approval to remove VectorizeDebugOption: [JDK-8320668](https://bugs.openjdk.org/browse/JDK-8320668) >> >> I want a more general flag for AutoVectorization, that can trace different components of AutoVectorization. >> It should be a CompileCommand, so that it can select which methods it traces for. >> >> TraceSuperWord should still look similar, and select a subset of the TraceAutoVectorization components (those for SuperWord), but still apply to all classes/methods. >> >> With more refactoring later in [JDK-8315361](https://bugs.openjdk.org/browse/JDK-8315361), this flag should become more usable and interpretable. Especially, the idea is that different components of the `VLoop / VLoopAnalyzer` can have tracing enabled / disabled. >> >> **How to use the flag:** >> Get "help", i.e. see all available tags: >> `./java -Xcomp -XX:CompileCommand=TraceAutoVectorization,*::*,help --version` >> >> See "rejections" (i.e. failures where we don't vectorize) and successes (using TraceNewVectors): >> `./java -Xcomp -XX:CompileCommand=TraceAutoVectorization,*::*,SW_REJECTIONS -XX:+TraceNewVectors --version` >> The results are currently underwhealming. I will have to track many more failures, and I will do that with the bigger refactoring, when I move around the code and require error code returning everywhere, and then I can use that error code for printing. > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: > > - Merge branch 'master' into JDK-8317572 > - adjust hpp guards for Christian > - fix missing include > - add PRODUCT and COMPILER2 ifdefs for Vladimir > - fix a test > - move code to StringUtils::CommaSeparatedStringIterator > - more for Christian > - Apply suggestions from code review > > Co-authored-by: Christian Hagedorn > - a bit more > - reordering some things > - ... and 1 more: https://git.openjdk.org/jdk/compare/c98d57d5...bf5e8263 Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/17586#pullrequestreview-1848203310 From dlunden at openjdk.org Mon Jan 29 09:38:38 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 29 Jan 2024 09:38:38 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v2] In-Reply-To: References: Message-ID: On Thu, 25 Jan 2024 14:34:27 GMT, Emanuel Peter wrote: >> Thanks @eme64. I've addressed all comments now; please have a look again. > > @dlunde Given the findings here: https://github.com/openjdk/jdk/pull/17428#discussion_r1466460916 > I think you should add a IR rule on every test. > And for the ones that do not currently vectorize, please add a negative IR rule, so that we can detect when that changes. > For example when we implement a feature, then we can properly fix up the IR rule. Thanks @eme64, @robcasloz, and @chhagedorn. On aarch64, `test_divc` and `test_divc_n` do not vectorize. Therefore, I had to change to negative rules for that case. I suggest we investigate why it does not vectorize in a separate issue (if necessary). Due to the above change, I'll wait for a final approval before integrating. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17428#issuecomment-1914304491 From thartmann at openjdk.org Mon Jan 29 09:42:35 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 29 Jan 2024 09:42:35 GMT Subject: RFR: JDK-8322854: Incorrect rematerialization of scalar replaced objects in C2 In-Reply-To: References: Message-ID: <5_Ho9X_CPJ351Mdk282czVGvKzn_Ao7NWhySybZhcHA=.d5ad2514-dcca-42f2-a756-1b51ed9698fa@github.com> On Wed, 24 Jan 2024 22:40:59 GMT, Cesar Soares Lucas wrote: > Current implementation of `PhaseMacroExpand::value_from_mem` returns `return _igvn.zerocon(ft);` when it hits a sentinel while searching for a memory operation on a given slice. One of the sentinels is the memory input of the allocate node origin of the memory slice. Therefore, `value_from_mem` may return `zeroconf(ft)` if `sfpt_mem` is the same memory edge used by the Allocate node origin of the memory slice being traversed. > > The scalar replacement implementation uses `value_from_mem` during creation of metadata describing object scalar replaced (see `PhaseMacroExpand::create_scalarized_object_description`). The `create_scalarized_object_description` method is also used as part of RAM optimization implementation. The RAM optimization targets Phi nodes and therefore a memory graph loop created by a _memory phi_ node is possible to seen as part of the transformation. See image below: > > > > This pattern doesn't show up when scalarizing objects that don't participate in allocation merges. > > To fix the issue I changed the code in `value_from_mem` to instead of using the _input_ memory edge of the Allocate as a stop condition, it will now use the projection memory edge of the Allocate. > > Tested locally on windows, mac and linux x86_64 with JTREG tier1-3 and didn't observe any regression. I executed some quick testing and I see failures with - compiler/gcbarriers/TestArrayCopyWithLargeObjectAlignment.java - compiler/loopstripmining/TestLoadOnBackedgeWithPrec.java - compiler/membars/TestMemBarAcquire.java 668 Phi === 49 669 670 [[ 667 818 817 816 ]] #rawptr:BotPTR !jvms: TestLoadOnBackedgeWithPrec::j @ bci:159 (line 55) 668 Phi === 49 669 670 [[ 667 818 817 816 ]] #rawptr:BotPTR !jvms: TestLoadOnBackedgeWithPrec::j @ bci:159 (line 55) 41 safePoint_poll_tls === 47 0 819 0 0 42 1421 820 1423 616 0 46 45 44 |1645 [[ 43 39 ]] !jvms: TestLoadOnBackedgeWithPrec::j @ bci:190 (line 46) # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/opt/mach5/mesos/work_dir/slaves/0db9c48f-6638-40d0-9a4b-bd9cc7533eb8-S9853/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/2cd88c15-c79b-46eb-a505-4fbf2c345f82/runs/27421909-81f2-433f-811e-a8e03bb02478/workspace/open/src/hotspot/share/opto/buildOopMap.cpp:365), pid=978899, tid=978928 # assert(false) failed: there should be an oop in OopMap instead of a live raw oop at safepoint With `-ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:-TieredCompilation` on AArch64. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17562#issuecomment-1914312689 From ddong at openjdk.org Mon Jan 29 09:44:44 2024 From: ddong at openjdk.org (Denghui Dong) Date: Mon, 29 Jan 2024 09:44:44 GMT Subject: RFR: 8324213: C1: There is no need for Canonicalizer to handle IfOp [v3] In-Reply-To: References: Message-ID: > IfOp will not be created when building the graph, so there is no need for Canonicalizer to handle IfOp. Denghui Dong has updated the pull request incrementally with two additional commits since the last revision: - typo - add comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17499/files - new: https://git.openjdk.org/jdk/pull/17499/files/e156cd0c..d7ceda32 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17499&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17499&range=01-02 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17499.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17499/head:pull/17499 PR: https://git.openjdk.org/jdk/pull/17499 From ddong at openjdk.org Mon Jan 29 09:44:47 2024 From: ddong at openjdk.org (Denghui Dong) Date: Mon, 29 Jan 2024 09:44:47 GMT Subject: RFR: 8324213: C1: There is no need for Canonicalizer to handle IfOp [v2] In-Reply-To: <2jKwZuyGvqW3yiK3iLXgTdu8r9Z6eV3U7Ptkmnt649M=.17bcf185-aff1-4eb7-82bf-8ae1ad7ee453@github.com> References: <5PJm9P2Eyap_zvmmVIGpbuyMAY9M4BMJUOouWKndftc=.f1924da6-a1e2-453c-81c2-0dbb81935b9f@github.com> <2jKwZuyGvqW3yiK3iLXgTdu8r9Z6eV3U7Ptkmnt649M=.17bcf185-aff1-4eb7-82bf-8ae1ad7ee453@github.com> Message-ID: On Mon, 29 Jan 2024 09:13:12 GMT, Christian Hagedorn wrote: >> Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: >> >> update > > src/hotspot/share/c1/c1_Canonicalizer.cpp line 472: > >> 470: >> 471: void Canonicalizer::do_IfOp(IfOp* x) { >> 472: ShouldNotReachHere(); > > Looks good, but maybe add a comment here why we cannot visit `IfOp` here (i.e. that `IfOp` is not created by the `GraphBuilder` but only later when eliminating conditional expressions with `CE_Eliminator`). Updated. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17499#discussion_r1469318739 From ddong at openjdk.org Mon Jan 29 09:49:46 2024 From: ddong at openjdk.org (Denghui Dong) Date: Mon, 29 Jan 2024 09:49:46 GMT Subject: Integrated: 8324213: C1: There is no need for Canonicalizer to handle IfOp In-Reply-To: References: Message-ID: On Fri, 19 Jan 2024 15:24:12 GMT, Denghui Dong wrote: > IfOp will not be created when building the graph, so there is no need for Canonicalizer to handle IfOp. This pull request has now been integrated. Changeset: 7a300b63 Author: Denghui Dong URL: https://git.openjdk.org/jdk/commit/7a300b63b5ca22dfe3e831e641f7a11b9c719b30 Stats: 5 lines in 1 file changed: 2 ins; 0 del; 3 mod 8324213: C1: There is no need for Canonicalizer to handle IfOp Reviewed-by: dlong, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/17499 From ddong at openjdk.org Mon Jan 29 09:49:43 2024 From: ddong at openjdk.org (Denghui Dong) Date: Mon, 29 Jan 2024 09:49:43 GMT Subject: RFR: 8324213: C1: There is no need for Canonicalizer to handle IfOp [v3] In-Reply-To: References: Message-ID: On Mon, 29 Jan 2024 09:44:44 GMT, Denghui Dong wrote: >> IfOp will not be created when building the graph, so there is no need for Canonicalizer to handle IfOp. > > Denghui Dong has updated the pull request incrementally with two additional commits since the last revision: > > - typo > - add comment Thanks for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17499#issuecomment-1914324541 From chagedorn at openjdk.org Mon Jan 29 09:49:43 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 29 Jan 2024 09:49:43 GMT Subject: RFR: 8324213: C1: There is no need for Canonicalizer to handle IfOp [v3] In-Reply-To: References: Message-ID: On Mon, 29 Jan 2024 09:44:44 GMT, Denghui Dong wrote: >> IfOp will not be created when building the graph, so there is no need for Canonicalizer to handle IfOp. > > Denghui Dong has updated the pull request incrementally with two additional commits since the last revision: > > - typo > - add comment Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/17499#pullrequestreview-1848231496 From chagedorn at openjdk.org Mon Jan 29 09:49:45 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 29 Jan 2024 09:49:45 GMT Subject: RFR: 8324213: C1: There is no need for Canonicalizer to handle IfOp [v2] In-Reply-To: References: <5PJm9P2Eyap_zvmmVIGpbuyMAY9M4BMJUOouWKndftc=.f1924da6-a1e2-453c-81c2-0dbb81935b9f@github.com> <2jKwZuyGvqW3yiK3iLXgTdu8r9Z6eV3U7Ptkmnt649M=.17bcf185-aff1-4eb7-82bf-8ae1ad7ee453@github.com> Message-ID: <_RIo8uPDoIZkKfw0CZK5e84gVwwzK5yq6RcIGeoyWJQ=.ed3d1ce7-b1ac-4c51-9cab-7a9f4e964ee5@github.com> On Mon, 29 Jan 2024 09:41:28 GMT, Denghui Dong wrote: >> src/hotspot/share/c1/c1_Canonicalizer.cpp line 472: >> >>> 470: >>> 471: void Canonicalizer::do_IfOp(IfOp* x) { >>> 472: ShouldNotReachHere(); >> >> Looks good, but maybe add a comment here why we cannot visit `IfOp` here (i.e. that `IfOp` is not created by the `GraphBuilder` but only later when eliminating conditional expressions with `CE_Eliminator`). > > Updated. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17499#discussion_r1469323080 From rcastanedalo at openjdk.org Mon Jan 29 09:54:37 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 29 Jan 2024 09:54:37 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v7] In-Reply-To: References: Message-ID: On Fri, 26 Jan 2024 18:28:02 GMT, Daniel Lund?n wrote: >> This changeset translates the tests in `compiler/c2/cr7200264/` to use the IR verification framework. >> >> The proposed translation, to the extent possible, preserves the semantics of the original test. A major difference is that the IR checks are now local (for every `test_*` method) instead of global. The execution time of the new test is comparable to the old test. >> >> Testing: >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/7671921417) >> - Ran the new translated tests within all tier1 through tier10 contexts on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. >> - Tested that manually adding `-XX:LoopUnrollLimit=0` to the test framework flags caused the translated tests to fail. Note: it is, however, no longer possible to break the test by passing `-XX:LoopUnrollLimit=0` on the command line. > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Change preconditions for test_divc and test_divc_n Marked as reviewed by rcastanedalo (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/17428#pullrequestreview-1848255488 From thartmann at openjdk.org Mon Jan 29 10:06:26 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 29 Jan 2024 10:06:26 GMT Subject: RFR: 8323795: jcmd Compiler.codecache should print total size of code cache [v4] In-Reply-To: References: Message-ID: On Mon, 22 Jan 2024 09:43:52 GMT, Yi Yang wrote: >> CodeHeap 'non-profiled nmethods': size=118592Kb used=29Kb max_used=29Kb free=118562Kb >> bounds [0x00007fbe84622000, 0x00007fbe84892000, 0x00007fbe8b9f2000] >> CodeHeap 'profiled nmethods': size=118588Kb used=80Kb max_used=80Kb free=118507Kb >> bounds [0x00007fbe7c9f2000, 0x00007fbe7cc62000, 0x00007fbe83dc1000] >> CodeHeap 'non-nmethods': size=8580Kb used=1258Kb max_used=1834Kb free=7321Kb >> bounds [0x00007fbe83dc1000, 0x00007fbe84031000, 0x00007fbe84622000] >> total_blobs=474 nmethods=87 adapters=293 >> compilation: enabled >> stopped_count=0, restarted_count=0 >> full_count=0 >> >> >> It's better to accumulates total size of used/free/size, for example >> >> -SegmentedCodeCache >> CodeCache: size=245760Kb used=1366Kb max_used=1943Kb free=244393Kb >> bounds [0x00007fdcc89f2000, 0x00007fdcc8c62000, 0x00007fdcd79f2000] >> total_blobs=474, nmethods=87, adapters=293 >> stopped_count=0, restarted_count=0, full_count=0 >> compilation=enabled >> >> >> >> +SegmentedCodeCache >> CodeHeap 'non-profiled nmethods': size=118592Kb used=29Kb max_used=29Kb free=118562Kb >> bounds [0x00007f89c8622000, 0x00007f89c8892000, 0x00007f89cf9f2000] >> CodeHeap 'profiled nmethods': size=118588Kb used=80Kb max_used=80Kb free=118507Kb >> bounds [0x00007f89c09f2000, 0x00007f89c0c62000, 0x00007f89c7dc1000] >> CodeHeap 'non-nmethods': size=8580Kb used=1258Kb max_used=1834Kb free=7321Kb >> bounds [0x00007f89c7dc1000, 0x00007f89c8031000, 0x00007f89c8622000] >> CodeCache: size=245760Kb, used=1367Kb, max_used=1943Kb, free=244390Kb >> total_blobs=474, nmethods=87, adapters=293 >> stopped_count=0, restarted_count=0, full_count=0 >> compilation=enabled > > Yi Yang has updated the pull request incrementally with one additional commit since the last revision: > > whitespace Thanks, that looks good to me. I'll run testing and report back once it passed. src/hotspot/share/code/codeCache.cpp line 1801: > 1799: "disabled (not enough contiguous free space left)", > 1800: CompileBroker::get_total_compiler_stopped_count(), > 1801: CompileBroker::get_total_compiler_restarted_count()); The indentation should be fixed. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17445#pullrequestreview-1848281793 PR Review Comment: https://git.openjdk.org/jdk/pull/17445#discussion_r1469352315 From thartmann at openjdk.org Mon Jan 29 10:07:46 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 29 Jan 2024 10:07:46 GMT Subject: RFR: 8320237 C2: late inlining of method handle invoke causes duplicate lines in PrintInlining output [v3] In-Reply-To: References: Message-ID: On Mon, 22 Jan 2024 16:28:02 GMT, Kangcheng Xu wrote: >> This PR resolves [JDK-8320237](https://bugs.openjdk.org/browse/JDK-8320237) >> >> The original behavior produces both a failure and success message upon late inlining which is confusing. The patch removes the failure message if inlining was successful. Huge thanks to @rwestrel for reporting and working out a solution. >> >> Unit test `test/hotspot/jtreg/compiler/inlining/TestDuplicatedLateInliningOutput.java` is added and passing. > > Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - add -XX:-BackgroundCompilation flag > - Merge branch 'master' into JDK-8320237 > - fix VM crashes > - update test summary, requirements, and VM flags > - Merge branch 'master' into JDK-8320237 > - make regex whitespace consistent > > and to trigger GHA > - 8320237: C2: late inlining of method handle invoke causes duplicate lines in PrintInlining output Okay, I'll re-run testing and report back once it passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17147#issuecomment-1914357522 From thartmann at openjdk.org Mon Jan 29 10:09:25 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 29 Jan 2024 10:09:25 GMT Subject: RFR: 8324717: Remove HotSpotJVMCICompilerFactory In-Reply-To: References: Message-ID: On Thu, 25 Jan 2024 14:42:19 GMT, Doug Simon wrote: > There has been no active use of `jdk.vm.ci.hotspot.HotSpotJVMCICompilerFactory.CompilationLevelAdjustment` since [JDK-8219403](https://bugs.openjdk.org/browse/JDK-8219403) effectively [disabled](https://github.com/openjdk/jdk/commit/61f35bf898d6a0f4e7b6e514821b40efd87396dc#diff-4d3a3b7e7e12e1d5b4cf3e4677d9e0de5e9df3bbf1bbfa0d8d43d12098d67dc4R302-R306) it. Since `HotSpotJVMCICompilerFactory` exists solely for `CompilationLevelAdjustment` related logic, this PR removes the whole `HotSpotJVMCICompilerFactory` class. Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17570#pullrequestreview-1848291555 From epeter at openjdk.org Mon Jan 29 10:16:49 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 29 Jan 2024 10:16:49 GMT Subject: RFR: 8317572: C2 SuperWord: refactor/improve TraceSuperWord, replace VectorizeDebugOption with TraceAutoVectorization [v8] In-Reply-To: References: Message-ID: On Mon, 29 Jan 2024 09:31:41 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8317572 >> - adjust hpp guards for Christian >> - fix missing include >> - add PRODUCT and COMPILER2 ifdefs for Vladimir >> - fix a test >> - move code to StringUtils::CommaSeparatedStringIterator >> - more for Christian >> - Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn >> - a bit more >> - reordering some things >> - ... and 1 more: https://git.openjdk.org/jdk/compare/3cbf181f...bf5e8263 > > Marked as reviewed by chagedorn (Reviewer). @chhagedorn @vnkozlov thanks for the reviews and helpful suggestions! ------------- PR Comment: https://git.openjdk.org/jdk/pull/17586#issuecomment-1914371458 From epeter at openjdk.org Mon Jan 29 10:16:50 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 29 Jan 2024 10:16:50 GMT Subject: Integrated: 8317572: C2 SuperWord: refactor/improve TraceSuperWord, replace VectorizeDebugOption with TraceAutoVectorization In-Reply-To: References: Message-ID: <4vRiwFtuQPFVuzuNT-uAMW_wY-Pn0QYhsAVYg3hM7_o=.45119068-42d2-49ee-8fa7-9bc2f1963c1c@github.com> On Fri, 26 Jan 2024 12:49:50 GMT, Emanuel Peter wrote: > Subtask of https://github.com/openjdk/jdk/pull/16620 > > I got approval to remove VectorizeDebugOption: [JDK-8320668](https://bugs.openjdk.org/browse/JDK-8320668) > > I want a more general flag for AutoVectorization, that can trace different components of AutoVectorization. > It should be a CompileCommand, so that it can select which methods it traces for. > > TraceSuperWord should still look similar, and select a subset of the TraceAutoVectorization components (those for SuperWord), but still apply to all classes/methods. > > With more refactoring later in [JDK-8315361](https://bugs.openjdk.org/browse/JDK-8315361), this flag should become more usable and interpretable. Especially, the idea is that different components of the `VLoop / VLoopAnalyzer` can have tracing enabled / disabled. > > **How to use the flag:** > Get "help", i.e. see all available tags: > `./java -Xcomp -XX:CompileCommand=TraceAutoVectorization,*::*,help --version` > > See "rejections" (i.e. failures where we don't vectorize) and successes (using TraceNewVectors): > `./java -Xcomp -XX:CompileCommand=TraceAutoVectorization,*::*,SW_REJECTIONS -XX:+TraceNewVectors --version` > The results are currently underwhealming. I will have to track many more failures, and I will do that with the bigger refactoring, when I move around the code and require error code returning everywhere, and then I can use that error code for printing. This pull request has now been integrated. Changeset: 3066d49c Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/3066d49cc1910bb9ed01558582fdeb2385c484c3 Stats: 546 lines in 14 files changed: 402 ins; 81 del; 63 mod 8317572: C2 SuperWord: refactor/improve TraceSuperWord, replace VectorizeDebugOption with TraceAutoVectorization Reviewed-by: chagedorn, kvn ------------- PR: https://git.openjdk.org/jdk/pull/17586 From epeter at openjdk.org Mon Jan 29 10:20:40 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 29 Jan 2024 10:20:40 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v7] In-Reply-To: References: Message-ID: On Fri, 26 Jan 2024 18:28:02 GMT, Daniel Lund?n wrote: >> This changeset translates the tests in `compiler/c2/cr7200264/` to use the IR verification framework. >> >> The proposed translation, to the extent possible, preserves the semantics of the original test. A major difference is that the IR checks are now local (for every `test_*` method) instead of global. The execution time of the new test is comparable to the old test. >> >> Testing: >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/7671921417) >> - Ran the new translated tests within all tier1 through tier10 contexts on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. >> - Tested that manually adding `-XX:LoopUnrollLimit=0` to the test framework flags caused the translated tests to fail. Note: it is, however, no longer possible to break the test by passing `-XX:LoopUnrollLimit=0` on the command line. > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Change preconditions for test_divc and test_divc_n test/hotspot/jtreg/compiler/c2/cr7200264/TestIntVect.java line 547: > 545: IRNode.VECTOR_SIZE + "min(max_int, max_long)", "> 0" }, > 546: applyIfCPUFeature = {"sse2", "true"}, > 547: applyIfAnd = {"UseAVX", ">= 2", "UseSSE", ">= 4"}) Looks like you should just require `avx2`, instead of `sse2`. Then you can drop the `UseAvx` and `UseSSE` flag checks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17428#discussion_r1469371322 From thartmann at openjdk.org Mon Jan 29 10:22:36 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 29 Jan 2024 10:22:36 GMT Subject: RFR: JDK-8317299: safepoint scalarization doesn't keep track of the depth of the JVM state In-Reply-To: References: Message-ID: On Fri, 19 Jan 2024 16:31:25 GMT, Damon Fenacci wrote: > # Issue > > The origin of the problem is tied to the fact that, when C2 optimizes vector boxes, it performs safepoint object scalarization before late inlining. > This can lead to situations in which scalarization adds scalarized values to the JVM state and late inlining of further methods adds further JVM state entries on top for each inlined method. > With the example of the reported bug (_TestIntrinsicBailOut.java_) we get to a situation like this: > > ... > bc: JVMS depth=6 loc=20 stk=23 arg=23 mon=23 scalar=23 end=23 mondepth=0 sp=0 bci=1 reexecute=false method=virtual jobject jdk.incubator.vector.ByteVector.rearrangeTemplate(jobject, jobject) > bc: JVMS depth=7 loc=23 stk=27 arg=27 mon=27 scalar=27 end=27 mondepth=0 sp=0 bci=36 reexecute=false method=virtual jobject jdk.incubator.vector.AbstractShuffle.checkIndexes() > bc: JVMS depth=8 loc=27 stk=28 arg=28 mon=28 scalar=28 end=28 mondepth=0 sp=0 bci=1 reexecute=false method=virtual jobject jdk.incubator.vector.AbstractShuffle.reorder() > bc: JVMS depth=9 loc=28 stk=29 arg=29 mon=29 scalar=29 end=31 mondepth=0 sp=0 bci=1 reexecute=false method=virtual jobject jdk.internal.vm.vector.VectorSupport$VectorPayload.getPayload() > bc: JVMS depth=10 loc=31 stk=32 arg=32 mon=32 scalar=32 end=32 mondepth=0 sp=0 bci=3 reexecute=false method=static jobject jdk.internal.vm.vector.VectorSupport.maybeRebox(jobject) > bc: JVMS depth=11 loc=32 stk=33 arg=33 mon=33 scalar=33 end=33 mondepth=0 sp=0 bci=1 reexecute=false method=virtual void jdk.internal.misc.Unsafe.loadFence() > > `JVMS depth=9` shows 2 scalars but 2 further inlines added 2 more JVM states (with no scalars). > > The corresponding node looks like this: > image > > To keep track of its scalarized inputs, `SafePointScalarObjectNode` keeps a field `_first_index`, which is supposed to be "relative to the last (youngest) jvms->_scloff"... > https://github.com/openjdk/jdk/blob/c5e72450966ad50d57a8d22e9d634bfcb319aee9/src/hotspot/share/opto/callnode.hpp#L509-L511 > but if there are late inlined methods, this field is going to be relative to the JVM state at the depth before inlining happened (e.g. depth=9 in the example) and not relative to the youngest depth. > > # Solution > > In order to keep track of the right depth a `_depth` field is added to `SafePointScalarObjectNode`, which refers to the depth of the JVM state the `_first_index` field refers to. The method `uint first_index(JVMState*... The fix looks good to me. Good catch, Vladimir. I completely forgot about [JDK-8276112](https://bugs.openjdk.org/browse/JDK-8276112) but from re-reading my old analysis in https://github.com/openjdk/jdk/pull/6333, it's most likely the exact same issue. src/hotspot/share/opto/callnode.hpp line 511: > 509: uint _first_index; // First input edge relative index of a SafePoint node where > 510: // states of the scalarized object fields are collected. > 511: uint _depth; // Depth of the JVM state the first index field refers to Suggestion: uint _depth; // Depth of the JVM state the _first_index field refers to ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17500#pullrequestreview-1848317383 PR Comment: https://git.openjdk.org/jdk/pull/17500#issuecomment-1914378760 PR Review Comment: https://git.openjdk.org/jdk/pull/17500#discussion_r1469373053 From epeter at openjdk.org Mon Jan 29 10:24:40 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 29 Jan 2024 10:24:40 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v7] In-Reply-To: References: Message-ID: On Mon, 29 Jan 2024 10:17:31 GMT, Emanuel Peter wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Change preconditions for test_divc and test_divc_n > > test/hotspot/jtreg/compiler/c2/cr7200264/TestIntVect.java line 547: > >> 545: IRNode.VECTOR_SIZE + "min(max_int, max_long)", "> 0" }, >> 546: applyIfCPUFeature = {"sse2", "true"}, >> 547: applyIfAnd = {"UseAVX", ">= 2", "UseSSE", ">= 4"}) > > Looks like you should just require `avx2`, instead of `sse2`. Then you can drop the `UseAvx` and `UseSSE` flag checks. If you set a lower AVX/SSE level, then the cpu-features are automatically removed. And if you have a cpu that does not have the relevant features, the AVX/SSE flags are automatically lowered. But in IR tests, we generally try to rely on the features as much as possible, and not the flags. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17428#discussion_r1469377081 From thartmann at openjdk.org Mon Jan 29 10:28:37 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 29 Jan 2024 10:28:37 GMT Subject: RFR: 8323795: jcmd Compiler.codecache should print total size of code cache [v4] In-Reply-To: References: Message-ID: <3-P8wERy7_aJBqGgw17RlkBb2uoDk60iCNsu7qAybPE=.d1d97a6c-dea8-4a67-b265-92ad96c2ff7d@github.com> On Mon, 22 Jan 2024 09:43:52 GMT, Yi Yang wrote: >> CodeHeap 'non-profiled nmethods': size=118592Kb used=29Kb max_used=29Kb free=118562Kb >> bounds [0x00007fbe84622000, 0x00007fbe84892000, 0x00007fbe8b9f2000] >> CodeHeap 'profiled nmethods': size=118588Kb used=80Kb max_used=80Kb free=118507Kb >> bounds [0x00007fbe7c9f2000, 0x00007fbe7cc62000, 0x00007fbe83dc1000] >> CodeHeap 'non-nmethods': size=8580Kb used=1258Kb max_used=1834Kb free=7321Kb >> bounds [0x00007fbe83dc1000, 0x00007fbe84031000, 0x00007fbe84622000] >> total_blobs=474 nmethods=87 adapters=293 >> compilation: enabled >> stopped_count=0, restarted_count=0 >> full_count=0 >> >> >> It's better to accumulates total size of used/free/size, for example >> >> -SegmentedCodeCache >> CodeCache: size=245760Kb used=1366Kb max_used=1943Kb free=244393Kb >> bounds [0x00007fdcc89f2000, 0x00007fdcc8c62000, 0x00007fdcd79f2000] >> total_blobs=474, nmethods=87, adapters=293 >> stopped_count=0, restarted_count=0, full_count=0 >> compilation=enabled >> >> >> >> +SegmentedCodeCache >> CodeHeap 'non-profiled nmethods': size=118592Kb used=29Kb max_used=29Kb free=118562Kb >> bounds [0x00007f89c8622000, 0x00007f89c8892000, 0x00007f89cf9f2000] >> CodeHeap 'profiled nmethods': size=118588Kb used=80Kb max_used=80Kb free=118507Kb >> bounds [0x00007f89c09f2000, 0x00007f89c0c62000, 0x00007f89c7dc1000] >> CodeHeap 'non-nmethods': size=8580Kb used=1258Kb max_used=1834Kb free=7321Kb >> bounds [0x00007f89c7dc1000, 0x00007f89c8031000, 0x00007f89c8622000] >> CodeCache: size=245760Kb, used=1367Kb, max_used=1943Kb, free=244390Kb >> total_blobs=474, nmethods=87, adapters=293 >> stopped_count=0, restarted_count=0, full_count=0 >> compilation=enabled > > Yi Yang has updated the pull request incrementally with one additional commit since the last revision: > > whitespace `/test/hotspot/jtreg/vmTestbase/vm/compiler/CodeCacheInfo/Test.java fails` stdout: [CodeHeap 'non-profiled nmethods': size=120032Kb used=3Kb max_used=3Kb free=120028Kb bounds [0x0000000113a84000, 0x0000000113cf4000, 0x000000011afbc000] CodeHeap 'profiled nmethods': size=120016Kb used=14Kb max_used=14Kb free=120001Kb bounds [0x000000010bfbc000, 0x000000010c22c000, 0x00000001134f0000] CodeHeap 'non-nmethods': size=5712Kb used=1752Kb max_used=1752Kb free=3959Kb bounds [0x00000001134f0000, 0x0000000113760000, 0x0000000113a84000] CodeCache: size=245760Kb, used=1769Kb, max_used=1769Kb, free=243988Kb total_blobs=748, nmethods=15, adapters=651, full_count=0 Compilation: enabled, stopped_count=0, restarted_count=0 ]; stderr: [java version "23-internal" 2024-09-17 Java(TM) SE Runtime Environment (fastdebug build 23-internal-2024-01-29-1002351.tobias.hartmann.jdk2) Java HotSpot(TM) 64-Bit Server VM (fastdebug build 23-internal-2024-01-29-1002351.tobias.hartmann.jdk2, mixed mode, sharing) ] exitValue = 0 java.lang.RuntimeException: '^(CodeHeap '[^']+': size=\\d+Kb used=\\d+Kb max_used=\\d+Kb free=\\d+Kb\\n bounds \[0x[0-9a-f]+, 0x[0-9a-f]+, 0x[0-9a-f]+\]\\n)+ total_blobs=\\d+ nmethods=\\d+ adapters=\\d+\\n compilation: enabled\\n' missing from stdout at jdk.test.lib.process.OutputAnalyzer.stdoutShouldMatch(OutputAnalyzer.java:389) at vm.compiler.CodeCacheInfo.Test.main(Test.java:78) at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) at java.base/java.lang.reflect.Method.invoke(Method.java:580) at com.sun.javatest.regtest.agent.MainWrapper$MainTask.run(MainWrapper.java:138) at java.base/java.lang.Thread.run(Thread.java:1575) ------------- Changes requested by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17445#pullrequestreview-1848331584 From dlunden at openjdk.org Mon Jan 29 10:30:39 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 29 Jan 2024 10:30:39 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v7] In-Reply-To: References: Message-ID: On Mon, 29 Jan 2024 10:21:59 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/c2/cr7200264/TestIntVect.java line 547: >> >>> 545: IRNode.VECTOR_SIZE + "min(max_int, max_long)", "> 0" }, >>> 546: applyIfCPUFeature = {"sse2", "true"}, >>> 547: applyIfAnd = {"UseAVX", ">= 2", "UseSSE", ">= 4"}) >> >> Looks like you should just require `avx2`, instead of `sse2`. Then you can drop the `UseAvx` and `UseSSE` flag checks. > > If you set a lower AVX/SSE level, then the cpu-features are automatically removed. And if you have a cpu that does not have the relevant features, the AVX/SSE flags are automatically lowered. But in IR tests, we generally try to rely on the features as much as possible, and not the flags. OK, thanks, good to know. I'll make the update. @eme64: Do you want to create a new issue to investigate why it does not vectorize on aarch64? Or maybe that's expected? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17428#discussion_r1469384442 From thartmann at openjdk.org Mon Jan 29 12:03:40 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 29 Jan 2024 12:03:40 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store [v4] In-Reply-To: References: Message-ID: On Fri, 26 Jan 2024 09:35:50 GMT, Emanuel Peter wrote: >> This is a feature requiested by @RogerRiggs and @cl4es . >> >> **Idea** >> >> Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get speedups by using Unsafe (e.g. `Unsafe.putLongUnaligned`), or ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code. >> >> This patch here supports a few simple use-cases, like these: >> >> Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant: >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395 >> >> Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the splitting (i.e. shifting and truncation), and directly store the variable: >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456 >> >> The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian): >> >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338 >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472 >> >> **Details** >> >> This draft currently implements the optimization in an additional special IGVN phase: >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485 >> >> We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. We essentially try to establish a chain of mergable stores: >> >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806 >> >> Mergable stores must have the same Opcode (implies they have the same element type and hence size). Further, mergable stores must have the same control (or be separated by only a RangeCheck). Further, they must either bot... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Add diagnostic flag MergeStores Great work, Emanuel. I think this is a well encapsulated optimization for a supposedly common code pattern requested by core libraries folks. I agree with Vladimir, that it would be nice to support this as part of the autovectorizer but that is probably not going to happen anytime soon. Until then, going with this separate phase would allow us to add support (and tests) for additional code patterns if requests come in and potentially move this to the autovectorizer later. src/hotspot/share/opto/memnode.cpp line 2766: > 2764: if (phase->C->merge_stores_phase()) { > 2765: Node* progress = Ideal_merge_stores(phase); > 2766: if(progress != nullptr) { return progress; } Suggestion: if (progress != nullptr) { return progress; } src/hotspot/share/opto/memnode.cpp line 2772: > 2770: } > 2771: > 2772: // Link together multiple stores (B/S/C/I) into alonger one. Suggestion: // Link together multiple stores (B/S/C/I) into a longer one. src/hotspot/share/opto/memnode.cpp line 3014: > 3012: // AddP(AddP(AddP(AddP(base, o2), o2), o1), con) > 3013: // > 3014: // Two adresses are adjacent, if they share a base and all offset (o1, o2, ...) Suggestion: // Two addresses are adjacent, if they share a base and all offsets (o1, o2, ...) src/hotspot/share/opto/phaseX.cpp line 2282: > 2280: tty->print_cr("Set at i = %d", i); > 2281: n->dump(); > 2282: assert( igvn->hash_find(this) != this, "Need to remove from hash before changing edges" ); Suggestion: assert(false, "Need to remove from hash before changing edges"); ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16245#pullrequestreview-1848441479 PR Review Comment: https://git.openjdk.org/jdk/pull/16245#discussion_r1469450595 PR Review Comment: https://git.openjdk.org/jdk/pull/16245#discussion_r1469466833 PR Review Comment: https://git.openjdk.org/jdk/pull/16245#discussion_r1469472461 PR Review Comment: https://git.openjdk.org/jdk/pull/16245#discussion_r1469458332 From epeter at openjdk.org Mon Jan 29 12:12:50 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 29 Jan 2024 12:12:50 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store [v5] In-Reply-To: References: Message-ID: > This is a feature requiested by @RogerRiggs and @cl4es . > > **Idea** > > Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get speedups by using Unsafe (e.g. `Unsafe.putLongUnaligned`), or ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code. > > This patch here supports a few simple use-cases, like these: > > Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395 > > Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the splitting (i.e. shifting and truncation), and directly store the variable: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456 > > The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian): > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338 > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472 > > **Details** > > This draft currently implements the optimization in an additional special IGVN phase: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485 > > We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. We essentially try to establish a chain of mergable stores: > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806 > > Mergable stores must have the same Opcode (implies they have the same element type and hence size). Further, mergable stores must have the same control (or be separated by only a RangeCheck). Further, they must either both store constants, or adjacent segments of a larger value ... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Suggestions by Tobias Co-authored-by: Tobias Hartmann ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16245/files - new: https://git.openjdk.org/jdk/pull/16245/files/83290c57..85c9aa34 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16245&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16245&range=03-04 Stats: 4 lines in 2 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/16245.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16245/head:pull/16245 PR: https://git.openjdk.org/jdk/pull/16245 From epeter at openjdk.org Mon Jan 29 12:12:51 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 29 Jan 2024 12:12:51 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store [v4] In-Reply-To: <2s8lhnt7FIJ1EuMpSIZ_zu9H6kMnzUQf2gCgdPUjUjQ=.b2eae978-fbc3-4cfc-a54f-483a3b517a09@github.com> References: <2s8lhnt7FIJ1EuMpSIZ_zu9H6kMnzUQf2gCgdPUjUjQ=.b2eae978-fbc3-4cfc-a54f-483a3b517a09@github.com> Message-ID: <7ACZVZcNV5WZ-Klf8ZNAupu4myqa0N9AFQUvA-VWA0A=.804498d0-1c54-4448-aa55-2b06d68f4609@github.com> On Fri, 26 Jan 2024 17:19:39 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> Add diagnostic flag MergeStores > > src/hotspot/share/opto/c2_globals.hpp line 362: > >> 360: notproduct(bool, TraceMergeStores, false, \ >> 361: "Trace creation of merged stores") \ >> 362: \ > > The flag should be `develop` since it is under `#ifdef ASSERT`. Ok, will do! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16245#discussion_r1469495844 From dlunden at openjdk.org Mon Jan 29 12:14:50 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 29 Jan 2024 12:14:50 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v8] In-Reply-To: References: Message-ID: > This changeset translates the tests in `compiler/c2/cr7200264/` to use the IR verification framework. > > The proposed translation, to the extent possible, preserves the semantics of the original test. A major difference is that the IR checks are now local (for every `test_*` method) instead of global. The execution time of the new test is comparable to the old test. > > Testing: > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/7671921417) > - Ran the new translated tests within all tier1 through tier10 contexts on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. > - Tested that manually adding `-XX:LoopUnrollLimit=0` to the test framework flags caused the translated tests to fail. Note: it is, however, no longer possible to break the test by passing `-XX:LoopUnrollLimit=0` on the command line. Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: Change to avx2 CPU feature check ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17428/files - new: https://git.openjdk.org/jdk/pull/17428/files/66f5ad44..4ab64279 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17428&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17428&range=06-07 Stats: 4 lines in 1 file changed: 0 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17428.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17428/head:pull/17428 PR: https://git.openjdk.org/jdk/pull/17428 From epeter at openjdk.org Mon Jan 29 12:23:11 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 29 Jan 2024 12:23:11 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store [v6] In-Reply-To: References: Message-ID: > This is a feature requiested by @RogerRiggs and @cl4es . > > **Idea** > > Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get speedups by using Unsafe (e.g. `Unsafe.putLongUnaligned`), or ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code. > > This patch here supports a few simple use-cases, like these: > > Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395 > > Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the splitting (i.e. shifting and truncation), and directly store the variable: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456 > > The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian): > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338 > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472 > > **Details** > > This draft currently implements the optimization in an additional special IGVN phase: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485 > > We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. We essentially try to establish a chain of mergable stores: > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806 > > Mergable stores must have the same Opcode (implies they have the same element type and hence size). Further, mergable stores must have the same control (or be separated by only a RangeCheck). Further, they must either both store constants, or adjacent segments of a larger value ... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: made trace flag develop ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16245/files - new: https://git.openjdk.org/jdk/pull/16245/files/85c9aa34..8822fb6c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16245&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16245&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16245.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16245/head:pull/16245 PR: https://git.openjdk.org/jdk/pull/16245 From epeter at openjdk.org Mon Jan 29 12:26:34 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 29 Jan 2024 12:26:34 GMT Subject: RFR: 8319690: [AArch64] C2 compilation hits offset_ok_for_immed: assert "c2 compiler bug" In-Reply-To: References: <16J-lJ2AceGTVcRWBcP15yKcwO-1IA1XsngyOuNjf7k=.0776f081-ae2c-4279-87cf-d909806c2bc4@github.com> <4KRCqYxn02wMjYDuN3_HbYWxc9BDtYMNd40bNrJ4K8w=.5362dda4-8d30-48fd-8915-909eb15a6023@github.com> Message-ID: On Sat, 6 Jan 2024 17:44:04 GMT, Andrew Haley wrote: >>> After this change, `immIOffset` and `immLOffset` appear to be obsolete. >> >> Removed them in the new commit. Thanks! > >> @fg1417 what is the state on this? >> >> The example here may look like a strange edge-case. But with my plans this may become more common: [JDK-8318446](https://bugs.openjdk.org/browse/JDK-8318446) C2: implement StoreNode::Ideal_merge_stores >> >> I will merge stores, which can then look like unaligned stores of a larger type. This bug here blocks my progress. I'm not in a hurry, just curious if there is now a plan how to proceed here ;) > > The problem with this PR is that the code is way too complex for such a simple problem. The port is correct as it is, in the release build. > > The only problem is an assertion. We could simply remove that assertion, but if it were me I'd fix the problem properly. Both @dean-long and I have suggested ways to improve this patch with less code. If @fg1417 decides to drop this PR I'll fix it. @theRealAph do you want to fix this? Otherwise I'll just push a PR removing the assert, since it blocks this: https://github.com/openjdk/jdk/pull/16245 ------------- PR Comment: https://git.openjdk.org/jdk/pull/16991#issuecomment-1914586958 From roland at openjdk.org Mon Jan 29 12:32:29 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 29 Jan 2024 12:32:29 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store [v6] In-Reply-To: References: Message-ID: On Fri, 19 Jan 2024 13:15:28 GMT, Emanuel Peter wrote: > 2. Should I make a fresh pass over the whole graph like in `gather_nodes_for_merge_stores`, or rather have a list that collects the store nodes during igvn, and that I can just readily pick up here. Just like these lists: > https://github.com/openjdk/jdk/pull/16966/files#diff-f076857d7da81f56709da3de1511b1105727032186cde4d02c678667761f46eaR445-R451 Why not simply enqueue stores for post loop opts processing by igvn? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16245#discussion_r1469521646 From epeter at openjdk.org Mon Jan 29 12:38:31 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 29 Jan 2024 12:38:31 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store In-Reply-To: References: Message-ID: <2jq2r9dd2L4A_I0TKWXTejxHkIVYJFd9VsiuCduNiuQ=.ab766352-e5c3-4b99-94c1-7295b76d1744@github.com> On Fri, 19 Jan 2024 14:09:28 GMT, Roland Westrelin wrote: >> This is a feature requiested by @RogerRiggs and @cl4es . >> >> **Idea** >> >> Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get speedups by using Unsafe (e.g. `Unsafe.putLongUnaligned`), or ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code. >> >> This patch here supports a few simple use-cases, like these: >> >> Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant: >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395 >> >> Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the splitting (i.e. shifting and truncation), and directly store the variable: >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456 >> >> The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian): >> >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338 >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472 >> >> **Details** >> >> This draft currently implements the optimization in an additional special IGVN phase: >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485 >> >> We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. We essentially try to establish a chain of mergable stores: >> >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806 >> >> Mergable stores must have the same Opcode (implies they have the same element type and hence size). Further, mergable stores must have the same control (or be separated by only a RangeCheck). Further, they must either bot... > > Why not provide new internal API points and intrinsics? The benefits would be: > - less complexity on the c2 side (and less bugs) > - much easier for someone writing java code to check that the optimization triggers (check the PrintInlining output that the intrinsic shows up vs check the final assembly code) > - clear contract between the java libraries and the VM as to what optimizes under what conditions > > If I was the user for this I would be worried, that: > - it's hard for me to check it's doing what I expect > - even if it does initially, changes to the java code (maybe by other people less familiar with this transformation) could break the optimization. If there's a call to some specific API, at least people changing the code know special attention is necessary and that as long as the new API points are used, the optimization is guaranteed. @rwestrel > Why not simply enqueue stores for post loop opts processing by igvn? That does not work. I need to do the processing post-post-loop-opts. And not during post-loop-opts. Because during post-loop-opts, some CastII nodes fold away, and that simplifies the pointers, and makes it easier to find adjacent memops. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16245#issuecomment-1914608495 From roland at openjdk.org Mon Jan 29 13:43:40 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 29 Jan 2024 13:43:40 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v4] In-Reply-To: References: Message-ID: On Wed, 17 Jan 2024 11:18:36 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> merge fix > > src/hotspot/share/opto/callGenerator.cpp line 1031: > >> 1029: CallStaticJavaNode* get_first_iff_unc = get_first_iff_failure->is_uncommon_trap_proj(Deoptimization::Reason_none); >> 1030: if (get_first_iff_unc != nullptr) { >> 1031: // first cache check never hits, keep only the second. > > I'm struggling to understand: > We still have an unc-trap for the first. So we never failed so far, right? So we always found it in the cache, or am I wrong? > We are not removing this unc-trap though, right? The `ScopedValue.get()` codee probes 2 cache locations. If, when pattern matching the `get()` subgraph: - we only find a single if that probes the cache, then, according to profile data, there was always a hit at the first cache location. - we find 2 ifs, then the first and second locations were probed. If the first if's other branch is to an uncommon trap, then that location never saw a cache hit. In that case, when the `ScopedValueGetHitsInCacheNode` is expanded, only code to probe the second location is added back to the IR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1469607133 From epeter at openjdk.org Mon Jan 29 14:05:48 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 29 Jan 2024 14:05:48 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v8] In-Reply-To: References: Message-ID: <7Y7nMSNp71wg4-5O7Kx5y4oisBqF0yjGrHRmRxPi6cA=.48c1a7ae-f6a8-4bc2-9a15-17c72dfe9db0@github.com> On Mon, 29 Jan 2024 12:14:50 GMT, Daniel Lund?n wrote: >> This changeset translates the tests in `compiler/c2/cr7200264/` to use the IR verification framework. >> >> The proposed translation, to the extent possible, preserves the semantics of the original test. A major difference is that the IR checks are now local (for every `test_*` method) instead of global. The execution time of the new test is comparable to the old test. >> >> Testing: >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/7671921417) >> - Ran the new translated tests within all tier1 through tier10 contexts on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. >> - Tested that manually adding `-XX:LoopUnrollLimit=0` to the test framework flags caused the translated tests to fail. Note: it is, however, no longer possible to break the test by passing `-XX:LoopUnrollLimit=0` on the command line. > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Change to avx2 CPU feature check I looked into it, by writing this test: public class Test { static int RANGE = 10_000; public static void main(String[] args) { int[] a = new int[RANGE]; int[] b = new int[RANGE]; for (int i = 0; i < 10_000; i++) { test1(a, b); test2(a, b, i % 200 - 100); } } static void test1(int[] a, int[] b) { for (int i = 0; i < a.length; i++) { a[i] = b[i] / 15; } } static void test2(int[] a, int[] b, int s) { for (int i = 0; i < a.length; i++) { a[i] = b[i] / 7; } } } And running this command: `./java -XX:CompileCommand=compileonly,Test::test1 -XX:+TraceSuperWord -XX:+TraceLoopOpts -XX:+TraceNewVectors Test.java` In the logs, I see it attempts to vectorize, crating packs like this: ... Pack: 7 align: 0 678 RShiftI === _ 679 153 [[ 671 ]] !orig=561,154 !jvms: Test::test1 @ bci:15 (line 15) align: 4 667 RShiftI === _ 668 153 [[ 660 ]] !orig=154 !jvms: Test::test1 @ bci:15 (line 15) align: 8 561 RShiftI === _ 562 153 [[ 554 ]] !orig=154 !jvms: Test::test1 @ bci:15 (line 15) align: 12 154 RShiftI === _ 251 153 [[ 155 ]] !jvms: Test::test1 @ bci:15 (line 15) Pack: 8 align: 0 676 MulL === _ 677 144 [[ 675 ]] !orig=559,146 !jvms: Test::test1 @ bci:15 (line 15) align: 8 665 MulL === _ 666 144 [[ 664 ]] !orig=146 !jvms: Test::test1 @ bci:15 (line 15) ... But then, I also see: Unimplemented 559 MulL === _ 560 144 [[ 558 ]] !orig=146 !jvms: Test::test1 @ bci:15 (line 15) And in `src/hotspot/cpu/aarch64/aarch64_vector.ad`, I see this: bool Matcher::match_rule_supported_auto_vectorization(int opcode, int vlen, BasicType bt) { if (UseSVE == 0) { // These operations are not profitable to be vectorized on NEON, because no direct // NEON instructions support them. But the match rule support for them is profitable for // Vector API intrinsics. if ((opcode == Op_VectorCastD2X && bt == T_INT) || (opcode == Op_VectorCastL2X && bt == T_FLOAT) || (opcode == Op_CountLeadingZerosV && bt == T_LONG) || (opcode == Op_CountTrailingZerosV && bt == T_LONG) || // The vector implementation of Op_AddReductionVD/F is for the Vector API only. // It is not suitable for auto-vectorization because it does not add the elements // in the same order as sequential code, and FP addition is non-associative. opcode == Op_AddReductionVD || opcode == Op_AddReductionVF || opcode == Op_MulReductionVD || opcode == Op_MulReductionVF || opcode == Op_MulVL) { return false; } } return match_rule_supported_vector(opcode, vlen, bt); } **Conclusion** The int-division is implemented using a `MulVL`, and that is not implemented in `asimd`, and so the vectorization fails. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17428#issuecomment-1914758844 From shade at openjdk.org Mon Jan 29 14:24:49 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 29 Jan 2024 14:24:49 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store [v6] In-Reply-To: References: Message-ID: On Mon, 29 Jan 2024 12:23:11 GMT, Emanuel Peter wrote: >> This is a feature requiested by @RogerRiggs and @cl4es . >> >> **Idea** >> >> Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get speedups by using Unsafe (e.g. `Unsafe.putLongUnaligned`), or ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code. >> >> This patch here supports a few simple use-cases, like these: >> >> Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant: >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395 >> >> Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the splitting (i.e. shifting and truncation), and directly store the variable: >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456 >> >> The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian): >> >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338 >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472 >> >> **Details** >> >> This draft currently implements the optimization in an additional special IGVN phase: >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485 >> >> We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. We essentially try to establish a chain of mergable stores: >> >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806 >> >> Mergable stores must have the same Opcode (implies they have the same element type and hence size). Further, mergable stores must have the same control (or be separated by only a RangeCheck). Further, they must either bot... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > made trace flag develop I only skimmed through the code, so maybe it is already handled. Let me ask for clarity anyway: does it combine the series of naturally aligned stores into the bulk -- possibly misaligned! -- store? Because it would have performance implications, and depending on how hardware treats the misaligned stores, the correctness problem too. E.g., if we allow transforming `storeB(&(A+1), V1); storeB(&(A+2), V2);` -> `storeC(&(A+1), combine(V1, V2)`, then `storeC` might not be aligned. The platforms are allowed to throw the VM under `SIGBUS` when that store is executed. `Unsafe.putXUnaligned` was done to avoid this trouble, which only intrinsifies when `UseUnalignedAccesses` is `true`, and maybe have more safeguards that I don't remember off-hand. You do that safely in initializing stores in constructors, since we are guaranteed no one reads the object yet -- I think C2 already does some of that coalescing in `InitializeNode::coalesce_subword_stores`. Should these two coalescing steps share some code? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16245#issuecomment-1914792398 From roland at openjdk.org Mon Jan 29 14:32:47 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 29 Jan 2024 14:32:47 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v4] In-Reply-To: References: Message-ID: <11LEhzD6xnypaHh2nmO98ORHTUZnJaYeuIPbk5cJGP0=.8e7a319c-7f96-4269-b763-89f4461f2933@github.com> On Wed, 17 Jan 2024 14:17:58 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> merge fix > > src/hotspot/share/opto/callGenerator.cpp line 1155: > >> 1153: CallProjections slow_projs; >> 1154: slow_call->extract_projections(&slow_projs, false); >> 1155: Node* fallthrough = slow_projs.fallthrough_catchproj->clone(); > > Why does that have to be cloned? Control out of the slow call is redirected to new the `region_fast_slow`. It's one way to do that. The other would be to iterate over all uses of the projection, replace the edge leading to the projection with top and then add the projection as input to `region_fast_slow`. That doesn't seem simpler. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1469674991 From roland at openjdk.org Mon Jan 29 14:41:29 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 29 Jan 2024 14:41:29 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store In-Reply-To: References: Message-ID: On Fri, 19 Jan 2024 14:09:28 GMT, Roland Westrelin wrote: >> This is a feature requiested by @RogerRiggs and @cl4es . >> >> **Idea** >> >> Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get speedups by using Unsafe (e.g. `Unsafe.putLongUnaligned`), or ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code. >> >> This patch here supports a few simple use-cases, like these: >> >> Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant: >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395 >> >> Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the splitting (i.e. shifting and truncation), and directly store the variable: >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456 >> >> The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian): >> >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338 >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472 >> >> **Details** >> >> This draft currently implements the optimization in an additional special IGVN phase: >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485 >> >> We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. We essentially try to establish a chain of mergable stores: >> >> https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806 >> >> Mergable stores must have the same Opcode (implies they have the same element type and hence size). Further, mergable stores must have the same control (or be separated by only a RangeCheck). Further, they must either bot... > > Why not provide new internal API points and intrinsics? The benefits would be: > - less complexity on the c2 side (and less bugs) > - much easier for someone writing java code to check that the optimization triggers (check the PrintInlining output that the intrinsic shows up vs check the final assembly code) > - clear contract between the java libraries and the VM as to what optimizes under what conditions > > If I was the user for this I would be worried, that: > - it's hard for me to check it's doing what I expect > - even if it does initially, changes to the java code (maybe by other people less familiar with this transformation) could break the optimization. If there's a call to some specific API, at least people changing the code know special attention is necessary and that as long as the new API points are used, the optimization is guaranteed. > @rwestrel > > > Why not simply enqueue stores for post loop opts processing by igvn? > > That does not work. I need to do the processing post-post-loop-opts. And not during post-loop-opts. Because during post-loop-opts, some CastII nodes fold away, and that simplifies the pointers, and makes it easier to find adjacent memops. Range check castIIs? With JDK-8324517, I will possibly propose they stay in until the end of compilation (it used to cause performance to regress so I have to take a close look). ------------- PR Comment: https://git.openjdk.org/jdk/pull/16245#issuecomment-1914827679 From varadam at openjdk.org Mon Jan 29 15:07:25 2024 From: varadam at openjdk.org (Varada M) Date: Mon, 29 Jan 2024 15:07:25 GMT Subject: RFR: JDK-8322648: Improve class initialization barrier in TemplateTable::_new for PPC [v2] In-Reply-To: References: Message-ID: On Mon, 29 Jan 2024 06:49:27 GMT, Varada M wrote: >> src/hotspot/cpu/ppc/templateTable_ppc_64.cpp line 3808: >> >>> 3806: // Make sure klass is initialized. >>> 3807: assert(VM_Version::supports_fast_class_init_checks(), "Optimization requires support for fast class initialization checks"); >>> 3808: __ clinit_barrier(Rcpool, R16_thread, nullptr /*L_fast_path*/, &Lslow_case); >> >> `Rcpool` is the wrong parameter. You need to use `RinstanceKlass`. > > Thank you for the correction @TheRealMDoerr . I have applied the code change. Tier1 test is running. I will push once it is done. build successful and no failures in tier1 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17518#discussion_r1469729878 From varadam at openjdk.org Mon Jan 29 15:07:23 2024 From: varadam at openjdk.org (Varada M) Date: Mon, 29 Jan 2024 15:07:23 GMT Subject: RFR: JDK-8322648: Improve class initialization barrier in TemplateTable::_new for PPC [v3] In-Reply-To: References: Message-ID: > ppc port implementation of https://github.com/openjdk/jdk/pull/17006 > > Fastdebug and Release : build and tier1 testing successful. > > JBS Issue : [JDK-8322648](https://bugs.openjdk.org/browse/JDK-8322648) Varada M has updated the pull request incrementally with one additional commit since the last revision: 8322648: Improve class initialization barrier in TemplateTable::_new for PPC ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17518/files - new: https://git.openjdk.org/jdk/pull/17518/files/2fb4cc08..e85eb148 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17518&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17518&range=01-02 Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17518.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17518/head:pull/17518 PR: https://git.openjdk.org/jdk/pull/17518 From dlunden at openjdk.org Mon Jan 29 15:02:00 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 29 Jan 2024 15:02:00 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v9] In-Reply-To: References: Message-ID: > This changeset translates the tests in `compiler/c2/cr7200264/` to use the IR verification framework. > > The proposed translation, to the extent possible, preserves the semantics of the original test. A major difference is that the IR checks are now local (for every `test_*` method) instead of global. The execution time of the new test is comparable to the old test. > > Testing: > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/7671921417) > - Ran the new translated tests within all tier1 through tier10 contexts on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. > - Tested that manually adding `-XX:LoopUnrollLimit=0` to the test framework flags caused the translated tests to fail. Note: it is, however, no longer possible to break the test by passing `-XX:LoopUnrollLimit=0` on the command line. Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: Update aarch64 rules for test_divc and test_divc_n ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17428/files - new: https://git.openjdk.org/jdk/pull/17428/files/4ab64279..22004dd5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17428&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17428&range=07-08 Stats: 16 lines in 1 file changed: 0 ins; 2 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/17428.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17428/head:pull/17428 PR: https://git.openjdk.org/jdk/pull/17428 From mdoerr at openjdk.org Mon Jan 29 15:27:47 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 29 Jan 2024 15:27:47 GMT Subject: RFR: JDK-8322648: Improve class initialization barrier in TemplateTable::_new for PPC [v3] In-Reply-To: References: Message-ID: <_GYvUiajjwpy4OKskAmfqQCYfkMM8f80rR6qdcHNTfc=.3301ff20-052c-4666-b262-79b9c32ecd1d@github.com> On Mon, 29 Jan 2024 15:07:23 GMT, Varada M wrote: >> ppc port implementation of https://github.com/openjdk/jdk/pull/17006 >> >> Fastdebug and Release : build and tier1 testing successful. >> >> JBS Issue : [JDK-8322648](https://bugs.openjdk.org/browse/JDK-8322648) > > Varada M has updated the pull request incrementally with one additional commit since the last revision: > > 8322648: Improve class initialization barrier in TemplateTable::_new for PPC Seems like you have missed my previous comment: https://github.com/openjdk/jdk/pull/17518#pullrequestreview-1847895107 I don't think it's correct without that. And please remove the comment "// get instance_size.". It doesn't tell anything which is not obvious. We should definitely measure performance with -Xint. Maybe even "time java -Xint -version" shows a difference between correct and incorrect code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17518#issuecomment-1914925878 From varadam at openjdk.org Mon Jan 29 15:39:25 2024 From: varadam at openjdk.org (Varada M) Date: Mon, 29 Jan 2024 15:39:25 GMT Subject: RFR: JDK-8322648: Improve class initialization barrier in TemplateTable::_new for PPC [v3] In-Reply-To: References: Message-ID: On Mon, 29 Jan 2024 15:07:23 GMT, Varada M wrote: >> ppc port implementation of https://github.com/openjdk/jdk/pull/17006 >> >> Fastdebug and Release : build and tier1 testing successful. >> >> JBS Issue : [JDK-8322648](https://bugs.openjdk.org/browse/JDK-8322648) > > Varada M has updated the pull request incrementally with one additional commit since the last revision: > > 8322648: Improve class initialization barrier in TemplateTable::_new for PPC > Seems like you have missed my previous comment: [#17518 (review)](https://github.com/openjdk/jdk/pull/17518#pullrequestreview-1847895107) I don't think it's correct without that. And please remove the comment "// get instance_size.". It doesn't tell anything which is not obvious. We should definitely measure performance with -Xint. Maybe even "time java -Xint -version" shows a difference between correct and incorrect code. I see a minor difference in real time and CPU time Without the code change : real 0m0.714s user 0m0.381s sys 0m0.026s With the code change : real 0m0.711s user 0m0.380s sys 0m0.026s` ------------- PR Comment: https://git.openjdk.org/jdk/pull/17518#issuecomment-1914951250 From mdoerr at openjdk.org Mon Jan 29 15:43:45 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 29 Jan 2024 15:43:45 GMT Subject: RFR: JDK-8322648: Improve class initialization barrier in TemplateTable::_new for PPC [v3] In-Reply-To: References: Message-ID: On Mon, 29 Jan 2024 15:07:23 GMT, Varada M wrote: >> ppc port implementation of https://github.com/openjdk/jdk/pull/17006 >> >> Fastdebug and Release : build and tier1 testing successful. >> >> JBS Issue : [JDK-8322648](https://bugs.openjdk.org/browse/JDK-8322648) > > Varada M has updated the pull request incrementally with one additional commit since the last revision: > > 8322648: Improve class initialization barrier in TemplateTable::_new for PPC Ok, the difference is probably below noise. Maybe we can try jvm98 with -Xint. I think the patch should look like this: diff --git a/src/hotspot/cpu/ppc/templateTable_ppc_64.cpp b/src/hotspot/cpu/ppc/templateTable_ppc_64.cpp index 84ecfc4f934..74bea6ac9ac 100644 --- a/src/hotspot/cpu/ppc/templateTable_ppc_64.cpp +++ b/src/hotspot/cpu/ppc/templateTable_ppc_64.cpp @@ -3803,16 +3803,15 @@ void TemplateTable::_new() { __ sldi(Roffset, Rindex, LogBytesPerWord); __ load_resolved_klass_at_offset(Rcpool, Roffset, RinstanceKlass); - // Make sure klass is fully initialized and get instance_size. - __ lbz(Rscratch, in_bytes(InstanceKlass::init_state_offset()), RinstanceKlass); + // Make sure klass is initialized. + assert(VM_Version::supports_fast_class_init_checks(), "Optimization requires support for fast class initialization checks"); + __ clinit_barrier(RinstanceKlass, R16_thread, nullptr /*L_fast_path*/, &Lslow_case); + __ lwz(Rinstance_size, in_bytes(Klass::layout_helper_offset()), RinstanceKlass); - __ cmpdi(CCR1, Rscratch, InstanceKlass::fully_initialized); // Make sure klass does not have has_finalizer, or is abstract, or interface or java/lang/Class. __ andi_(R0, Rinstance_size, Klass::_lh_instance_slow_path_bit); // slow path bit equals 0? - - __ crnand(CCR0, Assembler::equal, CCR1, Assembler::equal); // slow path bit set or not fully initialized? - __ beq(CCR0, Lslow_case); + __ bne(CCR0, Lslow_case); // -------------------------------------------------------------------------- // Fast case: ------------- PR Comment: https://git.openjdk.org/jdk/pull/17518#issuecomment-1914963023 From dlunden at openjdk.org Mon Jan 29 15:55:46 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Mon, 29 Jan 2024 15:55:46 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v8] In-Reply-To: <7Y7nMSNp71wg4-5O7Kx5y4oisBqF0yjGrHRmRxPi6cA=.48c1a7ae-f6a8-4bc2-9a15-17c72dfe9db0@github.com> References: <7Y7nMSNp71wg4-5O7Kx5y4oisBqF0yjGrHRmRxPi6cA=.48c1a7ae-f6a8-4bc2-9a15-17c72dfe9db0@github.com> Message-ID: On Mon, 29 Jan 2024 14:03:11 GMT, Emanuel Peter wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Change to avx2 CPU feature check > > I looked into the failure with `asimd` on aarch64, by writing this test: > > > public class Test { > static int RANGE = 10_000; > > public static void main(String[] args) { > int[] a = new int[RANGE]; > int[] b = new int[RANGE]; > for (int i = 0; i < 10_000; i++) { > test1(a, b); > test2(a, b, i % 200 - 100); > } > } > > static void test1(int[] a, int[] b) { > for (int i = 0; i < a.length; i++) { > a[i] = b[i] / 15; > } > } > > static void test2(int[] a, int[] b, int s) { > for (int i = 0; i < a.length; i++) { > a[i] = b[i] / 7; > } > } > } > > > And running this command: > `./java -XX:CompileCommand=compileonly,Test::test1 -XX:+TraceSuperWord -XX:+TraceLoopOpts -XX:+TraceNewVectors Test.java` > > In the logs, I see it attempts to vectorize, crating packs like this: > > ... > Pack: 7 > align: 0 678 RShiftI === _ 679 153 [[ 671 ]] !orig=561,154 !jvms: Test::test1 @ bci:15 (line 15) > align: 4 667 RShiftI === _ 668 153 [[ 660 ]] !orig=154 !jvms: Test::test1 @ bci:15 (line 15) > align: 8 561 RShiftI === _ 562 153 [[ 554 ]] !orig=154 !jvms: Test::test1 @ bci:15 (line 15) > align: 12 154 RShiftI === _ 251 153 [[ 155 ]] !jvms: Test::test1 @ bci:15 (line 15) > Pack: 8 > align: 0 676 MulL === _ 677 144 [[ 675 ]] !orig=559,146 !jvms: Test::test1 @ bci:15 (line 15) > align: 8 665 MulL === _ 666 144 [[ 664 ]] !orig=146 !jvms: Test::test1 @ bci:15 (line 15) > ... > > > But then, I also see: > > Unimplemented > 559 MulL === _ 560 144 [[ 558 ]] !orig=146 !jvms: Test::test1 @ bci:15 (line 15) > > > And in `src/hotspot/cpu/aarch64/aarch64_vector.ad`, I see this: > > bool Matcher::match_rule_supported_auto_vectorization(int opcode, int vlen, BasicType bt) { > if (UseSVE == 0) { > // These operations are not profitable to be vectorized on NEON, because no direct > // NEON instructions support them. But the match rule support for them is profitable for > // Vector API intrinsics. > if ((opcode == Op_VectorCastD2X && bt == T_INT) || > (opcode == Op_VectorCastL2X && bt == T_FLOAT) || > (opcode == Op_CountLeadingZerosV && bt == T_LONG) || > (opcode == Op_CountTrailingZerosV && bt == T_LONG) || > // The vector implementation of Op_AddReductionVD/F is for the Vector API only. > // It is not suitable for auto-vectorization because it does not add the elements > // in the same order as sequential code, and FP addition is ... Thanks @eme64, updated now. I'll rerun the tests before integrating. @pfustc @fg1417: @robcasloz recommended that I ask you to check that the `sve` IR checks do not fail for `test_divc` and `test_divc_n` in this changeset. Do you have machines on which you can check this? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17428#issuecomment-1914986863 From aph at openjdk.org Mon Jan 29 15:59:52 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 29 Jan 2024 15:59:52 GMT Subject: RFR: 8319690: [AArch64] C2 compilation hits offset_ok_for_immed: assert "c2 compiler bug" In-Reply-To: References: <16J-lJ2AceGTVcRWBcP15yKcwO-1IA1XsngyOuNjf7k=.0776f081-ae2c-4279-87cf-d909806c2bc4@github.com> <4KRCqYxn02wMjYDuN3_HbYWxc9BDtYMNd40bNrJ4K8w=.5362dda4-8d30-48fd-8915-909eb15a6023@github.com> Message-ID: <1q-CrjRvuqEglCph_AVbq3tt-My841n7iCHYp8zTaAU=.6d30fbd0-5121-419a-913e-aed371eee0ca@github.com> On Sat, 6 Jan 2024 17:44:04 GMT, Andrew Haley wrote: >>> After this change, `immIOffset` and `immLOffset` appear to be obsolete. >> >> Removed them in the new commit. Thanks! > >> @fg1417 what is the state on this? >> >> The example here may look like a strange edge-case. But with my plans this may become more common: [JDK-8318446](https://bugs.openjdk.org/browse/JDK-8318446) C2: implement StoreNode::Ideal_merge_stores >> >> I will merge stores, which can then look like unaligned stores of a larger type. This bug here blocks my progress. I'm not in a hurry, just curious if there is now a plan how to proceed here ;) > > The problem with this PR is that the code is way too complex for such a simple problem. The port is correct as it is, in the release build. > > The only problem is an assertion. We could simply remove that assertion, but if it were me I'd fix the problem properly. Both @dean-long and I have suggested ways to improve this patch with less code. If @fg1417 decides to drop this PR I'll fix it. > @theRealAph do you want to fix this? Otherwise I'll just push a PR removing the assert, since it blocks this: #16245 I've just started working on it. If you like, push a removing the assert, and then my fix will re-insert it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16991#issuecomment-1915006526 From never at openjdk.org Mon Jan 29 17:33:25 2024 From: never at openjdk.org (Tom Rodriguez) Date: Mon, 29 Jan 2024 17:33:25 GMT Subject: RFR: 8324717: Remove HotSpotJVMCICompilerFactory In-Reply-To: References: Message-ID: On Thu, 25 Jan 2024 14:42:19 GMT, Doug Simon wrote: > There has been no active use of `jdk.vm.ci.hotspot.HotSpotJVMCICompilerFactory.CompilationLevelAdjustment` since [JDK-8219403](https://bugs.openjdk.org/browse/JDK-8219403) effectively [disabled](https://github.com/openjdk/jdk/commit/61f35bf898d6a0f4e7b6e514821b40efd87396dc#diff-4d3a3b7e7e12e1d5b4cf3e4677d9e0de5e9df3bbf1bbfa0d8d43d12098d67dc4R302-R306) it. Since `HotSpotJVMCICompilerFactory` exists solely for `CompilationLevelAdjustment` related logic, this PR removes the whole `HotSpotJVMCICompilerFactory` class. Marked as reviewed by never (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/17570#pullrequestreview-1849324351 From kvn at openjdk.org Mon Jan 29 17:44:33 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 29 Jan 2024 17:44:33 GMT Subject: RFR: 8324794: C2 SuperWord: do not ignore reductions in SuperWord::unrolling_analysis In-Reply-To: References: <4Dv9KCdW7B9Ms3a-MO-59PL-3qebNhNJejuuM6LCB0w=.7575de49-9d14-4c12-b95d-8aca28b07a05@github.com> <-f9_1s2hXm20N3DABo4Hq1dyKzAxRHq3kJ7I_D-PONo=.5566f4f5-e975-4bee-b274-c7a136672c1b@github.com> Message-ID: On Sun, 28 Jan 2024 09:44:14 GMT, Emanuel Peter wrote: > I can only speculate: maybe the idea was that reductions are not profitable, unless there are other nodes, like stores and loads. So if we only find reductions, then we would not adjust the unrolling, since we are not expecting vectorization anyway. Again: only speculation. You reviewed the code in 2015, maybe you still remember the reason ;) I looked and there was no discussion about that during review. Originally it was not SuperWord analysis - it only looked for arithmetic Phi nodes in loop. Last year we changed it: [1be80a44](https://github.com/openjdk/jdk/commit/1be80a4445cf74adc9b2cd5bf262a897f9ede74f) I think the check simplify `unrolling_analysis` code since we skip nodes we already know about them. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17604#issuecomment-1915247383 From kvn at openjdk.org Mon Jan 29 18:12:41 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 29 Jan 2024 18:12:41 GMT Subject: RFR: 8324734: Relax too-strict assert(VM_Version::supports_evex()) in Assembler::locate_operand() [v2] In-Reply-To: References: <776ui1DnJfcgz89q2aPAeOyhOYKalPhAE1bMR28wK6I=.19728df4-3710-4503-a726-8ce217501f67@github.com> Message-ID: <4wo7OWrXKsNvFcYTzEezwPZS-JIQ2bkehiLujSxEjcw=.605312a9-3c3b-4cb4-955e-c8d63c0308c1@github.com> On Mon, 29 Jan 2024 09:30:02 GMT, Roman Kennke wrote: >> Details see bug report. The gist is that HotSpot downgrades to UseAVX=2 on some processors, and reports supports_evex() == false, but the instruction decoder can still encounter EVEX instructions when (e.g.) hitting a SIGBUS in memset() - which does have EVEX instructions. >> >> Testing: >> - [x] runtime/Unsafe/InternalErrorTest.java >> - [x] tier1 > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Distinguish between CPU and HotSpot features for supports_evex() Good. I will submit our testing before approval. And this needs second review. ------------- PR Review: https://git.openjdk.org/jdk/pull/17590#pullrequestreview-1849438969 From shade at openjdk.org Mon Jan 29 18:38:37 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 29 Jan 2024 18:38:37 GMT Subject: RFR: 8324734: Relax too-strict assert(VM_Version::supports_evex()) in Assembler::locate_operand() [v2] In-Reply-To: References: <776ui1DnJfcgz89q2aPAeOyhOYKalPhAE1bMR28wK6I=.19728df4-3710-4503-a726-8ce217501f67@github.com> Message-ID: On Mon, 29 Jan 2024 09:30:02 GMT, Roman Kennke wrote: >> Details see bug report. The gist is that HotSpot downgrades to UseAVX=2 on some processors, and reports supports_evex() == false, but the instruction decoder can still encounter EVEX instructions when (e.g.) hitting a SIGBUS in memset() - which does have EVEX instructions. >> >> Testing: >> - [x] runtime/Unsafe/InternalErrorTest.java >> - [x] tier1 > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Distinguish between CPU and HotSpot features for supports_evex() This looks fine, with nits. src/hotspot/cpu/x86/vm_version_x86.cpp line 812: > 810: > 811: if (cpu_family() > 4) { // it supports CPUID > 812: _features = feature_flags(); // It can be changed by VM flags Suggestion: _features = feature_flags(); // These can be changed by VM settings src/hotspot/cpu/x86/vm_version_x86.hpp line 643: > 641: > 642: // > 643: // Feature identification which can be affected by VM flags Suggestion: // Feature identification which can be affected by VM settings src/hotspot/cpu/x86/vm_version_x86.hpp line 709: > 707: // Feature identification not affected by VM flags > 708: // > 709: static bool cpu_supports_evex() { return (_cpu_features & CPU_AVX512F) != 0; } Indenting is a bit off here. I think the opening brace should be aligned with the others on the top. src/hotspot/share/runtime/abstract_vm_version.hpp line 57: > 55: static const char* _s_internal_vm_info_string; > 56: > 57: // CPU feature flags which can be restricted by VM flags. Suggestion: // CPU feature flags, can be affected by VM settings. src/hotspot/share/runtime/abstract_vm_version.hpp line 61: > 59: static const char* _features_string; > 60: > 61: // CPU feature flags not affected by VM flags. Suggestion: // Original CPU feature flags, not affected by VM settings. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17590#pullrequestreview-1849488534 PR Review Comment: https://git.openjdk.org/jdk/pull/17590#discussion_r1470022549 PR Review Comment: https://git.openjdk.org/jdk/pull/17590#discussion_r1470032016 PR Review Comment: https://git.openjdk.org/jdk/pull/17590#discussion_r1470021986 PR Review Comment: https://git.openjdk.org/jdk/pull/17590#discussion_r1470032487 PR Review Comment: https://git.openjdk.org/jdk/pull/17590#discussion_r1470033274 From rkennke at openjdk.org Mon Jan 29 18:45:20 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 29 Jan 2024 18:45:20 GMT Subject: RFR: 8324734: Relax too-strict assert(VM_Version::supports_evex()) in Assembler::locate_operand() [v3] In-Reply-To: <776ui1DnJfcgz89q2aPAeOyhOYKalPhAE1bMR28wK6I=.19728df4-3710-4503-a726-8ce217501f67@github.com> References: <776ui1DnJfcgz89q2aPAeOyhOYKalPhAE1bMR28wK6I=.19728df4-3710-4503-a726-8ce217501f67@github.com> Message-ID: > Details see bug report. The gist is that HotSpot downgrades to UseAVX=2 on some processors, and reports supports_evex() == false, but the instruction decoder can still encounter EVEX instructions when (e.g.) hitting a SIGBUS in memset() - which does have EVEX instructions. > > Testing: > - [x] runtime/Unsafe/InternalErrorTest.java > - [x] tier1 Roman Kennke has updated the pull request incrementally with five additional commits since the last revision: - Fix intendation - Update src/hotspot/share/runtime/abstract_vm_version.hpp Co-authored-by: Aleksey Shipil?v - Update src/hotspot/share/runtime/abstract_vm_version.hpp Co-authored-by: Aleksey Shipil?v - Update src/hotspot/cpu/x86/vm_version_x86.hpp Co-authored-by: Aleksey Shipil?v - Update src/hotspot/cpu/x86/vm_version_x86.cpp Co-authored-by: Aleksey Shipil?v ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17590/files - new: https://git.openjdk.org/jdk/pull/17590/files/f3929385..856cbfa2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17590&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17590&range=01-02 Stats: 5 lines in 3 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/17590.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17590/head:pull/17590 PR: https://git.openjdk.org/jdk/pull/17590 From rkennke at openjdk.org Mon Jan 29 18:45:21 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 29 Jan 2024 18:45:21 GMT Subject: RFR: 8324734: Relax too-strict assert(VM_Version::supports_evex()) in Assembler::locate_operand() [v2] In-Reply-To: References: <776ui1DnJfcgz89q2aPAeOyhOYKalPhAE1bMR28wK6I=.19728df4-3710-4503-a726-8ce217501f67@github.com> Message-ID: On Mon, 29 Jan 2024 18:35:49 GMT, Aleksey Shipilev wrote: > This looks fine, with nits. Thanks for the review! I fixed all the mentioned issues. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17590#issuecomment-1915343877 From cslucas at openjdk.org Mon Jan 29 18:48:26 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 29 Jan 2024 18:48:26 GMT Subject: RFR: JDK-8322854: Incorrect rematerialization of scalar replaced objects in C2 In-Reply-To: References: Message-ID: On Sat, 27 Jan 2024 21:07:22 GMT, Vladimir Kozlov wrote: >> Current implementation of `PhaseMacroExpand::value_from_mem` returns `return _igvn.zerocon(ft);` when it hits a sentinel while searching for a memory operation on a given slice. One of the sentinels is the memory input of the allocate node origin of the memory slice. Therefore, `value_from_mem` may return `zeroconf(ft)` if `sfpt_mem` is the same memory edge used by the Allocate node origin of the memory slice being traversed. >> >> The scalar replacement implementation uses `value_from_mem` during creation of metadata describing object scalar replaced (see `PhaseMacroExpand::create_scalarized_object_description`). The `create_scalarized_object_description` method is also used as part of RAM optimization implementation. The RAM optimization targets Phi nodes and therefore a memory graph loop created by a _memory phi_ node is possible to seen as part of the transformation. See image below: >> >> >> >> This pattern doesn't show up when scalarizing objects that don't participate in allocation merges. >> >> To fix the issue I changed the code in `value_from_mem` to instead of using the _input_ memory edge of the Allocate as a stop condition, it will now use the projection memory edge of the Allocate. >> >> Tested locally on windows, mac and linux x86_64 with JTREG tier1-3 and didn't observe any regression. > > @JohnTortugo do I understand correctly that we have a loop and the Phi node we are processing is memory input to Allocation? > > If I recall correctly, the only way we get to `alloc->in(Mem)` if there is no `Initialize` node (there are such cases). In such case `Allocation` may not have memory out projection. > > Why your case see `alloc->in(Mem)`? > > What other `Phi` node's edge points to? > > I am concern if you use projection memory edge of the Allocate you may miss/skip it during search and start searching unrelated path. @vnkozlov - Thank you for letting me know about those edge cases. I'll investigate what happens in those situations. I created this Gist to demonstrate the problem: https://gist.github.com/JohnTortugo/2e6f183b0bf1e465dc871246b410ef4c @TobiHartmann - I'll try and reproduce these failures locally. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17562#issuecomment-1915348659 From dnsimon at openjdk.org Mon Jan 29 19:11:44 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 29 Jan 2024 19:11:44 GMT Subject: RFR: 8324717: Remove HotSpotJVMCICompilerFactory In-Reply-To: References: Message-ID: On Thu, 25 Jan 2024 14:42:19 GMT, Doug Simon wrote: > There has been no active use of `jdk.vm.ci.hotspot.HotSpotJVMCICompilerFactory.CompilationLevelAdjustment` since [JDK-8219403](https://bugs.openjdk.org/browse/JDK-8219403) effectively [disabled](https://github.com/openjdk/jdk/commit/61f35bf898d6a0f4e7b6e514821b40efd87396dc#diff-4d3a3b7e7e12e1d5b4cf3e4677d9e0de5e9df3bbf1bbfa0d8d43d12098d67dc4R302-R306) it. Since `HotSpotJVMCICompilerFactory` exists solely for `CompilationLevelAdjustment` related logic, this PR removes the whole `HotSpotJVMCICompilerFactory` class. Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17570#issuecomment-1915383889 From dnsimon at openjdk.org Mon Jan 29 19:15:43 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 29 Jan 2024 19:15:43 GMT Subject: Integrated: 8324717: Remove HotSpotJVMCICompilerFactory In-Reply-To: References: Message-ID: On Thu, 25 Jan 2024 14:42:19 GMT, Doug Simon wrote: > There has been no active use of `jdk.vm.ci.hotspot.HotSpotJVMCICompilerFactory.CompilationLevelAdjustment` since [JDK-8219403](https://bugs.openjdk.org/browse/JDK-8219403) effectively [disabled](https://github.com/openjdk/jdk/commit/61f35bf898d6a0f4e7b6e514821b40efd87396dc#diff-4d3a3b7e7e12e1d5b4cf3e4677d9e0de5e9df3bbf1bbfa0d8d43d12098d67dc4R302-R306) it. Since `HotSpotJVMCICompilerFactory` exists solely for `CompilationLevelAdjustment` related logic, this PR removes the whole `HotSpotJVMCICompilerFactory` class. This pull request has now been integrated. Changeset: fb07bbe7 Author: Doug Simon URL: https://git.openjdk.org/jdk/commit/fb07bbe7b2a97b914596ff42105fd867a0916a7a Stats: 100 lines in 2 files changed: 0 ins; 100 del; 0 mod 8324717: Remove HotSpotJVMCICompilerFactory Reviewed-by: thartmann, never ------------- PR: https://git.openjdk.org/jdk/pull/17570 From kvn at openjdk.org Mon Jan 29 19:51:31 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 29 Jan 2024 19:51:31 GMT Subject: RFR: JDK-8322854: Incorrect rematerialization of scalar replaced objects in C2 In-Reply-To: References: Message-ID: On Sat, 27 Jan 2024 21:07:22 GMT, Vladimir Kozlov wrote: >> Current implementation of `PhaseMacroExpand::value_from_mem` returns `return _igvn.zerocon(ft);` when it hits a sentinel while searching for a memory operation on a given slice. One of the sentinels is the memory input of the allocate node origin of the memory slice. Therefore, `value_from_mem` may return `zeroconf(ft)` if `sfpt_mem` is the same memory edge used by the Allocate node origin of the memory slice being traversed. >> >> The scalar replacement implementation uses `value_from_mem` during creation of metadata describing object scalar replaced (see `PhaseMacroExpand::create_scalarized_object_description`). The `create_scalarized_object_description` method is also used as part of RAM optimization implementation. The RAM optimization targets Phi nodes and therefore a memory graph loop created by a _memory phi_ node is possible to seen as part of the transformation. See image below: >> >> >> >> This pattern doesn't show up when scalarizing objects that don't participate in allocation merges. >> >> To fix the issue I changed the code in `value_from_mem` to instead of using the _input_ memory edge of the Allocate as a stop condition, it will now use the projection memory edge of the Allocate. >> >> Tested locally on windows, mac and linux x86_64 with JTREG tier1-3 and didn't observe any regression. > > @JohnTortugo do I understand correctly that we have a loop and the Phi node we are processing is memory input to Allocation? > > If I recall correctly, the only way we get to `alloc->in(Mem)` if there is no `Initialize` node (there are such cases). In such case `Allocation` may not have memory out projection. > > Why your case see `alloc->in(Mem)`? > > What other `Phi` node's edge points to? > > I am concern if you use projection memory edge of the Allocate you may miss/skip it during search and start searching unrelated path. > @vnkozlov - Thank you for letting me know about those edge cases. I'll investigate what happens in those situations. > I created this Gist to demonstrate the problem: https://gist.github.com/JohnTortugo/2e6f183b0bf1e465dc871246b410ef4c @JohnTortugo - Thank you for demonstration. Now I understand the issue. Yes, your suggestion is reasonable but you need to watch out for missing allocation's memory projection - you should not use `nullptr` as sentinel. May be add assert and run testing to see if we hit it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17562#issuecomment-1915443950 From xliu at openjdk.org Mon Jan 29 19:57:42 2024 From: xliu at openjdk.org (Xin Liu) Date: Mon, 29 Jan 2024 19:57:42 GMT Subject: RFR: 8324667: fold Parse::seems_stable_comparison() [v2] In-Reply-To: References: Message-ID: On Thu, 25 Jan 2024 19:05:50 GMT, Joshua Cao wrote: >> The function has a long comment block that seems irrelevant since https://github.com/openjdk/jdk/commit/8bd4b5624c6ece31d965259aadc290a24d44423a. We can just fold away this method. It only has one caller. >> >> >> passes GHA > > Joshua Cao has updated the pull request incrementally with two additional commits since the last revision: > > - Update copyright for parse.hpp > - Remove seems_stable_comparison() from header and remove copyright LGTM. I am not a reviewer. ------------- Marked as reviewed by xliu (Committer). PR Review: https://git.openjdk.org/jdk/pull/17573#pullrequestreview-1849637781 From duke at openjdk.org Mon Jan 29 19:57:43 2024 From: duke at openjdk.org (Joshua Cao) Date: Mon, 29 Jan 2024 19:57:43 GMT Subject: Integrated: 8324667: fold Parse::seems_stable_comparison() In-Reply-To: References: Message-ID: On Thu, 25 Jan 2024 17:57:55 GMT, Joshua Cao wrote: > The function has a long comment block that seems irrelevant since https://github.com/openjdk/jdk/commit/8bd4b5624c6ece31d965259aadc290a24d44423a. We can just fold away this method. It only has one caller. > > > passes GHA This pull request has now been integrated. Changeset: 84deeb6c Author: Joshua Cao Committer: Xin Liu URL: https://git.openjdk.org/jdk/commit/84deeb6cd58884bd794da88e4d5a6c873286383b Stats: 19 lines in 2 files changed: 1 ins; 16 del; 2 mod 8324667: fold Parse::seems_stable_comparison() Reviewed-by: jkarthikeyan, chagedorn, xliu ------------- PR: https://git.openjdk.org/jdk/pull/17573 From dlong at openjdk.org Mon Jan 29 21:46:32 2024 From: dlong at openjdk.org (Dean Long) Date: Mon, 29 Jan 2024 21:46:32 GMT Subject: RFR: 8324630: C1: Canonicalizer::do_LookupSwitch doesn't break the loop when the successor is found [v3] In-Reply-To: References: <-3LeFaCHrhS593HcrdEaKfE0LMdbQkg1hP6y3iLJ2Bc=.c96dd8de-ef94-4757-b5c7-83309074e036@github.com> Message-ID: On Sat, 27 Jan 2024 03:20:52 GMT, Denghui Dong wrote: >> src/hotspot/share/c1/c1_Canonicalizer.cpp line 848: >> >>> 846: int high = x->length() - 1; >>> 847: while (low <= high) { >>> 848: int mid = low + ((high - low) >> 1); >> >> Isn't this the same as >> `int mid = (low + high) >> 1;` > > `low + ((high - low) >> 1)` can avoid integer overflow (it seems unlikely to happen though). Good point. Integer overflow is a real concern. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17553#discussion_r1470256667 From dlong at openjdk.org Mon Jan 29 21:46:32 2024 From: dlong at openjdk.org (Dean Long) Date: Mon, 29 Jan 2024 21:46:32 GMT Subject: RFR: 8324630: C1: Canonicalizer::do_LookupSwitch doesn't break the loop when the successor is found [v3] In-Reply-To: <-3LeFaCHrhS593HcrdEaKfE0LMdbQkg1hP6y3iLJ2Bc=.c96dd8de-ef94-4757-b5c7-83309074e036@github.com> References: <-3LeFaCHrhS593HcrdEaKfE0LMdbQkg1hP6y3iLJ2Bc=.c96dd8de-ef94-4757-b5c7-83309074e036@github.com> Message-ID: <2amvFgNsI51KVNgfG7lWO-1YAVj5jmqrIqQrae2wm3c=.25e5b2cc-ac13-47e2-9128-868b93fa94d6@github.com> On Fri, 26 Jan 2024 22:47:46 GMT, Denghui Dong wrote: >> Hi, >> >> Please review the small change that breaks the loop in Canonicalizer::do_LookupSwitch if the successor is found. >> >> The keys of LookupSwitch are sorted, so there is no need to continue the loop once matched. >> >> Thanks. > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > binary search ------------- Marked as reviewed by dlong (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17553#pullrequestreview-1849830309 From duke at openjdk.org Mon Jan 29 22:39:45 2024 From: duke at openjdk.org (Joshua Cao) Date: Mon, 29 Jan 2024 22:39:45 GMT Subject: RFR: 8275202: C2: optimize out more redundant conditions In-Reply-To: <978cgwy3Nb_x7yU6jZz0f6zhTBZfphstisAkBf1Vktc=.283d06eb-4f79-40cf-b8dd-a9c230e59902@github.com> References: <978cgwy3Nb_x7yU6jZz0f6zhTBZfphstisAkBf1Vktc=.283d06eb-4f79-40cf-b8dd-a9c230e59902@github.com> Message-ID: On Wed, 21 Jun 2023 12:47:26 GMT, Roland Westrelin wrote: > This change adds a new loop opts pass to optimize redundant conditions > such as the second one in: > > > if (i < 10) { > if (i < 42) { > > > In the branch of the first if, the type of i can be narrowed down to > [min_jint, 9] which can then be used to constant fold the second > condition. > > The compiler already keeps track of type[n] for every node in the > current compilation unit. That's not sufficient to optimize the > snippet above though because the type of i can only be narrowed in > some sections of the control flow (that is a subset of all > controls). The solution is to build a new table that tracks the type > of n at every control c > > > type'[n, root] = type[n] // initialized from igvn's type table > type'[n, c] = type[n, idom(c)] > > > This pass iterates over the CFG looking for conditions such as: > > > if (i < 10) { > > > that allows narrowing the type of i and updates the type' table > accordingly. > > At a region r: > > > type'[n, r] = meet(type'[n, r->in(1)], type'[n, r->in(2)]...) > > > For a Phi phi at a region r: > > > type'[phi, r] = meet(type'[phi->in(1), r->in(1)], type'[phi->in(2), r->in(2)]...) > > > Once a type is narrowed, uses are enqueued and their types are > computed by calling the Value() methods. If a use's type is narrowed, > it's recorded at c in the type' table. Value() methods retrieve types > from the type table, not the type' table. To address that issue while > leaving Value() methods unchanged, before calling Value() at c, the > type table is updated so: > > > type[n] = type'[n, c] > > > An exception is for Phi::Value which needs to retrieve the type of > nodes are various controls: there, a new type(Node* n, Node* c) > method is used. > > For most n and c, type'[n, c] is likely the same as type[n], the type > recorded in the global igvn table (that is there shouldn't be many > nodes at only a few control for which we can narrow the type down). As > a consequence, the types'[n, c] table is implemented with: > > - At c, narrowed down types are stored in a GrowableArray. Each entry > records the previous type at idom(c) and the narrowed down type at > c. > > - The GrowableArray of type updates is recorded in a hash table > indexed by c. If there's no update at c, there's no entry in the > hash table. > > This pass operates in 2 steps: > > - it first iterates over the graph looking for conditions that narrow > the types of some nodes and propagate type updates to uses until a > fix point. > > - it transforms the graph so newly found constant nodes are folded. > > > The new pass is run on every loop opts. There are a couple rea... Would it make sense to handle this case in `IfNode::ideal()`? `IfNode::fold_compares` already handles more complicated cases of integer comparisons. We can match `cmp(i, constant)`, build integer range data, and eliminate IfNode's based on the data. It won't be as powerful as the fixed point optimization that is proposed in this PR, but it is sufficient to cover the case mentioned in the JBS issue and would have less compile time overhead. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14586#issuecomment-1915695767 From kvn at openjdk.org Mon Jan 29 23:18:41 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 29 Jan 2024 23:18:41 GMT Subject: RFR: 8324734: Relax too-strict assert(VM_Version::supports_evex()) in Assembler::locate_operand() [v3] In-Reply-To: References: <776ui1DnJfcgz89q2aPAeOyhOYKalPhAE1bMR28wK6I=.19728df4-3710-4503-a726-8ce217501f67@github.com> Message-ID: <06wfyDbZvrl0vopNE33C_iFFpw8QGAQ-05OyfpES_aw=.7018500c-1e9b-4f5e-873d-bbdea4aa82f7@github.com> On Mon, 29 Jan 2024 18:45:20 GMT, Roman Kennke wrote: >> Details see bug report. The gist is that HotSpot downgrades to UseAVX=2 on some processors, and reports supports_evex() == false, but the instruction decoder can still encounter EVEX instructions when (e.g.) hitting a SIGBUS in memset() - which does have EVEX instructions. >> >> Testing: >> - [x] runtime/Unsafe/InternalErrorTest.java >> - [x] tier1 > > Roman Kennke has updated the pull request incrementally with five additional commits since the last revision: > > - Fix intendation > - Update src/hotspot/share/runtime/abstract_vm_version.hpp > > Co-authored-by: Aleksey Shipil?v > - Update src/hotspot/share/runtime/abstract_vm_version.hpp > > Co-authored-by: Aleksey Shipil?v > - Update src/hotspot/cpu/x86/vm_version_x86.hpp > > Co-authored-by: Aleksey Shipil?v > - Update src/hotspot/cpu/x86/vm_version_x86.cpp > > Co-authored-by: Aleksey Shipil?v My testing passed. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17590#pullrequestreview-1849949984 From gcao at openjdk.org Tue Jan 30 02:10:42 2024 From: gcao at openjdk.org (Gui Cao) Date: Tue, 30 Jan 2024 02:10:42 GMT Subject: Integrated: 8324125: Improve class initialization barrier in TemplateTable::_new for RISC-V In-Reply-To: <_TyDNsY_znk2GR9l7rpALqDOuwMwj4hkuF5kKcyJBQg=.1d9f5448-ad18-49bf-9fde-a68e88626470@github.com> References: <_TyDNsY_znk2GR9l7rpALqDOuwMwj4hkuF5kKcyJBQg=.1d9f5448-ad18-49bf-9fde-a68e88626470@github.com> Message-ID: <576x9FIWsCiIK2f0aHzaDCZ2BboLrpcEWgUkRYkuLSg=.56096ea7-9235-491a-a3e5-370c42082903@github.com> On Wed, 24 Jan 2024 09:16:09 GMT, Gui Cao wrote: > Hi, This RISC-V Port implementation for https://github.com/openjdk/jdk/pull/17006, > > ### Testing: > > - [x] Run tier1-3 tests on qemu 8.1.0 with UseRVV (fastdebug) > - [x] Run tier1-3 tests with SiFive unmatched (release) This pull request has now been integrated. Changeset: a1d65eb6 Author: Gui Cao Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/a1d65eb6d87ff9019a9a92a775213be2a8b60fd1 Stats: 5 lines in 1 file changed: 0 ins; 1 del; 4 mod 8324125: Improve class initialization barrier in TemplateTable::_new for RISC-V Reviewed-by: fyang, rehn ------------- PR: https://git.openjdk.org/jdk/pull/17548 From yyang at openjdk.org Tue Jan 30 06:49:57 2024 From: yyang at openjdk.org (Yi Yang) Date: Tue, 30 Jan 2024 06:49:57 GMT Subject: RFR: 8323795: jcmd Compiler.codecache should print total size of code cache [v5] In-Reply-To: References: Message-ID: > CodeHeap 'non-profiled nmethods': size=118592Kb used=29Kb max_used=29Kb free=118562Kb > bounds [0x00007fbe84622000, 0x00007fbe84892000, 0x00007fbe8b9f2000] > CodeHeap 'profiled nmethods': size=118588Kb used=80Kb max_used=80Kb free=118507Kb > bounds [0x00007fbe7c9f2000, 0x00007fbe7cc62000, 0x00007fbe83dc1000] > CodeHeap 'non-nmethods': size=8580Kb used=1258Kb max_used=1834Kb free=7321Kb > bounds [0x00007fbe83dc1000, 0x00007fbe84031000, 0x00007fbe84622000] > total_blobs=474 nmethods=87 adapters=293 > compilation: enabled > stopped_count=0, restarted_count=0 > full_count=0 > > > It's better to accumulates total size of used/free/size, for example > > -SegmentedCodeCache > CodeCache: size=245760Kb used=1366Kb max_used=1943Kb free=244393Kb > bounds [0x00007fdcc89f2000, 0x00007fdcc8c62000, 0x00007fdcd79f2000] > total_blobs=474, nmethods=87, adapters=293 > stopped_count=0, restarted_count=0, full_count=0 > compilation=enabled > > > > +SegmentedCodeCache > CodeHeap 'non-profiled nmethods': size=118592Kb used=29Kb max_used=29Kb free=118562Kb > bounds [0x00007f89c8622000, 0x00007f89c8892000, 0x00007f89cf9f2000] > CodeHeap 'profiled nmethods': size=118588Kb used=80Kb max_used=80Kb free=118507Kb > bounds [0x00007f89c09f2000, 0x00007f89c0c62000, 0x00007f89c7dc1000] > CodeHeap 'non-nmethods': size=8580Kb used=1258Kb max_used=1834Kb free=7321Kb > bounds [0x00007f89c7dc1000, 0x00007f89c8031000, 0x00007f89c8622000] > CodeCache: size=245760Kb, used=1367Kb, max_used=1943Kb, free=244390Kb > total_blobs=474, nmethods=87, adapters=293 > stopped_count=0, restarted_count=0, full_count=0 > compilation=enabled Yi Yang has updated the pull request incrementally with one additional commit since the last revision: fix ident && vmTestbase test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17445/files - new: https://git.openjdk.org/jdk/pull/17445/files/e9ccc76d..5f86187d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17445&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17445&range=03-04 Stats: 15 lines in 3 files changed: 1 ins; 0 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/17445.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17445/head:pull/17445 PR: https://git.openjdk.org/jdk/pull/17445 From epeter at openjdk.org Tue Jan 30 08:14:42 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 30 Jan 2024 08:14:42 GMT Subject: RFR: 8291809: Convert compiler/c2/cr7200264/TestSSE2IntVect.java to IR verification test [v9] In-Reply-To: References: Message-ID: On Mon, 29 Jan 2024 15:02:00 GMT, Daniel Lund?n wrote: >> This changeset translates the tests in `compiler/c2/cr7200264/` to use the IR verification framework. >> >> The proposed translation, to the extent possible, preserves the semantics of the original test. A major difference is that the IR checks are now local (for every `test_*` method) instead of global. The execution time of the new test is comparable to the old test. >> >> Testing: >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/7671921417) >> - Ran the new translated tests within all tier1 through tier10 contexts on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64. >> - Tested that manually adding `-XX:LoopUnrollLimit=0` to the test framework flags caused the translated tests to fail. Note: it is, however, no longer possible to break the test by passing `-XX:LoopUnrollLimit=0` on the command line. > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Update aarch64 rules for test_divc and test_divc_n Thanks for the updates! ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17428#pullrequestreview-1850467776 From shade at openjdk.org Tue Jan 30 09:00:32 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 30 Jan 2024 09:00:32 GMT Subject: RFR: 8324734: Relax too-strict assert(VM_Version::supports_evex()) in Assembler::locate_operand() [v3] In-Reply-To: References: <776ui1DnJfcgz89q2aPAeOyhOYKalPhAE1bMR28wK6I=.19728df4-3710-4503-a726-8ce217501f67@github.com> Message-ID: On Mon, 29 Jan 2024 18:45:20 GMT, Roman Kennke wrote: >> Details see bug report. The gist is that HotSpot downgrades to UseAVX=2 on some processors, and reports supports_evex() == false, but the instruction decoder can still encounter EVEX instructions when (e.g.) hitting a SIGBUS in memset() - which does have EVEX instructions. >> >> Testing: >> - [x] runtime/Unsafe/InternalErrorTest.java >> - [x] tier1 > > Roman Kennke has updated the pull request incrementally with five additional commits since the last revision: > > - Fix intendation > - Update src/hotspot/share/runtime/abstract_vm_version.hpp > > Co-authored-by: Aleksey Shipil?v > - Update src/hotspot/share/runtime/abstract_vm_version.hpp > > Co-authored-by: Aleksey Shipil?v > - Update src/hotspot/cpu/x86/vm_version_x86.hpp > > Co-authored-by: Aleksey Shipil?v > - Update src/hotspot/cpu/x86/vm_version_x86.cpp > > Co-authored-by: Aleksey Shipil?v Looks fine to me. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17590#pullrequestreview-1850556132 From roland at openjdk.org Tue Jan 30 09:24:24 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 30 Jan 2024 09:24:24 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v4] In-Reply-To: References: Message-ID: <42IjtaFnhJy9RiLE_-v4y6AZ6R3vUFJtQ0UzeLaI79I=.e48124c6-a0e2-48f1-be81-d37f8c1f7388@github.com> On Wed, 17 Jan 2024 15:13:37 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> merge fix > > src/hotspot/share/opto/intrinsicnode.cpp line 388: > >> 386: assert(in(0)->in(0)->in(1)->is_Bool(), ""); >> 387: assert(in(0)->in(0)->in(1)->in(1)->Opcode() == Op_ScopedValueGetHitsInCache, ""); >> 388: assert(in(0)->in(0)->in(1)->in(1) == in(1), ""); > > Why not use your beautiful enum for addressing the inputs? It's verifying that the ScopedValueGetLoadFromCache is indeed guarded by a ScopedValueGetHitsInCache. No use of the beautiful enum here. > src/hotspot/share/opto/node.cpp line 977: > >> 975: } >> 976: >> 977: Node* Node::find_unique_out_with(int opcode) const { > > Random idea: > Would it not be nice if this method automatically casted the node to that node-class? > Suggestions: > - using templates: give the class name and the opcode. A bit annoying to use > - using macros: give it the node-type name: i.e. `Add` for `AddNode`. The macro then uses the template, filling in `AddNode` and `Op_Add`. What do you think? Yes, it would but that out of scope for this PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1470830173 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1470834145 From mdoerr at openjdk.org Tue Jan 30 09:27:40 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 30 Jan 2024 09:27:40 GMT Subject: RFR: JDK-8322648: Improve class initialization barrier in TemplateTable::_new for PPC [v3] In-Reply-To: References: Message-ID: On Mon, 29 Jan 2024 15:07:23 GMT, Varada M wrote: >> ppc port implementation of https://github.com/openjdk/jdk/pull/17006 >> >> Fastdebug and Release : build and tier1 testing successful. >> >> JBS Issue : [JDK-8322648](https://bugs.openjdk.org/browse/JDK-8322648) > > Varada M has updated the pull request incrementally with one additional commit since the last revision: > > 8322648: Improve class initialization barrier in TemplateTable::_new for PPC I can run it, but please update your PR first (as shown above). CCR1 is used uninitialized in your current version. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17518#issuecomment-1916408729 From varadam at openjdk.org Tue Jan 30 09:37:47 2024 From: varadam at openjdk.org (Varada M) Date: Tue, 30 Jan 2024 09:37:47 GMT Subject: RFR: JDK-8322648: Improve class initialization barrier in TemplateTable::_new for PPC [v3] In-Reply-To: References: Message-ID: On Mon, 29 Jan 2024 15:07:23 GMT, Varada M wrote: >> ppc port implementation of https://github.com/openjdk/jdk/pull/17006 >> >> Fastdebug and Release : build and tier1 testing successful. >> >> JBS Issue : [JDK-8322648](https://bugs.openjdk.org/browse/JDK-8322648) > > Varada M has updated the pull request incrementally with one additional commit since the last revision: > > 8322648: Improve class initialization barrier in TemplateTable::_new for PPC Thank you Martin. I have applied the changes ------------- PR Comment: https://git.openjdk.org/jdk/pull/17518#issuecomment-1916428792 From varadam at openjdk.org Tue Jan 30 09:37:46 2024 From: varadam at openjdk.org (Varada M) Date: Tue, 30 Jan 2024 09:37:46 GMT Subject: RFR: JDK-8322648: Improve class initialization barrier in TemplateTable::_new for PPC [v4] In-Reply-To: References: Message-ID: > ppc port implementation of https://github.com/openjdk/jdk/pull/17006 > > Fastdebug and Release : build and tier1 testing successful. > > JBS Issue : [JDK-8322648](https://bugs.openjdk.org/browse/JDK-8322648) Varada M has updated the pull request incrementally with one additional commit since the last revision: 8322648: Improve class initialization barrier in TemplateTable::_new for PPC ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17518/files - new: https://git.openjdk.org/jdk/pull/17518/files/e85eb148..01d0af3d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17518&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17518&range=02-03 Stats: 4 lines in 1 file changed: 0 ins; 3 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17518.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17518/head:pull/17518 PR: https://git.openjdk.org/jdk/pull/17518 From mdoerr at openjdk.org Tue Jan 30 10:32:44 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 30 Jan 2024 10:32:44 GMT Subject: RFR: JDK-8322648: Improve class initialization barrier in TemplateTable::_new for PPC [v4] In-Reply-To: References: Message-ID: <-jybh7O1qjbwGMypTnB8cpfPrubNVDcvgdUzurJDpNA=.afdbb937-a10c-4a99-98ea-95ba4ae9b5a9@github.com> On Tue, 30 Jan 2024 09:37:46 GMT, Varada M wrote: >> ppc port implementation of https://github.com/openjdk/jdk/pull/17006 >> >> Fastdebug and Release : build and tier1 testing successful. >> >> JBS Issue : [JDK-8322648](https://bugs.openjdk.org/browse/JDK-8322648) > > Varada M has updated the pull request incrementally with one additional commit since the last revision: > > 8322648: Improve class initialization barrier in TemplateTable::_new for PPC Thanks for the update! JVM98 performance results with -Xint -XX:-ProfileInterpreter look good (no regression). Some sub-benchmarks seem to be a very little faster, but there's no big difference. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17518#pullrequestreview-1850824938 From varadam at openjdk.org Tue Jan 30 10:47:33 2024 From: varadam at openjdk.org (Varada M) Date: Tue, 30 Jan 2024 10:47:33 GMT Subject: RFR: JDK-8322648: Improve class initialization barrier in TemplateTable::_new for PPC [v4] In-Reply-To: References: Message-ID: <9QV2Fcv4jqyjD9T0Ok19bzuRD6kPTboYlm4_c_ObGF4=.5085dc5f-20c2-44fd-b029-6d2525f13a7c@github.com> On Tue, 30 Jan 2024 09:37:46 GMT, Varada M wrote: >> ppc port implementation of https://github.com/openjdk/jdk/pull/17006 >> >> Fastdebug and Release : build and tier1 testing successful. >> >> JBS Issue : [JDK-8322648](https://bugs.openjdk.org/browse/JDK-8322648) > > Varada M has updated the pull request incrementally with one additional commit since the last revision: > > 8322648: Improve class initialization barrier in TemplateTable::_new for PPC Thank you Martin for running the benchmark and reviewing the code ------------- PR Comment: https://git.openjdk.org/jdk/pull/17518#issuecomment-1916566803 From rkennke at openjdk.org Tue Jan 30 10:49:25 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 30 Jan 2024 10:49:25 GMT Subject: RFR: 8324734: Relax too-strict assert(VM_Version::supports_evex()) in Assembler::locate_operand() [v3] In-Reply-To: References: <776ui1DnJfcgz89q2aPAeOyhOYKalPhAE1bMR28wK6I=.19728df4-3710-4503-a726-8ce217501f67@github.com> Message-ID: On Mon, 29 Jan 2024 18:45:20 GMT, Roman Kennke wrote: >> Details see bug report. The gist is that HotSpot downgrades to UseAVX=2 on some processors, and reports supports_evex() == false, but the instruction decoder can still encounter EVEX instructions when (e.g.) hitting a SIGBUS in memset() - which does have EVEX instructions. >> >> Testing: >> - [x] runtime/Unsafe/InternalErrorTest.java >> - [x] tier1 > > Roman Kennke has updated the pull request incrementally with five additional commits since the last revision: > > - Fix intendation > - Update src/hotspot/share/runtime/abstract_vm_version.hpp > > Co-authored-by: Aleksey Shipil?v > - Update src/hotspot/share/runtime/abstract_vm_version.hpp > > Co-authored-by: Aleksey Shipil?v > - Update src/hotspot/cpu/x86/vm_version_x86.hpp > > Co-authored-by: Aleksey Shipil?v > - Update src/hotspot/cpu/x86/vm_version_x86.cpp > > Co-authored-by: Aleksey Shipil?v Thanks, Vladimir and Aleksey! ------------- PR Comment: https://git.openjdk.org/jdk/pull/17590#issuecomment-1916570404 From thartmann at openjdk.org Tue Jan 30 10:51:45 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 30 Jan 2024 10:51:45 GMT Subject: RFR: 8320237 C2: late inlining of method handle invoke causes duplicate lines in PrintInlining output [v3] In-Reply-To: References: Message-ID: On Mon, 22 Jan 2024 16:28:02 GMT, Kangcheng Xu wrote: >> This PR resolves [JDK-8320237](https://bugs.openjdk.org/browse/JDK-8320237) >> >> The original behavior produces both a failure and success message upon late inlining which is confusing. The patch removes the failure message if inlining was successful. Huge thanks to @rwestrel for reporting and working out a solution. >> >> Unit test `test/hotspot/jtreg/compiler/inlining/TestDuplicatedLateInliningOutput.java` is added and passing. > > Kangcheng Xu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - add -XX:-BackgroundCompilation flag > - Merge branch 'master' into JDK-8320237 > - fix VM crashes > - update test summary, requirements, and VM flags > - Merge branch 'master' into JDK-8320237 > - make regex whitespace consistent > > and to trigger GHA > - 8320237: C2: late inlining of method handle invoke causes duplicate lines in PrintInlining output All tests passed. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17147#pullrequestreview-1850867320 From roland at openjdk.org Tue Jan 30 12:16:30 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 30 Jan 2024 12:16:30 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v4] In-Reply-To: References: Message-ID: On Thu, 18 Jan 2024 08:53:18 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> merge fix > > src/hotspot/share/opto/loopPredicate.cpp line 1648: > >> 1646: tty->print("Predicate invariant if: %d ", new_predicate_iff->_idx); >> 1647: loop->dump_head(); >> 1648: } else if (TraceLoopOpts) { > > Why not have them as separate ifs? What if someone enables both, will they not miss a line? That's the code pattern used elsewhere in `PhaseIdealLoop::loop_predication_impl_helper()`. > src/hotspot/share/opto/loopnode.cpp line 4717: > >> 4715: assert(!_igvn.delay_transform(), ""); >> 4716: _igvn.set_delay_transform(true); >> 4717: for (uint i = _scoped_value_get_nodes.size(); i > 0; i--) { > > Suggestion: > > for (uint i = _scoped_value_get_nodes.size()-1; i >= 0; i--) { An unsigned value is always `>= 0`. Wouldn't your suggestion turn the loop into an infinite loop? > src/hotspot/share/opto/subnode.hpp line 340: > >> 338: >> 339: Node* mem() const { >> 340: return in(Memory); > > Why not verify that this is a `MemNode`? MemNodes are not the only ones carrying memory state (projection for a call, membar etc). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1471104465 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1471106908 PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1470836111 From epeter at openjdk.org Tue Jan 30 12:16:25 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 30 Jan 2024 12:16:25 GMT Subject: RFR: 8324794: C2 SuperWord: do not ignore reductions in SuperWord::unrolling_analysis In-Reply-To: References: <4Dv9KCdW7B9Ms3a-MO-59PL-3qebNhNJejuuM6LCB0w=.7575de49-9d14-4c12-b95d-8aca28b07a05@github.com> <-f9_1s2hXm20N3DABo4Hq1dyKzAxRHq3kJ7I_D-PONo=.5566f4f5-e975-4bee-b274-c7a136672c1b@github.com> Message-ID: On Mon, 29 Jan 2024 17:41:28 GMT, Vladimir Kozlov wrote: >> @vnkozlov Yes, exactly, the call to `is_marked_reduction`. Other than that, unrolling_analysis could be static, and does not need any information from SuperWord. I'd like to splitt off unrolling_analysis, and so I'll have to remove the call to `is_marked_reduction`. >> >> It seems like this was in from the begginning, when Michael Berg added the unrolling_analysis with https://github.com/openjdk/jdk/commit/7c7b91845f94d13b8fed7911be7f933cf0df28d4 >> >> I can see no reason stated in the RFE or the code itself. >> >> I can only speculate: maybe the idea was that reductions are not profitable, unless there are other nodes, like stores and loads. So if we only find reductions, then we would not adjust the unrolling, since we are not expecting vectorization anyway. Again: only speculation. You reviewed the code in 2015, maybe you still remember the reason ;) >> >> FYI: only reductions may in the not too distant future become vectorizable in a profitable way, so I think removing this is good anyway. > >> I can only speculate: maybe the idea was that reductions are not profitable, unless there are other nodes, like stores and loads. So if we only find reductions, then we would not adjust the unrolling, since we are not expecting vectorization anyway. Again: only speculation. You reviewed the code in 2015, maybe you still remember the reason ;) > > I looked and there was no discussion about that during review. Originally it was not SuperWord analysis - it only looked for arithmetic Phi nodes in loop. Last year we changed it: [1be80a44](https://github.com/openjdk/jdk/commit/1be80a4445cf74adc9b2cd5bf262a897f9ede74f) > I think the check simplify `unrolling_analysis` code since we skip nodes we already know about them. @vnkozlov this change here is in preparation for this next RFE: https://github.com/openjdk/jdk/pull/17624 ------------- PR Comment: https://git.openjdk.org/jdk/pull/17604#issuecomment-1916711979 From epeter at openjdk.org Tue Jan 30 12:18:49 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 30 Jan 2024 12:18:49 GMT Subject: RFR: 8324890: C2 SuperWord: refactor out VLoop, make unrolling_analysis static, remove init/reset mechanism Message-ID: Subtask of https://github.com/openjdk/jdk/pull/16620 1. Move out the shared code between `SuperWord::SLP_extract` (where we do vectorization) and `SuperWord::unrolling_analysis`, and move it to a new class `VLoop`. This allows us to decouple `unrolling_analysis` from the SuperWord object, and we can make it static. 2. So far, SuperWord was reused for all loops in a compilation, and then "reset" (with `SuperWord::init`) for every loop. This is a bit of a nasty pattern. I now make a new `VLoop` and a new `SuperWord` object per loop. 3. Since we now make more `SuperWord` objects, we allocate the internal data structures more often. Therefore, I now pre-allocate/reserve sufficient space on initialization. Side-note about https://github.com/openjdk/jdk/pull/17604: I would like to remove the use of `SuperWord::is_marked_reduction` from `SuperWord::unrolling_analysis`. For starters: it is not clear what it was ever good for. Second: it requires us to do reduction marking/analysis before `unrolling_analysis`, and hence makes the reduction marking shared between `unrolling_analysis` and vectorization. I could move the reduction marking to `VLoop` now. But the `_loop_reducitons` set would have to be put on an arena, and I would like to avoid creating an arena for the `unrolling_analysis`. Plus, it would just be nicer code, to have reduction analysis together with body analysis, type analysis, etc. and all of them in only in `SLP_extract`. ------------- Commit messages: - _vtrace is moved to VLoop - comment update - cosmetics - rename in preconditions - remove loop_transform_helper - fix small bug - preallocate memory - more refactoring - moved mark_reductions - rm init, and revert some other stuff - ... and 1 more: https://git.openjdk.org/jdk/compare/fe0eec7e...06d83797 Changes: https://git.openjdk.org/jdk/pull/17624/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17624&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8324890 Stats: 625 lines in 9 files changed: 255 ins; 199 del; 171 mod Patch: https://git.openjdk.org/jdk/pull/17624.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17624/head:pull/17624 PR: https://git.openjdk.org/jdk/pull/17624 From epeter at openjdk.org Tue Jan 30 12:18:51 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 30 Jan 2024 12:18:51 GMT Subject: RFR: 8324890: C2 SuperWord: refactor out VLoop, make unrolling_analysis static, remove init/reset mechanism In-Reply-To: References: Message-ID: On Tue, 30 Jan 2024 06:14:45 GMT, Emanuel Peter wrote: > Subtask of https://github.com/openjdk/jdk/pull/16620 > > 1. Move out the shared code between `SuperWord::SLP_extract` (where we do vectorization) and `SuperWord::unrolling_analysis`, and move it to a new class `VLoop`. This allows us to decouple `unrolling_analysis` from the SuperWord object, and we can make it static. > 2. So far, SuperWord was reused for all loops in a compilation, and then "reset" (with `SuperWord::init`) for every loop. This is a bit of a nasty pattern. I now make a new `VLoop` and a new `SuperWord` object per loop. > 3. Since we now make more `SuperWord` objects, we allocate the internal data structures more often. Therefore, I now pre-allocate/reserve sufficient space on initialization. > > Side-note about https://github.com/openjdk/jdk/pull/17604: > I would like to remove the use of `SuperWord::is_marked_reduction` from `SuperWord::unrolling_analysis`. For starters: it is not clear what it was ever good for. Second: it requires us to do reduction marking/analysis before `unrolling_analysis`, and hence makes the reduction marking shared between `unrolling_analysis` and vectorization. I could move the reduction marking to `VLoop` now. But the `_loop_reducitons` set would have to be put on an arena, and I would like to avoid creating an arena for the `unrolling_analysis`. Plus, it would just be nicer code, to have reduction analysis together with body analysis, type analysis, etc. and all of them in only in `SLP_extract`. src/hotspot/share/opto/loopTransform.cpp line 1106: > 1104: VLoop vloop(this, true); > 1105: if (vloop.check_preconditions()) { > 1106: SuperWord::unrolling_analysis(vloop, _local_loop_unroll_factor); Note: I made `unrolling_analysis` static, and only pass in `vloop`, not all the info in the `SuperWord` object. src/hotspot/share/opto/loopnode.cpp line 4867: > 4865: // Auto-vectorize main-loop > 4866: if (C->do_superword() && C->has_loops() && !C->major_progress()) { > 4867: ResourceArea autovectorization_arena; Note: this allows us to free up all the space used by `SuperWord`'s internal data structures between every processed loop. Previously, all internal data structures were on the `phase->C->comp_arena()`. src/hotspot/share/opto/loopnode.cpp line 5988: > 5986: _pre_loop_end = pre_loop_end; > 5987: } > 5988: Note: This should have never been cached in the node itself, but only during autovectorization. I moved it now into `VLoop`, which I also pass into `VPointer`, which has to access the pre-loop for independence checks. src/hotspot/share/opto/loopnode.hpp line 235: > 233: > 234: // Cached CountedLoopEndNode of pre loop for main loops > 235: CountedLoopEndNode* _pre_loop_end; Note: this makes the node smaller, and does not cache something that may be invalid later. It was used only during SuperWord. Looks like a bad pattern. src/hotspot/share/opto/loopopts.cpp line 4231: > 4229: } > 4230: > 4231: // This counted main-loop either failed preconditions, the analyzer Suggestion: // This counted main-loop either failed preconditions, src/hotspot/share/opto/superword.cpp line 59: > 57: _do_vector_loop(phase()->C->do_vector_loop()), // whether to do vectorization/simd style > 58: _num_work_vecs(0), // amount of vector work we have > 59: _num_reductions(0) // amount of reduction work we have Note: Before this change, we used to only create SuperWord once, and use it for all loops in the compilation. Now that I ripped out the "init" method, and avoid reusing SuperWord this way, we want to make sure we do not re-allocate too much. For some data structures I now pre-allocate memory for the maximum size they may ever reach. This is to avoid re-allocation when they grow. src/hotspot/share/opto/superword.cpp line 67: > 65: CountedLoopNode* cl = vloop.cl(); > 66: Node* cl_exit = vloop.cl_exit(); > 67: PhaseIdealLoop* phase = vloop.phase(); Note: Made it static, and instead of the SuperWord object, we now only have access to the VLoop object. src/hotspot/share/opto/superword.cpp line 75: > 73: > 74: //------------------------------transform_loop--------------------------- > 75: bool SuperWord::transform_loop(IdealLoopTree* lpt, bool do_optimization) { Note: code moved to `VLoop::check_preconditions_helper` src/hotspot/share/opto/superword.cpp line 98: > 96: if (SuperWordReductions) { > 97: mark_reductions(); > 98: } Note: I now would like to move reduction marking to **after** precondition checking. Hence, I moved it to `SLP_extract`. src/hotspot/share/opto/superword.cpp line 149: > 147: } > 148: > 149: init(); // initialize data structures Note: this is the end of the "preconditions", and we used to set `_early_exit = false` inside `init()` src/hotspot/share/opto/superword.cpp line 180: > 178: Node* n = lpt()->_body.at(i); > 179: if (n == cl->incr() || > 180: is_marked_reduction(n) || This is the annoying reduction, which would require reduction marking for `unrolling_analysis`. I'm hoping to remove it here: https://github.com/openjdk/jdk/pull/17604 src/hotspot/share/opto/superword.cpp line 414: > 412: } > 413: > 414: const char* SuperWord::transform_loop_helper() { TODO remove the helper, we can do that later src/hotspot/share/opto/superword.cpp line 469: > 467: if (SuperWordReductions) { > 468: mark_reductions(); > 469: } Note: it would be really nice to move reduction marking to all other analysis, like bb construction and velt computation, etc. src/hotspot/share/opto/superword.cpp line 2323: > 2321: // modified. We bail out, and retry without SuperWord. > 2322: bool SuperWord::output() { > 2323: assert(!_packset.is_empty(), "packset must not be empty"); revert this! src/hotspot/share/opto/superword.cpp line 3786: > 3784: _align_to_ref = nullptr; > 3785: _race_possible = 0; > 3786: _early_return = false; remove init! src/hotspot/share/opto/superword.hpp line 325: > 323: CountedLoopNode* _lp; // Current CountedLoopNode > 324: VectorSet _loop_reductions; // Reduction nodes in the current loop > 325: Node* _bb; // Current basic block Note: always the same as `cl` src/hotspot/share/opto/vectorization.cpp line 57: > 55: } > 56: > 57: const char* VLoop::check_preconditions_helper() { Note: replaces most code from the old `SuperWord::transform_loop` src/hotspot/share/opto/vectorization.hpp line 101: > 99: assert(head != nullptr, "must find head"); > 100: return head; > 101: }; Note: before this patch, these two cache-accessors were in the `CounterLoopEndNode`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1471075782 PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1471079385 PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1470844038 PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1470848311 PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1471081941 PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1470881385 PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1470949551 PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1470795774 PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1470798480 PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1470796778 PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1470644463 PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1470805244 PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1470968130 PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1470644015 PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1470643573 PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1471010024 PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1470795387 PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1471072996 From epeter at openjdk.org Tue Jan 30 12:18:51 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 30 Jan 2024 12:18:51 GMT Subject: RFR: 8324890: C2 SuperWord: refactor out VLoop, make unrolling_analysis static, remove init/reset mechanism In-Reply-To: References: Message-ID: On Tue, 30 Jan 2024 08:55:24 GMT, Emanuel Peter wrote: >> Subtask of https://github.com/openjdk/jdk/pull/16620 >> >> 1. Move out the shared code between `SuperWord::SLP_extract` (where we do vectorization) and `SuperWord::unrolling_analysis`, and move it to a new class `VLoop`. This allows us to decouple `unrolling_analysis` from the SuperWord object, and we can make it static. >> 2. So far, SuperWord was reused for all loops in a compilation, and then "reset" (with `SuperWord::init`) for every loop. This is a bit of a nasty pattern. I now make a new `VLoop` and a new `SuperWord` object per loop. >> 3. Since we now make more `SuperWord` objects, we allocate the internal data structures more often. Therefore, I now pre-allocate/reserve sufficient space on initialization. >> >> Side-note about https://github.com/openjdk/jdk/pull/17604: >> I would like to remove the use of `SuperWord::is_marked_reduction` from `SuperWord::unrolling_analysis`. For starters: it is not clear what it was ever good for. Second: it requires us to do reduction marking/analysis before `unrolling_analysis`, and hence makes the reduction marking shared between `unrolling_analysis` and vectorization. I could move the reduction marking to `VLoop` now. But the `_loop_reducitons` set would have to be put on an arena, and I would like to avoid creating an arena for the `unrolling_analysis`. Plus, it would just be nicer code, to have reduction analysis together with body analysis, type analysis, etc. and all of them in only in `SLP_extract`. > > src/hotspot/share/opto/superword.cpp line 75: > >> 73: >> 74: //------------------------------transform_loop--------------------------- >> 75: bool SuperWord::transform_loop(IdealLoopTree* lpt, bool do_optimization) { > > Note: code moved to `VLoop::check_preconditions_helper` Note: all the `do_optimization` parts are not part of preconditions, and hence they are kept in the new `transform_loop` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17624#discussion_r1470803554 From epeter at openjdk.org Tue Jan 30 12:59:45 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 30 Jan 2024 12:59:45 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v4] In-Reply-To: References: Message-ID: On Tue, 30 Jan 2024 12:13:47 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/loopnode.cpp line 4717: >> >>> 4715: assert(!_igvn.delay_transform(), ""); >>> 4716: _igvn.set_delay_transform(true); >>> 4717: for (uint i = _scoped_value_get_nodes.size(); i > 0; i--) { >> >> Suggestion: >> >> for (uint i = _scoped_value_get_nodes.size()-1; i >= 0; i--) { > > An unsigned value is always `>= 0`. Wouldn't your suggestion turn the loop into an infinite loop? And does it need to be a `uint`? It seems you are really using `i-1`, and never `i` directly. It would be nicer to name the thing you are actually using. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1471162405 From rkennke at openjdk.org Tue Jan 30 13:28:43 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 30 Jan 2024 13:28:43 GMT Subject: Integrated: 8324734: Relax too-strict assert(VM_Version::supports_evex()) in Assembler::locate_operand() In-Reply-To: <776ui1DnJfcgz89q2aPAeOyhOYKalPhAE1bMR28wK6I=.19728df4-3710-4503-a726-8ce217501f67@github.com> References: <776ui1DnJfcgz89q2aPAeOyhOYKalPhAE1bMR28wK6I=.19728df4-3710-4503-a726-8ce217501f67@github.com> Message-ID: <1voFOLDYsJ3F1_nKKgfArFCcN5eqlrtOupJ9RUtqNHA=.55fe1687-4b05-4d8b-891d-d942267beafa@github.com> On Fri, 26 Jan 2024 15:18:39 GMT, Roman Kennke wrote: > Details see bug report. The gist is that HotSpot downgrades to UseAVX=2 on some processors, and reports supports_evex() == false, but the instruction decoder can still encounter EVEX instructions when (e.g.) hitting a SIGBUS in memset() - which does have EVEX instructions. > > Testing: > - [x] runtime/Unsafe/InternalErrorTest.java > - [x] tier1 This pull request has now been integrated. Changeset: f0024f58 Author: Roman Kennke URL: https://git.openjdk.org/jdk/commit/f0024f585dcc1d8afe5808bf626efd8f514da070 Stats: 14 lines in 5 files changed: 10 ins; 0 del; 4 mod 8324734: Relax too-strict assert(VM_Version::supports_evex()) in Assembler::locate_operand() Co-authored-by: Vladimir Kozlov Reviewed-by: kvn, shade ------------- PR: https://git.openjdk.org/jdk/pull/17590 From roland at openjdk.org Tue Jan 30 16:16:58 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 30 Jan 2024 16:16:58 GMT Subject: RFR: 8275202: C2: optimize out more redundant conditions In-Reply-To: References: <978cgwy3Nb_x7yU6jZz0f6zhTBZfphstisAkBf1Vktc=.283d06eb-4f79-40cf-b8dd-a9c230e59902@github.com> Message-ID: On Mon, 29 Jan 2024 22:36:27 GMT, Joshua Cao wrote: > It won't be as powerful as the fixed point optimization that is proposed in this PR, but it is sufficient to cover the case mentioned in the JBS issue and would have less compile time overhead. I think it would do a lot less so in my opinion, it's worth pursuing the fixed point optimization. The main reason it has such an overhead in the current patch is that it's run too often (every time there's a loop opts pass). There are corner cases that were hard to handled otherwise (and cause the compiler to crash). I've been working on a new version of the patch where the pass is run only once or twice (and found ways to handle the corner cases that were problematic). It's not ready yet. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14586#issuecomment-1917133085 From roland at openjdk.org Tue Jan 30 16:17:00 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 30 Jan 2024 16:17:00 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v4] In-Reply-To: References: Message-ID: On Thu, 18 Jan 2024 10:40:46 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> merge fix > > src/hotspot/share/opto/loopnode.cpp line 4767: > >> 4765: Node* second_index = get_from_cache->index2(); >> 4766: >> 4767: if (first_index == C->top() && second_index == C->top()) { > > could this not be done during igvn? What happens here is that the cache was always seen to be null so no code to to probe the cache was added to the IR. When that happens, the optimizations still apply (i.e. there could be dominated `ScopedValue.get()` that can be replaced by this one or a dominating one that can replace this one). This should also be very uncommon. So the nodes that are put in place to enable optimizations should be left in until expansion to not miss optimization opportunities. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16966#discussion_r1471360274 From epeter at openjdk.org Tue Jan 30 16:17:41 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 30 Jan 2024 16:17:41 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store [v7] In-Reply-To: References: Message-ID: > This is a feature requiested by @RogerRiggs and @cl4es . > > **Idea** > > Merging multiple consecutive small stores (e.g. 8 byte stores) into larger stores (e.g. one long store) can lead to speedup. Recently, @cl4es and @RogerRiggs had to review a few PR's where people would try to get speedups by using Unsafe (e.g. `Unsafe.putLongUnaligned`), or ByteArrayLittleEndian (e.g. `ByteArrayLittleEndian.setLong`). They have asked if we can do such an optimization in C2, rather than in the Java library code, or even user code. > > This patch here supports a few simple use-cases, like these: > > Merge consecutive array stores, with constants. We can combine the separate constants into a larger constant: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L383-L395 > > Merge consecutive array stores, with a variable that was split (using shifts). We can essentially undo the splitting (i.e. shifting and truncation), and directly store the variable: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L444-L456 > > The idea is that this would allow the introduction of a very simple API, without any "heavy" dependencies (Unsafe or ByteArrayLittleEndian): > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L327-L338 > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/test/hotspot/jtreg/compiler/c2/TestMergeStores.java#L467-L472 > > **Details** > > This draft currently implements the optimization in an additional special IGVN phase: > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/compile.cpp#L2479-L2485 > > We first collect all `StoreB|C|I`, and put them in the IGVN worklist (see `Compile::gather_nodes_for_merge_stores`). During IGVN, we call `StoreNode::Ideal_merge_stores` at the end (after all other optimizations) of `StoreNode::Ideal`. We essentially try to establish a chain of mergable stores: > > https://github.com/openjdk/jdk/blob/adca9e220822884d95d73c7f070adeee2632130d/src/hotspot/share/opto/memnode.cpp#L2802-L2806 > > Mergable stores must have the same Opcode (implies they have the same element type and hence size). Further, mergable stores must have the same control (or be separated by only a RangeCheck). Further, they must either both store constants, or adjacent segments of a larger value ... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: add UseUnalignedAccesses for shipilev ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16245/files - new: https://git.openjdk.org/jdk/pull/16245/files/8822fb6c..2a660e01 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16245&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16245&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16245.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16245/head:pull/16245 PR: https://git.openjdk.org/jdk/pull/16245 From epeter at openjdk.org Tue Jan 30 16:17:44 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 30 Jan 2024 16:17:44 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store [v6] In-Reply-To: References: Message-ID: On Mon, 29 Jan 2024 14:21:01 GMT, Aleksey Shipilev wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> made trace flag develop > > I only skimmed through the code, so maybe it is already handled. > > Let me ask for clarity anyway: does it combine the series of naturally aligned stores into the bulk -- possibly misaligned! -- store? Because it would have performance implications, and depending on how hardware treats the misaligned stores, the correctness problem too. E.g., if we allow transforming `storeB(&(A+1), V1); storeB(&(A+2), V2);` -> `storeC(&(A+1), combine(V1, V2)`, then `storeC` might not be aligned. The platforms are allowed to throw the VM under `SIGBUS` when that store is executed. `Unsafe.putXUnaligned` was done to avoid this trouble, which only intrinsifies when `UseUnalignedAccesses` is `true`, and maybe have more safeguards that I don't remember off-hand. > > Note that C2 already does some of the similar store tiling in `InitializeNode::coalesce_subword_stores` for initializing stores -- should these two coalescing phases share some code? @shipilev You are right, I need to guard the optimization with `UseUnalignedAccesses`. Just added it. Thanks you ? Probably my tests would have run into the `SIGBUS` you mentioned. About `InitializeNode::coalesce_subword_stores`: It only works on raw-stores, which write fields before the initialization of an object. It only works with constants. Hence, the pattern is quite different. Merging the two would be a lot of work. Too much for me for now. But maybe one day we can cover all these cases in a single optimization, that merges/coalesces all sorts of loads and stores, and essencially vectorizes any straingt-line code, at least for loads and stores. For now, I just wanted to add the feature that @cl4es and @RogerRiggs were specifically asking for, which is merging array stores for constants and variables (using shift to split). @rwestrel Ok. Well in that case I might have to make a more intelligent pointer-analysis, and parse past `ConvI2L` and `CastII` nodes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16245#issuecomment-1917040449 From epeter at openjdk.org Tue Jan 30 16:17:46 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 30 Jan 2024 16:17:46 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store [v6] In-Reply-To: References: Message-ID: On Tue, 30 Jan 2024 14:43:41 GMT, Roland Westrelin wrote: > > Ok. Well in that case I might have to make a more intelligent pointer-analysis, and parse past ConvI2L and CastII nodes. > > Do you still need a traversal of the graph to find the Stores or can you enqueue them for post loop opts then? Maybe I can do it with the post-loops enqueue. But of course at that point the pointers are just about to be transformed, at least the way it works today. They may be in any state that has the ConvI2L and CastII etc nodes, or it may have some or none of them. But maybe all of that can be done. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16245#issuecomment-1917052226 From roland at openjdk.org Tue Jan 30 16:17:45 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 30 Jan 2024 16:17:45 GMT Subject: RFR: 8318446: C2: optimize stores into primitive arrays by combining values into larger store [v6] In-Reply-To: References: Message-ID: On Tue, 30 Jan 2024 14:41:19 GMT, Emanuel Peter wrote: > Ok. Well in that case I might have to make a more intelligent pointer-analysis, and parse past ConvI2L and CastII nodes. Do you still need a traversal of the graph to find the Stores or can you enqueue them for post loop opts then? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16245#issuecomment-1917045747 From ddong at openjdk.org Tue Jan 30 16:23:44 2024 From: ddong at openjdk.org (Denghui Dong) Date: Tue, 30 Jan 2024 16:23:44 GMT Subject: RFR: 8324974: JFR: EventCompilerPhase should be created as UNTIMED Message-ID: <9-1NlOfWoZujMeO_WZOvQNDZETRelZrdWFu8m2RdAiM=.ee2704ec-7b09-4fcb-933a-8ec571243888@github.com> Hi, Please help review this fix. CompilerEvent::PhaseEvent::post will set the _start_time of EventCompilerPhase, so EventCompilerPhase should be created as UNTIMED. Thanks. ------------- Commit messages: - 8324974: JFR: EventCompilerPhase should be created as UNTIMED Changes: https://git.openjdk.org/jdk/pull/17632/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17632&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8324974 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/17632.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17632/head:pull/17632 PR: https://git.openjdk.org/jdk/pull/17632 From duke at openjdk.org Tue Jan 30 16:45:25 2024 From: duke at openjdk.org (Yuri Gaevsky) Date: Tue, 30 Jan 2024 16:45:25 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v2] In-Reply-To: References: <_XgH4VuPRDJLnIXlkoNtfdDIvvujW-zf1o3UeYlCrn8=.cb21e8dd-6160-4245-a169-4db548776aec@github.com> Message-ID: <-BozG265CkpO9U1kgyog_37ezPgkxUgGj_XFGgiMaWI=.d2203393-0fe7-42a4-ab47-43d39a8c5240@github.com> On Tue, 30 Jan 2024 16:35:14 GMT, Yuri Gaevsky wrote: >> Hi, I don't quite understand why there is a need to change LMUL from `m4` to `m2` if we are switching to use the stripmining approach. The tail calculation should normally share the code for `VEC_LOOP`, which also means we need to use some vector mask instructions to filter out the active elements for each loop iteration especially the iteration for handing the tail elements. And the vl returned by `vsetvli` tells us the number of elements which could be processed in parallel for one certain iteration ([1] is one example). I am not sure if you are trying this way. Do you have more details or code changes to share? Thanks. >> >> [1] https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#example-stripmine-sew > > I used m4->m2 change to process 8 elements in the tail with vector instructions after main vector loop. IIUC, the m4->m2 change in runtime is very costly, so I've created another patch with same goal but **without** m4->m2 change: > > void C2_MacroAssembler::arrays_hashcode_v(Register ary, Register cnt, Register result, > Register tmp1, Register tmp2, Register tmp3, > Register tmp4, Register tmp5, Register tmp6, > BasicType eltype) > { > ... > const int nof_vec_elems = MaxVectorSize; > const int hof_vec_elems = nof_vec_elems >> 1; > const int elsize_bytes = arrays_hashcode_elsize(eltype); > const int elsize_shift = exact_log2(elsize_bytes); > const int vec_step_bytes = nof_vec_elems << elsize_shift; > const int half_vec_step_bytes = vec_step_bytes >> 1; > const address adr_pows31 = StubRoutines::riscv::arrays_hashcode_powers_of_31() > + sizeof(jint); > > ... > > const Register chunks = tmp1; > const Register chunks_end = chunks; > const Register pows31 = tmp2; > const Register powmax = tmp3; > > const VectorRegister v_coeffs = v4; > const VectorRegister v_src = v8; > const VectorRegister v_sum = v12; > const VectorRegister v_powmax = v16; > const VectorRegister v_result = v20; > const VectorRegister v_tmp = v24; > const VectorRegister v_zred = v28; > > Label DONE, TAIL, TAIL_LOOP, PRE_TAIL, SAVE_VRESULT, WIDE_TAIL, VEC_LOOP; > > // result has a value initially > > beqz(cnt, DONE); > > andi(chunks, cnt, ~(hof_vec_elems-1)); > beqz(chunks, TAIL); > > // load pre-calculated powers of 31 > la(pows31, ExternalAddress(adr_pows31)); > mv(t1, nof_vec_elems); > vsetvli(t0, t1, Assembler::e32, Assembler::m4); > vle32_v(v_coeffs, pows31); > // clear vector registers used in intermediate calculations > vmv_v_i(v_sum, 0); > vmv_v_i(v_powmax, 0); > vmv_v_i(v_result, 0); > // set initial values > vmv_s_x(v_result, result); > vmv_s_x(v_zred, x0); > > andi(chunks, cnt, ~(nof_vec_elems-1)); > beqz(chunks, WIDE_TAIL); > > subw(cnt, cnt, chunks); > slli(chunks_end, chunks, elsize_shift); > add(chunks_end, ary, chunks_end); > // get value of 31^^nof_vec_elems > lw(powmax, Address(pows31, -1 * sizeof(jint))); > vmv_s_x(v_powmax, powmax); > > bind(VEC_LOOP); > // result = result * 31^^(hof_vec_elems) + v_src[0] * 31^^(hof_vec_elems-1) > // + ... + v_src[hof_vec_elems-1] * 31^^(0) > vmul_vv(v_result, v_result, v... Of course, any ideas for improvements the code are very welcome. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17413#discussion_r1471587439 From roland at openjdk.org Tue Jan 30 17:13:38 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 30 Jan 2024 17:13:38 GMT Subject: RFR: 8323274: C2: array load may float above range check Message-ID: This PR includes 5 test cases in which an array load floats above its range check and the resulting compiled code can be made to segfault by passing an out of bound index to the test method. Each test case takes advantage of a different transformation to make the array load happen too early: For instance, with TestArrayAccessAboveRCAfterSplitIf: if (k == 2) { v = array1[i]; array = array1; if (l == m) { } } else { v = array2[i]; array = array2; } v += array[i]; // range check + array load The range check is split through phi: if (k == 2) { v = array1[i]; array = array1; if (l == m) { } // range check here } else { v = array2[i]; array = array2; // range check here } v += array[i]; // array load Then an identical dominating range check is found: if (k == 2) { v = array1[i]; // range check here array = array1; if (l == m) { } } else { v = array2[i]; // range check here array = array2; } v += array[i]; // array load Then a branch dies: v = array1[i]; // range check here array = array1; if (l == m) { } v += array[i]; // array load The array load is dependent on the `if (l == m) {` condition. An identical dominating condition is then found which causes the control dependent range check to float above the range check. Something similar can be triggered with: - TestArrayAccessAboveRCAfterPartialPeeling: sometimes, during partial peeling a load is assigned the loop head as control so something gets in between the range check and an array load and steps similar to the above can cause the array load to float above its range check. - TestArrayAccessAboveRCAfterUnswitching: cloning a loop body adds regions on exits of the loop and nodes that only have uses out of the loop can end up control dependent on one of the regions. In the test case, unswitching is what causes the cloning to happen. Again similar steps as above make the array load floats above its range check. I suppose similar bugs could be triggered with other loop transformations that rely on loop body cloning. TestArrayAccessAboveRCAfterSinking is a bit different in that it can change the control of an array load to be the projection of some arbitrary test. That test can then be replaced by a dominating one causing the array to float. Finally, in TestArrayAccessAboveRCForArrayCopyLoad, an array copy is converted to a series of loads/stores that's guarded by a test for `srcPos < dstPos`. A dominating identical test exists so an array load floats above the runtime checks that guarantee the arraycopy is legal. In all cases, the fix I propose is similar to 8319793: mark the array access nodes pinned when the transformation happens. This might be over conservative in some cases. I intend to address some of that with: 8324976 (C2: allow array loads known to be within bounds to float) which would set a load's control to null in the cases when it is known to be within bounds. I've also been working on a verification pass to catch these issues. I intend to propose it later. ------------- Commit messages: - whitespaces - tests & fixes Changes: https://git.openjdk.org/jdk/pull/17635/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17635&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8323274 Stats: 663 lines in 9 files changed: 659 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/17635.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17635/head:pull/17635 PR: https://git.openjdk.org/jdk/pull/17635 From duke at openjdk.org Tue Jan 30 16:37:49 2024 From: duke at openjdk.org (Yuri Gaevsky) Date: Tue, 30 Jan 2024 16:37:49 GMT Subject: RFR: 8322174: RISC-V: C2 VectorizedHashCode RVV Version [v2] In-Reply-To: References: <_XgH4VuPRDJLnIXlkoNtfdDIvvujW-zf1o3UeYlCrn8=.cb21e8dd-6160-4245-a169-4db548776aec@github.com> Message-ID: On Fri, 26 Jan 2024 12:48:23 GMT, Fei Yang wrote: >> Thank you for your comments, @RealFYang. I have tried to use vector instructions (m4 ==> m2) for the tail calculations but that makes the perfromance numbers only worse. :-( >> >> I've made additional measurements with more granularity: >> >> [ -XX:-UseRVV ] [-XX:+UseRVV } >> ArraysHashCode.multiints 10 avgt 30 12.460 ? 0.155 13.836 ? 0.054 ns/op >> ArraysHashCode.multiints 11 avgt 30 14.541 ? 0.140 14.613 ? 0.084 ns/op >> ArraysHashCode.multiints 12 avgt 30 15.097 ? 0.052 15.517 ? 0.097 ns/op >> ArraysHashCode.multiints 13 avgt 30 13.632 ? 0.137 14.486 ? 0.181 ns/op >> ArraysHashCode.multiints 14 avgt 30 15.771 ? 0.108 16.153 ? 0.092 ns/op >> ArraysHashCode.multiints 15 avgt 30 14.726 ? 0.088 15.930 ? 0.077 ns/op >> ArraysHashCode.multiints 16 avgt 30 15.533 ? 0.067 15.496 ? 0.083 ns/op >> ArraysHashCode.multiints 17 avgt 30 15.875 ? 0.173 16.878 ? 0.172 ns/op >> ArraysHashCode.multiints 18 avgt 30 15.740 ? 0.114 16.465 ? 0.089 ns/op >> ArraysHashCode.multiints 19 avgt 30 17.252 ? 0.051 17.628 ? 0.155 ns/op >> ArraysHashCode.multiints 20 avgt 30 20.193 ? 0.282 19.039 ? 0.441 ns/op >> ArraysHashCode.multiints 25 avgt 30 20.209 ? 0.070 20.513 ? 0.071 ns/op >> ArraysHashCode.multiints 30 avgt 30 23.157 ? 0.068 23.290 ? 0.165 ns/op >> ArraysHashCode.multiints 35 avgt 30 28.671 ? 0.116 26.198 ? 0.127 ns/op <--- >> ArraysHashCode.multiints 40 avgt 30 30.992 ? 0.068 27.342 ? 0.072 ns/op >> ArraysHashCode.multiints 45 avgt 30 39.408 ? 1.428 32.170 ? 0.230 ns/op >> ArraysHashCode.multiints 50 avgt 30 41.976 ? 0.442 33.103 ? 0.090 ns/op >> ArraysHashCode.multiints 55 avgt 30 45.379 ? 0.236 35.899 ? 0.692 ns/op >> ArraysHashCode.multiints 60 avgt 30 48.615 ? 0.249 35.709 ? 0.477 ns/op >> ArraysHashCode.multiints 65 avgt 30 51.455 ? 0.213 38.275 ? 0.266 ns/op >> ArraysHashCode.multiints 70 avgt 30 54.032 ? 0.324 37.985 ? 0.264 ns/op >> ArraysHashCode.multiints 75 avgt 30 56.759 ? 0.164 39.446 ? 0.425 ns/op >> ArraysHashCode.multiints 80 avgt 30 61.334 ? 0.267 41.521 ? 0.310 ns/op >> ArraysHashCode.multiints 85 avgt 30 66.177 ? 0.299 44.136 ? 0.407 ns/op >> ArraysHashCode.multiints 90 avgt 30 67.444 ? 0.282 42.909 ? 0.275 ns/op >> ArraysHashCode.multiints 95 avgt 30 77.... > > Hi, I don't quite understand why there is a need to change LMUL from `m4` to `m2` if we are switching to use the stripmining approach. The tail calculation should normally share the code for `VEC_LOOP`, which also means we need to use some vector mask instructions to filter out the active elements for each loop iteration especially the iteration for handing the tail elements. And the vl returned by `vsetvli` tells us the number of elements which could be processed in parallel for one certain iteration ([1] is one example). I am not sure if you are trying this way. Do you have more details or code changes to share? Thanks. > > [1] https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#example-stripmine-sew I used m4->m2 change to process 8 elements in the tail with vector instructions after main vector loop. IIUC, the m4->m2 change in runtime is very costly, so I've created another patch with same goal but **without** m4->m2 change: void C2_MacroAssembler::arrays_hashcode_v(Register ary, Register cnt, Register result, Register tmp1, Register tmp2, Register tmp3, Register tmp4, Register tmp5, Register tmp6, BasicType eltype) { ... const int nof_vec_elems = MaxVectorSize; const int hof_vec_elems = nof_vec_elems >> 1; const int elsize_bytes = arrays_hashcode_elsize(eltype); const int elsize_shift = exact_log2(elsize_bytes); const int vec_step_bytes = nof_vec_elems << elsize_shift; const int half_vec_step_bytes = vec_step_bytes >> 1; const address adr_pows31 = StubRoutines::riscv::arrays_hashcode_powers_of_31() + sizeof(jint); ... const Register chunks = tmp1; const Register chunks_end = chunks; const Register pows31 = tmp2; const Register powmax = tmp3; const VectorRegister v_coeffs = v4; const VectorRegister v_src = v8; const VectorRegister v_sum = v12; const VectorRegister v_powmax = v16; const VectorRegister v_result = v20; const VectorRegister v_tmp = v24; const VectorRegister v_zred = v28; Label DONE, TAIL, TAIL_LOOP, PRE_TAIL, SAVE_VRESULT, WIDE_TAIL, VEC_LOOP; // result has a value initially beqz(cnt, DONE); andi(chunks, cnt, ~(hof_vec_elems-1)); beqz(chunks, TAIL); // load pre-calculated powers of 31 la(pows31, ExternalAddress(adr_pows31)); mv(t1, nof_vec_elems); vsetvli(t0, t1, Assembler::e32, Assembler::m4); vle32_v(v_coeffs, pows31); // clear vector registers used in intermediate calculations vmv_v_i(v_sum, 0); vmv_v_i(v_powmax, 0); vmv_v_i(v_result, 0); // set initial values vmv_s_x(v_result, result); vmv_s_x(v_zred, x0); andi(chunks, cnt, ~(nof_vec_elems-1)); beqz(chunks, WIDE_TAIL); subw(cnt, cnt, chunks); slli(chunks_end, chunks, elsize_shift); add(chunks_end, ary, chunks_end); // get value of 31^^nof_vec_elems lw(powmax, Address(pows31, -1 * sizeof(jint))); vmv_s_x(v_powmax, powmax); bind(VEC_LOOP); // result = result * 31^^(hof_vec_elems) + v_src[0] * 31^^(hof_vec_elems-1) // + ... + v_src[hof_vec_elems-1] * 31^^(0) vmul_vv(v_result, v_result, v_powmax); arrays_hashcode_vec_elload(v_src, v_tmp, ary, eltype); vmul_vv(v_src, v_src, v_coeffs); vredsum_vs(v_sum, v_src, v_zred); vadd_vv(v_result, v_result, v_sum); addi(ary, ary, vec_step_bytes); // bump array pointer bne(ary, chunks_end, VEC_LOOP); // reached the end of chunks? beqz(cnt, SAVE_VRESULT); bind(WIDE_TAIL); andi(chunks, cnt, ~(hof_vec_elems-1)); beqz(chunks, PRE_TAIL); mv(t1, hof_vec_elems); subw(cnt, cnt, t1); vslidedown_vx(v_coeffs, v_coeffs, t1); // get value of 31^^hof_vec_elems lw(powmax, Address(pows31, sizeof(jint)*(hof_vec_elems - 1))); vmv_s_x(v_powmax, powmax); vsetvli(t0, t1, Assembler::e32, Assembler::m4); // result = result * 31^^(hof_vec_elems) + v_src[0] * 31^^(hof_vec_elems-1) // + ... + v_src[hof_vec_elems-1] * 31^^(0) vmul_vv(v_result, v_result, v_powmax); arrays_hashcode_vec_elload(v_src, v_tmp, ary, eltype); vmul_vv(v_src, v_src, v_coeffs); vredsum_vs(v_sum, v_src, v_zred); vadd_vv(v_result, v_result, v_sum); beqz(cnt, SAVE_VRESULT); addi(ary, ary, half_vec_step_bytes); // bump array pointer bind(PRE_TAIL); vmv_x_s(result, v_result); bind(TAIL); slli(chunks_end, cnt, elsize_shift); add(chunks_end, ary, chunks_end); bind(TAIL_LOOP); arrays_hashcode_elload(t0, Address(ary), eltype); slli(t1, result, 5); // optimize 31 * result subw(result, t1, result); // with result<<5 - result addw(result, result, t0); addi(ary, ary, elsize_bytes); bne(ary, chunks_end, TAIL_LOOP); j(DONE); bind(SAVE_VRESULT); vmv_x_s(result, v_result); bind(DONE); ... } and got the following numbers: [ -XX:+UseVectorizedHashCodeIntrinsic -XX:-UseRVV ] Benchmark (size) Mode Cnt Score Error Units ArraysHashCode.multibytes 8 avgt 10 11.020 ? 0.225 ns/op ArraysHashCode.multibytes 9 avgt 10 12.578 ? 0.117 ns/op ArraysHashCode.multibytes 16 avgt 10 15.505 ? 0.273 ns/op ArraysHashCode.multibytes 17 avgt 10 16.603 ? 0.164 ns/op ArraysHashCode.multibytes 24 avgt 10 21.005 ? 0.271 ns/op ArraysHashCode.multibytes 25 avgt 10 21.428 ? 0.227 ns/op ArraysHashCode.multibytes 32 avgt 10 27.985 ? 0.356 ns/op ArraysHashCode.multibytes 33 avgt 10 29.669 ? 0.145 ns/op ArraysHashCode.multibytes 48 avgt 10 37.575 ? 0.318 ns/op ArraysHashCode.multibytes 49 avgt 10 40.121 ? 0.229 ns/op ArraysHashCode.multibytes 56 avgt 10 48.637 ? 0.274 ns/op ArraysHashCode.multibytes 57 avgt 10 45.931 ? 0.305 ns/op ArraysHashCode.multibytes 64 avgt 10 48.362 ? 0.315 ns/op ArraysHashCode.multibytes 65 avgt 10 52.228 ? 0.320 ns/op ArraysHashCode.multibytes 72 avgt 10 49.523 ? 0.287 ns/op ArraysHashCode.multibytes 73 avgt 10 54.788 ? 0.437 ns/op ArraysHashCode.multibytes 80 avgt 10 62.087 ? 0.289 ns/op ArraysHashCode.multibytes 81 avgt 10 62.570 ? 0.211 ns/op [ -XX:+UseVectorizedHashCodeIntrinsic -XX:+UseRVV ] Benchmark (size) Mode Cnt Score Error Units ArraysHashCode.multibytes 8 avgt 10 15.700 ? 0.181 ns/op ArraysHashCode.multibytes 9 avgt 10 20.743 ? 0.419 ns/op ArraysHashCode.multibytes 16 avgt 10 30.189 ? 0.301 ns/op ArraysHashCode.multibytes 17 avgt 10 32.639 ? 0.601 ns/op ArraysHashCode.multibytes 24 avgt 10 36.358 ? 0.628 ns/op ArraysHashCode.multibytes 25 avgt 10 34.486 ? 0.563 ns/op ArraysHashCode.multibytes 32 avgt 10 42.667 ? 0.473 ns/op ArraysHashCode.multibytes 33 avgt 10 44.858 ? 0.413 ns/op ArraysHashCode.multibytes 48 avgt 10 47.132 ? 0.443 ns/op ArraysHashCode.multibytes 49 avgt 10 51.528 ? 0.519 ns/op ArraysHashCode.multibytes 56 avgt 10 52.133 ? 0.225 ns/op ArraysHashCode.multibytes 57 avgt 10 48.549 ? 0.411 ns/op ArraysHashCode.multibytes 64 avgt 10 57.399 ? 0.546 ns/op ArraysHashCode.multibytes 65 avgt 10 57.680 ? 0.158 ns/op ArraysHashCode.multibytes 72 avgt 10 50.890 ? 0.327 ns/op ArraysHashCode.multibytes 73 avgt 10 54.338 ? 0.378 ns/op ArraysHashCode.multibytes 80 avgt 10 59.218 ? 0.301 ns/op ArraysHashCode.multibytes 81 avgt 10 63.889 ? 0.344 ns/op As you can see the numbers are **worse** even in cases when scalar code is not used at all, i.e for lengths 16,24,32,48,56,64 etc. It seems possible to change the code to not contain any scalar code, e.g. use **vslidedown** instruction to move pre-calculated powers of 31 in v_coeffs according , and perform: vmul_vv(v_result, v_result, v_powmax); arrays_hashcode_vec_elload(v_src, v_tmp, ary, eltype); vmul_vv(v_src, v_src, v_coeffs); vredsum_vs(v_sum, v_src, v_zred); vadd_vv(v_result, v_result, v_sum); for all remaining elements. However, as I pointed out above in notes about lengths24/36/..., that unlikely change the numbers. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17413#discussion_r1471569685 From kvn at openjdk.org Tue Jan 30 17:29:34 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 30 Jan 2024 17:29:34 GMT Subject: RFR: 8323795: jcmd Compiler.codecache should print total size of code cache [v5] In-Reply-To: References: Message-ID: On Tue, 30 Jan 2024 06:49:57 GMT, Yi Yang wrote: >> CodeHeap 'non-profiled nmethods': size=118592Kb used=29Kb max_used=29Kb free=118562Kb >> bounds [0x00007fbe84622000, 0x00007fbe84892000, 0x00007fbe8b9f2000] >> CodeHeap 'profiled nmethods': size=118588Kb used=80Kb max_used=80Kb free=118507Kb >> bounds [0x00007fbe7c9f2000, 0x00007fbe7cc62000, 0x00007fbe83dc1000] >> CodeHeap 'non-nmethods': size=8580Kb used=1258Kb max_used=1834Kb free=7321Kb >> bounds [0x00007fbe83dc1000, 0x00007fbe84031000, 0x00007fbe84622000] >> total_blobs=474 nmethods=87 adapters=293 >> compilation: enabled >> stopped_count=0, restarted_count=0 >> full_count=0 >> >> >> It's better to accumulates total size of used/free/size, for example >> >> -SegmentedCodeCache >> CodeCache: size=245760Kb used=1366Kb max_used=1943Kb free=244393Kb >> bounds [0x00007fdcc89f2000, 0x00007fdcc8c62000, 0x00007fdcd79f2000] >> total_blobs=474, nmethods=87, adapters=293 >> stopped_count=0, restarted_count=0, full_count=0 >> compilation=enabled >> >> >> >> +SegmentedCodeCache >> CodeHeap 'non-profiled nmethods': size=118592Kb used=29Kb max_used=29Kb free=118562Kb >> bounds [0x00007f89c8622000, 0x00007f89c8892000, 0x00007f89cf9f2000] >> CodeHeap 'profiled nmethods': size=118588Kb used=80Kb max_used=80Kb free=118507Kb >> bounds [0x00007f89c09f2000, 0x00007f89c0c62000, 0x00007f89c7dc1000] >> CodeHeap 'non-nmethods': size=8580Kb used=1258Kb max_used=1834Kb free=7321Kb >> bounds [0x00007f89c7dc1000, 0x00007f89c8031000, 0x00007f89c8622000] >> CodeCache: size=245760Kb, used=1367Kb, max_used=1943Kb, free=244390Kb >> total_blobs=474, nmethods=87, adapters=293 >> stopped_count=0, restarted_count=0, full_count=0 >> compilation=enabled > > Yi Yang has updated the pull request incrementally with one additional commit since the last revision: > > fix ident && vmTestbase test The output looks fine to me now. I approve but it needs new round of testing. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17445#pullrequestreview-1851931838 From kvn at openjdk.org Tue Jan 30 17:48:27 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 30 Jan 2024 17:48:27 GMT Subject: RFR: 8324794: C2 SuperWord: do not ignore reductions in SuperWord::unrolling_analysis In-Reply-To: <4Dv9KCdW7B9Ms3a-MO-59PL-3qebNhNJejuuM6LCB0w=.7575de49-9d14-4c12-b95d-8aca28b07a05@github.com> References: <4Dv9KCdW7B9Ms3a-MO-59PL-3qebNhNJejuuM6LCB0w=.7575de49-9d14-4c12-b95d-8aca28b07a05@github.com> Message-ID: On Sat, 27 Jan 2024 06:12:40 GMT, Emanuel Peter wrote: > Subtask of https://github.com/openjdk/jdk/pull/16620 > > Ignoring reductions in unrolling_analysis is unnecessary, and it adds unnecessary dependency of unrolling_analysis on reduction-analysis. That dependency needs to be removed for further refactoring. After offline discussion I approve these changes. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17604#pullrequestreview-1851967542 From epeter at openjdk.org Tue Jan 30 20:11:11 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 30 Jan 2024 20:11:11 GMT Subject: RFR: 8324794: C2 SuperWord: do not ignore reductions in SuperWord::unrolling_analysis In-Reply-To: References: <4Dv9KCdW7B9Ms3a-MO-59PL-3qebNhNJejuuM6LCB0w=.7575de49-9d14-4c12-b95d-8aca28b07a05@github.com> <-f9_1s2hXm20N3DABo4Hq1dyKzAxRHq3kJ7I_D-PONo=.5566f4f5-e975-4bee-b274-c7a136672c1b@github.com> Message-ID: On Mon, 29 Jan 2024 17:41:28 GMT, Vladimir Kozlov wrote: >> @vnkozlov Yes, exactly, the call to `is_marked_reduction`. Other than that, unrolling_analysis could be static, and does not need any information from SuperWord. I'd like to splitt off unrolling_analysis, and so I'll have to remove the call to `is_marked_reduction`. >> >> It seems like this was in from the begginning, when Michael Berg added the unrolling_analysis with https://github.com/openjdk/jdk/commit/7c7b91845f94d13b8fed7911be7f933cf0df28d4 >> >> I can see no reason stated in the RFE or the code itself. >> >> I can only speculate: maybe the idea was that reductions are not profitable, unless there are other nodes, like stores and loads. So if we only find reductions, then we would not adjust the unrolling, since we are not expecting vectorization anyway. Again: only speculation. You reviewed the code in 2015, maybe you still remember the reason ;) >> >> FYI: only reductions may in the not too distant future become vectorizable in a profitable way, so I think removing this is good anyway. > >> I can only speculate: maybe the idea was that reductions are not profitable, unless there are other nodes, like stores and loads. So if we only find reductions, then we would not adjust the unrolling, since we are not expecting vectorization anyway. Again: only speculation. You reviewed the code in 2015, maybe you still remember the reason ;) > > I looked and there was no discussion about that during review. Originally it was not SuperWord analysis - it only looked for arithmetic Phi nodes in loop. Last year we changed it: [1be80a44](https://github.com/openjdk/jdk/commit/1be80a4445cf74adc9b2cd5bf262a897f9ede74f) > I think the check simplify `unrolling_analysis` code since we skip nodes we already know about them. Thanks @vnkozlov and @chhagedorn for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/17604#issuecomment-1917804192 From epeter at openjdk.org Tue Jan 30 20:18:35 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 30 Jan 2024 20:18:35 GMT Subject: Integrated: 8324794: C2 SuperWord: do not ignore reductions in SuperWord::unrolling_analysis In-Reply-To: <4Dv9KCdW7B9Ms3a-MO-59PL-3qebNhNJejuuM6LCB0w=.7575de49-9d14-4c12-b95d-8aca28b07a05@github.com> References: <4Dv9KCdW7B9Ms3a-MO-59PL-3qebNhNJejuuM6LCB0w=.7575de49-9d14-4c12-b95d-8aca28b07a05@github.com> Message-ID: On Sat, 27 Jan 2024 06:12:40 GMT, Emanuel Peter wrote: > Subtask of https://github.com/openjdk/jdk/pull/16620 > > Ignoring reductions in unrolling_analysis is unnecessary, and it adds unnecessary dependency of unrolling_analysis on reduction-analysis. That dependency needs to be removed for further refactoring. This pull request has now been integrated. Changeset: 11e28bd6 Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/11e28bd61968700956d2155a77688459fd7c028f Stats: 14 lines in 2 files changed: 11 ins; 1 del; 2 mod 8324794: C2 SuperWord: do not ignore reductions in SuperWord::unrolling_analysis Reviewed-by: chagedorn, kvn ------------- PR: https://git.openjdk.org/jdk/pull/17604 From epeter at openjdk.org Tue Jan 30 20:28:07 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 30 Jan 2024 20:28:07 GMT Subject: RFR: 8324890: C2 SuperWord: refactor out VLoop, make unrolling_analysis static, remove init/reset mechanism [v2] In-Reply-To: References: Message-ID: <-9DcGbMC-CBa7P7woP7Pws2deh3qYRNXpWBJGioypjg=.d1e4c6d2-1a71-4c4e-b29b-67db3d0a1209@github.com> > Subtask of https://github.com/openjdk/jdk/pull/16620 > > 1. Move out the shared code between `SuperWord::SLP_extract` (where we do vectorization) and `SuperWord::unrolling_analysis`, and move it to a new class `VLoop`. This allows us to decouple `unrolling_analysis` from the SuperWord object, and we can make it static. > 2. So far, SuperWord was reused for all loops in a compilation, and then "reset" (with `SuperWord::init`) for every loop. This is a bit of a nasty pattern. I now make a new `VLoop` and a new `SuperWord` object per loop. > 3. Since we now make more `SuperWord` objects, we allocate the internal data structures more often. Therefore, I now pre-allocate/reserve sufficient space on initialization. > > Side-note about https://github.com/openjdk/jdk/pull/17604: > I would like to remove the use of `SuperWord::is_marked_reduction` from `SuperWord::unrolling_analysis`. For starters: it is not clear what it was ever good for. Second: it requires us to do reduction marking/analysis before `unrolling_analysis`, and hence makes the reduction marking shared between `unrolling_analysis` and vectorization. I could move the reduction marking to `VLoop` now. But the `_loop_reducitons` set would have to be put on an arena, and I would like to avoid creating an arena for the `unrolling_analysis`. Plus, it would just be nicer code, to have reduction analysis together with body analysis, type analysis, etc. and all of them in only in `SLP_extract`. Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: - Merge branch 'master' into JDK-8324890 - _vtrace is moved to VLoop - comment update - cosmetics - rename in preconditions - remove loop_transform_helper - fix small bug - preallocate memory - more refactoring - moved mark_reductions - ... and 2 more: https://git.openjdk.org/jdk/compare/c7ce673d...25e3710e ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17624/files - new: https://git.openjdk.org/jdk/pull/17624/files/06d83797..25e3710e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17624&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17624&range=00-01 Stats: 9215 lines in 858 files changed: 218 ins; 158 del; 8839 mod Patch: https://git.openjdk.org/jdk/pull/17624.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17624/head:pull/17624 PR: https://git.openjdk.org/jdk/pull/17624 From dean.long at oracle.com Wed Jan 31 00:54:41 2024 From: dean.long at oracle.com (dean.long at oracle.com) Date: Tue, 30 Jan 2024 16:54:41 -0800 Subject: Difference between [jdk20] Thread.ensureMaterializedForStackWalk and Blackhole. In-Reply-To: References: Message-ID: My understanding is that they are similar, but disable different optimizations.? A blackhole is to disable dead-code elimination, while ensureMaterializedForStackWalk is to disable scalarization. dl On 1/29/24 12:15 AM, shami wrote: > Hello, > > I am trying to understand?the JDK20 intrinsic - > Thread.ensureMaterializedForStackWalk > (https://github.com/openjdk/jdk/pull/10952/files). > > It seems to be functionally equivalent to the /Blackhole.consume/ > intrinsic. > > Is there any subtle difference(s) between the two,?or can one be > implemented using the other? > > Thanks in advance. > Shami. -------------- next part -------------- An HTML attachment was scrubbed... URL: From amitkumar at openjdk.org Wed Jan 31 04:45:06 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 31 Jan 2024 04:45:06 GMT Subject: RFR: 8322649: Improve class initialization barrier in TemplateTable::_new for S390 [v2] In-Reply-To: References: Message-ID: On Wed, 24 Jan 2024 13:50:19 GMT, Martin Doerr wrote: >> Amit Kumar has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: >> >> - Merge master >> - s390 port > > LGTM. @TheRealMDoerr @RealLucy Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17481#issuecomment-1918368922 From amitkumar at openjdk.org Wed Jan 31 04:45:07 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 31 Jan 2024 04:45:07 GMT Subject: Integrated: 8322649: Improve class initialization barrier in TemplateTable::_new for S390 In-Reply-To: References: Message-ID: On Thu, 18 Jan 2024 09:49:37 GMT, Amit Kumar wrote: > s390 Port implementation for https://github.com/openjdk/jdk/pull/17006, > > Testing: > Build: fastdebug + release > Test: Tier1 {fastdebug} This pull request has now been integrated. Changeset: 83b3c9b3 Author: Amit Kumar URL: https://git.openjdk.org/jdk/commit/83b3c9b3eeda33bd5de9b1affb39fb1a8a674e48 Stats: 11 lines in 1 file changed: 0 ins; 5 del; 6 mod 8322649: Improve class initialization barrier in TemplateTable::_new for S390 Reviewed-by: mdoerr, lucy ------------- PR: https://git.openjdk.org/jdk/pull/17481 From mdoerr at openjdk.org Wed Jan 31 06:15:06 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 31 Jan 2024 06:15:06 GMT Subject: RFR: JDK-8322648: Improve class initialization barrier in TemplateTable::_new for PPC [v4] In-Reply-To: References: Message-ID: On Tue, 30 Jan 2024 09:37:46 GMT, Varada M wrote: >> ppc port implementation of https://github.com/openjdk/jdk/pull/17006 >> >> Fastdebug and Release : build and tier1 testing successful. >> >> JBS Issue : [JDK-8322648](https://bugs.openjdk.org/browse/JDK-8322648) > > Varada M has updated the pull request incrementally with one additional commit since the last revision: > > 8322648: Improve class initialization barrier in TemplateTable::_new for PPC We usually wait for a 2nd review, but this PR is tiny and we have run a lot of tests. So, let's ship it! ------------- PR Comment: https://git.openjdk.org/jdk/pull/17518#issuecomment-1918452205 From varadam at openjdk.org Wed Jan 31 06:15:07 2024 From: varadam at openjdk.org (Varada M) Date: Wed, 31 Jan 2024 06:15:07 GMT Subject: Integrated: JDK-8322648: Improve class initialization barrier in TemplateTable::_new for PPC In-Reply-To: References: Message-ID: On Mon, 22 Jan 2024 12:26:51 GMT, Varada M wrote: > ppc port implementation of https://github.com/openjdk/jdk/pull/17006 > > Fastdebug and Release : build and tier1 testing successful. > > JBS Issue : [JDK-8322648](https://bugs.openjdk.org/browse/JDK-8322648) This pull request has now been integrated. Changeset: f7121de4 Author: Varada M Committer: Martin Doerr URL: https://git.openjdk.org/jdk/commit/f7121de4a080c222e2bbf2468be94950db78530a Stats: 10 lines in 1 file changed: 2 ins; 3 del; 5 mod 8322648: Improve class initialization barrier in TemplateTable::_new for PPC Reviewed-by: mdoerr ------------- PR: https://git.openjdk.org/jdk/pull/17518 From dfenacci at openjdk.org Wed Jan 31 09:30:14 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 31 Jan 2024 09:30:14 GMT Subject: RFR: JDK-8317299: safepoint scalarization doesn't keep track of the depth of the JVM state [v2] In-Reply-To: References: Message-ID: > # Issue > > The origin of the problem is tied to the fact that, when C2 optimizes vector boxes, it performs safepoint object scalarization before late inlining. > This can lead to situations in which scalarization adds scalarized values to the JVM state and late inlining of further methods adds further JVM state entries on top for each inlined method. > With the example of the reported bug (_TestIntrinsicBailOut.java_) we get to a situation like this: > > ... > bc: JVMS depth=6 loc=20 stk=23 arg=23 mon=23 scalar=23 end=23 mondepth=0 sp=0 bci=1 reexecute=false method=virtual jobject jdk.incubator.vector.ByteVector.rearrangeTemplate(jobject, jobject) > bc: JVMS depth=7 loc=23 stk=27 arg=27 mon=27 scalar=27 end=27 mondepth=0 sp=0 bci=36 reexecute=false method=virtual jobject jdk.incubator.vector.AbstractShuffle.checkIndexes() > bc: JVMS depth=8 loc=27 stk=28 arg=28 mon=28 scalar=28 end=28 mondepth=0 sp=0 bci=1 reexecute=false method=virtual jobject jdk.incubator.vector.AbstractShuffle.reorder() > bc: JVMS depth=9 loc=28 stk=29 arg=29 mon=29 scalar=29 end=31 mondepth=0 sp=0 bci=1 reexecute=false method=virtual jobject jdk.internal.vm.vector.VectorSupport$VectorPayload.getPayload() > bc: JVMS depth=10 loc=31 stk=32 arg=32 mon=32 scalar=32 end=32 mondepth=0 sp=0 bci=3 reexecute=false method=static jobject jdk.internal.vm.vector.VectorSupport.maybeRebox(jobject) > bc: JVMS depth=11 loc=32 stk=33 arg=33 mon=33 scalar=33 end=33 mondepth=0 sp=0 bci=1 reexecute=false method=virtual void jdk.internal.misc.Unsafe.loadFence() > > `JVMS depth=9` shows 2 scalars but 2 further inlines added 2 more JVM states (with no scalars). > > The corresponding node looks like this: > image > > To keep track of its scalarized inputs, `SafePointScalarObjectNode` keeps a field `_first_index`, which is supposed to be "relative to the last (youngest) jvms->_scloff"... > https://github.com/openjdk/jdk/blob/c5e72450966ad50d57a8d22e9d634bfcb319aee9/src/hotspot/share/opto/callnode.hpp#L509-L511 > but if there are late inlined methods, this field is going to be relative to the JVM state at the depth before inlining happened (e.g. depth=9 in the example) and not relative to the youngest depth. > > # Solution > > In order to keep track of the right depth a `_depth` field is added to `SafePointScalarObjectNode`, which refers to the depth of the JVM state the `_first_index` field refers to. The method `uint first_index(JVMState*... Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/callnode.hpp Co-authored-by: Tobias Hartmann ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17500/files - new: https://git.openjdk.org/jdk/pull/17500/files/c5f693eb..c6caf715 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17500&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17500&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17500.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17500/head:pull/17500 PR: https://git.openjdk.org/jdk/pull/17500 From mli at openjdk.org Wed Jan 31 10:14:11 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 31 Jan 2024 10:14:11 GMT Subject: RFR: JDK-8325037: x86: enable and fix hotspot/jtreg/compiler/vectorization/TestRoundVectFloat.java Message-ID: Hi, Can you help to review this simple patch to fix test TestRoundVectFloat.java? Thanks! FYI: This test is not actually tested, need to fix the test applying filter and IR matching rule. ------------- Commit messages: - fix copyright - Initial commit Changes: https://git.openjdk.org/jdk/pull/17649/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17649&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8325037 Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17649.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17649/head:pull/17649 PR: https://git.openjdk.org/jdk/pull/17649 From aph at openjdk.org Wed Jan 31 11:24:07 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 31 Jan 2024 11:24:07 GMT Subject: RFR: 8319690: [AArch64] C2 compilation hits offset_ok_for_immed: assert "c2 compiler bug" [v2] In-Reply-To: References: <16J-lJ2AceGTVcRWBcP15yKcwO-1IA1XsngyOuNjf7k=.0776f081-ae2c-4279-87cf-d909806c2bc4@github.com> Message-ID: On Thu, 7 Dec 2023 06:42:49 GMT, Fei Gao wrote: >> On LP64 systems, if the heap can be moved into low virtual address space (below 4GB) and the heap size is smaller than the interesting threshold of 4 GB, we can use unscaled decoding pattern for narrow klass decoding. It means that a generic field reference can be decoded by: >> >> cast<64> (32-bit compressed reference) + field_offset >> >> >> When the `field_offset` is an immediate, on aarch64 platform, the unscaled decoding pattern can match perfectly with a direct addressing mode, i.e., `base_plus_offset`, supported by `LDR/STR` instructions. But for certain data width, not all immediates can be encoded in the instruction field of `LDR/STR` [[1]](https://github.com/openjdk/jdk/blob/8db7bad992a0f31de9c7e00c2657c18670539102/src/hotspot/cpu/aarch64/assembler_aarch64.inline.hpp#L33). The ranges are different as data widths vary. >> >> For example, when we try to load a value of long type at offset of `1030`, the address expression is `(AddP (DecodeN base) 1030)`. Before the patch, the expression was matching with `operand indOffIN()`. But, for 64-bit `LDR/STR`, signed immediate byte offset must be in the range -256 to 255 or positive immediate byte offset must be a multiple of 8 in the range 0 to 32760 [[2]](https://developer.arm.com/documentation/ddi0602/2023-09/Base-Instructions/LDR--immediate---Load-Register--immediate--?lang=en). `1030` can't be encoded in the instruction field. So, after matching, when we do checking for instruction encoding, the assertion would fail. >> >> In this patch, we're going to filter out invalid immediates when deciding if current addressing mode can be matched as `base_plus_offset`. We introduce `indOffIN4/indOffLN4` and `indOffIN8/indOffLN8` for 32-bit data type and 64-bit data type separately in the patch. E.g., for `memory4`, we remove the generic `indOffIN/indOffLN`, which matches wrong unscaled immediate range, and replace them with `indOffIN4/indOffLN4` instead. >> >> Since 8-bit and 16-bit `LDR/STR` instructions also support the unscaled decoding pattern, we add the addressing mode in the lists of `memory1` and `memory2` by introducing `indOffIN1/indOffLN1` and `indOffIN2/indOffLN2`. >> >> We also remove unused operands `indOffI/indOffl/indOffIN/indOffLN` to avoid misuse. >> >> Tier 1-3 passed on aarch64. > > Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Remove unused immIOffset/immLOffset > - Merge branch 'master' into fg8319690 > - 8319690: [AArch64] C2 compilation hits offset_ok_for_immed: assert "c2 compiler bug" > > On LP64 systems, if the heap can be moved into low virtual > address space (below 4GB) and the heap size is smaller than the > interesting threshold of 4 GB, we can use unscaled decoding > pattern for narrow klass decoding. It means that a generic field > reference can be decoded by: > ``` > cast<64> (32-bit compressed reference) + field_offset > ``` > > When the `field_offset` is an immediate, on aarch64 platform, the > unscaled decoding pattern can match perfectly with a direct > addressing mode, i.e., `base_plus_offset`, supported by LDR/STR > instructions. But for certain data width, not all immediates can > be encoded in the instruction field of LDR/STR[1]. The ranges are > different as data widths vary. > > For example, when we try to load a value of long type at offset of > `1030`, the address expression is `(AddP (DecodeN base) 1030)`. > Before the patch, the expression was matching with > `operand indOffIN()`. But, for 64-bit LDR/STR, signed immediate > byte offset must be in the range -256 to 255 or positive immediate > byte offset must be a multiple of 8 in the range 0 to 32760[2]. > `1030` can't be encoded in the instruction field. So, after > matching, when we do checking for instruction encoding, the > assertion would fail. > > In this patch, we're going to filter out invalid immediates > when deciding if current addressing mode can be matched as > `base_plus_offset`. We introduce `indOffIN4/indOffLN4` and > `indOffIN8/indOffLN8` for 32-bit data type and 64-bit data > type separately in the patch. E.g., for `memory4`, we remove > the generic `indOffIN/indOffLN`, which matches wrong unscaled > immediate range, and replace them with `indOffIN4/indOffLN4` > instead. > > Since 8-bit and 16-bit LDR/STR instructions also support the > unscaled decoding pattern, we add the addressing mode in the > lists of `memory1` and `memory2` by introducing > `indOffIN1/indOffLN1` and `indOffIN2/indOffLN2`. > > We also remove unused operands `indOffI/indOffl/indOffIN/indOffLN` > to avoid misuse. > > ... I'm withdrawing my objection to this PR. I've had a good look at some alternatives, and none of them are any better. I've reluctantly concluded that, given the design of C2, there's no better way to fix it. My apologies to @fg1417 . Yoy were right. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16991#issuecomment-1918910080 From thartmann at openjdk.org Wed Jan 31 11:40:05 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 31 Jan 2024 11:40:05 GMT Subject: RFR: JDK-8325037: x86: enable and fix hotspot/jtreg/compiler/vectorization/TestRoundVectFloat.java In-Reply-To: References: Message-ID: On Wed, 31 Jan 2024 10:09:16 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch to fix test TestRoundVectFloat.java? > Thanks! > > FYI: This test is not actually tested, need to fix the test applying filter and IR matching rule. Changes requested by thartmann (Reviewer). test/hotspot/jtreg/compiler/vectorization/TestRoundVectFloat.java line 2: > 1: /* > 2: * Copyright (c) 2022, 2023, Oracle and/or its affiliates. All rights reserved. Suggestion: * Copyright (c) 2022, 2024, Oracle and/or its affiliates. All rights reserved. test/hotspot/jtreg/compiler/vectorization/TestRoundVectFloat.java line 52: > 50: > 51: @Test > 52: @IR(applyIf = {"UseAVX", " > 1"}, counts = {IRNode.ROUND_VF , " > 0 "}) Isn't that node only available if `UseAVX >= 2`? https://github.com/openjdk/jdk/blob/f0bae7939a61a79f3e07de97451c433e91742069/src/hotspot/cpu/x86/x86.ad#L1501-L1504 ------------- PR Review: https://git.openjdk.org/jdk/pull/17649#pullrequestreview-1853522702 PR Review Comment: https://git.openjdk.org/jdk/pull/17649#discussion_r1472695949 PR Review Comment: https://git.openjdk.org/jdk/pull/17649#discussion_r1472698786 From thartmann at openjdk.org Wed Jan 31 12:58:02 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 31 Jan 2024 12:58:02 GMT Subject: RFR: JDK-8325037: x86: enable and fix hotspot/jtreg/compiler/vectorization/TestRoundVectFloat.java In-Reply-To: References: Message-ID: On Wed, 31 Jan 2024 11:37:03 GMT, Tobias Hartmann wrote: >> Hi, >> Can you help to review this simple patch to fix test TestRoundVectFloat.java? >> Thanks! >> >> FYI: This test is not actually tested, need to fix the test applying filter and IR matching rule. > > test/hotspot/jtreg/compiler/vectorization/TestRoundVectFloat.java line 52: > >> 50: >> 51: @Test >> 52: @IR(applyIf = {"UseAVX", " > 1"}, counts = {IRNode.ROUND_VF , " > 0 "}) > > Isn't that node only available if `UseAVX >= 2`? > https://github.com/openjdk/jdk/blob/f0bae7939a61a79f3e07de97451c433e91742069/src/hotspot/cpu/x86/x86.ad#L1501-L1504 Sorry, misread the code. Looks good. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17649#discussion_r1472782987 From thartmann at openjdk.org Wed Jan 31 12:58:01 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 31 Jan 2024 12:58:01 GMT Subject: RFR: JDK-8325037: x86: enable and fix hotspot/jtreg/compiler/vectorization/TestRoundVectFloat.java In-Reply-To: References: Message-ID: On Wed, 31 Jan 2024 10:09:16 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch to fix test TestRoundVectFloat.java? > Thanks! > > FYI: This test is not actually tested, need to fix the test applying filter and IR matching rule. Marked as reviewed by thartmann (Reviewer). I'll run this through our testing and report back once it passed. ------------- PR Review: https://git.openjdk.org/jdk/pull/17649#pullrequestreview-1853663877 PR Comment: https://git.openjdk.org/jdk/pull/17649#issuecomment-1919052000 From jwaters at openjdk.org Wed Jan 31 13:16:11 2024 From: jwaters at openjdk.org (Julian Waters) Date: Wed, 31 Jan 2024 13:16:11 GMT Subject: RFR: 8325049: stubGenerator_ppc.cpp should use alignas Message-ID: <-1NTkXo4SNw3kF6RYJEZ2os_tVfyUZVJqSPfEg4CnH0=.131a45fd-fd9c-4f1b-9818-b00439699d6d@github.com> Please review a trivial change to make stubGenerator_ppc.cpp use the well defined alignas instead of the aligned attribute. This was the only site I could find that does not use ATTRIBUTE_ALIGNED and instead independently defines the aligned attribute on its own, so I swapped it to use alignas instead ------------- Commit messages: - 8325049 Changes: https://git.openjdk.org/jdk/pull/17652/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17652&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8325049 Stats: 4 lines in 1 file changed: 0 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17652.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17652/head:pull/17652 PR: https://git.openjdk.org/jdk/pull/17652 From ddong at openjdk.org Wed Jan 31 14:09:02 2024 From: ddong at openjdk.org (Denghui Dong) Date: Wed, 31 Jan 2024 14:09:02 GMT Subject: RFR: 8324974: JFR: EventCompilerPhase should be created as UNTIMED In-Reply-To: <9-1NlOfWoZujMeO_WZOvQNDZETRelZrdWFu8m2RdAiM=.ee2704ec-7b09-4fcb-933a-8ec571243888@github.com> References: <9-1NlOfWoZujMeO_WZOvQNDZETRelZrdWFu8m2RdAiM=.ee2704ec-7b09-4fcb-933a-8ec571243888@github.com> Message-ID: On Tue, 30 Jan 2024 15:15:24 GMT, Denghui Dong wrote: > Hi, > > Please help review this fix. > > CompilerEvent::PhaseEvent::post will set the _start_time of EventCompilerPhase, so EventCompilerPhase should be created as UNTIMED. > > Thanks. testing: jdk/jfr all passed. (release build) ------------- PR Comment: https://git.openjdk.org/jdk/pull/17632#issuecomment-1919172870 From vlivanov at openjdk.org Wed Jan 31 14:16:03 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 31 Jan 2024 14:16:03 GMT Subject: RFR: JDK-8317299: safepoint scalarization doesn't keep track of the depth of the JVM state [v2] In-Reply-To: References: Message-ID: On Wed, 31 Jan 2024 09:30:14 GMT, Damon Fenacci wrote: >> # Issue >> >> The origin of the problem is tied to the fact that, when C2 optimizes vector boxes, it performs safepoint object scalarization before late inlining. >> This can lead to situations in which scalarization adds scalarized values to the JVM state and late inlining of further methods adds further JVM state entries on top for each inlined method. >> With the example of the reported bug (_TestIntrinsicBailOut.java_) we get to a situation like this: >> >> ... >> bc: JVMS depth=6 loc=20 stk=23 arg=23 mon=23 scalar=23 end=23 mondepth=0 sp=0 bci=1 reexecute=false method=virtual jobject jdk.incubator.vector.ByteVector.rearrangeTemplate(jobject, jobject) >> bc: JVMS depth=7 loc=23 stk=27 arg=27 mon=27 scalar=27 end=27 mondepth=0 sp=0 bci=36 reexecute=false method=virtual jobject jdk.incubator.vector.AbstractShuffle.checkIndexes() >> bc: JVMS depth=8 loc=27 stk=28 arg=28 mon=28 scalar=28 end=28 mondepth=0 sp=0 bci=1 reexecute=false method=virtual jobject jdk.incubator.vector.AbstractShuffle.reorder() >> bc: JVMS depth=9 loc=28 stk=29 arg=29 mon=29 scalar=29 end=31 mondepth=0 sp=0 bci=1 reexecute=false method=virtual jobject jdk.internal.vm.vector.VectorSupport$VectorPayload.getPayload() >> bc: JVMS depth=10 loc=31 stk=32 arg=32 mon=32 scalar=32 end=32 mondepth=0 sp=0 bci=3 reexecute=false method=static jobject jdk.internal.vm.vector.VectorSupport.maybeRebox(jobject) >> bc: JVMS depth=11 loc=32 stk=33 arg=33 mon=33 scalar=33 end=33 mondepth=0 sp=0 bci=1 reexecute=false method=virtual void jdk.internal.misc.Unsafe.loadFence() >> >> `JVMS depth=9` shows 2 scalars but 2 further inlines added 2 more JVM states (with no scalars). >> >> The corresponding node looks like this: >> image >> >> To keep track of its scalarized inputs, `SafePointScalarObjectNode` keeps a field `_first_index`, which is supposed to be "relative to the last (youngest) jvms->_scloff"... >> https://github.com/openjdk/jdk/blob/c5e72450966ad50d57a8d22e9d634bfcb319aee9/src/hotspot/share/opto/callnode.hpp#L509-L511 >> but if there are late inlined methods, this field is going to be relative to the JVM state at the depth before inlining happened (e.g. depth=9 in the example) and not relative to the youngest depth. >> >> # Solution >> >> In order to keep track of the right depth a `_depth` field is added to `SafePointScalarObjectNode`, which refers to the depth of the JVM state the `_first_index` fie... > > Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/callnode.hpp > > Co-authored-by: Tobias Hartmann Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17500#pullrequestreview-1853836738 From egahlin at openjdk.org Wed Jan 31 15:38:03 2024 From: egahlin at openjdk.org (Erik Gahlin) Date: Wed, 31 Jan 2024 15:38:03 GMT Subject: RFR: 8324974: JFR: EventCompilerPhase should be created as UNTIMED In-Reply-To: <9-1NlOfWoZujMeO_WZOvQNDZETRelZrdWFu8m2RdAiM=.ee2704ec-7b09-4fcb-933a-8ec571243888@github.com> References: <9-1NlOfWoZujMeO_WZOvQNDZETRelZrdWFu8m2RdAiM=.ee2704ec-7b09-4fcb-933a-8ec571243888@github.com> Message-ID: On Tue, 30 Jan 2024 15:15:24 GMT, Denghui Dong wrote: > Hi, > > Please help review this fix. > > CompilerEvent::PhaseEvent::post will set the _start_time of EventCompilerPhase, so EventCompilerPhase should be created as UNTIMED. > > Thanks. Marked as reviewed by egahlin (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/17632#pullrequestreview-1854115950 From kvn at openjdk.org Wed Jan 31 16:50:05 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 31 Jan 2024 16:50:05 GMT Subject: RFR: JDK-8325037: x86: enable and fix hotspot/jtreg/compiler/vectorization/TestRoundVectFloat.java In-Reply-To: References: Message-ID: <3NMZJwvyy-5yZkLOvsDeVQT_dqz8ykcIZ3zxaOBuPDI=.9a0c4788-a3b7-4ba4-9525-e71aa5d6f154@github.com> On Wed, 31 Jan 2024 10:09:16 GMT, Hamlin Li wrote: > Hi, > Can you help to review this simple patch to fix test TestRoundVectFloat.java? > Thanks! > > FYI: This test is not actually tested, need to fix the test applying filter and IR matching rule. Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17649#pullrequestreview-1854304157 From coleenp at openjdk.org Wed Jan 31 17:02:10 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 31 Jan 2024 17:02:10 GMT Subject: RFR: 8324679: Replace NULL with nullptr in HotSpot .ad files Message-ID: This is a straightforward mechanical (sed) replacement of NULL with nullptr. One string was adjusted to not say nullptr ptr. nullptr makes sense in the comments so I didn't change them. Tested with tier1 Oracle platforms and build test with linux-x64-zero,linux-x64-zero-debug,linux-s390x-open,linux-arm32-debug,linux-ppc64le-debug,linux-riscv64-debug ------------- Commit messages: - 8324679: Replace NULL with nullptr in HotSpot .ad files Changes: https://git.openjdk.org/jdk/pull/17658/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17658&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8324679 Stats: 327 lines in 9 files changed: 0 ins; 0 del; 327 mod Patch: https://git.openjdk.org/jdk/pull/17658.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17658/head:pull/17658 PR: https://git.openjdk.org/jdk/pull/17658 From epeter at openjdk.org Wed Jan 31 17:13:30 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 31 Jan 2024 17:13:30 GMT Subject: RFR: 8325064: C2 SuperWord: refactor construct_bb Message-ID: Subtask of https://github.com/openjdk/jdk/pull/16620 The goal is to further disentangle different "components" in Superword. In this refactoring, I disentangle the `bb`, `reduction` and `memory_slice` "components" which were all intertwined in `construct_bb`. 1. Move memory slice code -> `analyze_memory_slices`. 2. Remove reduction checking code -> simply use the `is_marked_reduction_loop` condition outside. 3. `_data_entry`: was used for non-CFG nodes in the loop that have no input node that is also inside the loop. But that actually never happens! I removed that array, and replaced the code with verification. ------------- Commit messages: - small improvements - 8325064 Changes: https://git.openjdk.org/jdk/pull/17657/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17657&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8325064 Stats: 182 lines in 2 files changed: 70 ins; 84 del; 28 mod Patch: https://git.openjdk.org/jdk/pull/17657.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17657/head:pull/17657 PR: https://git.openjdk.org/jdk/pull/17657 From epeter at openjdk.org Wed Jan 31 17:13:31 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 31 Jan 2024 17:13:31 GMT Subject: RFR: 8325064: C2 SuperWord: refactor construct_bb In-Reply-To: References: Message-ID: On Wed, 31 Jan 2024 16:53:59 GMT, Emanuel Peter wrote: > Subtask of https://github.com/openjdk/jdk/pull/16620 > > The goal is to further disentangle different "components" in Superword. > > In this refactoring, I disentangle the `bb`, `reduction` and `memory_slice` "components" which were all intertwined in `construct_bb`. > > 1. Move memory slice code -> `analyze_memory_slices`. > 2. Remove reduction checking code -> simply use the `is_marked_reduction_loop` condition outside. > 3. `_data_entry`: was used for non-CFG nodes in the loop that have no input node that is also inside the loop. But that actually never happens! I removed that array, and replaced the code with verification. src/hotspot/share/opto/superword.cpp line 535: > 533: #endif > 534: return false; > 535: } Note: this condition used to be at the end of `SuperWord::construct_bb`. Now it makes more sense to do this outside. src/hotspot/share/opto/superword.cpp line 549: > 547: > 548: // Ensure extra info is allocated. > 549: initialize_node_info(); Note: used to be `initialize_bb` inside `construct_bb`. Corrected the name and moved it out. src/hotspot/share/opto/superword.cpp line 949: > 947: } > 948: } > 949: #endif Note: both of these methods are refactored out of `construct_bb` src/hotspot/share/opto/superword.cpp line 2988: > 2986: assert(n != entry, "can't be entry"); > 2987: _data_entry.push(n); > 2988: } Note: `found` is always true, it turns out. I added an assert. And this means I can also remove `_data_entry`, as we now never push anything to it. src/hotspot/share/opto/superword.cpp line 2994: > 2992: > 2993: // Find memory slices (head and tail) > 2994: for (DUIterator_Fast imax, i = lp()->fast_outs(imax); i < imax; i++) { Note: moved the memory slice code to `analyze_memory_slices` src/hotspot/share/opto/superword.cpp line 3040: > 3038: // Don't go around backedge > 3039: (!use->is_Phi() || n == entry)) { > 3040: if (is_marked_reduction(use)) { Note: I would like to remove this reduction code from the basic block code. It separates the different "components". Instead, we can just check `is_marked_reduction_loop` outside, which checks if we have any marked reduction. The only difference is that we don't do the implemented check. But most platforms implement the reductions, and we check that again later in `SuperWord::implemented` anyway. src/hotspot/share/opto/superword.cpp line 3094: > 3092: #endif > 3093: assert(rpo_idx == -1 && bb_ct == _block.length(), "all block members found"); > 3094: return (_mem_slice_head.length() > 0) || (reduction_uses > 0) || (_data_entry.length() > 0); Note: Moved this condition out: if (!is_marked_reduction_loop() && _mem_slice_head.is_empty()) { src/hotspot/share/opto/superword.cpp line 3123: > 3121: } > 3122: } > 3123: Note: not used anymore, already prevous to this change ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17657#discussion_r1473158354 PR Review Comment: https://git.openjdk.org/jdk/pull/17657#discussion_r1473159421 PR Review Comment: https://git.openjdk.org/jdk/pull/17657#discussion_r1473160065 PR Review Comment: https://git.openjdk.org/jdk/pull/17657#discussion_r1473163417 PR Review Comment: https://git.openjdk.org/jdk/pull/17657#discussion_r1473162067 PR Review Comment: https://git.openjdk.org/jdk/pull/17657#discussion_r1473167670 PR Review Comment: https://git.openjdk.org/jdk/pull/17657#discussion_r1473168907 PR Review Comment: https://git.openjdk.org/jdk/pull/17657#discussion_r1473169382 From qamai at openjdk.org Wed Jan 31 17:32:42 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 31 Jan 2024 17:32:42 GMT Subject: RFR: 8282365: Consolidate and improve division by constant idealizations [v47] In-Reply-To: References: Message-ID: > This patch implements idealisation for unsigned divisions to change a division by a constant into a series of multiplication and shift. I also change the idealisation of `DivI` to get a more efficient series when the magic constant overflows an int32. > > In general, the idea behind a signed division transformation is that for a positive constant `d`, we would need to find constants `c` and `m` so that: > > floor(x / d) = floor(x * c / 2**m) for 0 < x < 2**(N - 1) (1) > ceil(x / d) = floor(x * c / 2**m) + 1 for -2**(N - 1) <= x < 0 (2) > > The implementation in the original book takes into consideration that the machine may not be able to perform the full multiplication `x * c`, so the constant overflow and we need to add back the dividend as in `DivLNode::Ideal` cases. However, for int32 division, `x * c` cannot overflow an int64. As a result, it is always feasible to just calculate the product and shift the result. > > For unsigned multiplication, the situation is somewhat trickier because the condition needs to be twice as strong (the condition (1) and (2) above are mostly the same). This results in the magic constant `c` calculated based on the method presented in Hacker's Delight by Henry S. Warren, Jr. may overflow an uintN. For int division, we can depend on the theorem devised by Arch D. Robison in N-Bit Unsigned Division Via N-Bit Multiply-Add, which states that there exists either: > > c1 in uint32 and m1, such that floor(x / d) = floor(x * c1 / 2**m1) for 0 < x < 2**32 (3) > c2 in uint32 and m2, such that floor(x / d) = floor((x + 1) * c2 / 2**m2) for 0 < x < 2**32 (4) > > which means that either `x * c1` never overflows an uint64 or `(x + 1) * c2` never overflows an uint64. And we can perform a full multiplication. > > For longs, there is no way to do a full multiplication so we do some basic transformations to achieve a computable formula. The details I have written as comments in the overflow case. > > More tests are added to cover the possible patterns. > > Please take a look and have some reviews. Thank you very much. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: further clarify variable meanings ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9947/files - new: https://git.openjdk.org/jdk/pull/9947/files/1400de7b..26e5c6e7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9947&range=46 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9947&range=45-46 Stats: 8 lines in 1 file changed: 4 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/9947.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/9947/head:pull/9947 PR: https://git.openjdk.org/jdk/pull/9947 From qamai at openjdk.org Wed Jan 31 17:32:42 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 31 Jan 2024 17:32:42 GMT Subject: RFR: 8282365: Consolidate and improve division by constant idealizations [v46] In-Reply-To: References: Message-ID: On Fri, 26 Jan 2024 17:19:01 GMT, Raffaello Giulietti wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> just be simple > > src/hotspot/share/opto/divconstants.cpp line 55: > >> 53: // >> 54: // ceil(x / d) = floor(x * c / m) + 1 for every integer x in [-N, 0) >> 55: // > > For the record, the domain for non-negative dividends can be extended to `[0, v + d)`, which is usually larger than `[0, N]`, since `v <= N < v + d`. > Similarly, the domain for negative dividends can be extended to `(-(v + d), 0)`. Indeed, however if we go from a given `N` then the solution would be the same. > src/hotspot/share/opto/divconstants.cpp line 126: > >> 124: // >> 125: // c * d - rc = 2**s with 0 < rc <= d >> 126: // qv * v + rv = 2**s with 0 <= rv < v > > To clarify the roles of these quantities, I suggest to extend the comment a bit, like so > > // Let r = m - floor(m / d) * d, that is, let r be the remainder of the indicated floor division. > // Then > // c = floor(m / d) + 1, rc = d - r. > // Further > // qv = floor(m / v), rv = m - floor(m / v) * d, that is, qv and rv are the quotient, > // resp., the remainder of the floor division. Fixed that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1473196294 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1473195266 From rgiulietti at openjdk.org Wed Jan 31 18:09:16 2024 From: rgiulietti at openjdk.org (Raffaello Giulietti) Date: Wed, 31 Jan 2024 18:09:16 GMT Subject: RFR: 8282365: Consolidate and improve division by constant idealizations [v46] In-Reply-To: References: Message-ID: On Wed, 31 Jan 2024 17:29:23 GMT, Quan Anh Mai wrote: >> src/hotspot/share/opto/divconstants.cpp line 55: >> >>> 53: // >>> 54: // ceil(x / d) = floor(x * c / m) + 1 for every integer x in [-N, 0) >>> 55: // >> >> For the record, the domain for non-negative dividends can be extended to `[0, v + d)`, which is usually larger than `[0, N]`, since `v <= N < v + d`. >> Similarly, the domain for negative dividends can be extended to `(-(v + d), 0)`. > > Indeed, however if we go from a given `N` then the solution would be the same. Nothing would change in the algorithm. I just wanted to point it out to have a record here (and in the mailing list). To the reviewers of the less math-oriented aspects: I'm pretty confident that this algorithm is correct, so I would approve it in isolation if this were possible ;-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1473254522 From kvn at openjdk.org Wed Jan 31 19:03:08 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 31 Jan 2024 19:03:08 GMT Subject: RFR: 8324679: Replace NULL with nullptr in HotSpot .ad files In-Reply-To: References: Message-ID: On Wed, 31 Jan 2024 16:57:46 GMT, Coleen Phillimore wrote: > This is a straightforward mechanical (sed) replacement of NULL with nullptr. One string was adjusted to not say nullptr ptr. nullptr makes sense in the comments so I didn't change them. > > Tested with tier1 Oracle platforms and build test with linux-x64-zero,linux-x64-zero-debug,linux-s390x-open,linux-arm32-debug,linux-ppc64le-debug,linux-riscv64-debug 1. All .ad files have next comment: // NULL Pointer Immediate `nullptr Pointer` change does not look good. May be keep `NULL` or use `Null`. 2. Some formats text were changed to `null ptr`. I suggest to use `null pointer`. 3. If I remember correctly you should not modify `aarch64.ad` but instead `aarch64_ad.m4`. But I let @theRealAph comment on that. I see changes were pushed directly into `aarch64.ad` without modifying `aarch64_ad.m4`. src/hotspot/cpu/aarch64/aarch64.ad line 4666: > 4664: %} > 4665: > 4666: // nullptr Pointer Immediate Should we keep NULL here? src/hotspot/cpu/aarch64/aarch64.ad line 4798: > 4796: %} > 4797: > 4798: // Narrow nullptr Pointer Immediate And here src/hotspot/cpu/arm/arm.ad line 1990: > 1988: %} > 1989: > 1990: // nullptr Pointer Immediate And here src/hotspot/cpu/ppc/ppc.ad line 4113: > 4111: %} > 4112: > 4113: // nullptr Pointer Immediate And here src/hotspot/cpu/ppc/ppc.ad line 5957: > 5955: %} > 5956: > 5957: // Load nullptr as compressed oop. And here src/hotspot/cpu/riscv/riscv.ad line 2897: > 2895: %} > 2896: > 2897: // nullptr Pointer Immediate here src/hotspot/cpu/riscv/riscv.ad line 3018: > 3016: %} > 3017: > 3018: // Narrow nullptr Pointer Immediate here src/hotspot/cpu/riscv/riscv.ad line 4896: > 4894: > 4895: ins_cost(ALU_COST); > 4896: format %{ "mv $dst, $con\t# null ptr, #@loadConP0" %} May be "null pointer" src/hotspot/cpu/riscv/riscv.ad line 4947: > 4945: > 4946: ins_cost(ALU_COST); > 4947: format %{ "mv $dst, $con\t# compressed null ptr, #@loadConN0" %} "null pointer" src/hotspot/cpu/s390/s390.ad line 2969: > 2967: %} > 2968: > 2969: // Narrow nullptr Pointer Immediate Keep `NULL` src/hotspot/cpu/s390/s390.ad line 4317: > 4315: effect(KILL cr); > 4316: size(4); > 4317: format %{ "XGR $dst,$dst\t # null ptr" %} "null pointer" src/hotspot/cpu/x86/x86_32.ad line 3399: > 3397: %} > 3398: > 3399: // nullptr Pointer Immediate Keep `NULL`. src/hotspot/cpu/x86/x86_64.ad line 2182: > 2180: %} > 2181: > 2182: // nullptr Pointer Immediate here src/hotspot/cpu/x86/x86_64.ad line 2210: > 2208: %} > 2209: > 2210: // nullptr Pointer Immediate here src/hotspot/cpu/x86/x86_64.ad line 4905: > 4903: match(Set dst src); > 4904: effect(KILL cr); > 4905: format %{ "xorq $dst, $src\t# compressed null ptr" %} "null pointer" src/hotspot/cpu/x86/x86_64.ad line 5172: > 5170: %} > 5171: > 5172: // Store nullptr Pointer, mark word, or other simple pointer constant. Keep `NULL` ------------- PR Review: https://git.openjdk.org/jdk/pull/17658#pullrequestreview-1854569457 PR Review Comment: https://git.openjdk.org/jdk/pull/17658#discussion_r1473305574 PR Review Comment: https://git.openjdk.org/jdk/pull/17658#discussion_r1473305925 PR Review Comment: https://git.openjdk.org/jdk/pull/17658#discussion_r1473307646 PR Review Comment: https://git.openjdk.org/jdk/pull/17658#discussion_r1473308878 PR Review Comment: https://git.openjdk.org/jdk/pull/17658#discussion_r1473309341 PR Review Comment: https://git.openjdk.org/jdk/pull/17658#discussion_r1473311416 PR Review Comment: https://git.openjdk.org/jdk/pull/17658#discussion_r1473311577 PR Review Comment: https://git.openjdk.org/jdk/pull/17658#discussion_r1473313154 PR Review Comment: https://git.openjdk.org/jdk/pull/17658#discussion_r1473313503 PR Review Comment: https://git.openjdk.org/jdk/pull/17658#discussion_r1473314528 PR Review Comment: https://git.openjdk.org/jdk/pull/17658#discussion_r1473315184 PR Review Comment: https://git.openjdk.org/jdk/pull/17658#discussion_r1473316133 PR Review Comment: https://git.openjdk.org/jdk/pull/17658#discussion_r1473316373 PR Review Comment: https://git.openjdk.org/jdk/pull/17658#discussion_r1473316623 PR Review Comment: https://git.openjdk.org/jdk/pull/17658#discussion_r1473316977 PR Review Comment: https://git.openjdk.org/jdk/pull/17658#discussion_r1473317758 From kbarrett at openjdk.org Wed Jan 31 19:20:17 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 31 Jan 2024 19:20:17 GMT Subject: RFR: 8282365: Consolidate and improve division by constant idealizations [v47] In-Reply-To: References: Message-ID: On Wed, 31 Jan 2024 17:32:42 GMT, Quan Anh Mai wrote: >> This patch implements idealisation for unsigned divisions to change a division by a constant into a series of multiplication and shift. I also change the idealisation of `DivI` to get a more efficient series when the magic constant overflows an int32. >> >> In general, the idea behind a signed division transformation is that for a positive constant `d`, we would need to find constants `c` and `m` so that: >> >> floor(x / d) = floor(x * c / 2**m) for 0 < x < 2**(N - 1) (1) >> ceil(x / d) = floor(x * c / 2**m) + 1 for -2**(N - 1) <= x < 0 (2) >> >> The implementation in the original book takes into consideration that the machine may not be able to perform the full multiplication `x * c`, so the constant overflow and we need to add back the dividend as in `DivLNode::Ideal` cases. However, for int32 division, `x * c` cannot overflow an int64. As a result, it is always feasible to just calculate the product and shift the result. >> >> For unsigned multiplication, the situation is somewhat trickier because the condition needs to be twice as strong (the condition (1) and (2) above are mostly the same). This results in the magic constant `c` calculated based on the method presented in Hacker's Delight by Henry S. Warren, Jr. may overflow an uintN. For int division, we can depend on the theorem devised by Arch D. Robison in N-Bit Unsigned Division Via N-Bit Multiply-Add, which states that there exists either: >> >> c1 in uint32 and m1, such that floor(x / d) = floor(x * c1 / 2**m1) for 0 < x < 2**32 (3) >> c2 in uint32 and m2, such that floor(x / d) = floor((x + 1) * c2 / 2**m2) for 0 < x < 2**32 (4) >> >> which means that either `x * c1` never overflows an uint64 or `(x + 1) * c2` never overflows an uint64. And we can perform a full multiplication. >> >> For longs, there is no way to do a full multiplication so we do some basic transformations to achieve a computable formula. The details I have written as comments in the overflow case. >> >> More tests are added to cover the possible patterns. >> >> Please take a look and have some reviews. Thank you very much. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > further clarify variable meanings As mentioned previously, I'm not reviewing the core magic number stuff, just the surrounding structure. That all looks okay to me now. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/9947#pullrequestreview-1854641163 From coleenp at openjdk.org Wed Jan 31 20:16:02 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 31 Jan 2024 20:16:02 GMT Subject: RFR: 8324679: Replace NULL with nullptr in HotSpot .ad files In-Reply-To: References: Message-ID: On Wed, 31 Jan 2024 18:43:20 GMT, Vladimir Kozlov wrote: >> This is a straightforward mechanical (sed) replacement of NULL with nullptr. One string was adjusted to not say nullptr ptr. nullptr makes sense in the comments so I didn't change them. >> >> Tested with tier1 Oracle platforms and build test with linux-x64-zero,linux-x64-zero-debug,linux-s390x-open,linux-arm32-debug,linux-ppc64le-debug,linux-riscv64-debug > > src/hotspot/cpu/aarch64/aarch64.ad line 4666: > >> 4664: %} >> 4665: >> 4666: // nullptr Pointer Immediate > > Should we keep NULL here? How about Null Pointer for these? > src/hotspot/cpu/x86/x86_64.ad line 5172: > >> 5170: %} >> 5171: >> 5172: // Store nullptr Pointer, mark word, or other simple pointer constant. > > Keep `NULL` I made this one Null too since we don't want to find NULL in the code because it'll look like this change backtracked. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17658#discussion_r1473409334 PR Review Comment: https://git.openjdk.org/jdk/pull/17658#discussion_r1473410427 From coleenp at openjdk.org Wed Jan 31 20:22:13 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 31 Jan 2024 20:22:13 GMT Subject: RFR: 8324679: Replace NULL with nullptr in HotSpot .ad files [v2] In-Reply-To: References: Message-ID: <1RwoULTx_sT-A7nFhScpHc9jpGgInpjamzkp6PzEl4k=.aeb8ca27-9471-4841-9e90-6bd0feeee8a8@github.com> > This is a straightforward mechanical (sed) replacement of NULL with nullptr. One string was adjusted to not say nullptr ptr. nullptr makes sense in the comments so I didn't change them. > > Tested with tier1 Oracle platforms and build test with linux-x64-zero,linux-x64-zero-debug,linux-s390x-open,linux-arm32-debug,linux-ppc64le-debug,linux-riscv64-debug Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Fix comments to Null Pointer and strings to null pointer. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17658/files - new: https://git.openjdk.org/jdk/pull/17658/files/7475cfe8..019b6027 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17658&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17658&range=00-01 Stats: 23 lines in 8 files changed: 0 ins; 0 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/17658.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17658/head:pull/17658 PR: https://git.openjdk.org/jdk/pull/17658 From coleenp at openjdk.org Wed Jan 31 20:22:13 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 31 Jan 2024 20:22:13 GMT Subject: RFR: 8324679: Replace NULL with nullptr in HotSpot .ad files In-Reply-To: References: Message-ID: On Wed, 31 Jan 2024 16:57:46 GMT, Coleen Phillimore wrote: > This is a straightforward mechanical (sed) replacement of NULL with nullptr. One string was adjusted to not say nullptr ptr. nullptr makes sense in the comments so I didn't change them. > > Tested with tier1 Oracle platforms and build test with linux-x64-zero,linux-x64-zero-debug,linux-s390x-open,linux-arm32-debug,linux-ppc64le-debug,linux-riscv64-debug Thanks for looking at this Vladimir. I made these changes. I don't know about this aarch64_ad.m4. It didn't have any NULL in it so not sure how it would be used to generate aarch64.ad. We can wait for Andrew to answer. For any other suggested changes, you can use a button like "suggest change" and I'll just click on it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17658#issuecomment-1919881862 From kvn at openjdk.org Wed Jan 31 21:45:02 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 31 Jan 2024 21:45:02 GMT Subject: RFR: 8324679: Replace NULL with nullptr in HotSpot .ad files In-Reply-To: References: Message-ID: On Wed, 31 Jan 2024 20:18:46 GMT, Coleen Phillimore wrote: > button like "suggest change" Where is it? I don't see such button in GitHub GUI. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17658#issuecomment-1920020020 From kvn at openjdk.org Wed Jan 31 21:45:02 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 31 Jan 2024 21:45:02 GMT Subject: RFR: 8324679: Replace NULL with nullptr in HotSpot .ad files In-Reply-To: References: Message-ID: On Wed, 31 Jan 2024 21:41:16 GMT, Vladimir Kozlov wrote: > > button like "suggest change" > > Where is it? I don't see such button in GitHub GUI. Found it, not obvious. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17658#issuecomment-1920021900 From kvn at openjdk.org Wed Jan 31 21:45:01 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 31 Jan 2024 21:45:01 GMT Subject: RFR: 8324679: Replace NULL with nullptr in HotSpot .ad files [v2] In-Reply-To: <1RwoULTx_sT-A7nFhScpHc9jpGgInpjamzkp6PzEl4k=.aeb8ca27-9471-4841-9e90-6bd0feeee8a8@github.com> References: <1RwoULTx_sT-A7nFhScpHc9jpGgInpjamzkp6PzEl4k=.aeb8ca27-9471-4841-9e90-6bd0feeee8a8@github.com> Message-ID: On Wed, 31 Jan 2024 20:22:13 GMT, Coleen Phillimore wrote: >> This is a straightforward mechanical (sed) replacement of NULL with nullptr. One string was adjusted to not say nullptr ptr. nullptr makes sense in the comments so I didn't change them. >> >> Tested with tier1 Oracle platforms and build test with linux-x64-zero,linux-x64-zero-debug,linux-s390x-open,linux-arm32-debug,linux-ppc64le-debug,linux-riscv64-debug > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix comments to Null Pointer and strings to null pointer. Update looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17658#pullrequestreview-1854893114 From sviswanathan at openjdk.org Wed Jan 31 21:57:06 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 31 Jan 2024 21:57:06 GMT Subject: RFR: 8318650: Optimized subword gather for x86 targets. [v11] In-Reply-To: References: Message-ID: On Sun, 21 Jan 2024 06:55:43 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch optimizes sub-word gather operation for x86 targets with AVX2 and AVX512 features. >> >> Following is the summary of changes:- >> >> 1) Intrinsify sub-word gather using hybrid algorithm which initially partially unrolls scalar loop to accumulates values from gather indices into a quadword(64bit) slice followed by vector permutation to place the slice into appropriate vector lanes, it prevents code bloating and generates compact JIT sequence. This coupled with savings from expansive array allocation in existing java implementation translates into significant performance of 1.5-10x gains with included micro on Intel Atom family CPUs and with JVM option UseAVX=2. >> >> ![image](https://github.com/openjdk/jdk/assets/59989778/e25ba4ad-6a61-42fa-9566-452f741a9c6d) >> >> >> 2) For AVX512 targets algorithm uses integral gather instructions to load values from normalized indices which are multiple of integer size, followed by shuffling and packing exact sub-word values from integral lanes. >> >> 3) Patch was also compared against modified java fallback implementation by replacing temporary array allocation with zero initialized vector and a scalar loops which inserts gathered values into vector. But, vector insert operation in higher vector lanes is a three step process which first extracts the upper vector 128 bit lane, updates it with gather subword value and then inserts the lane back to its original position. This makes inserts into higher order lanes costly w.r.t to proposed solution. In addition generated JIT code for modified fallback implementation was very bulky. This may impact in-lining decisions into caller contexts. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolutions. src/hotspot/cpu/x86/assembler_x86.cpp line 3297: > 3295: } > 3296: > 3297: // Move Unaligned EVEX enabled Vector (programmable : 8,16,32,64) The (programmable : 8,16,32,64) part of the comment could be removed. This is not something special here for this instruction. Or at the minimum we should say "programmable vector length". src/hotspot/cpu/x86/assembler_x86.cpp line 13587: > 13585: } > 13586: > 13587: void Assembler::bt(Register dst, Register src) { We could name it btq() as it is a 64 bit instruction. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1574: > 1572: void C2_MacroAssembler::vpackI2X(BasicType elem_bt, XMMRegister dst, > 1573: XMMRegister ones, XMMRegister xtmp, > 1574: int vlen_enc) { The ones and xtmp argument is not used in vpackI2X? src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1603: > 1601: bitlevel_offset_shift = 3; > 1602: nomlarized_index_shift = 2; > 1603: } It takes a lot of effort to understand the logic here. Good to have a comment here that we are gathering 32 bit aligned elements first and then extracting the required subword from that. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1613: > 1611: vpand(xtmp, idx_vec, xtmp, vlen_enc); > 1612: // Load double words from normalized indices. > 1613: evpgatherdd(dst, gmask, Address(base, xtmp, scale), vlen_enc); Could we not do here directly: evpgatherdd(dst, gmask, Address(base, idx_vec, scale), vlen_enc); Then we dont need lines 1609-1611 and also 1616-1621 as well. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1613: > 1611: vpand(xtmp, idx_vec, xtmp, vlen_enc); > 1612: // Load double words from normalized indices. > 1613: evpgatherdd(dst, gmask, Address(base, xtmp, scale), vlen_enc); Another question, looks to me that we could read beyond the allocated memory for the array here. e.g. consider the following case: * It is a byte gather * The byte source array is of size 41, i.e. only indices 0-40 are valid * The gather index is 40 Then as part of evpgatherdd we would be reading bytes at 40-43 offset from source array. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1622: > 1620: // 16 bits(for short)/8 bits(for byte) of each double word lane. > 1621: vpsrlvd(dst, dst, xtmp, vlen_enc); > 1622: // Pack double word vector into short vector. Pack double word vector into short/byte vector. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1473298053 PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1473305137 PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1473311064 PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1473484078 PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1473486672 PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1473488519 PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1473464504 From coleenp at openjdk.org Wed Jan 31 23:09:23 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 31 Jan 2024 23:09:23 GMT Subject: RFR: 8324679: Replace NULL with nullptr in HotSpot .ad files [v3] In-Reply-To: References: Message-ID: <8cMoyt6XiA_uA4UTRo7xUraWzfNhGGMg5PXwprKVZls=.721c8717-1c5d-44ff-96ce-1e04b4db9b3d@github.com> > This is a straightforward mechanical (sed) replacement of NULL with nullptr. One string was adjusted to not say nullptr ptr. nullptr makes sense in the comments so I didn't change them. > > Tested with tier1 Oracle platforms and build test with linux-x64-zero,linux-x64-zero-debug,linux-s390x-open,linux-arm32-debug,linux-ppc64le-debug,linux-riscv64-debug Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/cpu/arm/arm.ad ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17658/files - new: https://git.openjdk.org/jdk/pull/17658/files/019b6027..4024e5ff Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17658&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17658&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17658.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17658/head:pull/17658 PR: https://git.openjdk.org/jdk/pull/17658 From coleenp at openjdk.org Wed Jan 31 23:09:23 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 31 Jan 2024 23:09:23 GMT Subject: RFR: 8324679: Replace NULL with nullptr in HotSpot .ad files [v2] In-Reply-To: <1RwoULTx_sT-A7nFhScpHc9jpGgInpjamzkp6PzEl4k=.aeb8ca27-9471-4841-9e90-6bd0feeee8a8@github.com> References: <1RwoULTx_sT-A7nFhScpHc9jpGgInpjamzkp6PzEl4k=.aeb8ca27-9471-4841-9e90-6bd0feeee8a8@github.com> Message-ID: On Wed, 31 Jan 2024 20:22:13 GMT, Coleen Phillimore wrote: >> This is a straightforward mechanical (sed) replacement of NULL with nullptr. One string was adjusted to not say nullptr ptr. nullptr makes sense in the comments so I didn't change them. >> >> Tested with tier1 Oracle platforms and build test with linux-x64-zero,linux-x64-zero-debug,linux-s390x-open,linux-arm32-debug,linux-ppc64le-debug,linux-riscv64-debug > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix comments to Null Pointer and strings to null pointer. src/hotspot/cpu/arm/arm.ad line 807: > 805: #ifndef PRODUCT > 806: void MachSpillCopyNode::format( PhaseRegAlloc *ra_, outputStream *st ) const { > 807: implementation( nullptr, ra_, false, st ); Suggestion: implementation(nullptr, ra_, false, st ); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17658#discussion_r1473590768 From coleenp at openjdk.org Wed Jan 31 23:09:23 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 31 Jan 2024 23:09:23 GMT Subject: RFR: 8324679: Replace NULL with nullptr in HotSpot .ad files [v2] In-Reply-To: References: <1RwoULTx_sT-A7nFhScpHc9jpGgInpjamzkp6PzEl4k=.aeb8ca27-9471-4841-9e90-6bd0feeee8a8@github.com> Message-ID: On Wed, 31 Jan 2024 23:04:47 GMT, Coleen Phillimore wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix comments to Null Pointer and strings to null pointer. > > src/hotspot/cpu/arm/arm.ad line 807: > >> 805: #ifndef PRODUCT >> 806: void MachSpillCopyNode::format( PhaseRegAlloc *ra_, outputStream *st ) const { >> 807: implementation( nullptr, ra_, false, st ); > > Suggestion: > > implementation(nullptr, ra_, false, st ); It's the square with the +- in it to the right of "Preview". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17658#discussion_r1473591237 From coleenp at openjdk.org Wed Jan 31 23:32:02 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 31 Jan 2024 23:32:02 GMT Subject: RFR: 8324679: Replace NULL with nullptr in HotSpot .ad files [v2] In-Reply-To: References: <1RwoULTx_sT-A7nFhScpHc9jpGgInpjamzkp6PzEl4k=.aeb8ca27-9471-4841-9e90-6bd0feeee8a8@github.com> Message-ID: On Wed, 31 Jan 2024 23:05:35 GMT, Coleen Phillimore wrote: >> src/hotspot/cpu/arm/arm.ad line 807: >> >>> 805: #ifndef PRODUCT >>> 806: void MachSpillCopyNode::format( PhaseRegAlloc *ra_, outputStream *st ) const { >>> 807: implementation( nullptr, ra_, false, st ); >> >> Suggestion: >> >> implementation(nullptr, ra_, false, st ); > > It's the square with the +- in it to the right of "Preview". The only trick with this button is that if you commit the suggestion, you have to pull it back to your local repo in case you want to make more changes there. Also if you use the suggestion button, you become co-author, which is a bit odd. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17658#discussion_r1473611222 From sviswanathan at openjdk.org Wed Jan 31 23:56:04 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 31 Jan 2024 23:56:04 GMT Subject: RFR: 8318650: Optimized subword gather for x86 targets. [v11] In-Reply-To: References: Message-ID: On Wed, 31 Jan 2024 21:31:21 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolutions. > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1613: > >> 1611: vpand(xtmp, idx_vec, xtmp, vlen_enc); >> 1612: // Load double words from normalized indices. >> 1613: evpgatherdd(dst, gmask, Address(base, xtmp, scale), vlen_enc); > > Another question, looks to me that we could read beyond the allocated memory for the array here. e.g. consider the following case: > * It is a byte gather > * The byte source array is of size 41, i.e. only indices 0-40 are valid > * The gather index is 40 > > Then as part of evpgatherdd we would be reading bytes at 40-43 offset from source array. I guess the fact that the Java objects are 8 byte alignment padded and the alignment being done at lines 1609-1611 and 1616-1621 somehow takes care of this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16354#discussion_r1473627981