From jbhateja at openjdk.org Mon Jan 1 14:36:06 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 1 Jan 2024 14:36:06 GMT Subject: RFR: 8318650: Optimized subword gather for x86 targets. [v10] In-Reply-To: References: Message-ID: > Hi All, > > This patch optimizes sub-word gather operation for x86 targets with AVX2 and AVX512 features. > > Following is the summary of changes:- > > 1) Intrinsify sub-word gather using hybrid algorithm which initially partially unrolls scalar loop to accumulates values from gather indices into a quadword(64bit) slice followed by vector permutation to place the slice into appropriate vector lanes, it prevents code bloating and generates compact JIT sequence. This coupled with savings from expansive array allocation in existing java implementation translates into significant performance of 1.5-10x gains with included micro on Intel Atom family CPUs and with JVM option UseAVX=2. > > ![image](https://github.com/openjdk/jdk/assets/59989778/e25ba4ad-6a61-42fa-9566-452f741a9c6d) > > > 2) For AVX512 targets algorithm uses integral gather instructions to load values from normalized indices which are multiple of integer size, followed by shuffling and packing exact sub-word values from integral lanes. > > 3) Patch was also compared against modified java fallback implementation by replacing temporary array allocation with zero initialized vector and a scalar loops which inserts gathered values into vector. But, vector insert operation in higher vector lanes is a three step process which first extracts the upper vector 128 bit lane, updates it with gather subword value and then inserts the lane back to its original position. This makes inserts into higher order lanes costly w.r.t to proposed solution. In addition generated JIT code for modified fallback implementation was very bulky. This may impact in-lining decisions into caller contexts. > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: - Accelerating masked sub-word gathers for AVX2 targets, this gives additional 1.5-4x speedups over existing implementation. - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8318650 - Removing JDK-8321648 related changes. - Refined AVX3 implementation with integral gather. - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8318650 - Fix incorrect comment - Review comments resolutions. - Review comments resolutions. - Review comments resolutions. - Restricting masked sub-word gather to AVX512 target to align with integral gather support. - ... and 2 more: https://git.openjdk.org/jdk/compare/518ec971...de47076e ------------- Changes: https://git.openjdk.org/jdk/pull/16354/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16354&range=09 Stats: 1421 lines in 32 files changed: 1373 ins; 20 del; 28 mod Patch: https://git.openjdk.org/jdk/pull/16354.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16354/head:pull/16354 PR: https://git.openjdk.org/jdk/pull/16354 From rehn at openjdk.org Tue Jan 2 06:56:46 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 2 Jan 2024 06:56:46 GMT Subject: RFR: 8322790: RISC-V: Tune costs for shuffles with no conversion In-Reply-To: References: Message-ID: On Sat, 30 Dec 2023 20:07:13 GMT, Ilya Gavrilin wrote: > Hi all, please review this small change to RISC-V nodes insertion costs. > Now we have several nodes which provide shuffles without conversion: https://github.com/openjdk/jdk/blob/32d80e2caf6063b58128bd5f3dc87b276f3bd0cb/src/hotspot/cpu/riscv/riscv.ad#L8525-L8741 > On most RISC-V cpu`s we prefer reg<->reg operations, because they are faster, but now stack<->reg operations used (for details about reasons, please, visit connected jbs issue). > After changing insertion costs reg<->reg operations selected, and we can see performance improvements for benchmarks, which use such shuffles (tested on thead C910 board): > | Benchmark | Upstream build (ops/ms) | Patched build (ops/ms) | difference (%) | > |:-----------------------------------:|:-----------------------:|:----------------------:|:--------------:| > | MathBench.doubleToRawLongBitsDouble | 30935.139 | 32171.761 | +4.00 | > | StrictMathBench.ceilDouble | 24682.810 | 29782.050 | +20.66 | > | StrictMathBench.cosDouble | 6948.309 | 6938.276 | -0.14 | > | StrictMathBench.expDouble | 6816.143 | 7211.021 | +5.79 | > | StrictMathBench.floorDouble | 30699.630 | 34189.509 | +11.37 | > | StrictMathBench.maxDouble | 35157.355 | 34675.191 | -1.37 | > | StrictMathBench.minDouble | 35192.135 | 35183.015 | -0.03 | > | StrictMathBench.sinDouble | 6698.405 | 6721.809 | +0.35 | > > New benchmark for changed nodes: > > --- a/test/micro/org/openjdk/bench/java/lang/MathBench.java > +++ b/test/micro/org/openjdk/bench/java/lang/MathBench.java > @@ -540,4 +540,11 @@ public class MathBench { > return Math.ulp(float7); > } > > + @Benchmark > + public long doubleToRawLongBitsDouble() { > + double dbl162Dot5 = double81 * 2.0d + double0Dot5; > + double dbl3 = double2 + double1; > + return Double.doubleToRawLongBits(dbl162Dot5) + Double.doubleToRawLongBits(dbl3); > + } > + Thanks, seems reasonable to me. ------------- Marked as reviewed by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17206#pullrequestreview-1800006019 From kbarrett at openjdk.org Tue Jan 2 07:27:58 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 2 Jan 2024 07:27:58 GMT Subject: RFR: 8322816: RISC-V: Incorrect guarantee in patch_vtype Message-ID: Please review this change that fixes a test for a guarantee. This also removes a -Wparentheses warning when those are enabled (which is how the problem was discovered). The problem is that operator precedence groups the sub-expressions differently than intended. The fix is to override the operator precedence by adding parentheses to achieve the intended grouping. Testing: Local (linux-x64) cross-build for linux-riscv with this change plus -Wparentheses enabled and other changes to allow that to work. Requesting someone from the riscv porters to properly test this. ------------- Commit messages: - fix subexpression grouping in patch_vtype guarantee Changes: https://git.openjdk.org/jdk/pull/17215/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17215&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8322816 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17215.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17215/head:pull/17215 PR: https://git.openjdk.org/jdk/pull/17215 From fyang at openjdk.org Tue Jan 2 09:01:47 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 2 Jan 2024 09:01:47 GMT Subject: RFR: 8322816: RISC-V: Incorrect guarantee in patch_vtype In-Reply-To: References: Message-ID: <2Dii1vRWzgzjyBGHrx_tH28SlYMBWUZ0h2mO7D5so_4=.817b5636-aa8a-4c6d-807a-29b1855a59a7@github.com> On Tue, 2 Jan 2024 07:23:56 GMT, Kim Barrett wrote: > Please review this change that fixes a test for a guarantee. This also > removes a -Wparentheses warning when those are enabled (which is how the > problem was discovered). > > The problem is that operator precedence groups the sub-expressions differently > than intended. The fix is to override the operator precedence by adding > parentheses to achieve the intended grouping. > > Testing: Local (linux-x64) cross-build for linux-riscv with this change plus > -Wparentheses enabled and other changes to allow that to work. > > Requesting someone from the riscv porters to properly test this. src/hotspot/cpu/riscv/assembler_riscv.hpp line 1160: > 1158: #define patch_vtype(hsb, lsb, vlmul, vsew, vta, vma, vill) \ > 1159: if (vill == 1) { \ > 1160: guarantee((vlmul | vsew | vta | vma) == 0, \ I see the `vill` parameter is always false in current code, which means this guarantee never gets excecuted. And I don't think we would make use of the `vill` field of vtype in future. So I personally perfer to remove this guarantee and its enclosing if block for now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17215#discussion_r1439251060 From stefank at openjdk.org Tue Jan 2 09:28:14 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 2 Jan 2024 09:28:14 GMT Subject: RFR: 8282365: Consolidate and improve division by constant idealizations [v42] In-Reply-To: References: Message-ID: On Mon, 25 Dec 2023 17:57:28 GMT, Quan Anh Mai wrote: >> This patch implements idealisation for unsigned divisions to change a division by a constant into a series of multiplication and shift. I also change the idealisation of `DivI` to get a more efficient series when the magic constant overflows an int32. >> >> In general, the idea behind a signed division transformation is that for a positive constant `d`, we would need to find constants `c` and `m` so that: >> >> floor(x / d) = floor(x * c / 2**m) for 0 < x < 2**(N - 1) (1) >> ceil(x / d) = floor(x * c / 2**m) + 1 for -2**(N - 1) <= x < 0 (2) >> >> The implementation in the original book takes into consideration that the machine may not be able to perform the full multiplication `x * c`, so the constant overflow and we need to add back the dividend as in `DivLNode::Ideal` cases. However, for int32 division, `x * c` cannot overflow an int64. As a result, it is always feasible to just calculate the product and shift the result. >> >> For unsigned multiplication, the situation is somewhat trickier because the condition needs to be twice as strong (the condition (1) and (2) above are mostly the same). This results in the magic constant `c` calculated based on the method presented in Hacker's Delight by Henry S. Warren, Jr. may overflow an uintN. For int division, we can depend on the theorem devised by Arch D. Robison in N-Bit Unsigned Division Via N-Bit Multiply-Add, which states that there exists either: >> >> c1 in uint32 and m1, such that floor(x / d) = floor(x * c1 / 2**m1) for 0 < x < 2**32 (3) >> c2 in uint32 and m2, such that floor(x / d) = floor((x + 1) * c2 / 2**m2) for 0 < x < 2**32 (4) >> >> which means that either `x * c1` never overflows an uint64 or `(x + 1) * c2` never overflows an uint64. And we can perform a full multiplication. >> >> For longs, there is no way to do a full multiplication so we do some basic transformations to achieve a computable formula. The details I have written as comments in the overflow case. >> >> More tests are added to cover the possible patterns. >> >> Please take a look and have some reviews. Thank you very much. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > power of 2 I'm not reviewing the patch itself, but I'd like to request some tweaks to the include blocks in the HotSpot code. src/hotspot/share/opto/divconstants.cpp line 27: > 25: #include "precompiled.hpp" > 26: #include "utilities/powerOfTwo.hpp" > 27: #include Please add a blank line between the HotSpot includes and the system includes. src/hotspot/share/opto/divnode.cpp line 27: > 25: #include "precompiled.hpp" > 26: #include > 27: #include These includes should be moved. src/hotspot/share/opto/divnode.cpp line 42: > 40: #include "utilities/powerOfTwo.hpp" > 41: > 42: Revert this stray addition of a blank line. test/hotspot/gtest/opto/test_constant_division.cpp line 29: > 27: #include "runtime/os.hpp" > 28: #include "utilities/growableArray.hpp" > 29: #include Move include. ------------- PR Review: https://git.openjdk.org/jdk/pull/9947#pullrequestreview-1800139023 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1439270103 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1439270557 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1439270384 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1439271034 From fyang at openjdk.org Tue Jan 2 10:59:46 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 2 Jan 2024 10:59:46 GMT Subject: RFR: 8322790: RISC-V: Tune costs for shuffles with no conversion In-Reply-To: References: Message-ID: On Sat, 30 Dec 2023 20:07:13 GMT, Ilya Gavrilin wrote: > Hi all, please review this small change to RISC-V nodes insertion costs. > Now we have several nodes which provide shuffles without conversion: https://github.com/openjdk/jdk/blob/32d80e2caf6063b58128bd5f3dc87b276f3bd0cb/src/hotspot/cpu/riscv/riscv.ad#L8525-L8741 > On most RISC-V cpu`s we prefer reg<->reg operations, because they are faster, but now stack<->reg operations used (for details about reasons, please, visit connected jbs issue). > After changing insertion costs reg<->reg operations selected, and we can see performance improvements for benchmarks, which use such shuffles (tested on thead C910 board): > | Benchmark | Upstream build (ops/ms) | Patched build (ops/ms) | difference (%) | > |:-----------------------------------:|:-----------------------:|:----------------------:|:--------------:| > | MathBench.doubleToRawLongBitsDouble | 30935.139 | 32171.761 | +4.00 | > | StrictMathBench.ceilDouble | 24682.810 | 29782.050 | +20.66 | > | StrictMathBench.cosDouble | 6948.309 | 6938.276 | -0.14 | > | StrictMathBench.expDouble | 6816.143 | 7211.021 | +5.79 | > | StrictMathBench.floorDouble | 30699.630 | 34189.509 | +11.37 | > | StrictMathBench.maxDouble | 35157.355 | 34675.191 | -1.37 | > | StrictMathBench.minDouble | 35192.135 | 35183.015 | -0.03 | > | StrictMathBench.sinDouble | 6698.405 | 6721.809 | +0.35 | > > New benchmark for changed nodes: > > --- a/test/micro/org/openjdk/bench/java/lang/MathBench.java > +++ b/test/micro/org/openjdk/bench/java/lang/MathBench.java > @@ -540,4 +540,11 @@ public class MathBench { > return Math.ulp(float7); > } > > + @Benchmark > + public long doubleToRawLongBitsDouble() { > + double dbl162Dot5 = double81 * 2.0d + double0Dot5; > + double dbl3 = double2 + double1; > + return Double.doubleToRawLongBits(dbl162Dot5) + Double.doubleToRawLongBits(dbl3); > + } > + src/hotspot/cpu/riscv/riscv.ad line 8534: > 8532: effect(DEF dst, USE src); > 8533: > 8534: ins_cost(ALU_COST + LOAD_COST); Adding an extra cost of `ALU_COST` for these load/store nodes looks a bit strange to me. I suppose only lowering the cost for those fmv.x.w/fmv.w.x/fmv.x.d/fmv.d.x nodes will do? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17206#discussion_r1439341249 From vkempik at openjdk.org Tue Jan 2 10:59:47 2024 From: vkempik at openjdk.org (Vladimir Kempik) Date: Tue, 2 Jan 2024 10:59:47 GMT Subject: RFR: 8322790: RISC-V: Tune costs for shuffles with no conversion In-Reply-To: References: Message-ID: On Tue, 2 Jan 2024 10:55:23 GMT, Fei Yang wrote: >> Hi all, please review this small change to RISC-V nodes insertion costs. >> Now we have several nodes which provide shuffles without conversion: https://github.com/openjdk/jdk/blob/32d80e2caf6063b58128bd5f3dc87b276f3bd0cb/src/hotspot/cpu/riscv/riscv.ad#L8525-L8741 >> On most RISC-V cpu`s we prefer reg<->reg operations, because they are faster, but now stack<->reg operations used (for details about reasons, please, visit connected jbs issue). >> After changing insertion costs reg<->reg operations selected, and we can see performance improvements for benchmarks, which use such shuffles (tested on thead C910 board): >> | Benchmark | Upstream build (ops/ms) | Patched build (ops/ms) | difference (%) | >> |:-----------------------------------:|:-----------------------:|:----------------------:|:--------------:| >> | MathBench.doubleToRawLongBitsDouble | 30935.139 | 32171.761 | +4.00 | >> | StrictMathBench.ceilDouble | 24682.810 | 29782.050 | +20.66 | >> | StrictMathBench.cosDouble | 6948.309 | 6938.276 | -0.14 | >> | StrictMathBench.expDouble | 6816.143 | 7211.021 | +5.79 | >> | StrictMathBench.floorDouble | 30699.630 | 34189.509 | +11.37 | >> | StrictMathBench.maxDouble | 35157.355 | 34675.191 | -1.37 | >> | StrictMathBench.minDouble | 35192.135 | 35183.015 | -0.03 | >> | StrictMathBench.sinDouble | 6698.405 | 6721.809 | +0.35 | >> >> New benchmark for changed nodes: >> >> --- a/test/micro/org/openjdk/bench/java/lang/MathBench.java >> +++ b/test/micro/org/openjdk/bench/java/lang/MathBench.java >> @@ -540,4 +540,11 @@ public class MathBench { >> return Math.ulp(float7); >> } >> >> + @Benchmark >> + public long doubleToRawLongBitsDouble() { >> + double dbl162Dot5 = double81 * 2.0d + double0Dot5; >> + double dbl3 = double2 + double1; >> + return Double.doubleToRawLongBits(dbl162Dot5) + Double.doubleToRawLongBits(dbl3); >> + } >> + > > src/hotspot/cpu/riscv/riscv.ad line 8534: > >> 8532: effect(DEF dst, USE src); >> 8533: >> 8534: ins_cost(ALU_COST + LOAD_COST); > > Adding an extra cost of `ALU_COST` for these load/store nodes looks a bit strange to me. I suppose only lowering the cost for those fmv.x.w/fmv.w.x/fmv.x.d/fmv.d.x nodes will do? those nodes need to go below 100 which then starts looking ugly ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17206#discussion_r1439342747 From davleopo at openjdk.org Tue Jan 2 13:37:19 2024 From: davleopo at openjdk.org (David Leopoldseder) Date: Tue, 2 Jan 2024 13:37:19 GMT Subject: RFR: 8322636: [JVMCI] HotSpotSpeculationLog can be inconsistent across a single compile [v2] In-Reply-To: References: Message-ID: > This PR fixes a subtle inconsistency in `HotSpotSpeculationLog` . > > Normal uses of `HotSpotSpeculationLog` work by using a `SpeculationReason` and asking the speculation log via `maySpeculate` if the speculation can be performed, i.e., if it failed before for the given method. An example for this can be seen in Graal https://github.com/oracle/graal/blob/master/compiler/src/jdk.graal.compiler/src/jdk/graal/compiler/nodes/loop/CountedLoopInfo.java#L591C15-L591C15 > The implicit assumption is that the speculation log, `HotSpotSpeculationLog` in particular collects failed speculations at the beginning of a compile and then stays consistent during the compile. Why is that? - Because if there are new failed speculations added to the failed speculations during the compile - the compiler would speculate again on those in an inconsistent way. E.g. at the beginning of a compile a certain speculation has not failed yet and the compiler thinks it can do optimization xyz using a speculation - later during the compilation process it consults the speculation log but gets a different answer. All those inconsistent speculations that already failed will anyway later fail code installation in jvmci (they will throw a bailout during `HotSpotCodeCacheProvider#installCode` https://github.com/openjdk/jdk/blob/master/src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotSpeculationLog.java#L192 ). Thus, we should at least return a consistent result during a compile. > The problem for consistency here, that also makes troubles on the graal side, is that `maySpeculate` itself can collect failed speculations if there have not been any previously, i.e., `failedSpeculations == null`. > In order to make the speculation log consistent across an entire JVMCI compile this PR removes the collection of failed speculations in `maySpeculate`. David Leopoldseder has updated the pull request incrementally with one additional commit since the last revision: 8322636: [JVMCI] HotSpotSpeculationLog add javadoc to maySpeculate ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17183/files - new: https://git.openjdk.org/jdk/pull/17183/files/810e42ad..ef026267 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17183&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17183&range=00-01 Stats: 6 lines in 1 file changed: 6 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17183.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17183/head:pull/17183 PR: https://git.openjdk.org/jdk/pull/17183 From davleopo at openjdk.org Tue Jan 2 13:37:19 2024 From: davleopo at openjdk.org (David Leopoldseder) Date: Tue, 2 Jan 2024 13:37:19 GMT Subject: RFR: 8322636: [JVMCI] HotSpotSpeculationLog can be inconsistent across a single compile [v2] In-Reply-To: References: Message-ID: <6t_FVenXE1jnRPPqWfNGakr8O-SvV7urhzgUdodieU4=.221b8912-0de0-4f17-875b-a778429310ba@github.com> On Sat, 23 Dec 2023 04:17:24 GMT, Doug Simon wrote: > I think it's worth updating the javadoc for maySpeculate to clarify that it returns consistent results for any given speculation for the lifetime of a SpeculationLog object. @dougxc done - please check ------------- PR Comment: https://git.openjdk.org/jdk/pull/17183#issuecomment-1874031598 From never at openjdk.org Tue Jan 2 18:48:49 2024 From: never at openjdk.org (Tom Rodriguez) Date: Tue, 2 Jan 2024 18:48:49 GMT Subject: RFR: 8322636: [JVMCI] HotSpotSpeculationLog can be inconsistent across a single compile [v2] In-Reply-To: References: Message-ID: On Tue, 2 Jan 2024 13:37:19 GMT, David Leopoldseder wrote: >> This PR fixes a subtle inconsistency in `HotSpotSpeculationLog` . >> >> Normal uses of `HotSpotSpeculationLog` work by using a `SpeculationReason` and asking the speculation log via `maySpeculate` if the speculation can be performed, i.e., if it failed before for the given method. An example for this can be seen in Graal https://github.com/oracle/graal/blob/master/compiler/src/jdk.graal.compiler/src/jdk/graal/compiler/nodes/loop/CountedLoopInfo.java#L591C15-L591C15 >> The implicit assumption is that the speculation log, `HotSpotSpeculationLog` in particular collects failed speculations at the beginning of a compile and then stays consistent during the compile. Why is that? - Because if there are new failed speculations added to the failed speculations during the compile - the compiler would speculate again on those in an inconsistent way. E.g. at the beginning of a compile a certain speculation has not failed yet and the compiler thinks it can do optimization xyz using a speculation - later during the compilation process it consults the speculation log but gets a different answer. All those inconsistent speculations that already failed will anyway later fail code installation in jvmci (they will throw a bailout during `HotSpotCodeCacheProvider#installCode` https://github.com/openjdk/jdk/blob/master/src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotSpeculationLog.java#L192 ). Thus, we should at least return a consistent result durin g a compile. >> The problem for consistency here, that also makes troubles on the graal side, is that `maySpeculate` itself can collect failed speculations if there have not been any previously, i.e., `failedSpeculations == null`. >> In order to make the speculation log consistent across an entire JVMCI compile this PR removes the collection of failed speculations in `maySpeculate`. > > David Leopoldseder has updated the pull request incrementally with one additional commit since the last revision: > > 8322636: [JVMCI] HotSpotSpeculationLog add javadoc to maySpeculate More specifically it only validates against the speculations that failed before the last call to collectFailedSpeculations which must always be called explicitly. And we should point out somewhere that installCode will call collectFailedSpeculations before installation and revalidate the current set of speculations, bailing out if any were violated during compilation. This doesn't seem to be documented anywhere. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17183#issuecomment-1874408643 From kvn at openjdk.org Tue Jan 2 20:08:37 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 2 Jan 2024 20:08:37 GMT Subject: RFR: 8322758: Eliminate -Wparentheses warnings in C2 code In-Reply-To: References: Message-ID: On Fri, 29 Dec 2023 02:01:08 GMT, Kim Barrett wrote: > Please review this change to eliminate some -Wparentheses warnings. In most > cases, this involved simply adding a few parentheses to make some implicit > operator precedence explicit. > > In PhaseIdealLoop::rc_predicate, I also added a comment describing the test > being performed, since it didn't seem obvious even with the additional > parentheses. > > Testing: mach5 tier1 > > Also ran mach5 tier1 with these changes in conjunction enabling -Wparentheses > and other changes needed to make that work. Looks good to me. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17199#pullrequestreview-1800947505 From kvn at openjdk.org Tue Jan 2 20:16:47 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 2 Jan 2024 20:16:47 GMT Subject: RFR: 8322759: Eliminate -Wparentheses warnings in compiler code In-Reply-To: <496tGkQ1KUCrW1IHOETyvhqopkNYjsEoupxjo0Ze3Wg=.0223f494-fd34-4e4a-a31a-5030603f2113@github.com> References: <496tGkQ1KUCrW1IHOETyvhqopkNYjsEoupxjo0Ze3Wg=.0223f494-fd34-4e4a-a31a-5030603f2113@github.com> Message-ID: <2WJkEZqCHKmE27ORwdudo3QC0JLzBxShw6HBBJ8k2qE=.4f172823-b930-418a-924d-578342d2c991@github.com> On Fri, 29 Dec 2023 03:33:11 GMT, Kim Barrett wrote: > Please review this change to eliminate some -Wparentheses warnings. This > involved simply adding a few parentheses to make some implicit operator > precedence explicit. > > This change addresses non-C2 parts of the compiler component. > > Testing: mach5 tier1 > > Also ran mach5 tier1 with these changes in conjunction enabling -Wparentheses > and other changes needed to make that work. Looks good. src/hotspot/share/compiler/compilerDefinitions.inline.hpp line 2: > 1: /* > 2: * Copyright (c) 2016, 2023, Oracle and/or its affiliates. All rights reserved. 2024 ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17200#pullrequestreview-1800955588 PR Review Comment: https://git.openjdk.org/jdk/pull/17200#discussion_r1439783672 From kvn at openjdk.org Tue Jan 2 20:19:38 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 2 Jan 2024 20:19:38 GMT Subject: RFR: 8322781: C1: Debug build crash in GraphBuilder::vmap() when print stats In-Reply-To: References: Message-ID: On Fri, 29 Dec 2023 15:02:21 GMT, Denghui Dong wrote: > Hi, > > Could I have a review of this fix patch that fixes a crash problem in the debug build when -XX:+PrintValueNumbering -XX:+Verbose -XX:-UseLocalValueNumbering > > Thanks Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17205#pullrequestreview-1800958430 From kvn at openjdk.org Tue Jan 2 20:19:39 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 2 Jan 2024 20:19:39 GMT Subject: RFR: 8322779: C1: Remove the unused counter 'totalInstructionNodes' In-Reply-To: References: Message-ID: <0CPSYAgq79WDpVp9zYhNzExp-5jafLmEdLaD-tAXBNA=.a0e2eaac-ea7b-41b1-adf9-4caf3c7d2298@github.com> On Fri, 29 Dec 2023 14:30:59 GMT, Denghui Dong wrote: > Hi, > > Could I have a review of this small cleanup patch that removes the unused counter 'totalInstructionNodes'. JDK-8058968 refactored the Compiler time traces and deleted the only place that read the counter. > > Thanks Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17204#pullrequestreview-1800958993 From kbarrett at openjdk.org Tue Jan 2 22:27:01 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 2 Jan 2024 22:27:01 GMT Subject: RFR: 8322758: Eliminate -Wparentheses warnings in C2 code [v2] In-Reply-To: References: Message-ID: > Please review this change to eliminate some -Wparentheses warnings. In most > cases, this involved simply adding a few parentheses to make some implicit > operator precedence explicit. > > In PhaseIdealLoop::rc_predicate, I also added a comment describing the test > being performed, since it didn't seem obvious even with the additional > parentheses. > > Testing: mach5 tier1 > > Also ran mach5 tier1 with these changes in conjunction enabling -Wparentheses > and other changes needed to make that work. Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: update copyrights for 2024 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17199/files - new: https://git.openjdk.org/jdk/pull/17199/files/8acc005e..abacbe0e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17199&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17199&range=00-01 Stats: 9 lines in 9 files changed: 0 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/17199.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17199/head:pull/17199 PR: https://git.openjdk.org/jdk/pull/17199 From kbarrett at openjdk.org Tue Jan 2 22:27:02 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 2 Jan 2024 22:27:02 GMT Subject: RFR: 8322758: Eliminate -Wparentheses warnings in C2 code [v2] In-Reply-To: References: Message-ID: <6dFRm7UiXi5ef2W0MRLvZ3wT20zYPMBGCWd2c_OXDdM=.7603d68e-c2bc-4372-b78a-a1e4c43cb37b@github.com> On Fri, 29 Dec 2023 18:21:26 GMT, Andrew Haley wrote: >> Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: >> >> update copyrights for 2024 > > Marked as reviewed by aph (Reviewer). Thanks for reviews @theRealAph and @vnkozlov . > src/hotspot/share/opto/loopPredicate.cpp line 801: > >> 799: const TypeInt* idx_type = TypeInt::INT; >> 800: // same signs and upper, or different signs and not upper. >> 801: if (((stride > 0) == (scale > 0)) == upper) { > > This is rather l33t code, but I guess it's OK with the comment. This > Suggestion: > > _Bool same_signs = (stride > 0) == (scale > 0); > if ((same_signs & upper) > || (!same_signs && !upper)) { > > generates slightly more code with GCC -O2. I'd be happy with either. I agree it's a little odd, but I don't feel strongly about it, so leaving it as is. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17199#issuecomment-1874637742 PR Review Comment: https://git.openjdk.org/jdk/pull/17199#discussion_r1439914558 From kbarrett at openjdk.org Tue Jan 2 22:36:23 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 2 Jan 2024 22:36:23 GMT Subject: RFR: 8322758: Eliminate -Wparentheses warnings in C2 code [v3] In-Reply-To: References: Message-ID: > Please review this change to eliminate some -Wparentheses warnings. In most > cases, this involved simply adding a few parentheses to make some implicit > operator precedence explicit. > > In PhaseIdealLoop::rc_predicate, I also added a comment describing the test > being performed, since it didn't seem obvious even with the additional > parentheses. > > Testing: mach5 tier1 > > Also ran mach5 tier1 with these changes in conjunction enabling -Wparentheses > and other changes needed to make that work. Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into c2-wparentheses - update copyrights for 2024 - fix -Wparentheses warnings in C2 code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17199/files - new: https://git.openjdk.org/jdk/pull/17199/files/abacbe0e..2ad3798d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17199&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17199&range=01-02 Stats: 863 lines in 58 files changed: 610 ins; 44 del; 209 mod Patch: https://git.openjdk.org/jdk/pull/17199.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17199/head:pull/17199 PR: https://git.openjdk.org/jdk/pull/17199 From kbarrett at openjdk.org Tue Jan 2 22:36:24 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 2 Jan 2024 22:36:24 GMT Subject: Integrated: 8322758: Eliminate -Wparentheses warnings in C2 code In-Reply-To: References: Message-ID: On Fri, 29 Dec 2023 02:01:08 GMT, Kim Barrett wrote: > Please review this change to eliminate some -Wparentheses warnings. In most > cases, this involved simply adding a few parentheses to make some implicit > operator precedence explicit. > > In PhaseIdealLoop::rc_predicate, I also added a comment describing the test > being performed, since it didn't seem obvious even with the additional > parentheses. > > Testing: mach5 tier1 > > Also ran mach5 tier1 with these changes in conjunction enabling -Wparentheses > and other changes needed to make that work. This pull request has now been integrated. Changeset: 122bc777 Author: Kim Barrett URL: https://git.openjdk.org/jdk/commit/122bc7770e1487cc754e17b9356217009bd6b13e Stats: 27 lines in 9 files changed: 2 ins; 0 del; 25 mod 8322758: Eliminate -Wparentheses warnings in C2 code Reviewed-by: aph, kvn ------------- PR: https://git.openjdk.org/jdk/pull/17199 From kbarrett at openjdk.org Wed Jan 3 00:12:47 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 3 Jan 2024 00:12:47 GMT Subject: RFR: 8322816: RISC-V: Incorrect guarantee in patch_vtype In-Reply-To: <2Dii1vRWzgzjyBGHrx_tH28SlYMBWUZ0h2mO7D5so_4=.817b5636-aa8a-4c6d-807a-29b1855a59a7@github.com> References: <2Dii1vRWzgzjyBGHrx_tH28SlYMBWUZ0h2mO7D5so_4=.817b5636-aa8a-4c6d-807a-29b1855a59a7@github.com> Message-ID: On Tue, 2 Jan 2024 08:56:08 GMT, Fei Yang wrote: >> Please review this change that fixes a test for a guarantee. This also >> removes a -Wparentheses warning when those are enabled (which is how the >> problem was discovered). >> >> The problem is that operator precedence groups the sub-expressions differently >> than intended. The fix is to override the operator precedence by adding >> parentheses to achieve the intended grouping. >> >> Testing: Local (linux-x64) cross-build for linux-riscv with this change plus >> -Wparentheses enabled and other changes to allow that to work. >> >> Requesting someone from the riscv porters to properly test this. > > src/hotspot/cpu/riscv/assembler_riscv.hpp line 1160: > >> 1158: #define patch_vtype(hsb, lsb, vlmul, vsew, vta, vma, vill) \ >> 1159: if (vill == 1) { \ >> 1160: guarantee((vlmul | vsew | vta | vma) == 0, \ > > I see the `vill` parameter is always false in current code, which means this guarantee never gets excecuted. And I don't think we would make use of the `vill` field of vtype in future. So I personally perfer to remove this guarantee and its enclosing if block for now. Rather than removing the guarantee, wouldn't it be better to guarantee/assert `vill == 0`? Although looking at uses, that argument is a bool, so it should be `guarantee(!vill, ...)`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17215#discussion_r1439970844 From fyang at openjdk.org Wed Jan 3 02:01:49 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 3 Jan 2024 02:01:49 GMT Subject: RFR: 8322816: RISC-V: Incorrect guarantee in patch_vtype In-Reply-To: References: <2Dii1vRWzgzjyBGHrx_tH28SlYMBWUZ0h2mO7D5so_4=.817b5636-aa8a-4c6d-807a-29b1855a59a7@github.com> Message-ID: On Wed, 3 Jan 2024 00:10:25 GMT, Kim Barrett wrote: >> src/hotspot/cpu/riscv/assembler_riscv.hpp line 1160: >> >>> 1158: #define patch_vtype(hsb, lsb, vlmul, vsew, vta, vma, vill) \ >>> 1159: if (vill == 1) { \ >>> 1160: guarantee((vlmul | vsew | vta | vma) == 0, \ >> >> I see the `vill` parameter is always false in current code, which means this guarantee never gets excecuted. And I don't think we would make use of the `vill` field of vtype in future. So I personally perfer to remove this guarantee and its enclosing if block for now. > > Rather than removing the guarantee, wouldn't it be better to guarantee/assert `vill == 0`? > Although looking at uses, that argument is a bool, so it should be `guarantee(!vill, ...)`. Hi, Yes, that's better. Maybe: `guarantee(!vill, "should be");` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17215#discussion_r1440005337 From kbarrett at openjdk.org Wed Jan 3 05:15:55 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 3 Jan 2024 05:15:55 GMT Subject: RFR: 8322879: Eliminate -Wparentheses warnings in x86-32 code Message-ID: Please review this trivial change to eliminate a -Wparentheses warning. This involved simply adding parentheses to make the implicit operator precedence explicit. Testing: Locally (linux-x64) cross-compiled for linux-x86. Also ran GHA with -Wparentheses enabled along with this and other changes needed to make that work. ------------- Commit messages: - fix -Wparentheses warnings in x86-32 code Changes: https://git.openjdk.org/jdk/pull/17237/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17237&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8322879 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17237.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17237/head:pull/17237 PR: https://git.openjdk.org/jdk/pull/17237 From fyang at openjdk.org Wed Jan 3 05:24:46 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 3 Jan 2024 05:24:46 GMT Subject: RFR: 8322790: RISC-V: Tune costs for shuffles with no conversion In-Reply-To: References: Message-ID: On Tue, 2 Jan 2024 10:57:22 GMT, Vladimir Kempik wrote: >> src/hotspot/cpu/riscv/riscv.ad line 8534: >> >>> 8532: effect(DEF dst, USE src); >>> 8533: >>> 8534: ins_cost(ALU_COST + LOAD_COST); >> >> Adding an extra cost of `ALU_COST` for these load/store nodes looks a bit strange to me. I suppose only lowering the cost for those fmv.x.w/fmv.w.x/fmv.x.d/fmv.d.x nodes will do? > > those nodes need to go below 100 which then starts looking ugly Seems that the performance gain is still there (tested on lichee-pi-4a board) when reverting part of the changes. I haven't checked the JIT code though. Try this addon change: [addon-change.diff.txt](https://github.com/openjdk/jdk/files/13815870/addon-change.diff.txt) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17206#discussion_r1440083334 From thartmann at openjdk.org Wed Jan 3 06:41:46 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 3 Jan 2024 06:41:46 GMT Subject: RFR: 8322781: C1: Debug build crash in GraphBuilder::vmap() when print stats In-Reply-To: References: Message-ID: On Fri, 29 Dec 2023 15:02:21 GMT, Denghui Dong wrote: > Hi, > > Could I have a review of this fix patch that fixes a crash problem in the debug build when -XX:+PrintValueNumbering -XX:+Verbose -XX:-UseLocalValueNumbering > > Thanks Please add a test case to `test/hotspot/jtreg/compiler/arguments/TestC1Globals.java`. Thanks! ------------- Changes requested by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17205#pullrequestreview-1801440815 From kvn at openjdk.org Wed Jan 3 07:21:37 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 3 Jan 2024 07:21:37 GMT Subject: RFR: 8322879: Eliminate -Wparentheses warnings in x86-32 code In-Reply-To: References: Message-ID: On Wed, 3 Jan 2024 05:10:50 GMT, Kim Barrett wrote: > Please review this trivial change to eliminate a -Wparentheses warning. > This involved simply adding parentheses to make the implicit operator > precedence explicit. > > Testing: Locally (linux-x64) cross-compiled for linux-x86. Also ran GHA with > -Wparentheses enabled along with this and other changes needed to make that > work. Trivial. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17237#pullrequestreview-1801480220 From ddong at openjdk.org Wed Jan 3 07:34:00 2024 From: ddong at openjdk.org (Denghui Dong) Date: Wed, 3 Jan 2024 07:34:00 GMT Subject: RFR: 8322781: C1: Debug build crash in GraphBuilder::vmap() when print stats [v2] In-Reply-To: References: Message-ID: <6uLy4L6t2o_KFfe5CNlXg8boNYERM9hryaHCTGou16I=.4988281c-10b7-4e08-8fc2-abe14ce4938d@github.com> > Hi, > > Could I have a review of this fix patch that fixes a crash problem in the debug build when -XX:+PrintValueNumbering -XX:+Verbose -XX:-UseLocalValueNumbering > > Thanks Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: add test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17205/files - new: https://git.openjdk.org/jdk/pull/17205/files/49f90f41..3d5280ce Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17205&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17205&range=00-01 Stats: 11 lines in 1 file changed: 11 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17205.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17205/head:pull/17205 PR: https://git.openjdk.org/jdk/pull/17205 From thartmann at openjdk.org Wed Jan 3 07:44:46 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 3 Jan 2024 07:44:46 GMT Subject: RFR: 8322779: C1: Remove the unused counter 'totalInstructionNodes' In-Reply-To: References: Message-ID: On Fri, 29 Dec 2023 14:30:59 GMT, Denghui Dong wrote: > Hi, > > Could I have a review of this small cleanup patch that removes the unused counter 'totalInstructionNodes'. JDK-8058968 refactored the Compiler time traces and deleted the only place that read the counter. > > Thanks Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17204#pullrequestreview-1801501711 From ddong at openjdk.org Wed Jan 3 07:50:01 2024 From: ddong at openjdk.org (Denghui Dong) Date: Wed, 3 Jan 2024 07:50:01 GMT Subject: RFR: 8322781: C1: Debug build crash in GraphBuilder::vmap() when print stats [v3] In-Reply-To: References: Message-ID: > Hi, > > Could I have a review of this fix patch that fixes a crash problem in the debug build when -XX:+PrintValueNumbering -XX:+Verbose -XX:-UseLocalValueNumbering > > Thanks Denghui Dong has updated the pull request incrementally with two additional commits since the last revision: - update - update ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17205/files - new: https://git.openjdk.org/jdk/pull/17205/files/3d5280ce..3408bc02 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17205&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17205&range=01-02 Stats: 3 lines in 1 file changed: 1 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17205.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17205/head:pull/17205 PR: https://git.openjdk.org/jdk/pull/17205 From ddong at openjdk.org Wed Jan 3 07:53:47 2024 From: ddong at openjdk.org (Denghui Dong) Date: Wed, 3 Jan 2024 07:53:47 GMT Subject: RFR: 8322781: C1: Debug build crash in GraphBuilder::vmap() when print stats [v3] In-Reply-To: References: Message-ID: On Wed, 3 Jan 2024 06:38:46 GMT, Tobias Hartmann wrote: > Please add a test case to `test/hotspot/jtreg/compiler/arguments/TestC1Globals.java`. Thanks! Added and verified in my Linux env. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17205#issuecomment-1874966579 From davleopo at openjdk.org Wed Jan 3 08:52:48 2024 From: davleopo at openjdk.org (David Leopoldseder) Date: Wed, 3 Jan 2024 08:52:48 GMT Subject: RFR: 8322636: [JVMCI] HotSpotSpeculationLog can be inconsistent across a single compile [v2] In-Reply-To: References: Message-ID: On Tue, 2 Jan 2024 18:45:41 GMT, Tom Rodriguez wrote: >> David Leopoldseder has updated the pull request incrementally with one additional commit since the last revision: >> >> 8322636: [JVMCI] HotSpotSpeculationLog add javadoc to maySpeculate > > More specifically it only validates against the speculations that failed before the last call to collectFailedSpeculations which must always be called explicitly. And we should point out somewhere that installCode will call collectFailedSpeculations before installation and revalidate the current set of speculations, bailing out if any were violated during compilation. This doesn't seem to be documented anywhere. @tkrodriguez where do you want to put it? Id suggest to add some additional javadoc to maySpeculate so we end up with something like /** * @return {@code true} if the given speculation can be performed, i.e., it never failed so far, otherwise * return {@code false}. Note, that this method returns consistent results for any given speculation for the * entire lifetime of the enclosing SpeculationLog object. This means that speculations failed during a * compilation will not be updated. Validation of speculations only considers those failed since the last * call to {@link #collectFailedSpeculations()}. * * Users of {@link SpeculationLog} must explicitly call {@link #collectFailedSpeculations()} to collect * failed speculations. This should be done before starting a compile. * * Code installation performs a revalidation of the current set of speculations. If this fails, i.e. since the * start of the compile new speculations failed, the compilation is aborted with a bailout. This is done in * {@link #getFlattenedSpeculations(boolean)}. */ ------------- PR Comment: https://git.openjdk.org/jdk/pull/17183#issuecomment-1875022576 From epeter at openjdk.org Wed Jan 3 09:01:31 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 3 Jan 2024 09:01:31 GMT Subject: RFR: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs [v58] In-Reply-To: References: Message-ID: > I want to push this in JDK23. > After this fix here, I'm doing [this refactoring](https://github.com/openjdk/jdk/pull/16620). > > To calm your nerves: most of the changes are in auto-generated tests, and tests in general. > > **Context** > > `-XX:+AlignVector` ensures that SuperWord only creates LoadVector and StoreVector that can be memory aligned. This is achieved by iterating in the pre-loop until we reach the alignment boundary, then we can start the main loop properly aligned. However, this is not possible in all cases, sometimes some memory accesses cannot be guaranteed to be aligned, and we need to reject vectorization (at least partially, for some of the packs). > > Alignment is split into two tasks: > - Alignment Correctness Checks: only relevant if `-XX:+AlignVector`. Need to reject vectorization if alignment is not possible. We must check if the address of the vector load/store is aligned with (divisible by) `ObjectAlignmentInBytes`. > - Alignment by adjusting pre-loop limit: alignment is desirable even if `-XX:-AlignVector`. We would like to align the vectors with their vector width. > > **Problem** > > I have recently found a bug with our AlignVector [JDK-8310190](https://bugs.openjdk.org/browse/JDK-8310190). > In that bug, we perform a misaligned memory vector access, which results in a `SIGBUS` on an ARM32 machine. > Thanks @fg1417 for confirming this! > Hence, we need to fix the alignment correctness checks. > > While working on this task, I also found some bugs in the "alignment by adjusting pre-loop limit": there were cases where it did not align the vectors correctly. > > **Problem Details** > > Reproducer: > > > static void test(short[] a, short[] b, short mask) { > for (int i = 0; i < RANGE; i+=8) { > // Problematic for AlignVector > b[i+0] = (short)(a[i+0] & mask); // best_memref, align 0 > > b[i+3] = (short)(a[i+3] & mask); // pack at offset 6 bytes > b[i+4] = (short)(a[i+4] & mask); > b[i+5] = (short)(a[i+5] & mask); > b[i+6] = (short)(a[i+6] & mask); > } > } > > > During `SuperWord::find_adjacent_refs` we used to check if the references are expected to be aligned. For that, we look at each "group" of references (eg all `LoadS`) and take the reference with the lowest offset. For that chosen reference, we check if it is alignable. If yes, we accept all references of that group, if no we reject all. > > This is problematic as shown in this example. We have references at index offset `0, 3, 4, 5, 6`, and by... Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 124 commits: - Merge branch 'JDK-8311586' of https://github.com/eme64/jdk into JDK-8311586 - Apply suggestions from code review by Christian Co-authored-by: Christian Hagedorn - fix copyright year 2024 - Merge branch 'master' into JDK-8311586 - more comments in SuperWord::adjust_pre_loop_limit_to_align_main_loop_vectors - comments about modulo positive / negative values - Apply suggestions from code review from Christian Co-authored-by: Christian Hagedorn - more small fixes by Christian - fix for yesterday's reviews by Christian - improve case analysis empty / constrained / trivial - ... and 114 more: https://git.openjdk.org/jdk/compare/06dd7353...d01a0cd9 ------------- Changes: https://git.openjdk.org/jdk/pull/14785/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14785&range=57 Stats: 8883 lines in 23 files changed: 7561 ins; 363 del; 959 mod Patch: https://git.openjdk.org/jdk/pull/14785.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14785/head:pull/14785 PR: https://git.openjdk.org/jdk/pull/14785 From shade at openjdk.org Wed Jan 3 11:38:47 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 3 Jan 2024 11:38:47 GMT Subject: RFR: 8322879: Eliminate -Wparentheses warnings in x86-32 code In-Reply-To: References: Message-ID: <1UfzIj3lfDKsWO6bURA4Fz-txwAaefwzEniMHpfcnTs=.90caaa75-f975-440d-a8b6-a9f602400e99@github.com> On Wed, 3 Jan 2024 05:10:50 GMT, Kim Barrett wrote: > Please review this trivial change to eliminate a -Wparentheses warning. > This involved simply adding parentheses to make the implicit operator > precedence explicit. > > Testing: Locally (linux-x64) cross-compiled for linux-x86. Also ran GHA with > -Wparentheses enabled along with this and other changes needed to make that > work. Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/17237#pullrequestreview-1801831066 From shade at openjdk.org Wed Jan 3 12:02:37 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 3 Jan 2024 12:02:37 GMT Subject: RFR: 8322781: C1: Debug build crash in GraphBuilder::vmap() when print stats [v3] In-Reply-To: References: Message-ID: On Wed, 3 Jan 2024 07:50:01 GMT, Denghui Dong wrote: >> Hi, >> >> Could I have a review of this fix patch that fixes a crash problem in the debug build when -XX:+PrintValueNumbering -XX:+Verbose -XX:-UseLocalValueNumbering >> >> Thanks > > Denghui Dong has updated the pull request incrementally with two additional commits since the last revision: > > - update > - update Looks fine. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17205#pullrequestreview-1801869703 From shade at openjdk.org Wed Jan 3 12:12:48 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 3 Jan 2024 12:12:48 GMT Subject: RFR: 8322759: Eliminate -Wparentheses warnings in compiler code In-Reply-To: <496tGkQ1KUCrW1IHOETyvhqopkNYjsEoupxjo0Ze3Wg=.0223f494-fd34-4e4a-a31a-5030603f2113@github.com> References: <496tGkQ1KUCrW1IHOETyvhqopkNYjsEoupxjo0Ze3Wg=.0223f494-fd34-4e4a-a31a-5030603f2113@github.com> Message-ID: On Fri, 29 Dec 2023 03:33:11 GMT, Kim Barrett wrote: > Please review this change to eliminate some -Wparentheses warnings. This > involved simply adding a few parentheses to make some implicit operator > precedence explicit. > > This change addresses non-C2 parts of the compiler component. > > Testing: mach5 tier1 > > Also ran mach5 tier1 with these changes in conjunction enabling -Wparentheses > and other changes needed to make that work. src/hotspot/share/compiler/compilerDefinitions.inline.hpp line 60: > 58: > 59: inline bool CompilerConfig::is_c1_or_interpreter_only_no_jvmci() { > 60: assert((is_jvmci_compiler() && is_jvmci()) || !is_jvmci_compiler(), "JVMCI compiler implies enabled JVMCI"); This looks like simply: assert(!is_jvmci_compiler() || is_jvmci(), "JVMCI compiler implies enabled JVMCI"); src/hotspot/share/compiler/compilerDefinitions.inline.hpp line 117: > 115: // Tiered is basically C1 & (C2 | JVMCI) minus all the odd cases with restrictions. > 116: inline bool CompilerConfig::is_tiered() { > 117: assert((is_c1_simple_only() && is_c1_only()) || !is_c1_simple_only(), "c1 simple mode must imply c1-only mode"); Ditto, assert(!is_c1_simple_only() || is_c1_only(), "c1 simple mode must imply c1-only mode"); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17200#discussion_r1440379521 PR Review Comment: https://git.openjdk.org/jdk/pull/17200#discussion_r1440381032 From shade at openjdk.org Wed Jan 3 12:25:49 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 3 Jan 2024 12:25:49 GMT Subject: RFR: 8322694: C1: Handle Constant and IfOp in NullCheckEliminator In-Reply-To: <1vvyuwLRjlWItKwCyighjCSM5SNbO4CSEE59hQtCU24=.b4783e52-328a-4ce3-8c92-7b736cea7546@github.com> References: <1vvyuwLRjlWItKwCyighjCSM5SNbO4CSEE59hQtCU24=.b4783e52-328a-4ce3-8c92-7b736cea7546@github.com> Message-ID: On Mon, 25 Dec 2023 15:43:52 GMT, Denghui Dong wrote: > This patch added the support for Constant and IfOn in NullCheckEliminator to eliminate more null check. > > testing: tier1-4 in progress Nice corner case! src/hotspot/share/c1/c1_Optimizer.cpp line 888: > 886: mark_visitable(instr); > 887: if (instr->is_pinned() || instr->can_trap() || (instr->as_NullCheck() != nullptr) > 888: || (instr->as_Constant() != nullptr && instr->as_Constant()->type()->is_object())) { Is this just `instr->as_ObjectConstant() != nullptr`? src/hotspot/share/c1/c1_Optimizer.cpp line 1206: > 1204: void NullCheckEliminator::handle_Constant(Constant *x) { > 1205: ObjectType* ot = x->type()->as_ObjectType(); > 1206: if (ot && ot->is_loaded()) { Hotspot style guide insists we avoid implicit bool conversions. Check `ot != nullptr` explicitly. src/hotspot/share/c1/c1_Optimizer.cpp line 1208: > 1206: if (ot && ot->is_loaded()) { > 1207: ObjectConstant* oc = ot->as_ObjectConstant(); > 1208: if (!oc || !oc->value()->is_null_object()) { Ditto, check `oc == nullptr`. Now, the fact that `as_ObjectConstant` returns `nullptr` means this is not an _object constant_, but some other constant, right? I think this is similar to what other places in C1 do, so while awkward, this looks okay. ------------- PR Review: https://git.openjdk.org/jdk/pull/17191#pullrequestreview-1801885674 PR Review Comment: https://git.openjdk.org/jdk/pull/17191#discussion_r1440392602 PR Review Comment: https://git.openjdk.org/jdk/pull/17191#discussion_r1440383548 PR Review Comment: https://git.openjdk.org/jdk/pull/17191#discussion_r1440391525 From ddong at openjdk.org Wed Jan 3 13:11:44 2024 From: ddong at openjdk.org (Denghui Dong) Date: Wed, 3 Jan 2024 13:11:44 GMT Subject: Integrated: 8322779: C1: Remove the unused counter 'totalInstructionNodes' In-Reply-To: References: Message-ID: On Fri, 29 Dec 2023 14:30:59 GMT, Denghui Dong wrote: > Hi, > > Could I have a review of this small cleanup patch that removes the unused counter 'totalInstructionNodes'. JDK-8058968 refactored the Compiler time traces and deleted the only place that read the counter. > > Thanks This pull request has now been integrated. Changeset: 539da248 Author: Denghui Dong URL: https://git.openjdk.org/jdk/commit/539da24863bc47b977ee86c584af2332426993a7 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod 8322779: C1: Remove the unused counter 'totalInstructionNodes' Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/17204 From ddong at openjdk.org Wed Jan 3 13:37:21 2024 From: ddong at openjdk.org (Denghui Dong) Date: Wed, 3 Jan 2024 13:37:21 GMT Subject: RFR: 8322694: C1: Handle Constant and IfOp in NullCheckEliminator [v2] In-Reply-To: References: <1vvyuwLRjlWItKwCyighjCSM5SNbO4CSEE59hQtCU24=.b4783e52-328a-4ce3-8c92-7b736cea7546@github.com> Message-ID: On Wed, 3 Jan 2024 12:22:43 GMT, Aleksey Shipilev wrote: >> Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: >> >> update > > src/hotspot/share/c1/c1_Optimizer.cpp line 888: > >> 886: mark_visitable(instr); >> 887: if (instr->is_pinned() || instr->can_trap() || (instr->as_NullCheck() != nullptr) >> 888: || (instr->as_Constant() != nullptr && instr->as_Constant()->type()->is_object())) { > > Is this just `instr->as_ObjectConstant() != nullptr`? Do you mean `insr->type()->as_ObjectConstant() != nullptr`? But we should include other Constants (e.g. `ArrayConstant`, `InstanceConstant`), and those classes don't implement `as_ObjectConstant` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17191#discussion_r1440457024 From ddong at openjdk.org Wed Jan 3 13:37:21 2024 From: ddong at openjdk.org (Denghui Dong) Date: Wed, 3 Jan 2024 13:37:21 GMT Subject: RFR: 8322694: C1: Handle Constant and IfOp in NullCheckEliminator [v2] In-Reply-To: <1vvyuwLRjlWItKwCyighjCSM5SNbO4CSEE59hQtCU24=.b4783e52-328a-4ce3-8c92-7b736cea7546@github.com> References: <1vvyuwLRjlWItKwCyighjCSM5SNbO4CSEE59hQtCU24=.b4783e52-328a-4ce3-8c92-7b736cea7546@github.com> Message-ID: > This patch added the support for Constant and IfOn in NullCheckEliminator to eliminate more null check. > > testing: tier1-4 in progress Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: update ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17191/files - new: https://git.openjdk.org/jdk/pull/17191/files/fe1f54a9..68952fcc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17191&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17191&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17191.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17191/head:pull/17191 PR: https://git.openjdk.org/jdk/pull/17191 From ddong at openjdk.org Wed Jan 3 13:41:40 2024 From: ddong at openjdk.org (Denghui Dong) Date: Wed, 3 Jan 2024 13:41:40 GMT Subject: RFR: 8322694: C1: Handle Constant and IfOp in NullCheckEliminator [v2] In-Reply-To: References: <1vvyuwLRjlWItKwCyighjCSM5SNbO4CSEE59hQtCU24=.b4783e52-328a-4ce3-8c92-7b736cea7546@github.com> Message-ID: On Wed, 3 Jan 2024 12:12:24 GMT, Aleksey Shipilev wrote: >> Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: >> >> update > > src/hotspot/share/c1/c1_Optimizer.cpp line 1206: > >> 1204: void NullCheckEliminator::handle_Constant(Constant *x) { >> 1205: ObjectType* ot = x->type()->as_ObjectType(); >> 1206: if (ot && ot->is_loaded()) { > > Hotspot style guide insists we avoid implicit bool conversions. Check `ot != nullptr` explicitly. fixed. > src/hotspot/share/c1/c1_Optimizer.cpp line 1208: > >> 1206: if (ot && ot->is_loaded()) { >> 1207: ObjectConstant* oc = ot->as_ObjectConstant(); >> 1208: if (!oc || !oc->value()->is_null_object()) { > > Ditto, check `oc == nullptr`. > > Now, the fact that `as_ObjectConstant` returns `nullptr` means this is not an _object constant_, but some other constant, right? I think this is similar to what other places in C1 do, so while awkward, this looks okay. fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17191#discussion_r1440462233 PR Review Comment: https://git.openjdk.org/jdk/pull/17191#discussion_r1440462604 From roland at openjdk.org Wed Jan 3 14:11:50 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 3 Jan 2024 14:11:50 GMT Subject: RFR: 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 [v8] In-Reply-To: References: Message-ID: On Fri, 15 Dec 2023 14:32:57 GMT, Roland Westrelin wrote: >> Range check smearing and range check predication make an array access >> dependent on 2 (or more in the case of RC smearing) conditions. As a >> consequence, if a range check can be eliminated because there's an >> identical dominating range check, the control dependent nodes that >> could float and become dependent on the dominating range check cannot >> be allowed to float because there's a risk that they would then bypass >> one of the checks that make the access legal. >> >> `IfNode::dominated_by()` and `PhaseIdealLoop::dominated_by()` have >> logic to prevent this: nodes that are control dependent on a range >> check or predicate are not allowed to float. This is however not >> sufficient as demonstrated by the test cases. >> >> In `TestArrayAccessAboveRCAfterSmearingOrPredication.testRangeCheckSmearing()`: >> >> >> v += array[i]; >> if (flag2) { >> if (flag3) { >> field = 0x42; >> } >> } >> if (flagField == 1) { >> v += array[i]; >> } >> >> >> The range check for the second `array[i]` load is replaced by the >> dominating range check for the first `array[i]` but because the second >> `array[i]` load could really be dependent on multiple range checks (in >> case smearing happened which is not the case here), c2 doesn't allow >> the second `array[i]` to float when the second range check is >> removed. The second `array[i]` is then control dependent on: >> >> >> if (flagField == 1) { >> >> >> which is next found to be dominated by the same test: >> >> >> if (flag == 1) { >> >> >> and is removed. However nothing in `dominated_by()` treats node >> dependent on tests that are not range check or predicates >> specially. So the second `array[i]` is allowed to float and become >> dependent on: >> >> >> if (flag == 1) { >> >> >> which is above the range check for that access. The test method in its >> last invocation is passed an index for the array access that's widely >> out of range. The array load happens before the range check and >> crashes the VM. `testLoopPredication()` is a similar test where array >> loads become dependent on predicates and end up above range checks. >> >> `TestArrayAccessCastIIAboveRC.java` is the test case from the bug >> where for similar reasons a range check `CastII` ends up above its >> range check, becomes top because its input becomes some integer that >> conflicts with its... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Anyone else for the review of this? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16886#issuecomment-1875430778 From shade at openjdk.org Wed Jan 3 14:48:48 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 3 Jan 2024 14:48:48 GMT Subject: RFR: 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 [v8] In-Reply-To: References: Message-ID: On Fri, 15 Dec 2023 14:32:57 GMT, Roland Westrelin wrote: >> Range check smearing and range check predication make an array access >> dependent on 2 (or more in the case of RC smearing) conditions. As a >> consequence, if a range check can be eliminated because there's an >> identical dominating range check, the control dependent nodes that >> could float and become dependent on the dominating range check cannot >> be allowed to float because there's a risk that they would then bypass >> one of the checks that make the access legal. >> >> `IfNode::dominated_by()` and `PhaseIdealLoop::dominated_by()` have >> logic to prevent this: nodes that are control dependent on a range >> check or predicate are not allowed to float. This is however not >> sufficient as demonstrated by the test cases. >> >> In `TestArrayAccessAboveRCAfterSmearingOrPredication.testRangeCheckSmearing()`: >> >> >> v += array[i]; >> if (flag2) { >> if (flag3) { >> field = 0x42; >> } >> } >> if (flagField == 1) { >> v += array[i]; >> } >> >> >> The range check for the second `array[i]` load is replaced by the >> dominating range check for the first `array[i]` but because the second >> `array[i]` load could really be dependent on multiple range checks (in >> case smearing happened which is not the case here), c2 doesn't allow >> the second `array[i]` to float when the second range check is >> removed. The second `array[i]` is then control dependent on: >> >> >> if (flagField == 1) { >> >> >> which is next found to be dominated by the same test: >> >> >> if (flag == 1) { >> >> >> and is removed. However nothing in `dominated_by()` treats node >> dependent on tests that are not range check or predicates >> specially. So the second `array[i]` is allowed to float and become >> dependent on: >> >> >> if (flag == 1) { >> >> >> which is above the range check for that access. The test method in its >> last invocation is passed an index for the array access that's widely >> out of range. The array load happens before the range check and >> crashes the VM. `testLoopPredication()` is a similar test where array >> loads become dependent on predicates and end up above range checks. >> >> `TestArrayAccessCastIIAboveRC.java` is the test case from the bug >> where for similar reasons a range check `CastII` ends up above its >> range check, becomes top because its input becomes some integer that >> conflicts with its... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review There are a couple of GHA failures, and those are probably resolved in current master. It would be helpful if you can pull from current master and get a clean run. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16886#issuecomment-1875487678 From roland at openjdk.org Wed Jan 3 15:53:04 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 3 Jan 2024 15:53:04 GMT Subject: RFR: 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 [v9] In-Reply-To: References: Message-ID: > Range check smearing and range check predication make an array access > dependent on 2 (or more in the case of RC smearing) conditions. As a > consequence, if a range check can be eliminated because there's an > identical dominating range check, the control dependent nodes that > could float and become dependent on the dominating range check cannot > be allowed to float because there's a risk that they would then bypass > one of the checks that make the access legal. > > `IfNode::dominated_by()` and `PhaseIdealLoop::dominated_by()` have > logic to prevent this: nodes that are control dependent on a range > check or predicate are not allowed to float. This is however not > sufficient as demonstrated by the test cases. > > In `TestArrayAccessAboveRCAfterSmearingOrPredication.testRangeCheckSmearing()`: > > > v += array[i]; > if (flag2) { > if (flag3) { > field = 0x42; > } > } > if (flagField == 1) { > v += array[i]; > } > > > The range check for the second `array[i]` load is replaced by the > dominating range check for the first `array[i]` but because the second > `array[i]` load could really be dependent on multiple range checks (in > case smearing happened which is not the case here), c2 doesn't allow > the second `array[i]` to float when the second range check is > removed. The second `array[i]` is then control dependent on: > > > if (flagField == 1) { > > > which is next found to be dominated by the same test: > > > if (flag == 1) { > > > and is removed. However nothing in `dominated_by()` treats node > dependent on tests that are not range check or predicates > specially. So the second `array[i]` is allowed to float and become > dependent on: > > > if (flag == 1) { > > > which is above the range check for that access. The test method in its > last invocation is passed an index for the array access that's widely > out of range. The array load happens before the range check and > crashes the VM. `testLoopPredication()` is a similar test where array > loads become dependent on predicates and end up above range checks. > > `TestArrayAccessCastIIAboveRC.java` is the test case from the bug > where for similar reasons a range check `CastII` ends up above its > range check, becomes top because its input becomes some integer that > conflicts with its type (but there's no condition to catch it). The > graph becomes broken and c2 crashes. > > Logic in the `dominated_by()` methods ... Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: - Merge branch 'master' into JDK-8319793 - review - Revert "Update src/hotspot/share/opto/castnode.hpp" This reverts commit 356c91cca911ed486f9f87f3eff53ce21e1e3ec9. - Revert "Update src/hotspot/share/opto/memnode.hpp" This reverts commit bdb731ea562f314f44d327f7243ef5cf9ad40b2e. - review - Update src/hotspot/share/opto/memnode.hpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/castnode.hpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/castnode.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/ifnode.cpp Co-authored-by: Christian Hagedorn - Merge branch 'master' into JDK-8319793 - ... and 1 more: https://git.openjdk.org/jdk/compare/15519285...dbe3c4c1 ------------- Changes: https://git.openjdk.org/jdk/pull/16886/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16886&range=08 Stats: 367 lines in 14 files changed: 309 ins; 27 del; 31 mod Patch: https://git.openjdk.org/jdk/pull/16886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16886/head:pull/16886 PR: https://git.openjdk.org/jdk/pull/16886 From mli at openjdk.org Wed Jan 3 16:12:06 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 3 Jan 2024 16:12:06 GMT Subject: RFR: 8322959: vectorapi: get wrong argument for `limit` in indexPartiallyInUpperRange intrinsic Message-ID: Hi, Can you review this simple fix for indexPartiallyInUpperRange intrinsic? Thanks. ------------- Commit messages: - Initial commit Changes: https://git.openjdk.org/jdk/pull/17247/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17247&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8322959 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17247.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17247/head:pull/17247 PR: https://git.openjdk.org/jdk/pull/17247 From sviswanathan at openjdk.org Wed Jan 3 17:12:49 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 3 Jan 2024 17:12:49 GMT Subject: RFR: 8322959: vectorapi: get wrong argument for `limit` in indexPartiallyInUpperRange intrinsic In-Reply-To: References: Message-ID: On Wed, 3 Jan 2024 16:05:48 GMT, Hamlin Li wrote: > Hi, > Can you review this simple fix for indexPartiallyInUpperRange intrinsic? > Thanks. src/hotspot/share/opto/vectorIntrinsics.cpp line 3151: > 3149: > 3150: Node* offset = argument(3); > 3151: Node* limit = argument(4); The offset is of long type so will take 2 spots (3 and 4) of argument. So limit will be argument(5). The original code (limit = argument(5)) looks correct to me. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17247#discussion_r1440691288 From roland at openjdk.org Wed Jan 3 17:17:54 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 3 Jan 2024 17:17:54 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v3] In-Reply-To: References: Message-ID: > This change implements C2 optimizations for calls to > ScopedValue.get(). Indeed, in: > > > v1 = scopedValue.get(); > ... > v2 = scopedValue.get(); > > > `v2` can be replaced by `v1` and the second call to `get()` can be > optimized out. That's true whatever is between the 2 calls unless a > new mapping for `scopedValue` is created in between (when that happens > no optimizations is performed for the method being compiled). Hoisting > a `get()` call out of loop for a loop invariant `scopedValue` should > also be legal in most cases. > > `ScopedValue.get()` is implemented in java code as a 2 step process. A > cache is attached to the current thread object. If the `ScopedValue` > object is in the cache then the result from `get()` is read from > there. Otherwise a slow call is performed that also inserts the > mapping in the cache. The cache itself is lazily allocated. One > `ScopedValue` can be hashed to 2 different indexes in the cache. On a > cache probe, both indexes are checked. As a consequence, the process > of probing the cache is a multi step process (check if the cache is > present, check first index, check second index if first index > failed). If the cache is populated early on, then when the method that > calls `ScopedValue.get()` is compiled, profile reports the slow path > as never taken and only the read from the cache is compiled. > > To perform the optimizations, I added 3 new node types to C2: > > - the pair > ScopedValueGetHitsInCacheNode/ScopedValueGetLoadFromCacheNode for > the cache probe > > - a cfg node ScopedValueGetResultNode to help locate the result of the > `get()` call in the IR graph. > > In pseudo code, once the nodes are inserted, the code of a `get()` is: > > > hits_in_the_cache = ScopedValueGetHitsInCache(scopedValue) > if (hits_in_the_cache) { > res = ScopedValueGetLoadFromCache(hits_in_the_cache); > } else { > res = ..; //slow call possibly inlined. Subgraph can be arbitray complex > } > res = ScopedValueGetResult(res) > > > In the snippet: > > > v1 = scopedValue.get(); > ... > v2 = scopedValue.get(); > > > Replacing `v2` by `v1` is then done by starting from the > `ScopedValueGetResult` node for the second `get()` and looking for a > dominating `ScopedValueGetResult` for the same `ScopedValue` > object. When one is found, it is used as a replacement. Eliminating > the second `get()` call is achieved by making > `ScopedValueGetHitsInCache` always successful if there's a dominating > `ScopedValueGetResult` and replacing its companion > `ScopedValueGetLoadFromCache` by the dominating > `ScopedValueGetResult`. > > Hoisting a `g... Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: - Merge branch 'master' into JDK-8320649 - test failures - white spaces + bug id in test - test & fix ------------- Changes: https://git.openjdk.org/jdk/pull/16966/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16966&range=02 Stats: 2037 lines in 33 files changed: 2007 ins; 1 del; 29 mod Patch: https://git.openjdk.org/jdk/pull/16966.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16966/head:pull/16966 PR: https://git.openjdk.org/jdk/pull/16966 From shade at openjdk.org Wed Jan 3 17:26:43 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 3 Jan 2024 17:26:43 GMT Subject: RFR: 8320128: Clean up Parse constructor for OSR [v2] In-Reply-To: References: <0mxhhLR9DDdpiBhbXWPxCmMzo_OI9DYiIX36xGhxC_c=.1eed99b9-2dc3-4403-b3c1-22b6ebed6079@github.com> Message-ID: On Tue, 21 Nov 2023 06:00:29 GMT, Xin Liu wrote: >> There's a special case for the constructor of Parse. If current compilation is OSR and it is handling the top-level method(depth() == 1), then >> >> 1. _tf = C->tf(); >> 2. _entry_bci = C->entry_bci(); >> 3. _flow = method()->get_osr_flow_analysis(_entry_bci); >> >> We don't need to assign those member data twice. We can also factor out _flow->failing() for the special case and normal cases. >> >> It's worth mentioning that we can't save ciTypeFlow computation because >> get_osr_flow_analysis(_entry_bci) actually needs get_flow_analysis(method()). > > Xin Liu has updated the pull request incrementally with one additional commit since the last revision: > > Update according to reviewer's feedback. This looks reasonable. I have a few cosmetic comments/suggestions. src/hotspot/share/opto/parse1.cpp line 513: > 511: tty->print("OSR @%d ", _entry_bci); > 512: } > 513: tty->print_cr("type flow bailout: %s", _flow->failure_reason()); Not sure if we want to keep the single `print_cr` for log atomicity reasons. I think this would be good too: if (is_osr_parse()) { tty->print_cr("OSR @%d type flow bailout: %s", _entry_bci, _flow->failure_reason()); } else { tty->print_cr("type flow bailout: %s", _flow->failure_reason()); } src/hotspot/share/opto/parse1.cpp line 529: > 527: } > 528: > 529: #ifdef ASSERT I think the goal for this `#ifdef` block is to eliminate even the `if (depth() == 1)` in product builds. Yes, most of the code is dead, but it is safer not to rely on it. Leave it as is. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16669#pullrequestreview-1802720853 PR Review Comment: https://git.openjdk.org/jdk/pull/16669#discussion_r1440702469 PR Review Comment: https://git.openjdk.org/jdk/pull/16669#discussion_r1440685800 From kvn at openjdk.org Wed Jan 3 18:00:39 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 3 Jan 2024 18:00:39 GMT Subject: RFR: 8322959: vectorapi: get wrong argument for `limit` in indexPartiallyInUpperRange intrinsic In-Reply-To: References: Message-ID: On Wed, 3 Jan 2024 17:10:07 GMT, Sandhya Viswanathan wrote: >> Hi, >> Can you review this simple fix for indexPartiallyInUpperRange intrinsic? >> Thanks. > > src/hotspot/share/opto/vectorIntrinsics.cpp line 3151: > >> 3149: >> 3150: Node* offset = argument(3); >> 3151: Node* limit = argument(4); > > The offset is of long type so will take 2 spots (3 and 4) of argument. So limit will be argument(5). The original code (limit = argument(5)) looks correct to me. @sviswa7 is right. @Hamlin-Li you do you have a test case where the value is wrong? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17247#discussion_r1440743534 From kvn at openjdk.org Wed Jan 3 18:39:02 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 3 Jan 2024 18:39:02 GMT Subject: RFR: JDK-8321599 Data loss in AVX3 Base64 decoding [v6] In-Reply-To: References: Message-ID: On Wed, 20 Dec 2023 16:28:03 GMT, Scott Gibbons wrote: >> Fix for looking for padding characters within the encoded string. Was not adding start offset to length, so was looking at potentially freed or uninitialized memory. >> >> Tested teir1 and with testcase supplied with JBS issue. >> >> The problem will only occur when all of the following are true: >> 1. The source offset of the string to be decoded is != 0. >> 2. The characters at the beginning of the string (minus the offset) plus the string length mod 64 are either "=" or "==". >> 3. The string is >= 32 characters. >> 4. The string is not MIME encoded. >> >> If any of these conditions are not met, the decode works as expected. This was due to omitting the source offset of the string when checking for padding characters. > > Scott Gibbons has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision: > > - Merge branch 'openjdk:master' into Base64-fix > - Updated copyright year > - Updated copyright year > - Revert code size change - wa for an experiment only. > - Added some comments to the test > - Merge branch 'openjdk:master' into Base64-fix > - Merge branch 'Base64-fix' of https://github.com/asgibbons/jdk into Base64-fix > - Merge branch 'openjdk:master' into Base64-fix > - Added tests for proper length and padding checks > - Fix for JDK-8321599 Looks reasonable. Please, update copyright year to 2024 in source file and test. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17039#pullrequestreview-1802869126 From kvn at openjdk.org Wed Jan 3 19:45:44 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 3 Jan 2024 19:45:44 GMT Subject: RFR: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs [v58] In-Reply-To: References: Message-ID: On Wed, 3 Jan 2024 09:01:31 GMT, Emanuel Peter wrote: >> I want to push this in JDK23. >> After this fix here, I'm doing [this refactoring](https://github.com/openjdk/jdk/pull/16620). >> >> To calm your nerves: most of the changes are in auto-generated tests, and tests in general. >> >> **Context** >> >> `-XX:+AlignVector` ensures that SuperWord only creates LoadVector and StoreVector that can be memory aligned. This is achieved by iterating in the pre-loop until we reach the alignment boundary, then we can start the main loop properly aligned. However, this is not possible in all cases, sometimes some memory accesses cannot be guaranteed to be aligned, and we need to reject vectorization (at least partially, for some of the packs). >> >> Alignment is split into two tasks: >> - Alignment Correctness Checks: only relevant if `-XX:+AlignVector`. Need to reject vectorization if alignment is not possible. We must check if the address of the vector load/store is aligned with (divisible by) `ObjectAlignmentInBytes`. >> - Alignment by adjusting pre-loop limit: alignment is desirable even if `-XX:-AlignVector`. We would like to align the vectors with their vector width. >> >> **Problem** >> >> I have recently found a bug with our AlignVector [JDK-8310190](https://bugs.openjdk.org/browse/JDK-8310190). >> In that bug, we perform a misaligned memory vector access, which results in a `SIGBUS` on an ARM32 machine. >> Thanks @fg1417 for confirming this! >> Hence, we need to fix the alignment correctness checks. >> >> While working on this task, I also found some bugs in the "alignment by adjusting pre-loop limit": there were cases where it did not align the vectors correctly. >> >> **Problem Details** >> >> Reproducer: >> >> >> static void test(short[] a, short[] b, short mask) { >> for (int i = 0; i < RANGE; i+=8) { >> // Problematic for AlignVector >> b[i+0] = (short)(a[i+0] & mask); // best_memref, align 0 >> >> b[i+3] = (short)(a[i+3] & mask); // pack at offset 6 bytes >> b[i+4] = (short)(a[i+4] & mask); >> b[i+5] = (short)(a[i+5] & mask); >> b[i+6] = (short)(a[i+6] & mask); >> } >> } >> >> >> During `SuperWord::find_adjacent_refs` we used to check if the references are expected to be aligned. For that, we look at each "group" of references (eg all `LoadS`) and take the reference with the lowest offset. For that chosen reference, we check if it is alignable. If yes, we accept all references of that group, if no we reject all. >> >> This is problemati... > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 124 commits: > > - Merge branch 'JDK-8311586' of https://github.com/eme64/jdk into JDK-8311586 > - Apply suggestions from code review by Christian > > Co-authored-by: Christian Hagedorn > - fix copyright year 2024 > - Merge branch 'master' into JDK-8311586 > - more comments in SuperWord::adjust_pre_loop_limit_to_align_main_loop_vectors > - comments about modulo positive / negative values > - Apply suggestions from code review from Christian > > Co-authored-by: Christian Hagedorn > - more small fixes by Christian > - fix for yesterday's reviews by Christian > - improve case analysis empty / constrained / trivial > - ... and 114 more: https://git.openjdk.org/jdk/compare/06dd7353...d01a0cd9 Few comments. src/hotspot/share/opto/chaitin.cpp line 1795: > 1793: // See if already computed; if so return it > 1794: if( derived_base_map[derived->_idx] ) > 1795: return derived_base_map[derived->_idx]; Please fix code style for these lines since you are touching this code. Spacing and missing {}. src/hotspot/share/opto/chaitin.cpp line 1797: > 1795: return derived_base_map[derived->_idx]; > 1796: > 1797: if (derived->is_Mach() && derived->as_Mach()->ideal_Opcode() == Op_VerifyVectorAlignment) { Missing #ifdef ASSERT src/hotspot/share/opto/compile.cpp line 1059: > 1057: > 1058: if (AllowVectorizeOnDemand) { > 1059: if (has_method() && _directive->VectorizeOption) { This seems no related. Please explain it. src/hotspot/share/opto/compile.cpp line 3713: > 3711: // to ObjectAlignmentInBytes. Hence, even if multiple arrays are accessed in > 3712: // a loop we can expect at least the following alignment: > 3713: jlong guaranteed_alignment = MIN2(vector_width, (jlong)ObjectAlignmentInBytes); This is more relaxed check than the actual alignment required. As I understand it is because it checks only base address of array and not actually memory address to which vector instruction is accessed (which is (base,index,offset)). It is useful but does not guarantee correct alignment of vector access instructions. Consider using `lea` instruction on x86 to load memory address into register and check it. src/hotspot/share/opto/machnode.cpp line 360: > 358: } > 359: > 360: if (base != nullptr && base->is_Mach() && base->as_Mach()->ideal_Opcode() == Op_VerifyVectorAlignment) { Missing #ifdef ASSERT src/hotspot/share/opto/superword.cpp line 674: > 672: "packset empty or we find the alignment reference"); > 673: > 674: if (TraceSuperWord) { Missing #ifndef PRODUCT src/hotspot/share/opto/superword.cpp line 1605: > 1603: compress_packset(); > 1604: > 1605: if (TraceSuperWord) { Missing #ifndef PRODUCT ------------- PR Review: https://git.openjdk.org/jdk/pull/14785#pullrequestreview-1802885677 PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1440813017 PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1440791887 PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1440792545 PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1440864022 PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1440796064 PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1440826791 PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1440828355 From sgibbons at openjdk.org Wed Jan 3 19:51:02 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 3 Jan 2024 19:51:02 GMT Subject: RFR: JDK-8321599 Data loss in AVX3 Base64 decoding [v7] In-Reply-To: References: Message-ID: > Fix for looking for padding characters within the encoded string. Was not adding start offset to length, so was looking at potentially freed or uninitialized memory. > > Tested teir1 and with testcase supplied with JBS issue. > > The problem will only occur when all of the following are true: > 1. The source offset of the string to be decoded is != 0. > 2. The characters at the beginning of the string (minus the offset) plus the string length mod 64 are either "=" or "==". > 3. The string is >= 32 characters. > 4. The string is not MIME encoded. > > If any of these conditions are not met, the decode works as expected. This was due to omitting the source offset of the string when checking for padding characters. Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: Fixed copyrights ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17039/files - new: https://git.openjdk.org/jdk/pull/17039/files/ba60ac59..5f0e0d59 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17039&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17039&range=05-06 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17039.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17039/head:pull/17039 PR: https://git.openjdk.org/jdk/pull/17039 From kvn at openjdk.org Wed Jan 3 19:53:39 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 3 Jan 2024 19:53:39 GMT Subject: RFR: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs [v58] In-Reply-To: References: Message-ID: On Wed, 3 Jan 2024 19:41:57 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 124 commits: >> >> - Merge branch 'JDK-8311586' of https://github.com/eme64/jdk into JDK-8311586 >> - Apply suggestions from code review by Christian >> >> Co-authored-by: Christian Hagedorn >> - fix copyright year 2024 >> - Merge branch 'master' into JDK-8311586 >> - more comments in SuperWord::adjust_pre_loop_limit_to_align_main_loop_vectors >> - comments about modulo positive / negative values >> - Apply suggestions from code review from Christian >> >> Co-authored-by: Christian Hagedorn >> - more small fixes by Christian >> - fix for yesterday's reviews by Christian >> - improve case analysis empty / constrained / trivial >> - ... and 114 more: https://git.openjdk.org/jdk/compare/06dd7353...d01a0cd9 > > src/hotspot/share/opto/compile.cpp line 3713: > >> 3711: // to ObjectAlignmentInBytes. Hence, even if multiple arrays are accessed in >> 3712: // a loop we can expect at least the following alignment: >> 3713: jlong guaranteed_alignment = MIN2(vector_width, (jlong)ObjectAlignmentInBytes); > > This is more relaxed check than the actual alignment required. As I understand it is because it checks only base address of array and not actually memory address to which vector instruction is accessed (which is (base,index,offset)). > It is useful but does not guarantee correct alignment of vector access instructions. > > Consider using `lea` instruction on x86 to load memory address into register and check it. May be hack only `loadV` and `storeV` instructions in .ad file to use `lea` and do the check. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1440873324 From xliu at openjdk.org Wed Jan 3 20:04:17 2024 From: xliu at openjdk.org (Xin Liu) Date: Wed, 3 Jan 2024 20:04:17 GMT Subject: RFR: 8320128: Clean up Parse constructor for OSR [v3] In-Reply-To: <0mxhhLR9DDdpiBhbXWPxCmMzo_OI9DYiIX36xGhxC_c=.1eed99b9-2dc3-4403-b3c1-22b6ebed6079@github.com> References: <0mxhhLR9DDdpiBhbXWPxCmMzo_OI9DYiIX36xGhxC_c=.1eed99b9-2dc3-4403-b3c1-22b6ebed6079@github.com> Message-ID: > There's a special case for the constructor of Parse. If current compilation is OSR and it is handling the top-level method(depth() == 1), then > > 1. _tf = C->tf(); > 2. _entry_bci = C->entry_bci(); > 3. _flow = method()->get_osr_flow_analysis(_entry_bci); > > We don't need to assign those member data twice. We can also factor out _flow->failing() for the special case and normal cases. > > It's worth mentioning that we can't save ciTypeFlow computation because > get_osr_flow_analysis(_entry_bci) actually needs get_flow_analysis(method()). Xin Liu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Use atomic logline and resume #ifdef ASSERT. - Merge branch 'master' into JDK-8320128 - Update according to reviewer's feedback. - 8320128: Clean up Parse constructor for OSR ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16669/files - new: https://git.openjdk.org/jdk/pull/16669/files/1f7c956c..ec89638c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16669&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16669&range=01-02 Stats: 789833 lines in 4137 files changed: 177560 ins; 537711 del; 74562 mod Patch: https://git.openjdk.org/jdk/pull/16669.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16669/head:pull/16669 PR: https://git.openjdk.org/jdk/pull/16669 From xliu at openjdk.org Wed Jan 3 20:14:31 2024 From: xliu at openjdk.org (Xin Liu) Date: Wed, 3 Jan 2024 20:14:31 GMT Subject: RFR: 8320128: Clean up Parse constructor for OSR [v3] In-Reply-To: References: <0mxhhLR9DDdpiBhbXWPxCmMzo_OI9DYiIX36xGhxC_c=.1eed99b9-2dc3-4403-b3c1-22b6ebed6079@github.com> Message-ID: On Wed, 3 Jan 2024 17:03:56 GMT, Aleksey Shipilev wrote: >> Xin Liu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Use atomic logline and resume #ifdef ASSERT. >> - Merge branch 'master' into JDK-8320128 >> - Update according to reviewer's feedback. >> - 8320128: Clean up Parse constructor for OSR > > src/hotspot/share/opto/parse1.cpp line 529: > >> 527: } >> 528: >> 529: #ifdef ASSERT > > I think the goal for this `#ifdef` block is to eliminate even the `if (depth() == 1)` in product builds. Yes, most of the code is dead, but it is safer not to rely on it. Leave it as is. I tried to improve readability by reducing macros. okay. I bring it back. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16669#discussion_r1440895697 From kbarrett at openjdk.org Wed Jan 3 20:16:28 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 3 Jan 2024 20:16:28 GMT Subject: RFR: 8322879: Eliminate -Wparentheses warnings in x86-32 code In-Reply-To: References: Message-ID: On Wed, 3 Jan 2024 07:19:18 GMT, Vladimir Kozlov wrote: >> Please review this trivial change to eliminate a -Wparentheses warning. >> This involved simply adding parentheses to make the implicit operator >> precedence explicit. >> >> Testing: Locally (linux-x64) cross-compiled for linux-x86. Also ran GHA with >> -Wparentheses enabled along with this and other changes needed to make that >> work. > > Trivial. Thanks for reviews, @vnkozlov and @shipilev . ------------- PR Comment: https://git.openjdk.org/jdk/pull/17237#issuecomment-1875913222 From kbarrett at openjdk.org Wed Jan 3 20:16:29 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 3 Jan 2024 20:16:29 GMT Subject: Integrated: 8322879: Eliminate -Wparentheses warnings in x86-32 code In-Reply-To: References: Message-ID: On Wed, 3 Jan 2024 05:10:50 GMT, Kim Barrett wrote: > Please review this trivial change to eliminate a -Wparentheses warning. > This involved simply adding parentheses to make the implicit operator > precedence explicit. > > Testing: Locally (linux-x64) cross-compiled for linux-x86. Also ran GHA with > -Wparentheses enabled along with this and other changes needed to make that > work. This pull request has now been integrated. Changeset: 30a0c61d Author: Kim Barrett URL: https://git.openjdk.org/jdk/commit/30a0c61de080a0cc52ec163095fe0f02f324474e Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8322879: Eliminate -Wparentheses warnings in x86-32 code Reviewed-by: kvn, shade ------------- PR: https://git.openjdk.org/jdk/pull/17237 From kvn at openjdk.org Wed Jan 3 20:32:23 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 3 Jan 2024 20:32:23 GMT Subject: RFR: JDK-8321599 Data loss in AVX3 Base64 decoding [v7] In-Reply-To: References: Message-ID: On Wed, 3 Jan 2024 19:51:02 GMT, Scott Gibbons wrote: >> Fix for looking for padding characters within the encoded string. Was not adding start offset to length, so was looking at potentially freed or uninitialized memory. >> >> Tested teir1 and with testcase supplied with JBS issue. >> >> The problem will only occur when all of the following are true: >> 1. The source offset of the string to be decoded is != 0. >> 2. The characters at the beginning of the string (minus the offset) plus the string length mod 64 are either "=" or "==". >> 3. The string is >= 32 characters. >> 4. The string is not MIME encoded. >> >> If any of these conditions are not met, the decode works as expected. This was due to omitting the source offset of the string when checking for padding characters. > > Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: > > Fixed copyrights Good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17039#issuecomment-1875932205 From never at openjdk.org Wed Jan 3 20:42:21 2024 From: never at openjdk.org (Tom Rodriguez) Date: Wed, 3 Jan 2024 20:42:21 GMT Subject: RFR: 8322636: [JVMCI] HotSpotSpeculationLog can be inconsistent across a single compile [v2] In-Reply-To: References: Message-ID: On Tue, 2 Jan 2024 13:37:19 GMT, David Leopoldseder wrote: >> This PR fixes a subtle inconsistency in `HotSpotSpeculationLog` . >> >> Normal uses of `HotSpotSpeculationLog` work by using a `SpeculationReason` and asking the speculation log via `maySpeculate` if the speculation can be performed, i.e., if it failed before for the given method. An example for this can be seen in Graal https://github.com/oracle/graal/blob/master/compiler/src/jdk.graal.compiler/src/jdk/graal/compiler/nodes/loop/CountedLoopInfo.java#L591C15-L591C15 >> The implicit assumption is that the speculation log, `HotSpotSpeculationLog` in particular collects failed speculations at the beginning of a compile and then stays consistent during the compile. Why is that? - Because if there are new failed speculations added to the failed speculations during the compile - the compiler would speculate again on those in an inconsistent way. E.g. at the beginning of a compile a certain speculation has not failed yet and the compiler thinks it can do optimization xyz using a speculation - later during the compilation process it consults the speculation log but gets a different answer. All those inconsistent speculations that already failed will anyway later fail code installation in jvmci (they will throw a bailout during `HotSpotCodeCacheProvider#installCode` https://github.com/openjdk/jdk/blob/master/src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotSpeculationLog.java#L192 ). Thus, we should at least return a consistent result durin g a compile. >> The problem for consistency here, that also makes troubles on the graal side, is that `maySpeculate` itself can collect failed speculations if there have not been any previously, i.e., `failedSpeculations == null`. >> In order to make the speculation log consistent across an entire JVMCI compile this PR removes the collection of failed speculations in `maySpeculate`. > > David Leopoldseder has updated the pull request incrementally with one additional commit since the last revision: > > 8322636: [JVMCI] HotSpotSpeculationLog add javadoc to maySpeculate So I looked more closely the HotSpot and substrate implementations and I'm not sure we can currently align the implementation and the javadoc. In the HotSpot world, HotSpotSpeculationLog is a compiler local object that reads data from the real speculation data that's kept in the MDO. This means that it has full control over when collectFailedSpeculations is called. SubstrateSpeculationLog is the actual log so if two threads are operating on the same log then one of them could see the effects of a call to collectFailedSpeculations by the other thread. Maybe in practice 2 threads never do this because it would mean they are compiling the same root method but it doesn't seem guaranteed. installCode on substrate also doesn't perform the speculation log check that HotSpot does. So maybe we punt on javadoc updates for now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17183#issuecomment-1875942561 From sgibbons at openjdk.org Wed Jan 3 21:18:15 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Wed, 3 Jan 2024 21:18:15 GMT Subject: RFR: JDK-8321599 Data loss in AVX3 Base64 decoding [v8] In-Reply-To: References: Message-ID: > Fix for looking for padding characters within the encoded string. Was not adding start offset to length, so was looking at potentially freed or uninitialized memory. > > Tested teir1 and with testcase supplied with JBS issue. > > The problem will only occur when all of the following are true: > 1. The source offset of the string to be decoded is != 0. > 2. The characters at the beginning of the string (minus the offset) plus the string length mod 64 are either "=" or "==". > 3. The string is >= 32 characters. > 4. The string is not MIME encoded. > > If any of these conditions are not met, the decode works as expected. This was due to omitting the source offset of the string when checking for padding characters. Scott Gibbons has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: - Merge branch 'openjdk:master' into Base64-fix - Fixed copyrights - Merge branch 'openjdk:master' into Base64-fix - Updated copyright year - Updated copyright year - Revert code size change - wa for an experiment only. - Added some comments to the test - Merge branch 'openjdk:master' into Base64-fix - Merge branch 'Base64-fix' of https://github.com/asgibbons/jdk into Base64-fix - Merge branch 'openjdk:master' into Base64-fix - ... and 2 more: https://git.openjdk.org/jdk/compare/919ef219...dbccc16e ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17039/files - new: https://git.openjdk.org/jdk/pull/17039/files/5f0e0d59..dbccc16e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17039&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17039&range=06-07 Stats: 2960 lines in 243 files changed: 1658 ins; 531 del; 771 mod Patch: https://git.openjdk.org/jdk/pull/17039.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17039/head:pull/17039 PR: https://git.openjdk.org/jdk/pull/17039 From duke at openjdk.org Wed Jan 3 21:18:48 2024 From: duke at openjdk.org (Eric Murphy) Date: Wed, 3 Jan 2024 21:18:48 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v42] In-Reply-To: References: <3wco3meaBNwjfDWtVvkkoRfgG7-Wu1XZJTfJFduX5LE=.adbcd599-bcab-45a8-896f-cd2c65510352@github.com> <_MGkyOjyeyCIOE_HpYGCpzN3zN6bJEtaMGo_3T66e7M=.446e6122-c301-4dd9-9704-b72606275f4c@github.com> Message-ID: On Sun, 15 Oct 2023 07:40:06 GMT, himichael wrote: > sing a physical machine, I am using a virtual machine, this virtual machine supports the AVX512 instruction set. > How do I open libsimdsort ? @himichael Did you ever resolve your issue? I am using JDK22 from SDKMan and have the same errors: java -Xlog:library [0.013s][info][library] Loaded library libjsvml.so, handle 0x00007fc9a40229b0 [0.024s][info][library] Failed to find JNI_OnLoad_nio in library with handle 0x00007fca6b2dc220 [0.024s][info][library] Loaded library /home/eric/.sdkman/candidates/java/22.ea.29-open/lib/libnio.so, handle 0x00007fca6419b9b0 [0.024s][info][library] Found JNI_OnLoad in library with handle 0x00007fca6419b9b0 [0.024s][info][library] Found Java_sun_nio_fs_UnixNativeDispatcher_init in library with handle 0x00007fca6419b9b0 [0.024s][info][library] Found Java_sun_nio_fs_UnixNativeDispatcher_getcwd in library with handle 0x00007fca6419b9b0 [0.025s][info][library] Failed to find JNI_OnLoad_jimage in library with handle 0x00007fca6b2dc220 [0.025s][info][library] Loaded library /home/eric/.sdkman/candidates/java/22.ea.29-open/lib/libjimage.so, handle 0x00007fca64006380 [0.025s][info][library] Failed to find JNI_OnLoad in library with handle 0x00007fca64006380 [0.025s][info][library] Failed to find Java_jdk_internal_jimage_NativeImageBuffer_getNativeMap in library with handle 0x00007fca6419b9b0 [0.025s][info][library] Found Java_jdk_internal_jimage_NativeImageBuffer_getNativeMap in library with handle 0x00007fca64006380 ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-1875978761 From duke at openjdk.org Wed Jan 3 22:21:07 2024 From: duke at openjdk.org (Joshua Cao) Date: Wed, 3 Jan 2024 22:21:07 GMT Subject: RFR: 8322976: Remove reference to transform_no_reclaim Message-ID: Passes hotspot:tier1 locally ------------- Commit messages: - 8322976: Remove reference to transform_no_reclaim Changes: https://git.openjdk.org/jdk/pull/17255/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17255&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8322976 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17255.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17255/head:pull/17255 PR: https://git.openjdk.org/jdk/pull/17255 From duke at openjdk.org Wed Jan 3 22:21:07 2024 From: duke at openjdk.org (Joshua Cao) Date: Wed, 3 Jan 2024 22:21:07 GMT Subject: RFR: 8322976: Remove reference to transform_no_reclaim In-Reply-To: References: Message-ID: On Wed, 3 Jan 2024 22:12:03 GMT, Joshua Cao wrote: > Passes hotspot:tier1 locally Its worth considering a hard cap here. For example, calling `apply_ideal` at most eight times might be sufficient for almost all cases. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17255#issuecomment-1876041941 From sgibbons at openjdk.org Thu Jan 4 01:39:39 2024 From: sgibbons at openjdk.org (Scott Gibbons) Date: Thu, 4 Jan 2024 01:39:39 GMT Subject: Integrated: JDK-8321599 Data loss in AVX3 Base64 decoding In-Reply-To: References: Message-ID: On Fri, 8 Dec 2023 20:56:52 GMT, Scott Gibbons wrote: > Fix for looking for padding characters within the encoded string. Was not adding start offset to length, so was looking at potentially freed or uninitialized memory. > > Tested teir1 and with testcase supplied with JBS issue. > > The problem will only occur when all of the following are true: > 1. The source offset of the string to be decoded is != 0. > 2. The characters at the beginning of the string (minus the offset) plus the string length mod 64 are either "=" or "==". > 3. The string is >= 32 characters. > 4. The string is not MIME encoded. > > If any of these conditions are not met, the decode works as expected. This was due to omitting the source offset of the string when checking for padding characters. This pull request has now been integrated. Changeset: 13c11487 Author: Scott Gibbons Committer: Sandhya Viswanathan URL: https://git.openjdk.org/jdk/commit/13c11487f7126a370d9ce8e62f661ea83eedefe6 Stats: 124 lines in 2 files changed: 121 ins; 0 del; 3 mod 8321599: Data loss in AVX3 Base64 decoding Reviewed-by: sviswanathan, kvn ------------- PR: https://git.openjdk.org/jdk/pull/17039 From sviswanathan at openjdk.org Thu Jan 4 01:45:28 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 4 Jan 2024 01:45:28 GMT Subject: RFR: JDK-8321599 Data loss in AVX3 Base64 decoding [v7] In-Reply-To: References: Message-ID: On Wed, 3 Jan 2024 20:29:56 GMT, Vladimir Kozlov wrote: >> Scott Gibbons has updated the pull request incrementally with one additional commit since the last revision: >> >> Fixed copyrights > > Good. Thanks a lot @vnkozlov. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17039#issuecomment-1876195513 From jbhateja at openjdk.org Thu Jan 4 05:33:35 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 4 Jan 2024 05:33:35 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. Message-ID: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Hi, Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only targets. Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 instruction set. These are very frequently used operation in columnar database filter operation. Implementation uses a lookup table to record permute indices. Table index is computed using mask argument of compress/expand operation. Following are the performance number of JMH micro included with the patch. System : Intel(R) Xeon(R) Platinum 8480+ (Sapphire Rapids) Baseline: Benchmark (size) Mode Cnt Score Error Units ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 142.767 ops/ms ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 71.436 ops/ms ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 35.992 ops/ms ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 182.151 ops/ms ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 91.096 ops/ms ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 44.757 ops/ms ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 184.099 ops/ms ColumnFilterBenchmark.filterIntColumn 2047 thrpt 2 91.981 ops/ms ColumnFilterBenchmark.filterIntColumn 4096 thrpt 2 45.170 ops/ms ColumnFilterBenchmark.filterLongColumn 1024 thrpt 2 148.017 ops/ms ColumnFilterBenchmark.filterLongColumn 2047 thrpt 2 73.516 ops/ms ColumnFilterBenchmark.filterLongColumn 4096 thrpt 2 36.844 ops/ms Withopt: Benchmark (size) Mode Cnt Score Error Units ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 2051.707 ops/ms ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 914.072 ops/ms ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 489.898 ops/ms ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 5324.195 ops/ms ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 2587.229 ops/ms ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 1278.665 ops/ms ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 4149.384 ops/ms ColumnFilterBenchmark.filterIntColumn 2047 thrpt 2 1791.170 ops/ms ColumnFilterBenchmark.filterIntColumn 4096 thrpt 2 974.888 ops/ms ColumnFilterBenchmark.filterLongColumn 1024 thrpt 2 1128.281 ops/ms ColumnFilterBenchmark.filterLongColumn 2047 thrpt 2 686.334 ops/ms ColumnFilterBenchmark.filterLongColumn 4096 thrpt 2 337.170 ops/ms Kindly review and share your feedback. Best Regards, Jatin ------------- Commit messages: - 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. Changes: https://git.openjdk.org/jdk/pull/17261/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17261&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8322768 Stats: 336 lines in 10 files changed: 323 ins; 8 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/17261.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17261/head:pull/17261 PR: https://git.openjdk.org/jdk/pull/17261 From jbhateja at openjdk.org Thu Jan 4 05:39:01 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 4 Jan 2024 05:39:01 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2] In-Reply-To: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Message-ID: > Hi, > > Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only targets. > Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 instruction set. > These are very frequently used operation in columnar database filter operation. > > Implementation uses a lookup table to record permute indices. Table index is computed using > mask argument of compress/expand operation. > > Following are the performance number of JMH micro included with the patch. > > > System : Intel(R) Xeon(R) Platinum 8480+ (Sapphire Rapids) > > Baseline: > Benchmark (size) Mode Cnt Score Error Units > ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 142.767 ops/ms > ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 71.436 ops/ms > ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 35.992 ops/ms > ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 182.151 ops/ms > ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 91.096 ops/ms > ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 44.757 ops/ms > ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 184.099 ops/ms > ColumnFilterBenchmark.filterIntColumn 2047 thrpt 2 91.981 ops/ms > ColumnFilterBenchmark.filterIntColumn 4096 thrpt 2 45.170 ops/ms > ColumnFilterBenchmark.filterLongColumn 1024 thrpt 2 148.017 ops/ms > ColumnFilterBenchmark.filterLongColumn 2047 thrpt 2 73.516 ops/ms > ColumnFilterBenchmark.filterLongColumn 4096 thrpt 2 36.844 ops/ms > > Withopt: > Benchmark (size) Mode Cnt Score Error Units > ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 2051.707 ops/ms > ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 914.072 ops/ms > ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 489.898 ops/ms > ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 5324.195 ops/ms > ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 2587.229 ops/ms > ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 1278.665 ops/ms > ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 4149.384 ops/ms > ColumnFilterBenchmark.filterIntColumn 2047 thrpt 2 1791.170 ops/ms > ColumnFilterBenchmark.filterIntColumn ... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Updating copyright year of modified files. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17261/files - new: https://git.openjdk.org/jdk/pull/17261/files/3f2b6105..6bd9b0ad Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17261&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17261&range=00-01 Stats: 6 lines in 6 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/17261.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17261/head:pull/17261 PR: https://git.openjdk.org/jdk/pull/17261 From thartmann at openjdk.org Thu Jan 4 06:19:39 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 4 Jan 2024 06:19:39 GMT Subject: [jdk22] RFR: 8321599: Data loss in AVX3 Base64 decoding Message-ID: Hi all, This pull request contains a backport of commit [13c11487](https://github.com/openjdk/jdk/commit/13c11487f7126a370d9ce8e62f661ea83eedefe6) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. The commit being backported was authored by Scott Gibbons on 4 Jan 2024 and was reviewed by Sandhya Viswanathan and Vladimir Kozlov. Thanks! ------------- Commit messages: - Backport 13c11487f7126a370d9ce8e62f661ea83eedefe6 Changes: https://git.openjdk.org/jdk22/pull/28/files Webrev: https://webrevs.openjdk.org/?repo=jdk22&pr=28&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8321599 Stats: 124 lines in 2 files changed: 121 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk22/pull/28.diff Fetch: git fetch https://git.openjdk.org/jdk22.git pull/28/head:pull/28 PR: https://git.openjdk.org/jdk22/pull/28 From epeter at openjdk.org Thu Jan 4 07:00:51 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 4 Jan 2024 07:00:51 GMT Subject: RFR: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs [v58] In-Reply-To: References: Message-ID: On Wed, 3 Jan 2024 19:50:49 GMT, Vladimir Kozlov wrote: >> src/hotspot/share/opto/compile.cpp line 3713: >> >>> 3711: // to ObjectAlignmentInBytes. Hence, even if multiple arrays are accessed in >>> 3712: // a loop we can expect at least the following alignment: >>> 3713: jlong guaranteed_alignment = MIN2(vector_width, (jlong)ObjectAlignmentInBytes); >> >> This is more relaxed check than the actual alignment required. As I understand it is because it checks only base address of array and not actually memory address to which vector instruction is accessed (which is (base,index,offset)). >> It is useful but does not guarantee correct alignment of vector access instructions. >> >> Consider using `lea` instruction on x86 to load memory address into register and check it. > > May be hack only `loadV` and `storeV` instructions in .ad file to use `lea` and do the check. I don't understand this comment. The `LoadVector` and `StoreVector` have both a `MemNode::Address` input, which I think it the memory address. The address itself usually consists of `AddP` nodes, which do the (base, index, offset) computation. These nodes later can be folded into the load/store itself, or be computed with a `lea`. I simply take the address value, check it for alignment and pass it on to the load/store. Take this example: public class Test { static int RANGE = 1024*64; public static void main(String[] strArr) { int a[] = new int[RANGE]; test0(a); } static void test0(int[] a) { for (int i = 0; i < RANGE; i++) { a[i]++; } } } `./java -XX:CompileCommand=compileonly,Test::test* -XX:+TraceSuperWord -Xcomp -XX:+PrintIdeal -XX:+AlignVector -XX:+VerifyAlignVector -XX:CompileCommand=print,Test::test* Test.java` This looks like the main loop: ;; B22: # out( B22 B23 ) <- in( B21 B22 ) Loop( B22-B22 inner post of N743) Freq: 4.49988 0x00007f83c8bb2f68: lea 0x10(%rbp,%rbx,4),%r10 ;*iaload {reexecute=0 rethrow=0 return_oop=0} ; - Test::test0 at 12 (line 11) 0x00007f83c8bb2f6d: mov %r10,%r8 0x00007f83c8bb2f70: test $0x7,%r8b 0x00007f83c8bb2f74: je 0x00007f83c8bb2f8a 0x00007f83c8bb2f76: movabs $0x7f83d77e2fc8,%rdi ; {external_word} 0x00007f83c8bb2f80: and $0xfffffffffffffff0,%rsp 0x00007f83c8bb2f84: callq 0x00007f83d71a0162 ; {runtime_call MacroAssembler::debug64(char*, long, long*)} 0x00007f83c8bb2f89: hlt 0x00007f83c8bb2f8a: test $0x7,%r10b 0x00007f83c8bb2f8e: je 0x00007f83c8bb2fa4 0x00007f83c8bb2f90: movabs $0x7f83d77e2fc8,%rdi ; {external_word} 0x00007f83c8bb2f9a: and $0xfffffffffffffff0,%rsp 0x00007f83c8bb2f9e: callq 0x00007f83d71a0162 ; {runtime_call MacroAssembler::debug64(char*, long, long*)} 0x00007f83c8bb2fa3: hlt 0x00007f83c8bb2fa4: vpaddd (%r10),%zmm5,%zmm0 0x00007f83c8bb2faa: vmovdqu32 %zmm0,(%r8) ;*iastore {reexecute=0 rethrow=0 return_oop=0} ; - Test::test0 at 15 (line 11) 0x00007f83c8bb2fb0: add $0x10,%ebx ;*iinc {reexecute=0 rethrow=0 return_oop=0} ; - Test::test0 at 16 (line 10) 0x00007f83c8bb2fb3: cmp %r11d,%ebx 0x00007f83c8bb2fb6: jl 0x00007f83c8bb2f68 ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0} ; - Test::test0 at 6 (line 10) What I see: `lea` computes address, stores to register `r10`. Move value to `r8`, do alignment check `test $0x7,%r8b`, which checks for 8 byte alignment. We do the same check again with `r10b`, since we use the same address for load and store. And then we directly load/store with those register values: vpaddd (%r10),%zmm5,%zmm0 vmovdqu32 %zmm0,(%r8) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1441398603 From epeter at openjdk.org Thu Jan 4 07:00:48 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 4 Jan 2024 07:00:48 GMT Subject: RFR: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs [v59] In-Reply-To: References: Message-ID: > I want to push this in JDK23. > After this fix here, I'm doing [this refactoring](https://github.com/openjdk/jdk/pull/16620). > > To calm your nerves: most of the changes are in auto-generated tests, and tests in general. > > **Context** > > `-XX:+AlignVector` ensures that SuperWord only creates LoadVector and StoreVector that can be memory aligned. This is achieved by iterating in the pre-loop until we reach the alignment boundary, then we can start the main loop properly aligned. However, this is not possible in all cases, sometimes some memory accesses cannot be guaranteed to be aligned, and we need to reject vectorization (at least partially, for some of the packs). > > Alignment is split into two tasks: > - Alignment Correctness Checks: only relevant if `-XX:+AlignVector`. Need to reject vectorization if alignment is not possible. We must check if the address of the vector load/store is aligned with (divisible by) `ObjectAlignmentInBytes`. > - Alignment by adjusting pre-loop limit: alignment is desirable even if `-XX:-AlignVector`. We would like to align the vectors with their vector width. > > **Problem** > > I have recently found a bug with our AlignVector [JDK-8310190](https://bugs.openjdk.org/browse/JDK-8310190). > In that bug, we perform a misaligned memory vector access, which results in a `SIGBUS` on an ARM32 machine. > Thanks @fg1417 for confirming this! > Hence, we need to fix the alignment correctness checks. > > While working on this task, I also found some bugs in the "alignment by adjusting pre-loop limit": there were cases where it did not align the vectors correctly. > > **Problem Details** > > Reproducer: > > > static void test(short[] a, short[] b, short mask) { > for (int i = 0; i < RANGE; i+=8) { > // Problematic for AlignVector > b[i+0] = (short)(a[i+0] & mask); // best_memref, align 0 > > b[i+3] = (short)(a[i+3] & mask); // pack at offset 6 bytes > b[i+4] = (short)(a[i+4] & mask); > b[i+5] = (short)(a[i+5] & mask); > b[i+6] = (short)(a[i+6] & mask); > } > } > > > During `SuperWord::find_adjacent_refs` we used to check if the references are expected to be aligned. For that, we look at each "group" of references (eg all `LoadS`) and take the reference with the lowest offset. For that chosen reference, we check if it is alignable. If yes, we accept all references of that group, if no we reject all. > > This is problematic as shown in this example. We have references at index offset `0, 3, 4, 5, 6`, and by... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: some minor changes for Vladimir ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14785/files - new: https://git.openjdk.org/jdk/pull/14785/files/d01a0cd9..aef48ab4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14785&range=58 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14785&range=57-58 Stats: 11 lines in 3 files changed: 9 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/14785.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14785/head:pull/14785 PR: https://git.openjdk.org/jdk/pull/14785 From chagedorn at openjdk.org Thu Jan 4 07:24:25 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 4 Jan 2024 07:24:25 GMT Subject: [jdk22] RFR: 8321599: Data loss in AVX3 Base64 decoding In-Reply-To: References: Message-ID: On Thu, 4 Jan 2024 06:13:11 GMT, Tobias Hartmann wrote: > Hi all, > > This pull request contains a backport of commit [13c11487](https://github.com/openjdk/jdk/commit/13c11487f7126a370d9ce8e62f661ea83eedefe6) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Scott Gibbons on 4 Jan 2024 and was reviewed by Sandhya Viswanathan and Vladimir Kozlov. > > Thanks! Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk22/pull/28#pullrequestreview-1803615430 From thartmann at openjdk.org Thu Jan 4 07:52:21 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 4 Jan 2024 07:52:21 GMT Subject: [jdk22] RFR: 8321599: Data loss in AVX3 Base64 decoding In-Reply-To: References: Message-ID: <32lBHdvCnLiTa1NAYA40iq4Cq8YrZSBhLkkHr8qOgvY=.16f829c4-bc5e-4f76-adc3-2f54441c7a01@github.com> On Thu, 4 Jan 2024 06:13:11 GMT, Tobias Hartmann wrote: > Hi all, > > This pull request contains a backport of commit [13c11487](https://github.com/openjdk/jdk/commit/13c11487f7126a370d9ce8e62f661ea83eedefe6) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Scott Gibbons on 4 Jan 2024 and was reviewed by Sandhya Viswanathan and Vladimir Kozlov. > > Thanks! Thanks, Christian! ------------- PR Comment: https://git.openjdk.org/jdk22/pull/28#issuecomment-1876649448 From roland at openjdk.org Thu Jan 4 08:12:22 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 4 Jan 2024 08:12:22 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v4] In-Reply-To: References: Message-ID: > This change implements C2 optimizations for calls to > ScopedValue.get(). Indeed, in: > > > v1 = scopedValue.get(); > ... > v2 = scopedValue.get(); > > > `v2` can be replaced by `v1` and the second call to `get()` can be > optimized out. That's true whatever is between the 2 calls unless a > new mapping for `scopedValue` is created in between (when that happens > no optimizations is performed for the method being compiled). Hoisting > a `get()` call out of loop for a loop invariant `scopedValue` should > also be legal in most cases. > > `ScopedValue.get()` is implemented in java code as a 2 step process. A > cache is attached to the current thread object. If the `ScopedValue` > object is in the cache then the result from `get()` is read from > there. Otherwise a slow call is performed that also inserts the > mapping in the cache. The cache itself is lazily allocated. One > `ScopedValue` can be hashed to 2 different indexes in the cache. On a > cache probe, both indexes are checked. As a consequence, the process > of probing the cache is a multi step process (check if the cache is > present, check first index, check second index if first index > failed). If the cache is populated early on, then when the method that > calls `ScopedValue.get()` is compiled, profile reports the slow path > as never taken and only the read from the cache is compiled. > > To perform the optimizations, I added 3 new node types to C2: > > - the pair > ScopedValueGetHitsInCacheNode/ScopedValueGetLoadFromCacheNode for > the cache probe > > - a cfg node ScopedValueGetResultNode to help locate the result of the > `get()` call in the IR graph. > > In pseudo code, once the nodes are inserted, the code of a `get()` is: > > > hits_in_the_cache = ScopedValueGetHitsInCache(scopedValue) > if (hits_in_the_cache) { > res = ScopedValueGetLoadFromCache(hits_in_the_cache); > } else { > res = ..; //slow call possibly inlined. Subgraph can be arbitray complex > } > res = ScopedValueGetResult(res) > > > In the snippet: > > > v1 = scopedValue.get(); > ... > v2 = scopedValue.get(); > > > Replacing `v2` by `v1` is then done by starting from the > `ScopedValueGetResult` node for the second `get()` and looking for a > dominating `ScopedValueGetResult` for the same `ScopedValue` > object. When one is found, it is used as a replacement. Eliminating > the second `get()` call is achieved by making > `ScopedValueGetHitsInCache` always successful if there's a dominating > `ScopedValueGetResult` and replacing its companion > `ScopedValueGetLoadFromCache` by the dominating > `ScopedValueGetResult`. > > Hoisting a `g... Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: merge fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16966/files - new: https://git.openjdk.org/jdk/pull/16966/files/c7d1fe84..28fa7f74 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16966&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16966&range=02-03 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16966.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16966/head:pull/16966 PR: https://git.openjdk.org/jdk/pull/16966 From roland at openjdk.org Thu Jan 4 08:08:24 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 4 Jan 2024 08:08:24 GMT Subject: RFR: 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 [v8] In-Reply-To: References: Message-ID: On Wed, 3 Jan 2024 14:46:10 GMT, Aleksey Shipilev wrote: > There are a couple of GHA failures, and those are probably resolved in current master. It would be helpful if you can pull from current master and get a clean run. Done. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16886#issuecomment-1876674519 From epeter at openjdk.org Thu Jan 4 08:18:38 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 4 Jan 2024 08:18:38 GMT Subject: RFR: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs [v58] In-Reply-To: References: Message-ID: <9zUqzeJrnPyjjEC0_F9Z5OWzHLqnTN_5c1bzOiK-LqA=.050c0992-7f7f-48f3-b2da-ec81aafa41a6@github.com> On Wed, 3 Jan 2024 18:38:16 GMT, Vladimir Kozlov wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 124 commits: >> >> - Merge branch 'JDK-8311586' of https://github.com/eme64/jdk into JDK-8311586 >> - Apply suggestions from code review by Christian >> >> Co-authored-by: Christian Hagedorn >> - fix copyright year 2024 >> - Merge branch 'master' into JDK-8311586 >> - more comments in SuperWord::adjust_pre_loop_limit_to_align_main_loop_vectors >> - comments about modulo positive / negative values >> - Apply suggestions from code review from Christian >> >> Co-authored-by: Christian Hagedorn >> - more small fixes by Christian >> - fix for yesterday's reviews by Christian >> - improve case analysis empty / constrained / trivial >> - ... and 114 more: https://git.openjdk.org/jdk/compare/06dd7353...d01a0cd9 > > src/hotspot/share/opto/compile.cpp line 1059: > >> 1057: >> 1058: if (AllowVectorizeOnDemand) { >> 1059: if (has_method() && _directive->VectorizeOption) { > > This seems no related. Please explain it. This is my justification in the PR description: > Other Details > > I made VectorizeDebugOption a debug print only flag now. Before this fix, it also had the same effect as VectorizeOption (which ensures that only nodes from the same original pre-unrolling node are packed, preventing hand-unrolled code to be vectorized but enabling some edge cases to be vectorized that would not otherwise vectorize). > > I added is_trace_align_vector with bit 128, since 64 was recently used for is_trace_loop_reverse, removed with [JDK-8309204](https://bugs.openjdk.org/browse/JDK-8309204). > > I plan to refactor VectorizeDebugOption soon, as it now has a few subflags / bits that are not used. I may also refactor how TraceSuperWord works in general. Filed [JDK-8317572](https://bugs.openjdk.org/browse/JDK-8317572). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1441455977 From epeter at openjdk.org Thu Jan 4 08:28:36 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 4 Jan 2024 08:28:36 GMT Subject: RFR: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs [v58] In-Reply-To: References: Message-ID: On Thu, 4 Jan 2024 06:55:46 GMT, Emanuel Peter wrote: >> May be hack only `loadV` and `storeV` instructions in .ad file to use `lea` and do the check. > > I don't understand this comment. > The `LoadVector` and `StoreVector` have both a `MemNode::Address` input, which I think it the memory address. > The address itself usually consists of `AddP` nodes, which do the (base, index, offset) computation. These nodes later can be folded into the load/store itself, or be computed with a `lea`. > I simply take the address value, check it for alignment and pass it on to the load/store. > > Take this example: > > public class Test { > static int RANGE = 1024*64; > > public static void main(String[] strArr) { > int a[] = new int[RANGE]; > test0(a); > } > > static void test0(int[] a) { > for (int i = 0; i < RANGE; i++) { > a[i]++; > } > } > } > > > `./java -XX:CompileCommand=compileonly,Test::test* -XX:+TraceSuperWord -Xcomp -XX:+PrintIdeal -XX:+AlignVector -XX:+VerifyAlignVector -XX:CompileCommand=print,Test::test* Test.java` > > This looks like the main loop: > > ;; B22: # out( B22 B23 ) <- in( B21 B22 ) Loop( B22-B22 inner post of N743) Freq: 4.49988 > 0x00007f83c8bb2f68: lea 0x10(%rbp,%rbx,4),%r10 ;*iaload {reexecute=0 rethrow=0 return_oop=0} > ; - Test::test0 at 12 (line 11) > 0x00007f83c8bb2f6d: mov %r10,%r8 > 0x00007f83c8bb2f70: test $0x7,%r8b > 0x00007f83c8bb2f74: je 0x00007f83c8bb2f8a > 0x00007f83c8bb2f76: movabs $0x7f83d77e2fc8,%rdi ; {external_word} > 0x00007f83c8bb2f80: and $0xfffffffffffffff0,%rsp > 0x00007f83c8bb2f84: callq 0x00007f83d71a0162 ; {runtime_call MacroAssembler::debug64(char*, long, long*)} > 0x00007f83c8bb2f89: hlt > 0x00007f83c8bb2f8a: test $0x7,%r10b > 0x00007f83c8bb2f8e: je 0x00007f83c8bb2fa4 > 0x00007f83c8bb2f90: movabs $0x7f83d77e2fc8,%rdi ; {external_word} > 0x00007f83c8bb2f9a: and $0xfffffffffffffff0,%rsp > 0x00007f83c8bb2f9e: callq 0x00007f83d71a0162 ; {runtime_call MacroAssembler::debug64(char*, long, long*)} > 0x00007f83c8bb2fa3: hlt > 0x00007f83c8bb2fa4: vpaddd (%r10),%zmm5,%zmm0 > 0x00007f83c8bb2faa: vmovdqu32 %zmm0,(%r8) ;*iastore {reexecute=0 rethrow=0 return_oop=0} > ; - Test::test0 at 15 (line 11) > 0x00007f83c8bb2fb0: add $0x10,%ebx ;*iinc {reexecute=0 rethrow=0 return_oop=0} > ; - Test::test0 at 16 (line 10) > 0x00007f83c8bb2fb3: cmp %r11d... And without `-XX:-VerifyAlignVector` ;; B20: # out( B20 B21 ) <- in( B19 B20 ) Loop( B20-B20 inner post of N743) Freq: 4.49988 0x00007ff22cbb2924: vpaddd 0x10(%rbx,%r13,4),%zmm0,%zmm1 0x00007ff22cbb292f: vmovdqu32 %zmm1,0x10(%rbx,%r13,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0} ; - Test::test0 at 15 (line 11) 0x00007ff22cbb293a: add $0x10,%r13d ;*iinc {reexecute=0 rethrow=0 return_oop=0} ; - Test::test0 at 16 (line 10) 0x00007ff22cbb293e: cmp %r10d,%r13d 0x00007ff22cbb2941: jl 0x00007ff22cbb2924 ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0} ; - Test::test0 at 6 (line 10) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1441463566 From shade at openjdk.org Thu Jan 4 08:51:29 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 4 Jan 2024 08:51:29 GMT Subject: RFR: 8320128: Clean up Parse constructor for OSR [v3] In-Reply-To: References: <0mxhhLR9DDdpiBhbXWPxCmMzo_OI9DYiIX36xGhxC_c=.1eed99b9-2dc3-4403-b3c1-22b6ebed6079@github.com> Message-ID: <2T5HC8Dvq7RUNkNeYMgFWk5niXTiPnE2k4RDlE3BJZs=.1ec18356-631f-4028-af37-f2ad0d8ec05c@github.com> On Wed, 3 Jan 2024 20:04:17 GMT, Xin Liu wrote: >> There's a special case for the constructor of Parse. If current compilation is OSR and it is handling the top-level method(depth() == 1), then >> >> 1. _tf = C->tf(); >> 2. _entry_bci = C->entry_bci(); >> 3. _flow = method()->get_osr_flow_analysis(_entry_bci); >> >> We don't need to assign those member data twice. We can also factor out _flow->failing() for the special case and normal cases. >> >> It's worth mentioning that we can't save ciTypeFlow computation because >> get_osr_flow_analysis(_entry_bci) actually needs get_flow_analysis(method()). > > Xin Liu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Use atomic logline and resume #ifdef ASSERT. > - Merge branch 'master' into JDK-8320128 > - Update according to reviewer's feedback. > - 8320128: Clean up Parse constructor for OSR src/hotspot/share/opto/parse1.cpp line 511: > 509: if (PrintOpto && (Verbose || WizardMode)) { > 510: if (is_osr_parse()) { > 511: tty->print("OSR @%d type flow bailout: %s", _entry_bci, _flow->failure_reason()); Should be `print_cr`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16669#discussion_r1441483574 From shade at openjdk.org Thu Jan 4 08:47:22 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 4 Jan 2024 08:47:22 GMT Subject: RFR: 8322694: C1: Handle Constant and IfOp in NullCheckEliminator [v2] In-Reply-To: References: <1vvyuwLRjlWItKwCyighjCSM5SNbO4CSEE59hQtCU24=.b4783e52-328a-4ce3-8c92-7b736cea7546@github.com> Message-ID: On Wed, 3 Jan 2024 13:32:32 GMT, Denghui Dong wrote: >> src/hotspot/share/c1/c1_Optimizer.cpp line 888: >> >>> 886: mark_visitable(instr); >>> 887: if (instr->is_pinned() || instr->can_trap() || (instr->as_NullCheck() != nullptr) >>> 888: || (instr->as_Constant() != nullptr && instr->as_Constant()->type()->is_object())) { >> >> Is this just `instr->as_ObjectConstant() != nullptr`? > > Do you mean `insr->type()->as_ObjectConstant() != nullptr`? > But we should include other Constants (e.g. `ArrayConstant`, `InstanceConstant`), and those classes don't implement `as_ObjectConstant` Ah, OK then. Yes, I thought ObjectConstant includes ArrayConstant and InstanceConstant. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17191#discussion_r1441479817 From davleopo at openjdk.org Thu Jan 4 08:56:24 2024 From: davleopo at openjdk.org (David Leopoldseder) Date: Thu, 4 Jan 2024 08:56:24 GMT Subject: RFR: 8322636: [JVMCI] HotSpotSpeculationLog can be inconsistent across a single compile [v2] In-Reply-To: References: Message-ID: <-1NI3HcMGiPeVKGOxV7AYi9Zd_hVjO7OEhLOIebDCxc=.d51be9ba-3353-471a-82d5-fdbe6bf74271@github.com> On Tue, 2 Jan 2024 13:37:19 GMT, David Leopoldseder wrote: >> This PR fixes a subtle inconsistency in `HotSpotSpeculationLog` . >> >> Normal uses of `HotSpotSpeculationLog` work by using a `SpeculationReason` and asking the speculation log via `maySpeculate` if the speculation can be performed, i.e., if it failed before for the given method. An example for this can be seen in Graal https://github.com/oracle/graal/blob/master/compiler/src/jdk.graal.compiler/src/jdk/graal/compiler/nodes/loop/CountedLoopInfo.java#L591C15-L591C15 >> The implicit assumption is that the speculation log, `HotSpotSpeculationLog` in particular collects failed speculations at the beginning of a compile and then stays consistent during the compile. Why is that? - Because if there are new failed speculations added to the failed speculations during the compile - the compiler would speculate again on those in an inconsistent way. E.g. at the beginning of a compile a certain speculation has not failed yet and the compiler thinks it can do optimization xyz using a speculation - later during the compilation process it consults the speculation log but gets a different answer. All those inconsistent speculations that already failed will anyway later fail code installation in jvmci (they will throw a bailout during `HotSpotCodeCacheProvider#installCode` https://github.com/openjdk/jdk/blob/master/src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotSpeculationLog.java#L192 ). Thus, we should at least return a consistent result durin g a compile. >> The problem for consistency here, that also makes troubles on the graal side, is that `maySpeculate` itself can collect failed speculations if there have not been any previously, i.e., `failedSpeculations == null`. >> In order to make the speculation log consistent across an entire JVMCI compile this PR removes the collection of failed speculations in `maySpeculate`. > > David Leopoldseder has updated the pull request incrementally with one additional commit since the last revision: > > 8322636: [JVMCI] HotSpotSpeculationLog add javadoc to maySpeculate I did not consider the substrate runtime compilation use case - that may actually lead to the same error of inconsistency we have seen as here. Probably not relevant now but if it ever pops up we need to relax the invariant on the graal side then. Regarding doc changes - what is our final call now ? (a) drop all new doc again or (b) keep (whatever) form of the new doc I added? - Its hotspot specific so not wrong. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17183#issuecomment-1876726238 From mli at openjdk.org Thu Jan 4 09:17:28 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 4 Jan 2024 09:17:28 GMT Subject: RFR: 8322959: vectorapi: get wrong argument for `limit` in indexPartiallyInUpperRange intrinsic In-Reply-To: References: Message-ID: On Wed, 3 Jan 2024 17:10:07 GMT, Sandhya Viswanathan wrote: >> Hi, >> Can you review this simple fix for indexPartiallyInUpperRange intrinsic? >> Thanks. > > src/hotspot/share/opto/vectorIntrinsics.cpp line 3151: > >> 3149: >> 3150: Node* offset = argument(3); >> 3151: Node* limit = argument(4); > > The offset is of long type so will take 2 spots (3 and 4) of argument. So limit will be argument(5). The original code (limit = argument(5)) looks correct to me. @sviswa7 Oh, thanks for correct me, I did not realise this. @vnkozlov I did run the tests `test/jdk/jdk/incubator/vector` and `test/hotspot/jtreg/compiler/vectorapi/` after applying the patch. I thought it's because this intrinsic is not covered yet, but seems I'm wrong. I will close this pr and bug later. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17247#discussion_r1441509274 From mli at openjdk.org Thu Jan 4 09:17:29 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 4 Jan 2024 09:17:29 GMT Subject: Withdrawn: 8322959: vectorapi: get wrong argument for `limit` in indexPartiallyInUpperRange intrinsic In-Reply-To: References: Message-ID: On Wed, 3 Jan 2024 16:05:48 GMT, Hamlin Li wrote: > Hi, > Can you review this simple fix for indexPartiallyInUpperRange intrinsic? > Thanks. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/17247 From thartmann at openjdk.org Thu Jan 4 09:19:21 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 4 Jan 2024 09:19:21 GMT Subject: [jdk22] Integrated: 8321599: Data loss in AVX3 Base64 decoding In-Reply-To: References: Message-ID: On Thu, 4 Jan 2024 06:13:11 GMT, Tobias Hartmann wrote: > Hi all, > > This pull request contains a backport of commit [13c11487](https://github.com/openjdk/jdk/commit/13c11487f7126a370d9ce8e62f661ea83eedefe6) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Scott Gibbons on 4 Jan 2024 and was reviewed by Sandhya Viswanathan and Vladimir Kozlov. > > Thanks! This pull request has now been integrated. Changeset: b8c88a3e Author: Tobias Hartmann URL: https://git.openjdk.org/jdk22/commit/b8c88a3e9129bd2f976a8c7631d754fed0765324 Stats: 124 lines in 2 files changed: 121 ins; 0 del; 3 mod 8321599: Data loss in AVX3 Base64 decoding Reviewed-by: chagedorn Backport-of: 13c11487f7126a370d9ce8e62f661ea83eedefe6 ------------- PR: https://git.openjdk.org/jdk22/pull/28 From epeter at openjdk.org Thu Jan 4 10:42:37 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 4 Jan 2024 10:42:37 GMT Subject: RFR: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs [v58] In-Reply-To: <9zUqzeJrnPyjjEC0_F9Z5OWzHLqnTN_5c1bzOiK-LqA=.050c0992-7f7f-48f3-b2da-ec81aafa41a6@github.com> References: <9zUqzeJrnPyjjEC0_F9Z5OWzHLqnTN_5c1bzOiK-LqA=.050c0992-7f7f-48f3-b2da-ec81aafa41a6@github.com> Message-ID: On Thu, 4 Jan 2024 08:15:58 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/compile.cpp line 1059: >> >>> 1057: >>> 1058: if (AllowVectorizeOnDemand) { >>> 1059: if (has_method() && _directive->VectorizeOption) { >> >> This seems no related. Please explain it. > > This is my justification in the PR description: > >> Other Details >> >> I made VectorizeDebugOption a debug print only flag now. Before this fix, it also had the same effect as VectorizeOption (which ensures that only nodes from the same original pre-unrolling node are packed, preventing hand-unrolled code to be vectorized but enabling some edge cases to be vectorized that would not otherwise vectorize). >> >> I added is_trace_align_vector with bit 128, since 64 was recently used for is_trace_loop_reverse, removed with [JDK-8309204](https://bugs.openjdk.org/browse/JDK-8309204). >> >> I plan to refactor VectorizeDebugOption soon, as it now has a few subflags / bits that are not used. I may also refactor how TraceSuperWord works in general. Filed [JDK-8317572](https://bugs.openjdk.org/browse/JDK-8317572). If you really want, then I can not touch `VectorizeDebugOption` at all, i.e. not activate `is_trace_align_vector` with that flag, but instead simply use `TraceSuperWord` (that might be a little verbose though). I already have the CSR for [JDK-8315361](https://bugs.openjdk.org/browse/JDK-8315361), so that I can remove `VectorizeDebugOption`. This has 2 effects: 1. remove the product effect of `VectorizeDebugOption`, which is the same effect as enabling `VectorizeOption`. 2. introduce a more general auto-vectorization tracing flag that allows more fine-grained control for debug printing. My idea here was to simply add the alignment tracing to `VectorizeDebugOption`. But currently one cannot enable that tracing without having the side-effects that also `VectorizeOption` has. Hence, I already now remove that product-side effect. @vnkozlov what do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1441592772 From shade at openjdk.org Thu Jan 4 10:49:22 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 4 Jan 2024 10:49:22 GMT Subject: RFR: 8322976: Remove reference to transform_no_reclaim In-Reply-To: References: Message-ID: On Wed, 3 Jan 2024 22:12:03 GMT, Joshua Cao wrote: > Passes hotspot:tier1 locally Looks good and trivial. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17255#pullrequestreview-1803913477 From epeter at openjdk.org Thu Jan 4 12:38:22 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 4 Jan 2024 12:38:22 GMT Subject: RFR: 8322735: C2: minor improvements of bubble sort used in SuperWord::packset_sort In-Reply-To: References: Message-ID: On Mon, 25 Dec 2023 15:34:55 GMT, Denghui Dong wrote: > A minor improvement could be made for bubble sort in SuperWord::packset_sort to reduce the comparison count in bad cases. > > See https://en.wikipedia.org/wiki/Bubble_sort @D-D-H can you explain what this improves? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17190#issuecomment-1877028706 From thartmann at openjdk.org Thu Jan 4 12:44:36 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 4 Jan 2024 12:44:36 GMT Subject: RFR: 8310844: [AArch64] C1 compilation fails because monitor offset in OSR buffer is too large for immediate Message-ID: <6vNa0ZYwKL0HXjHjWTyIe6kB5hza1aaXlJbxM4qh8-w=.edc55dc9-08ff-4861-9038-d4b150bcf4f7@github.com> [JDK-8287349](https://bugs.openjdk.org/browse/JDK-8287349) changed the code in `LIR_Assembler::osr_entry()` to use a single `ldp` instruction instead of two `ldr` instructions to load the monitor lock and object from the OSR state. This is not correct because the `ldp` instruction only supports a [7-bit signed immediate value](https://developer.arm.com/documentation/den0024/a/The-A64-instruction-set/Memory-access-instructions/Accessing-multiple-memory-locations). If the offset is larger, for example due to a large number of locals as in `TestLargeMonitorOffset::test`, we hit the `Field too big for insn` guarantee. I suggest to revert [JDK-8287349](https://bugs.openjdk.org/browse/JDK-8287349). I also found two unrelated bugs when working on the reproducer: [JDK-8322992](https://bugs.openjdk.org/browse/JDK-8322992) (javac) and [JDK-8322996](https://bugs.openjdk.org/browse/JDK-8322996) (C2). Thanks, Tobias ------------- Commit messages: - 8310844: [AArch64] C1 compilation fails because monitor offset in OSR buffer is too large for immediate Changes: https://git.openjdk.org/jdk/pull/17266/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17266&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8310844 Stats: 150 lines in 2 files changed: 147 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/17266.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17266/head:pull/17266 PR: https://git.openjdk.org/jdk/pull/17266 From thartmann at openjdk.org Thu Jan 4 13:09:22 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 4 Jan 2024 13:09:22 GMT Subject: RFR: 8322781: C1: Debug build crash in GraphBuilder::vmap() when print stats [v3] In-Reply-To: References: Message-ID: On Wed, 3 Jan 2024 07:50:01 GMT, Denghui Dong wrote: >> Hi, >> >> Could I have a review of this fix patch that fixes a crash problem in the debug build when -XX:+PrintValueNumbering -XX:+Verbose -XX:-UseLocalValueNumbering >> >> Thanks > > Denghui Dong has updated the pull request incrementally with two additional commits since the last revision: > > - update > - update Thanks, looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17205#pullrequestreview-1804124535 From thartmann at openjdk.org Thu Jan 4 13:07:25 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 4 Jan 2024 13:07:25 GMT Subject: RFR: 8322976: Remove reference to transform_no_reclaim In-Reply-To: References: Message-ID: On Wed, 3 Jan 2024 22:12:03 GMT, Joshua Cao wrote: > Passes hotspot:tier1 locally Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17255#pullrequestreview-1804119363 From ddong at openjdk.org Thu Jan 4 13:14:34 2024 From: ddong at openjdk.org (Denghui Dong) Date: Thu, 4 Jan 2024 13:14:34 GMT Subject: RFR: 8322781: C1: Debug build crash in GraphBuilder::vmap() when print stats [v3] In-Reply-To: References: Message-ID: On Wed, 3 Jan 2024 07:50:01 GMT, Denghui Dong wrote: >> Hi, >> >> Could I have a review of this fix patch that fixes a crash problem in the debug build when -XX:+PrintValueNumbering -XX:+Verbose -XX:-UseLocalValueNumbering >> >> Thanks > > Denghui Dong has updated the pull request incrementally with two additional commits since the last revision: > > - update > - update Thanks for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17205#issuecomment-1877073238 From ddong at openjdk.org Thu Jan 4 13:14:35 2024 From: ddong at openjdk.org (Denghui Dong) Date: Thu, 4 Jan 2024 13:14:35 GMT Subject: Integrated: 8322781: C1: Debug build crash in GraphBuilder::vmap() when print stats In-Reply-To: References: Message-ID: On Fri, 29 Dec 2023 15:02:21 GMT, Denghui Dong wrote: > Hi, > > Could I have a review of this fix patch that fixes a crash problem in the debug build when -XX:+PrintValueNumbering -XX:+Verbose -XX:-UseLocalValueNumbering > > Thanks This pull request has now been integrated. Changeset: 27d5f5c2 Author: Denghui Dong URL: https://git.openjdk.org/jdk/commit/27d5f5c237910bc3d2df62367d2e0a83c1132885 Stats: 14 lines in 2 files changed: 13 ins; 0 del; 1 mod 8322781: C1: Debug build crash in GraphBuilder::vmap() when print stats Reviewed-by: kvn, thartmann, shade ------------- PR: https://git.openjdk.org/jdk/pull/17205 From epeter at openjdk.org Thu Jan 4 13:45:26 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 4 Jan 2024 13:45:26 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2] In-Reply-To: References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Message-ID: On Thu, 4 Jan 2024 05:39:01 GMT, Jatin Bhateja wrote: >> Hi, >> >> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only targets. >> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 instruction set. >> These are very frequently used APIs in columnar database filter operation. >> >> Implementation uses a lookup table to record permute indices. Table index is computed using >> mask argument of compress/expand operation. >> >> Following are the performance number of JMH micro included with the patch. >> >> >> System : Intel(R) Xeon(R) Platinum 8480+ (Sapphire Rapids) >> >> Baseline: >> Benchmark (size) Mode Cnt Score Error Units >> ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 142.767 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 71.436 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 35.992 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 182.151 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 91.096 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 44.757 ops/ms >> ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 184.099 ops/ms >> ColumnFilterBenchmark.filterIntColumn 2047 thrpt 2 91.981 ops/ms >> ColumnFilterBenchmark.filterIntColumn 4096 thrpt 2 45.170 ops/ms >> ColumnFilterBenchmark.filterLongColumn 1024 thrpt 2 148.017 ops/ms >> ColumnFilterBenchmark.filterLongColumn 2047 thrpt 2 73.516 ops/ms >> ColumnFilterBenchmark.filterLongColumn 4096 thrpt 2 36.844 ops/ms >> >> Withopt: >> Benchmark (size) Mode Cnt Score Error Units >> ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 2051.707 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 914.072 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 489.898 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 5324.195 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 2587.229 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 1278.665 ops/ms >> ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 4149.384 ops/ms >> ColumnFilterBenchmark.filterIntColumn 2047 thrpt ... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Updating copyright year of modified files. @jatin-bhateja this looks like a great improvement! I have a few questions and requests below. FYI, this feels very inspiring. I'm dreaming of a day where we could do this filtering in the auto-vectorizer directly. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 5303: > 5301: // Blend the results with zero vector using permute vector as mask, its > 5302: // non-participating lanes holds a -1 value. > 5303: vblendvps(dst, dst, xtmp, permv, vec_enc); would you mind adding a few more comments to explain what happens here? I would also really appreciate more expressive register/variable names. I think you are basically converting the `mask` to a permutation `permv`, by a lookup in the table. Then you permute the `src` and blend it with a -1 vector, so that the unused (high) lanes are -1. xtmp -> min_one rtmp -> table_index rscratch -> table_adr src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 5307: > 5305: assert(bt == T_LONG || bt == T_DOUBLE, ""); > 5306: vmovmskpd(rtmp, mask, vec_enc); > 5307: shlq(rtmp, 5); Might this need to be 6? If I understand right, then you want to have a 64bit stride, hence 2^6, right? If that is correct, then this did not show in your tests, and you need a regression test anyway. src/hotspot/cpu/x86/c2_MacroAssembler_x86.hpp line 488: > 486: KRegister ktmp1, int vec_enc); > 487: > 488: Remove useless empty line src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 957: > 955: __ align(CodeEntryAlignment); > 956: StubCodeMark mark(this, "StubRoutines", stub_name); > 957: address start = __ pc(); Could you please add some comments here why you are filling the data like this? Presumably, you are emitting 32 bits and 64 bits respectively, right? So the cells have different size, correct? test/micro/org/openjdk/bench/jdk/incubator/vector/ColumnFilterBenchmark.java line 76: > 74: longinCol = new long[size]; > 75: longoutCol = new long[size]; > 76: lpivot = size / 2; I'd be interested to see what happens if you move up or down the "density" of elements that you accept. Would the simple branch prediction be faster if the density is low enough, i.e. we almost take no element. Though maybe that is not compiler problem but a user-problem? test/micro/org/openjdk/bench/jdk/incubator/vector/ColumnFilterBenchmark.java line 94: > 92: IntVector vec = IntVector.fromArray(ispecies, intinCol, i); > 93: VectorMask pred = vec.compare(VectorOperators.GT, ipivot); > 94: vec.compress(pred).intoArray(intoutCol, j); Could there be equivalent `expand` tests? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17261#pullrequestreview-1804121213 PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1441749005 PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1441761312 PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1441724949 PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1441759984 PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1441753158 PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1441729256 From epeter at openjdk.org Thu Jan 4 13:45:27 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 4 Jan 2024 13:45:27 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2] In-Reply-To: References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Message-ID: On Thu, 4 Jan 2024 13:09:30 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Updating copyright year of modified files. > > test/micro/org/openjdk/bench/jdk/incubator/vector/ColumnFilterBenchmark.java line 94: > >> 92: IntVector vec = IntVector.fromArray(ispecies, intinCol, i); >> 93: VectorMask pred = vec.compare(VectorOperators.GT, ipivot); >> 94: vec.compress(pred).intoArray(intoutCol, j); > > Could there be equivalent `expand` tests? And what about some result verification? Or is there another test that does that? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1441750595 From aph at openjdk.org Thu Jan 4 14:11:25 2024 From: aph at openjdk.org (Andrew Haley) Date: Thu, 4 Jan 2024 14:11:25 GMT Subject: RFR: 8310844: [AArch64] C1 compilation fails because monitor offset in OSR buffer is too large for immediate In-Reply-To: <6vNa0ZYwKL0HXjHjWTyIe6kB5hza1aaXlJbxM4qh8-w=.edc55dc9-08ff-4861-9038-d4b150bcf4f7@github.com> References: <6vNa0ZYwKL0HXjHjWTyIe6kB5hza1aaXlJbxM4qh8-w=.edc55dc9-08ff-4861-9038-d4b150bcf4f7@github.com> Message-ID: On Thu, 4 Jan 2024 12:39:18 GMT, Tobias Hartmann wrote: > [JDK-8287349](https://bugs.openjdk.org/browse/JDK-8287349) changed the code in `LIR_Assembler::osr_entry()` to use a single `ldp` instruction instead of two `ldr` instructions to load the monitor lock and object from the OSR state. This is not correct because the `ldp` instruction only supports a [7-bit signed immediate value](https://developer.arm.com/documentation/den0024/a/The-A64-instruction-set/Memory-access-instructions/Accessing-multiple-memory-locations). If the offset is larger, for example due to a large number of locals as in `TestLargeMonitorOffset::test`, we hit the `Field too big for insn` guarantee. > > I suggest to revert [JDK-8287349](https://bugs.openjdk.org/browse/JDK-8287349). > > I also found two unrelated bugs when working on the reproducer: [JDK-8322992](https://bugs.openjdk.org/browse/JDK-8322992) (javac) and [JDK-8322996](https://bugs.openjdk.org/browse/JDK-8322996) (C2). > > Thanks, > Tobias src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp line 289: > 287: __ ldr(r19, Address(OSR_buf, slot_offset + 1*BytesPerWord)); > 288: __ str(r19, frame_map()->address_for_monitor_object(i)); > 289: } The macro assembler automagically fuses `ldr` pairs. It'd be better to fix this with: --- a/src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp +++ b/src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp @@ -282,7 +282,8 @@ void LIR_Assembler::osr_entry() { __ bind(L); } #endif - __ ldp(r19, r20, Address(OSR_buf, slot_offset)); + __ ldr(r19, Address(OSR_buf, slot_offset)); + __ ldr(r20, Address(OSR_buf, slot_offset + BytesPerWord)); __ str(r19, frame_map()->address_for_monitor_lock(i)); __ str(r20, frame_map()->address_for_monitor_object(i)); } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17266#discussion_r1441789599 From thartmann at openjdk.org Thu Jan 4 14:20:39 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 4 Jan 2024 14:20:39 GMT Subject: RFR: 8310844: [AArch64] C1 compilation fails because monitor offset in OSR buffer is too large for immediate [v2] In-Reply-To: <6vNa0ZYwKL0HXjHjWTyIe6kB5hza1aaXlJbxM4qh8-w=.edc55dc9-08ff-4861-9038-d4b150bcf4f7@github.com> References: <6vNa0ZYwKL0HXjHjWTyIe6kB5hza1aaXlJbxM4qh8-w=.edc55dc9-08ff-4861-9038-d4b150bcf4f7@github.com> Message-ID: <2Hnw1VDu1Ml2XB-VfUV8l-t8dtrMzPNh3RyNPfV-tgA=.fc400df2-9529-4bdd-8e0c-fb49ca4f0b48@github.com> > [JDK-8287349](https://bugs.openjdk.org/browse/JDK-8287349) changed the code in `LIR_Assembler::osr_entry()` to use a single `ldp` instruction instead of two `ldr` instructions to load the monitor lock and object from the OSR state. This is not correct because the `ldp` instruction only supports a [7-bit signed immediate value](https://developer.arm.com/documentation/den0024/a/The-A64-instruction-set/Memory-access-instructions/Accessing-multiple-memory-locations). If the offset is larger, for example due to a large number of locals as in `TestLargeMonitorOffset::test`, we hit the `Field too big for insn` guarantee. > > I suggest to revert [JDK-8287349](https://bugs.openjdk.org/browse/JDK-8287349). > > I also found two unrelated bugs when working on the reproducer: [JDK-8322992](https://bugs.openjdk.org/browse/JDK-8322992) (javac) and [JDK-8322996](https://bugs.openjdk.org/browse/JDK-8322996) (C2). > > Thanks, > Tobias Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: Adjusted according to review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17266/files - new: https://git.openjdk.org/jdk/pull/17266/files/de6684fd..f888a56d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17266&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17266&range=00-01 Stats: 4 lines in 1 file changed: 1 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17266.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17266/head:pull/17266 PR: https://git.openjdk.org/jdk/pull/17266 From thartmann at openjdk.org Thu Jan 4 14:20:41 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 4 Jan 2024 14:20:41 GMT Subject: RFR: 8310844: [AArch64] C1 compilation fails because monitor offset in OSR buffer is too large for immediate [v2] In-Reply-To: References: <6vNa0ZYwKL0HXjHjWTyIe6kB5hza1aaXlJbxM4qh8-w=.edc55dc9-08ff-4861-9038-d4b150bcf4f7@github.com> Message-ID: On Thu, 4 Jan 2024 14:08:51 GMT, Andrew Haley wrote: >> Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: >> >> Adjusted according to review > > src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp line 289: > >> 287: __ ldr(r19, Address(OSR_buf, slot_offset + 1*BytesPerWord)); >> 288: __ str(r19, frame_map()->address_for_monitor_object(i)); >> 289: } > > The macro assembler automagically fuses `ldr` pairs. It'd be better to fix this with: > > > --- a/src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp > +++ b/src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp > @@ -282,7 +282,8 @@ void LIR_Assembler::osr_entry() { > __ bind(L); > } > #endif > - __ ldp(r19, r20, Address(OSR_buf, slot_offset)); > + __ ldr(r19, Address(OSR_buf, slot_offset)); > + __ ldr(r20, Address(OSR_buf, slot_offset + BytesPerWord)); > __ str(r19, frame_map()->address_for_monitor_lock(i)); > __ str(r20, frame_map()->address_for_monitor_object(i)); > } Thanks for the review. I adjusted the fix accordingly. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17266#discussion_r1441798599 From aph at openjdk.org Thu Jan 4 15:36:22 2024 From: aph at openjdk.org (Andrew Haley) Date: Thu, 4 Jan 2024 15:36:22 GMT Subject: RFR: 8310844: [AArch64] C1 compilation fails because monitor offset in OSR buffer is too large for immediate [v2] In-Reply-To: <2Hnw1VDu1Ml2XB-VfUV8l-t8dtrMzPNh3RyNPfV-tgA=.fc400df2-9529-4bdd-8e0c-fb49ca4f0b48@github.com> References: <6vNa0ZYwKL0HXjHjWTyIe6kB5hza1aaXlJbxM4qh8-w=.edc55dc9-08ff-4861-9038-d4b150bcf4f7@github.com> <2Hnw1VDu1Ml2XB-VfUV8l-t8dtrMzPNh3RyNPfV-tgA=.fc400df2-9529-4bdd-8e0c-fb49ca4f0b48@github.com> Message-ID: On Thu, 4 Jan 2024 14:20:39 GMT, Tobias Hartmann wrote: >> [JDK-8287349](https://bugs.openjdk.org/browse/JDK-8287349) changed the code in `LIR_Assembler::osr_entry()` to use a single `ldp` instruction instead of two `ldr` instructions to load the monitor lock and object from the OSR state. This is not correct because the `ldp` instruction only supports a [7-bit signed immediate value](https://developer.arm.com/documentation/den0024/a/The-A64-instruction-set/Memory-access-instructions/Accessing-multiple-memory-locations). If the offset is larger, for example due to a large number of locals as in `TestLargeMonitorOffset::test`, we hit the `Field too big for insn` guarantee. >> >> I suggest to revert [JDK-8287349](https://bugs.openjdk.org/browse/JDK-8287349). >> >> I also found two unrelated bugs when working on the reproducer: [JDK-8322992](https://bugs.openjdk.org/browse/JDK-8322992) (javac) and [JDK-8322996](https://bugs.openjdk.org/browse/JDK-8322996) (C2). >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Adjusted according to review Marked as reviewed by aph (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/17266#pullrequestreview-1804415472 From thartmann at openjdk.org Thu Jan 4 15:44:21 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 4 Jan 2024 15:44:21 GMT Subject: RFR: 8310844: [AArch64] C1 compilation fails because monitor offset in OSR buffer is too large for immediate [v2] In-Reply-To: <2Hnw1VDu1Ml2XB-VfUV8l-t8dtrMzPNh3RyNPfV-tgA=.fc400df2-9529-4bdd-8e0c-fb49ca4f0b48@github.com> References: <6vNa0ZYwKL0HXjHjWTyIe6kB5hza1aaXlJbxM4qh8-w=.edc55dc9-08ff-4861-9038-d4b150bcf4f7@github.com> <2Hnw1VDu1Ml2XB-VfUV8l-t8dtrMzPNh3RyNPfV-tgA=.fc400df2-9529-4bdd-8e0c-fb49ca4f0b48@github.com> Message-ID: On Thu, 4 Jan 2024 14:20:39 GMT, Tobias Hartmann wrote: >> [JDK-8287349](https://bugs.openjdk.org/browse/JDK-8287349) changed the code in `LIR_Assembler::osr_entry()` to use a single `ldp` instruction instead of two `ldr` instructions to load the monitor lock and object from the OSR state. This is not correct because the `ldp` instruction only supports a [7-bit signed immediate value](https://developer.arm.com/documentation/den0024/a/The-A64-instruction-set/Memory-access-instructions/Accessing-multiple-memory-locations). If the offset is larger, for example due to a large number of locals as in `TestLargeMonitorOffset::test`, we hit the `Field too big for insn` guarantee. >> >> I suggest to revert [JDK-8287349](https://bugs.openjdk.org/browse/JDK-8287349). >> >> I also found two unrelated bugs when working on the reproducer: [JDK-8322992](https://bugs.openjdk.org/browse/JDK-8322992) (javac) and [JDK-8322996](https://bugs.openjdk.org/browse/JDK-8322996) (C2). >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Adjusted according to review Thanks for the review, Andrew. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17266#issuecomment-1877311832 From aph at openjdk.org Thu Jan 4 15:51:21 2024 From: aph at openjdk.org (Andrew Haley) Date: Thu, 4 Jan 2024 15:51:21 GMT Subject: RFR: 8310844: [AArch64] C1 compilation fails because monitor offset in OSR buffer is too large for immediate [v2] In-Reply-To: <2Hnw1VDu1Ml2XB-VfUV8l-t8dtrMzPNh3RyNPfV-tgA=.fc400df2-9529-4bdd-8e0c-fb49ca4f0b48@github.com> References: <6vNa0ZYwKL0HXjHjWTyIe6kB5hza1aaXlJbxM4qh8-w=.edc55dc9-08ff-4861-9038-d4b150bcf4f7@github.com> <2Hnw1VDu1Ml2XB-VfUV8l-t8dtrMzPNh3RyNPfV-tgA=.fc400df2-9529-4bdd-8e0c-fb49ca4f0b48@github.com> Message-ID: <1bw-NMQZIKilTwHkcwrOxVeSIYYPI2WEHiCIRnYvFEc=.3da3a815-2fad-4619-ac08-399324ca7e63@github.com> On Thu, 4 Jan 2024 14:20:39 GMT, Tobias Hartmann wrote: >> [JDK-8287349](https://bugs.openjdk.org/browse/JDK-8287349) changed the code in `LIR_Assembler::osr_entry()` to use a single `ldp` instruction instead of two `ldr` instructions to load the monitor lock and object from the OSR state. This is not correct because the `ldp` instruction only supports a [7-bit signed immediate value](https://developer.arm.com/documentation/den0024/a/The-A64-instruction-set/Memory-access-instructions/Accessing-multiple-memory-locations). If the offset is larger, for example due to a large number of locals as in `TestLargeMonitorOffset::test`, we hit the `Field too big for insn` guarantee. >> >> I suggest to revert [JDK-8287349](https://bugs.openjdk.org/browse/JDK-8287349). >> >> I also found two unrelated bugs when working on the reproducer: [JDK-8322992](https://bugs.openjdk.org/browse/JDK-8322992) (javac) and [JDK-8322996](https://bugs.openjdk.org/browse/JDK-8322996) (C2). >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Adjusted according to review I looked through the history and I see this bug is my fault, and your fix will have to be back ported to all releases. Argh! Thanks for fixing it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17266#issuecomment-1877322652 From ddong at openjdk.org Thu Jan 4 15:54:22 2024 From: ddong at openjdk.org (Denghui Dong) Date: Thu, 4 Jan 2024 15:54:22 GMT Subject: RFR: 8322735: C2: minor improvements of bubble sort used in SuperWord::packset_sort In-Reply-To: References: Message-ID: On Thu, 4 Jan 2024 12:36:01 GMT, Emanuel Peter wrote: > And a more fundamental question: Why do we need this improvement? Do you see any timing bottleneck and improvement? And what is faster: bubbling up or down? > > And do you know why we sort at all in `extend_packlist` and why we do it again and again? Sorry, I don't know the theory or implementation of `superword`. (I hope to grasp it someday...) I just found it when browsing the code. This change is trivial; if you think it's unnecessary, I'm fine with closing it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17190#issuecomment-1877328019 From thartmann at openjdk.org Thu Jan 4 15:56:21 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 4 Jan 2024 15:56:21 GMT Subject: RFR: 8310844: [AArch64] C1 compilation fails because monitor offset in OSR buffer is too large for immediate [v2] In-Reply-To: <2Hnw1VDu1Ml2XB-VfUV8l-t8dtrMzPNh3RyNPfV-tgA=.fc400df2-9529-4bdd-8e0c-fb49ca4f0b48@github.com> References: <6vNa0ZYwKL0HXjHjWTyIe6kB5hza1aaXlJbxM4qh8-w=.edc55dc9-08ff-4861-9038-d4b150bcf4f7@github.com> <2Hnw1VDu1Ml2XB-VfUV8l-t8dtrMzPNh3RyNPfV-tgA=.fc400df2-9529-4bdd-8e0c-fb49ca4f0b48@github.com> Message-ID: On Thu, 4 Jan 2024 14:20:39 GMT, Tobias Hartmann wrote: >> [JDK-8287349](https://bugs.openjdk.org/browse/JDK-8287349) changed the code in `LIR_Assembler::osr_entry()` to use a single `ldp` instruction instead of two `ldr` instructions to load the monitor lock and object from the OSR state. This is not correct because the `ldp` instruction only supports a [7-bit signed immediate value](https://developer.arm.com/documentation/den0024/a/The-A64-instruction-set/Memory-access-instructions/Accessing-multiple-memory-locations). If the offset is larger, for example due to a large number of locals as in `TestLargeMonitorOffset::test`, we hit the `Field too big for insn` guarantee. >> >> I suggest to revert [JDK-8287349](https://bugs.openjdk.org/browse/JDK-8287349). >> >> I also found two unrelated bugs when working on the reproducer: [JDK-8322992](https://bugs.openjdk.org/browse/JDK-8322992) (javac) and [JDK-8322996](https://bugs.openjdk.org/browse/JDK-8322996) (C2). >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Adjusted according to review But, as I mentioned in the description, it's a regression from [JDK-8287349](https://bugs.openjdk.org/browse/JDK-8287349), right? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17266#issuecomment-1877332118 From aph at openjdk.org Thu Jan 4 16:08:22 2024 From: aph at openjdk.org (Andrew Haley) Date: Thu, 4 Jan 2024 16:08:22 GMT Subject: RFR: 8310844: [AArch64] C1 compilation fails because monitor offset in OSR buffer is too large for immediate [v2] In-Reply-To: References: <6vNa0ZYwKL0HXjHjWTyIe6kB5hza1aaXlJbxM4qh8-w=.edc55dc9-08ff-4861-9038-d4b150bcf4f7@github.com> <2Hnw1VDu1Ml2XB-VfUV8l-t8dtrMzPNh3RyNPfV-tgA=.fc400df2-9529-4bdd-8e0c-fb49ca4f0b48@github.com> Message-ID: On Thu, 4 Jan 2024 15:53:55 GMT, Tobias Hartmann wrote: > But, as I mentioned in the description, it's a regression from [JDK-8287349](https://bugs.openjdk.org/browse/JDK-8287349), right? Yeah, that's true. A "trivial performance fix," as was said at the time. Memo to myself: there are no trivial performance fixes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17266#issuecomment-1877354409 From adinn at openjdk.org Thu Jan 4 16:17:21 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 4 Jan 2024 16:17:21 GMT Subject: RFR: 8310844: [AArch64] C1 compilation fails because monitor offset in OSR buffer is too large for immediate [v2] In-Reply-To: References: <6vNa0ZYwKL0HXjHjWTyIe6kB5hza1aaXlJbxM4qh8-w=.edc55dc9-08ff-4861-9038-d4b150bcf4f7@github.com> Message-ID: On Thu, 4 Jan 2024 14:17:25 GMT, Tobias Hartmann wrote: >> src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp line 289: >> >>> 287: __ ldr(r19, Address(OSR_buf, slot_offset + 1*BytesPerWord)); >>> 288: __ str(r19, frame_map()->address_for_monitor_object(i)); >>> 289: } >> >> The macro assembler automagically fuses `ldr` pairs. It'd be better to fix this with: >> >> >> --- a/src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp >> +++ b/src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp >> @@ -282,7 +282,8 @@ void LIR_Assembler::osr_entry() { >> __ bind(L); >> } >> #endif >> - __ ldp(r19, r20, Address(OSR_buf, slot_offset)); >> + __ ldr(r19, Address(OSR_buf, slot_offset)); >> + __ ldr(r20, Address(OSR_buf, slot_offset + BytesPerWord)); >> __ str(r19, frame_map()->address_for_monitor_lock(i)); >> __ str(r20, frame_map()->address_for_monitor_object(i)); >> } > > Thanks for the review. I adjusted the fix accordingly. I'm not sure why the recommended adjustment is needed. The macro assembler does fuse pairs of adjacent ldr instructions into an ldp but only when the sizes match and the offsets fit into the requisite number of bits. So, if the two ldr instrctions ar egenerated next to each other the macroasembler should only convert to ldp *where appropriate*. Am I missing something here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17266#discussion_r1441959403 From adinn at openjdk.org Thu Jan 4 16:24:22 2024 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 4 Jan 2024 16:24:22 GMT Subject: RFR: 8310844: [AArch64] C1 compilation fails because monitor offset in OSR buffer is too large for immediate [v2] In-Reply-To: References: <6vNa0ZYwKL0HXjHjWTyIe6kB5hza1aaXlJbxM4qh8-w=.edc55dc9-08ff-4861-9038-d4b150bcf4f7@github.com> Message-ID: On Thu, 4 Jan 2024 16:14:41 GMT, Andrew Dinn wrote: >> Thanks for the review. I adjusted the fix accordingly. > > I'm not sure why the recommended adjustment is needed. The macro assembler does fuse pairs of adjacent ldr instructions into an ldp but only when the sizes match and the offsets fit into the requisite number of bits. > > So, if the two ldr instrctions ar egenerated next to each other the macroasembler should only convert to ldp *where appropriate*. Am I missing something here? Doh, sorry - I misread Andrew's proposed code! Ignore the noise. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17266#discussion_r1441978216 From aph at openjdk.org Thu Jan 4 16:24:25 2024 From: aph at openjdk.org (Andrew Haley) Date: Thu, 4 Jan 2024 16:24:25 GMT Subject: RFR: 8310844: [AArch64] C1 compilation fails because monitor offset in OSR buffer is too large for immediate [v2] In-Reply-To: References: <6vNa0ZYwKL0HXjHjWTyIe6kB5hza1aaXlJbxM4qh8-w=.edc55dc9-08ff-4861-9038-d4b150bcf4f7@github.com> Message-ID: On Thu, 4 Jan 2024 14:17:25 GMT, Tobias Hartmann wrote: >> src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp line 289: >> >>> 287: __ ldr(r19, Address(OSR_buf, slot_offset + 1*BytesPerWord)); >>> 288: __ str(r19, frame_map()->address_for_monitor_object(i)); >>> 289: } >> >> The macro assembler automagically fuses `ldr` pairs. It'd be better to fix this with: >> >> >> --- a/src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp >> +++ b/src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp >> @@ -282,7 +282,8 @@ void LIR_Assembler::osr_entry() { >> __ bind(L); >> } >> #endif >> - __ ldp(r19, r20, Address(OSR_buf, slot_offset)); >> + __ ldr(r19, Address(OSR_buf, slot_offset)); >> + __ ldr(r20, Address(OSR_buf, slot_offset + BytesPerWord)); >> __ str(r19, frame_map()->address_for_monitor_lock(i)); >> __ str(r20, frame_map()->address_for_monitor_object(i)); >> } > > Thanks for the review. I adjusted the fix accordingly. Yes, the problem @TobiHartmann is fixing is that we currently use `ldp`, but in very rare cases`ldp` can't reach, so the fix we need is to change one `ldp` to two `ldr`s. In almost all cases, macroassembler will merge the `ldr`s. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17266#discussion_r1441983810 From epeter at openjdk.org Thu Jan 4 16:25:36 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 4 Jan 2024 16:25:36 GMT Subject: RFR: 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 [v9] In-Reply-To: References: Message-ID: On Wed, 3 Jan 2024 15:53:04 GMT, Roland Westrelin wrote: >> Range check smearing and range check predication make an array access >> dependent on 2 (or more in the case of RC smearing) conditions. As a >> consequence, if a range check can be eliminated because there's an >> identical dominating range check, the control dependent nodes that >> could float and become dependent on the dominating range check cannot >> be allowed to float because there's a risk that they would then bypass >> one of the checks that make the access legal. >> >> `IfNode::dominated_by()` and `PhaseIdealLoop::dominated_by()` have >> logic to prevent this: nodes that are control dependent on a range >> check or predicate are not allowed to float. This is however not >> sufficient as demonstrated by the test cases. >> >> In `TestArrayAccessAboveRCAfterSmearingOrPredication.testRangeCheckSmearing()`: >> >> >> v += array[i]; >> if (flag2) { >> if (flag3) { >> field = 0x42; >> } >> } >> if (flagField == 1) { >> v += array[i]; >> } >> >> >> The range check for the second `array[i]` load is replaced by the >> dominating range check for the first `array[i]` but because the second >> `array[i]` load could really be dependent on multiple range checks (in >> case smearing happened which is not the case here), c2 doesn't allow >> the second `array[i]` to float when the second range check is >> removed. The second `array[i]` is then control dependent on: >> >> >> if (flagField == 1) { >> >> >> which is next found to be dominated by the same test: >> >> >> if (flag == 1) { >> >> >> and is removed. However nothing in `dominated_by()` treats node >> dependent on tests that are not range check or predicates >> specially. So the second `array[i]` is allowed to float and become >> dependent on: >> >> >> if (flag == 1) { >> >> >> which is above the range check for that access. The test method in its >> last invocation is passed an index for the array access that's widely >> out of range. The array load happens before the range check and >> crashes the VM. `testLoopPredication()` is a similar test where array >> loads become dependent on predicates and end up above range checks. >> >> `TestArrayAccessCastIIAboveRC.java` is the test case from the bug >> where for similar reasons a range check `CastII` ends up above its >> range check, becomes top because its input becomes some integer that >> conflicts with its... > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: > > - Merge branch 'master' into JDK-8319793 > - review > - Revert "Update src/hotspot/share/opto/castnode.hpp" > > This reverts commit 356c91cca911ed486f9f87f3eff53ce21e1e3ec9. > - Revert "Update src/hotspot/share/opto/memnode.hpp" > > This reverts commit bdb731ea562f314f44d327f7243ef5cf9ad40b2e. > - review > - Update src/hotspot/share/opto/memnode.hpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/castnode.hpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/ifnode.cpp > > Co-authored-by: Christian Hagedorn > - Merge branch 'master' into JDK-8319793 > - ... and 1 more: https://git.openjdk.org/jdk/compare/15519285...dbe3c4c1 @rwestrel thanks for all the work! Generally I'm very happy with the approach. I mostly left suggestions for better comments and improved naming. src/hotspot/share/opto/ifnode.cpp line 573: > 571: // that these Loads/Casts do not float above any of the dominating checks (even when the lowest dominating check is > 572: // later replaced by yet another dominating check), we need to pin them at the lowest dominating check. > 573: proj->pin_array_loads(igvn); `pin_array_loads` suggests we only care about `Load`. But the comment suggests otherwise. I would also appreciate if the comment said why there are now multiple dependencies. Actually, the problem is that we **would** have multiple dependency, but we only have one dependency input we can set, hence forgetting about the others. Pinning makes sure that there is no bypassing of dependencies, right? src/hotspot/share/opto/ifnode.cpp line 1501: > 1499: > 1500: //------------------------------dominated_by----------------------------------- > 1501: Node* IfNode::dominated_by(Node* prev_dom, PhaseIterGVN* igvn, bool range_check_smearing) { I suggest that you replace `range_check_smearing` with `pin_dependencies` or similar. Basically say what it will do in this method, rather than what is the use case. Then add a comment above what is a usecase, and more comments in the case where you call it with `true`. Because the range-check-smearing is not happening here but outside. src/hotspot/share/opto/ifnode.cpp line 1517: > 1515: prev_dom = idom; > 1516: } > 1517: Can you say what exactly this did, and why it is safe to remove? src/hotspot/share/opto/ifnode.cpp line 1541: > 1539: // control dependent nodes end up at the lowest/nearest dominating check in the graph. To ensure that these > 1540: // Loads/Casts do not float above any of the dominating checks (even when the lowest dominating check is later > 1541: // replaced by yet another dominating check), we need to pin them at the lowest dominating check. I like this comment. A picture would be a really nice addition. RC[0] -> true ... RC[6] -> false RC[0] -> true ... RC[3] -> false ctrl dependent node x, assuming array[3] is safe. x is first dependent on RC[3], which is now smeared to RC[0] (the lower one) and RC[6]. Now we discover that the lower RC[0] is dominated by the upper one, and skip RC[6]. Now x is only dependent on RC[6], which is true, and does not first check RC[6], which it should check. I suggest you move all of this to where the range-check-smearing happens. src/hotspot/share/opto/ifnode.cpp line 1805: > 1803: --i; > 1804: } > 1805: } This logic looks like it would not just pin array loads, but really any node that has `depends_only_on_test`. That could also be `CastII` or even other nodes like `LoadKlass`, right? If that is true, you should rename this method to something more precise. src/hotspot/share/opto/ifnode.cpp line 1958: > 1956: return nullptr; > 1957: } > 1958: Why is it ok to delay this to post-loop-opts? Does it not prevent moving some CFG from being eliminated out of loops? Would be nice to have a little justification comment. src/hotspot/share/opto/loopopts.cpp line 308: > 306: // IGVN worklist for later cleanup. Move control-dependent data Nodes on the > 307: // live path up to the dominating control. > 308: void PhaseIdealLoop::dominated_by(IfProjNode* prevdom, IfNode* iff, bool flip, bool range_check_predicate) { Can we also rename `range_check_predicate` -> `must_pin_dependencies`, so that it says what it does? And then add a comment to say that it is on when we are doing range check predication, and hence the eliminated RC lay between two predicates, and hence has a dependency on both. src/hotspot/share/opto/loopopts.cpp line 356: > 354: _igvn.replace_input_of(cd, 0, prevdom); > 355: if (range_check_predicate) { > 356: // Loads and range check Cast nodes that are control dependent on this range check (that is about to be removed) Here we should now be talking about range check predicates, and not just range checks, right? src/hotspot/share/opto/loopopts.cpp line 361: > 359: return; // Let IGVN transformation change control dependence. > 360: } > 361: Why it ok to remove this bailout? src/hotspot/share/opto/memnode.cpp line 851: > 849: return !Type::cmp( _type, ((LoadNode&)n)._type ) && > 850: _control_dependency == ((LoadNode&)n)._control_dependency && > 851: _mo == ((LoadNode&)n)._mo; might look nicer if you cast `n` once -> `load` and then use that. src/hotspot/share/opto/node.hpp line 1140: > 1138: virtual Node* pin_for_array_access() const { > 1139: return nullptr; > 1140: } Can you please add a comment, what this method is for? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16886#pullrequestreview-1804393994 PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1441946602 PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1441920140 PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1441949862 PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1441930213 PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1441935806 PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1441953152 PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1441963806 PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1441959206 PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1441972024 PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1441979826 PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1441895016 From epeter at openjdk.org Thu Jan 4 16:25:37 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 4 Jan 2024 16:25:37 GMT Subject: RFR: 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 [v9] In-Reply-To: References: Message-ID: On Thu, 4 Jan 2024 16:09:05 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: >> >> - Merge branch 'master' into JDK-8319793 >> - review >> - Revert "Update src/hotspot/share/opto/castnode.hpp" >> >> This reverts commit 356c91cca911ed486f9f87f3eff53ce21e1e3ec9. >> - Revert "Update src/hotspot/share/opto/memnode.hpp" >> >> This reverts commit bdb731ea562f314f44d327f7243ef5cf9ad40b2e. >> - review >> - Update src/hotspot/share/opto/memnode.hpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/castnode.hpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/castnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/ifnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Merge branch 'master' into JDK-8319793 >> - ... and 1 more: https://git.openjdk.org/jdk/compare/15519285...dbe3c4c1 > > src/hotspot/share/opto/ifnode.cpp line 1958: > >> 1956: return nullptr; >> 1957: } >> 1958: > > Why is it ok to delay this to post-loop-opts? Does it not prevent moving some CFG from being eliminated out of loops? Would be nice to have a little justification comment. Ah. Does this mean that if there are multiple RangeCheck in a loop, where some could be smeared, these are not smeared, and then we have more RangeChecks to eliminate out of the loop? Maybe in the end this all comes down to the same anyway. Just wondering. > src/hotspot/share/opto/node.hpp line 1140: > >> 1138: virtual Node* pin_for_array_access() const { >> 1139: return nullptr; >> 1140: } > > Can you please add a comment, what this method is for? Effectively, you want to replace some nodes, such as `Load` and `CastII` into pinned nodes, which have `StrongDependency` or `UnknownControl`. In either case, this means that we will not allow these to float any more. Generally, I'm not really happy with the name of `UnknownControl`. Sounds like the control is unknown. In what sense is it unknown, after all we have a control and want the Load to be pinned to it...? Maybe then we could rename `pin_for_array_access` -> `make_pinned`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1441954551 PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1441907080 From kvn at openjdk.org Thu Jan 4 16:39:38 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 4 Jan 2024 16:39:38 GMT Subject: RFR: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs [v58] In-Reply-To: References: <9zUqzeJrnPyjjEC0_F9Z5OWzHLqnTN_5c1bzOiK-LqA=.050c0992-7f7f-48f3-b2da-ec81aafa41a6@github.com> Message-ID: <1oTqtU5lm0As9tKfnWuGNh2sHXfQimLdMCzV2g1D2ho=.6a5a20f7-3ab3-43fb-b640-fa043131fef8@github.com> On Thu, 4 Jan 2024 10:39:11 GMT, Emanuel Peter wrote: >> This is my justification in the PR description: >> >>> Other Details >>> >>> I made VectorizeDebugOption a debug print only flag now. Before this fix, it also had the same effect as VectorizeOption (which ensures that only nodes from the same original pre-unrolling node are packed, preventing hand-unrolled code to be vectorized but enabling some edge cases to be vectorized that would not otherwise vectorize). >>> >>> I added is_trace_align_vector with bit 128, since 64 was recently used for is_trace_loop_reverse, removed with [JDK-8309204](https://bugs.openjdk.org/browse/JDK-8309204). >>> >>> I plan to refactor VectorizeDebugOption soon, as it now has a few subflags / bits that are not used. I may also refactor how TraceSuperWord works in general. Filed [JDK-8317572](https://bugs.openjdk.org/browse/JDK-8317572). > > If you really want, then I can not touch `VectorizeDebugOption` at all, i.e. not activate `is_trace_align_vector` with that flag, but instead simply use `TraceSuperWord` (that might be a little verbose though). > > I already have the CSR for [JDK-8315361](https://bugs.openjdk.org/browse/JDK-8315361), so that I can remove `VectorizeDebugOption`. This has 2 effects: > 1. remove the product effect of `VectorizeDebugOption`, which is the same effect as enabling `VectorizeOption`. > 2. introduce a more general auto-vectorization tracing flag that allows more fine-grained control for debug printing. > > My idea here was to simply add the alignment tracing to `VectorizeDebugOption`. But currently one cannot enable that tracing without having the side-effects that also `VectorizeOption` has. Hence, I already now remove that product-side effect. > > @vnkozlov what do you think? I missed that in your long description ;^) I agree with your suggestion. The option was indeed strange: mixing prints with affects on code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1442000827 From kvn at openjdk.org Thu Jan 4 16:47:35 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 4 Jan 2024 16:47:35 GMT Subject: RFR: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs [v58] In-Reply-To: References: Message-ID: <9qbGMjWjQfwA5r6XimWz_GZUdQEgo6GEzY-uYwIoP4s=.9c68a4f4-a82b-4760-a428-3b5025eda61b@github.com> On Thu, 4 Jan 2024 08:25:25 GMT, Emanuel Peter wrote: >> I don't understand this comment. >> The `LoadVector` and `StoreVector` have both a `MemNode::Address` input, which I think it the memory address. >> The address itself usually consists of `AddP` nodes, which do the (base, index, offset) computation. These nodes later can be folded into the load/store itself, or be computed with a `lea`. >> I simply take the address value, check it for alignment and pass it on to the load/store. >> >> Take this example: >> >> public class Test { >> static int RANGE = 1024*64; >> >> public static void main(String[] strArr) { >> int a[] = new int[RANGE]; >> test0(a); >> } >> >> static void test0(int[] a) { >> for (int i = 0; i < RANGE; i++) { >> a[i]++; >> } >> } >> } >> >> >> `./java -XX:CompileCommand=compileonly,Test::test* -XX:+TraceSuperWord -Xcomp -XX:+PrintIdeal -XX:+AlignVector -XX:+VerifyAlignVector -XX:CompileCommand=print,Test::test* Test.java` >> >> This looks like the main loop: >> >> ;; B22: # out( B22 B23 ) <- in( B21 B22 ) Loop( B22-B22 inner post of N743) Freq: 4.49988 >> 0x00007f83c8bb2f68: lea 0x10(%rbp,%rbx,4),%r10 ;*iaload {reexecute=0 rethrow=0 return_oop=0} >> ; - Test::test0 at 12 (line 11) >> 0x00007f83c8bb2f6d: mov %r10,%r8 >> 0x00007f83c8bb2f70: test $0x7,%r8b >> 0x00007f83c8bb2f74: je 0x00007f83c8bb2f8a >> 0x00007f83c8bb2f76: movabs $0x7f83d77e2fc8,%rdi ; {external_word} >> 0x00007f83c8bb2f80: and $0xfffffffffffffff0,%rsp >> 0x00007f83c8bb2f84: callq 0x00007f83d71a0162 ; {runtime_call MacroAssembler::debug64(char*, long, long*)} >> 0x00007f83c8bb2f89: hlt >> 0x00007f83c8bb2f8a: test $0x7,%r10b >> 0x00007f83c8bb2f8e: je 0x00007f83c8bb2fa4 >> 0x00007f83c8bb2f90: movabs $0x7f83d77e2fc8,%rdi ; {external_word} >> 0x00007f83c8bb2f9a: and $0xfffffffffffffff0,%rsp >> 0x00007f83c8bb2f9e: callq 0x00007f83d71a0162 ; {runtime_call MacroAssembler::debug64(char*, long, long*)} >> 0x00007f83c8bb2fa3: hlt >> 0x00007f83c8bb2fa4: vpaddd (%r10),%zmm5,%zmm0 >> 0x00007f83c8bb2faa: vmovdqu32 %zmm0,(%r8) ;*iastore {reexecute=0 rethrow=0 return_oop=0} >> ; - Test::test0 at 15 (line 11) >> 0x00007f83c8bb2fb0: add $0x10,%ebx ;*iinc {reexecute=0 rethrow=0 return_oop=0} >> ... > > And without `-XX:-VerifyAlignVector` > > > ;; B20: # out( B20 B21 ) <- in( B19 B20 ) Loop( B20-B20 inner post of N743) Freq: 4.49988 > 0x00007ff22cbb2924: vpaddd 0x10(%rbx,%r13,4),%zmm0,%zmm1 > 0x00007ff22cbb292f: vmovdqu32 %zmm1,0x10(%rbx,%r13,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0} > ; - Test::test0 at 15 (line 11) > 0x00007ff22cbb293a: add $0x10,%r13d ;*iinc {reexecute=0 rethrow=0 return_oop=0} > ; - Test::test0 at 16 (line 10) > 0x00007ff22cbb293e: cmp %r10d,%r13d > 0x00007ff22cbb2941: jl 0x00007ff22cbb2924 ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0} > ; - Test::test0 at 6 (line 10) Can you show assembler code for simple load and store instructions (move data from one array to another)? My concern is that LoadV and StoreV are defined only with `memory` input: instruct loadV(vec dst, memory mem) %{ match(Set dst (LoadVector mem)); I would assume it will be embedded memory only. But C2 may be smart enough to generate `lea` if it sees not AddP node. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1442017423 From kvn at openjdk.org Thu Jan 4 16:54:37 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 4 Jan 2024 16:54:37 GMT Subject: RFR: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs [v58] In-Reply-To: <9qbGMjWjQfwA5r6XimWz_GZUdQEgo6GEzY-uYwIoP4s=.9c68a4f4-a82b-4760-a428-3b5025eda61b@github.com> References: <9qbGMjWjQfwA5r6XimWz_GZUdQEgo6GEzY-uYwIoP4s=.9c68a4f4-a82b-4760-a428-3b5025eda61b@github.com> Message-ID: <8FqVnqkZB8_FsKx6kJfkJN8Oi90W6hAeBNl8uQy_QwE=.7d6061b5-4a3d-4d3f-aaa4-1c30151d066d@github.com> On Thu, 4 Jan 2024 16:45:11 GMT, Vladimir Kozlov wrote: >> And without `-XX:-VerifyAlignVector` >> >> >> ;; B20: # out( B20 B21 ) <- in( B19 B20 ) Loop( B20-B20 inner post of N743) Freq: 4.49988 >> 0x00007ff22cbb2924: vpaddd 0x10(%rbx,%r13,4),%zmm0,%zmm1 >> 0x00007ff22cbb292f: vmovdqu32 %zmm1,0x10(%rbx,%r13,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0} >> ; - Test::test0 at 15 (line 11) >> 0x00007ff22cbb293a: add $0x10,%r13d ;*iinc {reexecute=0 rethrow=0 return_oop=0} >> ; - Test::test0 at 16 (line 10) >> 0x00007ff22cbb293e: cmp %r10d,%r13d >> 0x00007ff22cbb2941: jl 0x00007ff22cbb2924 ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0} >> ; - Test::test0 at 6 (line 10) > > Can you show assembler code for simple load and store instructions (move data from one array to another)? > My concern is that LoadV and StoreV are defined only with `memory` input: > > instruct loadV(vec dst, memory mem) %{ > match(Set dst (LoadVector mem)); > > I would assume it will be embedded memory only. But C2 may be smart enough to generate `lea` if it sees not AddP node. Also why your assembler example have tested alignment twice for the same address? May be because the same array's element for load and store?: 0x00007f83c8bb2f6d: mov %r10,%r8 0x00007f83c8bb2f70: test $0x7,%r8b 0x00007f83c8bb2f74: je 0x00007f83c8bb2f8a ... 0x00007f83c8bb2f8a: test $0x7,%r10b 0x00007f83c8bb2f8e: je 0x00007f83c8bb2fa4 No need to optimize I think since it is only for debugging. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1442025069 From duke at openjdk.org Thu Jan 4 16:57:31 2024 From: duke at openjdk.org (Joshua Cao) Date: Thu, 4 Jan 2024 16:57:31 GMT Subject: Integrated: 8322976: Remove reference to transform_no_reclaim In-Reply-To: References: Message-ID: On Wed, 3 Jan 2024 22:12:03 GMT, Joshua Cao wrote: > Passes hotspot:tier1 locally This pull request has now been integrated. Changeset: ade40741 Author: Joshua Cao Committer: Xin Liu URL: https://git.openjdk.org/jdk/commit/ade40741cab0b5e4d8519a55ebcd51e386999f5d Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8322976: Remove reference to transform_no_reclaim Reviewed-by: shade, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/17255 From xliu at openjdk.org Thu Jan 4 17:06:38 2024 From: xliu at openjdk.org (Xin Liu) Date: Thu, 4 Jan 2024 17:06:38 GMT Subject: RFR: 8320128: Clean up Parse constructor for OSR [v4] In-Reply-To: <0mxhhLR9DDdpiBhbXWPxCmMzo_OI9DYiIX36xGhxC_c=.1eed99b9-2dc3-4403-b3c1-22b6ebed6079@github.com> References: <0mxhhLR9DDdpiBhbXWPxCmMzo_OI9DYiIX36xGhxC_c=.1eed99b9-2dc3-4403-b3c1-22b6ebed6079@github.com> Message-ID: <-QQZANSu7PVpZ3CUX7kbp00f_xkhjiy7la3Z3gXutKA=.a1c938e4-12e0-44b8-a2dd-7cc0c0c9f09f@github.com> > There's a special case for the constructor of Parse. If current compilation is OSR and it is handling the top-level method(depth() == 1), then > > 1. _tf = C->tf(); > 2. _entry_bci = C->entry_bci(); > 3. _flow = method()->get_osr_flow_analysis(_entry_bci); > > We don't need to assign those member data twice. We can also factor out _flow->failing() for the special case and normal cases. > > It's worth mentioning that we can't save ciTypeFlow computation because > get_osr_flow_analysis(_entry_bci) actually needs get_flow_analysis(method()). Xin Liu has updated the pull request incrementally with one additional commit since the last revision: Use print_cr for the log message. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16669/files - new: https://git.openjdk.org/jdk/pull/16669/files/ec89638c..1e566c97 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16669&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16669&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16669.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16669/head:pull/16669 PR: https://git.openjdk.org/jdk/pull/16669 From xliu at openjdk.org Thu Jan 4 17:06:45 2024 From: xliu at openjdk.org (Xin Liu) Date: Thu, 4 Jan 2024 17:06:45 GMT Subject: RFR: 8320128: Clean up Parse constructor for OSR [v3] In-Reply-To: <2T5HC8Dvq7RUNkNeYMgFWk5niXTiPnE2k4RDlE3BJZs=.1ec18356-631f-4028-af37-f2ad0d8ec05c@github.com> References: <0mxhhLR9DDdpiBhbXWPxCmMzo_OI9DYiIX36xGhxC_c=.1eed99b9-2dc3-4403-b3c1-22b6ebed6079@github.com> <2T5HC8Dvq7RUNkNeYMgFWk5niXTiPnE2k4RDlE3BJZs=.1ec18356-631f-4028-af37-f2ad0d8ec05c@github.com> Message-ID: On Thu, 4 Jan 2024 08:48:59 GMT, Aleksey Shipilev wrote: >> Xin Liu has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Use atomic logline and resume #ifdef ASSERT. >> - Merge branch 'master' into JDK-8320128 >> - Update according to reviewer's feedback. >> - 8320128: Clean up Parse constructor for OSR > > src/hotspot/share/opto/parse1.cpp line 511: > >> 509: if (PrintOpto && (Verbose || WizardMode)) { >> 510: if (is_osr_parse()) { >> 511: tty->print("OSR @%d type flow bailout: %s", _entry_bci, _flow->failure_reason()); > > Should be `print_cr`. Sorry, I would have discovered this by myself. updated! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16669#discussion_r1442038930 From never at openjdk.org Thu Jan 4 17:13:23 2024 From: never at openjdk.org (Tom Rodriguez) Date: Thu, 4 Jan 2024 17:13:23 GMT Subject: RFR: 8322636: [JVMCI] HotSpotSpeculationLog can be inconsistent across a single compile [v2] In-Reply-To: References: Message-ID: On Tue, 2 Jan 2024 13:37:19 GMT, David Leopoldseder wrote: >> This PR fixes a subtle inconsistency in `HotSpotSpeculationLog` . >> >> Normal uses of `HotSpotSpeculationLog` work by using a `SpeculationReason` and asking the speculation log via `maySpeculate` if the speculation can be performed, i.e., if it failed before for the given method. An example for this can be seen in Graal https://github.com/oracle/graal/blob/master/compiler/src/jdk.graal.compiler/src/jdk/graal/compiler/nodes/loop/CountedLoopInfo.java#L591C15-L591C15 >> The implicit assumption is that the speculation log, `HotSpotSpeculationLog` in particular collects failed speculations at the beginning of a compile and then stays consistent during the compile. Why is that? - Because if there are new failed speculations added to the failed speculations during the compile - the compiler would speculate again on those in an inconsistent way. E.g. at the beginning of a compile a certain speculation has not failed yet and the compiler thinks it can do optimization xyz using a speculation - later during the compilation process it consults the speculation log but gets a different answer. All those inconsistent speculations that already failed will anyway later fail code installation in jvmci (they will throw a bailout during `HotSpotCodeCacheProvider#installCode` https://github.com/openjdk/jdk/blob/master/src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotSpeculationLog.java#L192 ). Thus, we should at least return a consistent result durin g a compile. >> The problem for consistency here, that also makes troubles on the graal side, is that `maySpeculate` itself can collect failed speculations if there have not been any previously, i.e., `failedSpeculations == null`. >> In order to make the speculation log consistent across an entire JVMCI compile this PR removes the collection of failed speculations in `maySpeculate`. > > David Leopoldseder has updated the pull request incrementally with one additional commit since the last revision: > > 8322636: [JVMCI] HotSpotSpeculationLog add javadoc to maySpeculate I think for now lets just stick with your updates. It does seem like the substrate runtime compilation case is potentially exposed to the original problem but we should address that separately. ------------- Marked as reviewed by never (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17183#pullrequestreview-1804639539 From shade at openjdk.org Thu Jan 4 17:19:28 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 4 Jan 2024 17:19:28 GMT Subject: RFR: 8320128: Clean up Parse constructor for OSR [v4] In-Reply-To: <-QQZANSu7PVpZ3CUX7kbp00f_xkhjiy7la3Z3gXutKA=.a1c938e4-12e0-44b8-a2dd-7cc0c0c9f09f@github.com> References: <0mxhhLR9DDdpiBhbXWPxCmMzo_OI9DYiIX36xGhxC_c=.1eed99b9-2dc3-4403-b3c1-22b6ebed6079@github.com> <-QQZANSu7PVpZ3CUX7kbp00f_xkhjiy7la3Z3gXutKA=.a1c938e4-12e0-44b8-a2dd-7cc0c0c9f09f@github.com> Message-ID: <1HiTgoB9oAtaZk5ubbAw6fW5c6c4XAgEjIV0xAtOc5Q=.08d9d0f2-0d76-4fbe-bc2a-4c80037aad43@github.com> On Thu, 4 Jan 2024 17:06:38 GMT, Xin Liu wrote: >> There's a special case for the constructor of Parse. If current compilation is OSR and it is handling the top-level method(depth() == 1), then >> >> 1. _tf = C->tf(); >> 2. _entry_bci = C->entry_bci(); >> 3. _flow = method()->get_osr_flow_analysis(_entry_bci); >> >> We don't need to assign those member data twice. We can also factor out _flow->failing() for the special case and normal cases. >> >> It's worth mentioning that we can't save ciTypeFlow computation because >> get_osr_flow_analysis(_entry_bci) actually needs get_flow_analysis(method()). > > Xin Liu has updated the pull request incrementally with one additional commit since the last revision: > > Use print_cr for the log message. src/hotspot/share/opto/parse1.cpp line 414: > 412: if (PrintCompilation || PrintOpto) { > 413: // Make sure I have an inline tree, so I can print messages about it. > 414: InlineTree::find_subtree_from_root(C->ilt(), caller, parse_method); Reading this again, you sure that we don't need `caller->caller()` on `is_osr_parse()` path? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16669#discussion_r1442052577 From xliu at openjdk.org Thu Jan 4 20:13:25 2024 From: xliu at openjdk.org (Xin Liu) Date: Thu, 4 Jan 2024 20:13:25 GMT Subject: RFR: 8320128: Clean up Parse constructor for OSR [v4] In-Reply-To: <1HiTgoB9oAtaZk5ubbAw6fW5c6c4XAgEjIV0xAtOc5Q=.08d9d0f2-0d76-4fbe-bc2a-4c80037aad43@github.com> References: <0mxhhLR9DDdpiBhbXWPxCmMzo_OI9DYiIX36xGhxC_c=.1eed99b9-2dc3-4403-b3c1-22b6ebed6079@github.com> <-QQZANSu7PVpZ3CUX7kbp00f_xkhjiy7la3Z3gXutKA=.a1c938e4-12e0-44b8-a2dd-7cc0c0c9f09f@github.com> <1HiTgoB9oAtaZk5ubbAw6fW5c6c4XAgEjIV0xAtOc5Q=.08d9d0f2-0d76-4fbe-bc2a-4c80037aad43@github.com> Message-ID: <4O0GWJlH3hqZoS74u-cG95rPRNZkzhTj-uINcKxHXNk=.aeca474c-bae3-4216-b968-92233a949c83@github.com> On Thu, 4 Jan 2024 17:16:20 GMT, Aleksey Shipilev wrote: >> Xin Liu has updated the pull request incrementally with one additional commit since the last revision: >> >> Use print_cr for the log message. > > src/hotspot/share/opto/parse1.cpp line 414: > >> 412: if (PrintCompilation || PrintOpto) { >> 413: // Make sure I have an inline tree, so I can print messages about it. >> 414: InlineTree::find_subtree_from_root(C->ilt(), caller, parse_method); > > Reading this again, you sure that we don't need `caller->caller()` on `is_osr_parse()` path? first of all, is_osr_parse() was false at line 415 because _entry_bci was assigned to InvocationEntryBci right before. That's why I use *caller* directly. Even we consider to build InlineTree for OSR, I don't think caller->caller() is correct. I explain this in item 2 here. https://github.com/openjdk/jdk/pull/16669#issuecomment-1820258714 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16669#discussion_r1442207206 From shade at openjdk.org Thu Jan 4 20:19:26 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 4 Jan 2024 20:19:26 GMT Subject: RFR: 8320128: Clean up Parse constructor for OSR [v4] In-Reply-To: <-QQZANSu7PVpZ3CUX7kbp00f_xkhjiy7la3Z3gXutKA=.a1c938e4-12e0-44b8-a2dd-7cc0c0c9f09f@github.com> References: <0mxhhLR9DDdpiBhbXWPxCmMzo_OI9DYiIX36xGhxC_c=.1eed99b9-2dc3-4403-b3c1-22b6ebed6079@github.com> <-QQZANSu7PVpZ3CUX7kbp00f_xkhjiy7la3Z3gXutKA=.a1c938e4-12e0-44b8-a2dd-7cc0c0c9f09f@github.com> Message-ID: On Thu, 4 Jan 2024 17:06:38 GMT, Xin Liu wrote: >> There's a special case for the constructor of Parse. If current compilation is OSR and it is handling the top-level method(depth() == 1), then >> >> 1. _tf = C->tf(); >> 2. _entry_bci = C->entry_bci(); >> 3. _flow = method()->get_osr_flow_analysis(_entry_bci); >> >> We don't need to assign those member data twice. We can also factor out _flow->failing() for the special case and normal cases. >> >> It's worth mentioning that we can't save ciTypeFlow computation because >> get_osr_flow_analysis(_entry_bci) actually needs get_flow_analysis(method()). > > Xin Liu has updated the pull request incrementally with one additional commit since the last revision: > > Use print_cr for the log message. What testing was done here? I suggest at least `tier{1,2,3}` to capture surprises. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16669#issuecomment-1877705645 From shade at openjdk.org Thu Jan 4 20:19:28 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 4 Jan 2024 20:19:28 GMT Subject: RFR: 8320128: Clean up Parse constructor for OSR [v4] In-Reply-To: <4O0GWJlH3hqZoS74u-cG95rPRNZkzhTj-uINcKxHXNk=.aeca474c-bae3-4216-b968-92233a949c83@github.com> References: <0mxhhLR9DDdpiBhbXWPxCmMzo_OI9DYiIX36xGhxC_c=.1eed99b9-2dc3-4403-b3c1-22b6ebed6079@github.com> <-QQZANSu7PVpZ3CUX7kbp00f_xkhjiy7la3Z3gXutKA=.a1c938e4-12e0-44b8-a2dd-7cc0c0c9f09f@github.com> <1HiTgoB9oAtaZk5ubbAw6fW5c6c4XAgEjIV0xAtOc5Q=.08d9d0f2-0d76-4fbe-bc2a-4c80037aad43@github.com> <4O0GWJlH3hqZoS74u-cG95rPRNZkzhTj-uINcKxHXNk=.aeca474c-bae3-4216-b968-92233a949c83@github.com> Message-ID: On Thu, 4 Jan 2024 20:10:56 GMT, Xin Liu wrote: >> src/hotspot/share/opto/parse1.cpp line 414: >> >>> 412: if (PrintCompilation || PrintOpto) { >>> 413: // Make sure I have an inline tree, so I can print messages about it. >>> 414: InlineTree::find_subtree_from_root(C->ilt(), caller, parse_method); >> >> Reading this again, you sure that we don't need `caller->caller()` on `is_osr_parse()` path? > > first of all, is_osr_parse() was false at line 415 because _entry_bci was assigned to InvocationEntryBci right before. That's why I use *caller* directly. > > Even we consider to build InlineTree for OSR, I don't think caller->caller() is correct. > I explain this in item 2 here. > https://github.com/openjdk/jdk/pull/16669#issuecomment-1820258714 Ah OK, trippy... All good then. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16669#discussion_r1442211363 From dlong at openjdk.org Fri Jan 5 01:01:26 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 5 Jan 2024 01:01:26 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v4] In-Reply-To: References: Message-ID: On Thu, 4 Jan 2024 08:12:22 GMT, Roland Westrelin wrote: >> This change implements C2 optimizations for calls to >> ScopedValue.get(). Indeed, in: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> `v2` can be replaced by `v1` and the second call to `get()` can be >> optimized out. That's true whatever is between the 2 calls unless a >> new mapping for `scopedValue` is created in between (when that happens >> no optimizations is performed for the method being compiled). Hoisting >> a `get()` call out of loop for a loop invariant `scopedValue` should >> also be legal in most cases. >> >> `ScopedValue.get()` is implemented in java code as a 2 step process. A >> cache is attached to the current thread object. If the `ScopedValue` >> object is in the cache then the result from `get()` is read from >> there. Otherwise a slow call is performed that also inserts the >> mapping in the cache. The cache itself is lazily allocated. One >> `ScopedValue` can be hashed to 2 different indexes in the cache. On a >> cache probe, both indexes are checked. As a consequence, the process >> of probing the cache is a multi step process (check if the cache is >> present, check first index, check second index if first index >> failed). If the cache is populated early on, then when the method that >> calls `ScopedValue.get()` is compiled, profile reports the slow path >> as never taken and only the read from the cache is compiled. >> >> To perform the optimizations, I added 3 new node types to C2: >> >> - the pair >> ScopedValueGetHitsInCacheNode/ScopedValueGetLoadFromCacheNode for >> the cache probe >> >> - a cfg node ScopedValueGetResultNode to help locate the result of the >> `get()` call in the IR graph. >> >> In pseudo code, once the nodes are inserted, the code of a `get()` is: >> >> >> hits_in_the_cache = ScopedValueGetHitsInCache(scopedValue) >> if (hits_in_the_cache) { >> res = ScopedValueGetLoadFromCache(hits_in_the_cache); >> } else { >> res = ..; //slow call possibly inlined. Subgraph can be arbitray complex >> } >> res = ScopedValueGetResult(res) >> >> >> In the snippet: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> Replacing `v2` by `v1` is then done by starting from the >> `ScopedValueGetResult` node for the second `get()` and looking for a >> dominating `ScopedValueGetResult` for the same `ScopedValue` >> object. When one is found, it is used as a replacement. Eliminating >> the second `get()` call is achieved by making >> `ScopedValueGetHitsInCache` always successful if there's a dominating >> `Scoped... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > merge fix I'm not a C2 expert, so my high-level comments might not all make sense, but here goes. - My first reaction was why does this need to be so complicated? Can't we treat the slow path and cache implementation as a black box, and just common up the high-level get()? In my mind the ScopedValue get should be similar to a read from a "condy" dynamic constant. - The reason for breaking up the operations into individual nodes seems to be because of the profiling information. So I'm wondering how much this helps, given the added complexity. - I would expect multiple get() calls in the same method or in loops to be rare and/or programmer errors. The real advantage seems to be when the binding and the get() can be connected from caller to callee through inlining. - Are we able to optimize a get() on a constant/final ScopedValue into a simple array load at a constant offset? - Needing to do things like treat ScopedValueGetHitsInCache as always successful give be a bad feeling for some reason, and seem unnecessary if we did more at a higher (macro?) level rather than eagerly expanding the high-level operation into individual nodes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-1877968950 From jbhateja at openjdk.org Fri Jan 5 07:08:35 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 5 Jan 2024 07:08:35 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v3] In-Reply-To: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Message-ID: > Hi, > > Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only targets. > Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 instruction set. > These are very frequently used APIs in columnar database filter operation. > > Implementation uses a lookup table to record permute indices. Table index is computed using > mask argument of compress/expand operation. > > Following are the performance number of JMH micro included with the patch. > > > System : Intel(R) Xeon(R) Platinum 8480+ (Sapphire Rapids) > > Baseline: > Benchmark (size) Mode Cnt Score Error Units > ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 142.767 ops/ms > ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 71.436 ops/ms > ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 35.992 ops/ms > ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 182.151 ops/ms > ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 91.096 ops/ms > ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 44.757 ops/ms > ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 184.099 ops/ms > ColumnFilterBenchmark.filterIntColumn 2047 thrpt 2 91.981 ops/ms > ColumnFilterBenchmark.filterIntColumn 4096 thrpt 2 45.170 ops/ms > ColumnFilterBenchmark.filterLongColumn 1024 thrpt 2 148.017 ops/ms > ColumnFilterBenchmark.filterLongColumn 2047 thrpt 2 73.516 ops/ms > ColumnFilterBenchmark.filterLongColumn 4096 thrpt 2 36.844 ops/ms > > Withopt: > Benchmark (size) Mode Cnt Score Error Units > ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 2051.707 ops/ms > ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 914.072 ops/ms > ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 489.898 ops/ms > ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 5324.195 ops/ms > ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 2587.229 ops/ms > ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 1278.665 ops/ms > ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 4149.384 ops/ms > ColumnFilterBenchmark.filterIntColumn 2047 thrpt 2 1791.170 ops/ms > ColumnFilterBenchmark.filterIntColumn 4096... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolutions. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17261/files - new: https://git.openjdk.org/jdk/pull/17261/files/6bd9b0ad..ea0aa0b4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17261&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17261&range=01-02 Stats: 49 lines in 4 files changed: 44 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/17261.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17261/head:pull/17261 PR: https://git.openjdk.org/jdk/pull/17261 From jbhateja at openjdk.org Fri Jan 5 07:08:37 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 5 Jan 2024 07:08:37 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2] In-Reply-To: References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Message-ID: On Thu, 4 Jan 2024 13:41:40 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Updating copyright year of modified files. > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 5307: > >> 5305: assert(bt == T_LONG || bt == T_DOUBLE, ""); >> 5306: vmovmskpd(rtmp, mask, vec_enc); >> 5307: shlq(rtmp, 5); > > Might this need to be 6? If I understand right, then you want to have a 64bit stride, hence 2^6, right? > If that is correct, then this did not show in your tests, and you need a regression test anyway. This computes the byte offset from start of the table, both integer and long permute table have same row sizes, 8 int elements vs 4 long elements. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1442555037 From jbhateja at openjdk.org Fri Jan 5 07:08:39 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 5 Jan 2024 07:08:39 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2] In-Reply-To: References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Message-ID: <2ykpNJfFJYotuyI59zfh966TIQYUF2Id6NR56zpq_Vw=.6b3b5411-e086-45df-8666-2496ac013548@github.com> On Thu, 4 Jan 2024 13:30:24 GMT, Emanuel Peter wrote: >> test/micro/org/openjdk/bench/jdk/incubator/vector/ColumnFilterBenchmark.java line 94: >> >>> 92: IntVector vec = IntVector.fromArray(ispecies, intinCol, i); >>> 93: VectorMask pred = vec.compare(VectorOperators.GT, ipivot); >>> 94: vec.compress(pred).intoArray(intoutCol, j); >> >> Could there be equivalent `expand` tests? > > And what about some result verification? Or is there another test that does that? We do have extensive functional tests for compress/expand APIs in [test/jdk/jdk/incubator/vector](https://github.com/openjdk/jdk/tree/master/test/jdk/jdk/incubator/vector) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1442554968 From jbhateja at openjdk.org Fri Jan 5 07:08:40 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 5 Jan 2024 07:08:40 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2] In-Reply-To: <2ykpNJfFJYotuyI59zfh966TIQYUF2Id6NR56zpq_Vw=.6b3b5411-e086-45df-8666-2496ac013548@github.com> References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> <2ykpNJfFJYotuyI59zfh966TIQYUF2Id6NR56zpq_Vw=.6b3b5411-e086-45df-8666-2496ac013548@github.com> Message-ID: On Fri, 5 Jan 2024 07:03:26 GMT, Jatin Bhateja wrote: >> And what about some result verification? Or is there another test that does that? > > We do have extensive functional tests for compress/expand APIs in [test/jdk/jdk/incubator/vector](https://github.com/openjdk/jdk/tree/master/test/jdk/jdk/incubator/vector) > Could there be equivalent `expand` tests? Here are the performance number for existing [VectorAPI JMH micros.](https://github.com/openjdk/panama-vector/tree/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation) ![image](https://github.com/openjdk/jdk/assets/59989778/4b260814-3d3c-4e9b-b81a-61492ea48cce) ![image](https://github.com/openjdk/jdk/assets/59989778/50048281-ad50-44f6-a875-308e02537be2) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1442556253 From jbhateja at openjdk.org Fri Jan 5 07:11:23 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 5 Jan 2024 07:11:23 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2] In-Reply-To: References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Message-ID: <_guczAND7qope6gMYcZVaolzJE0FnlRfhm9RsgFS5eY=.15982e8f-229f-4d8d-a184-06a62288775a@github.com> On Thu, 4 Jan 2024 13:33:08 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Updating copyright year of modified files. > > test/micro/org/openjdk/bench/jdk/incubator/vector/ColumnFilterBenchmark.java line 76: > >> 74: longinCol = new long[size]; >> 75: longoutCol = new long[size]; >> 76: lpivot = size / 2; > > I'd be interested to see what happens if you move up or down the "density" of elements that you accept. Would the simple branch prediction be faster if the density is low enough, i.e. we almost take no element. > > Though maybe that is not compiler problem but a user-problem? Included fuzzy filter micro with varying mask density. ![image](https://github.com/openjdk/jdk/assets/59989778/a6af21cc-36c0-4503-aeb3-e66b862da2e1) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1442557565 From thartmann at openjdk.org Fri Jan 5 07:16:23 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 5 Jan 2024 07:16:23 GMT Subject: RFR: 8310844: [AArch64] C1 compilation fails because monitor offset in OSR buffer is too large for immediate [v2] In-Reply-To: References: <6vNa0ZYwKL0HXjHjWTyIe6kB5hza1aaXlJbxM4qh8-w=.edc55dc9-08ff-4861-9038-d4b150bcf4f7@github.com> <2Hnw1VDu1Ml2XB-VfUV8l-t8dtrMzPNh3RyNPfV-tgA=.fc400df2-9529-4bdd-8e0c-fb49ca4f0b48@github.com> Message-ID: On Thu, 4 Jan 2024 16:05:49 GMT, Andrew Haley wrote: > Memo to myself: there are no trivial performance fixes I'll copy that memo, it did look harmless at the time. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17266#issuecomment-1878231823 From thartmann at openjdk.org Fri Jan 5 07:16:24 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 5 Jan 2024 07:16:24 GMT Subject: RFR: 8310844: [AArch64] C1 compilation fails because monitor offset in OSR buffer is too large for immediate [v2] In-Reply-To: References: <6vNa0ZYwKL0HXjHjWTyIe6kB5hza1aaXlJbxM4qh8-w=.edc55dc9-08ff-4861-9038-d4b150bcf4f7@github.com> Message-ID: On Thu, 4 Jan 2024 16:19:22 GMT, Andrew Dinn wrote: >> I'm not sure why the recommended adjustment is needed. The macro assembler does fuse pairs of adjacent ldr instructions into an ldp but only when the sizes match and the offsets fit into the requisite number of bits. >> >> So, if the two ldr instrctions ar egenerated next to each other the macroasembler should only convert to ldp *where appropriate*. Am I missing something here? > > Doh, sorry - I misread Andrew's proposed code! Ignore the noise. Thanks for looking at this @adinn. Right, the macro assembler merge magic is nice, I didn't know about it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17266#discussion_r1442560775 From epeter at openjdk.org Fri Jan 5 08:25:25 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Jan 2024 08:25:25 GMT Subject: RFR: 8322735: C2: minor improvements of bubble sort used in SuperWord::packset_sort In-Reply-To: References: Message-ID: On Mon, 25 Dec 2023 15:34:55 GMT, Denghui Dong wrote: > A minor improvement could be made for bubble sort in SuperWord::packset_sort to reduce the comparison count in bad cases. > > See https://en.wikipedia.org/wiki/Bubble_sort Ok. This change is fine with me. Thanks for taking the time to look into this :) I was just curious what was your motivation. I may completely redo this code once I remove the alignment constraints (here used for sorting), but that will have to be decided in a few months. Please do the renaming, and then I can run testing and give you my approval. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17190#issuecomment-1878293104 From epeter at openjdk.org Fri Jan 5 08:28:37 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Jan 2024 08:28:37 GMT Subject: RFR: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs [v58] In-Reply-To: <1oTqtU5lm0As9tKfnWuGNh2sHXfQimLdMCzV2g1D2ho=.6a5a20f7-3ab3-43fb-b640-fa043131fef8@github.com> References: <9zUqzeJrnPyjjEC0_F9Z5OWzHLqnTN_5c1bzOiK-LqA=.050c0992-7f7f-48f3-b2da-ec81aafa41a6@github.com> <1oTqtU5lm0As9tKfnWuGNh2sHXfQimLdMCzV2g1D2ho=.6a5a20f7-3ab3-43fb-b640-fa043131fef8@github.com> Message-ID: On Thu, 4 Jan 2024 16:36:19 GMT, Vladimir Kozlov wrote: >> If you really want, then I can not touch `VectorizeDebugOption` at all, i.e. not activate `is_trace_align_vector` with that flag, but instead simply use `TraceSuperWord` (that might be a little verbose though). >> >> I already have the CSR for [JDK-8315361](https://bugs.openjdk.org/browse/JDK-8315361), so that I can remove `VectorizeDebugOption`. This has 2 effects: >> 1. remove the product effect of `VectorizeDebugOption`, which is the same effect as enabling `VectorizeOption`. >> 2. introduce a more general auto-vectorization tracing flag that allows more fine-grained control for debug printing. >> >> My idea here was to simply add the alignment tracing to `VectorizeDebugOption`. But currently one cannot enable that tracing without having the side-effects that also `VectorizeOption` has. Hence, I already now remove that product-side effect. >> >> @vnkozlov what do you think? > > I missed that in your long description ;^) > I agree with your suggestion. The option was indeed strange: mixing prints with affects on code. Ok, great, I will leave it then ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1442607787 From epeter at openjdk.org Fri Jan 5 08:38:36 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Jan 2024 08:38:36 GMT Subject: RFR: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs [v58] In-Reply-To: <8FqVnqkZB8_FsKx6kJfkJN8Oi90W6hAeBNl8uQy_QwE=.7d6061b5-4a3d-4d3f-aaa4-1c30151d066d@github.com> References: <9qbGMjWjQfwA5r6XimWz_GZUdQEgo6GEzY-uYwIoP4s=.9c68a4f4-a82b-4760-a428-3b5025eda61b@github.com> <8FqVnqkZB8_FsKx6kJfkJN8Oi90W6hAeBNl8uQy_QwE=.7d6061b5-4a3d-4d3f-aaa4-1c30151d066d@github.com> Message-ID: On Thu, 4 Jan 2024 16:51:19 GMT, Vladimir Kozlov wrote: >> Can you show assembler code for simple load and store instructions (move data from one array to another)? >> My concern is that LoadV and StoreV are defined only with `memory` input: >> >> instruct loadV(vec dst, memory mem) %{ >> match(Set dst (LoadVector mem)); >> >> I would assume it will be embedded memory only. But C2 may be smart enough to generate `lea` if it sees not AddP node. > > Also why your assembler example have tested alignment twice for the same address? May be because the same array's element for load and store?: > > 0x00007f83c8bb2f6d: mov %r10,%r8 > 0x00007f83c8bb2f70: test $0x7,%r8b > 0x00007f83c8bb2f74: je 0x00007f83c8bb2f8a > ... > 0x00007f83c8bb2f8a: test $0x7,%r10b > 0x00007f83c8bb2f8e: je 0x00007f83c8bb2fa4 > > No need to optimize I think since it is only for debugging. @vnkozlov > Can you show assembler code for simple load and store instructions (move data from one array to another)? Here the example with simple load -> store with two different arrays: public class Test { static int RANGE = 1024*64; public static void main(String[] strArr) { int a[] = new int[RANGE]; int b[] = new int[RANGE]; test0(a, b); } static void test0(int[] a, int[] b) { for (int i = 0; i < RANGE; i++) { a[i] = b[i]; } } } With `-XX:+VerifyAlignVector`: `./java -XX:CompileCommand=compileonly,Test::test* -XX:+TraceSuperWord -Xcomp -XX:+PrintIdeal -XX:+AlignVector -XX:+VerifyAlignVector -XX:CompileCommand=print,Test::test* Test.java` ;; B32: # out( B32 B33 ) <- in( B31 B32 ) Loop( B32-B32 inner post of N1028) Freq: 4.49976 0x00007fbef8bb31ec: movslq %ebx,%r10 0x00007fbef8bb31ef: shl $0x2,%r10 ;*iaload {reexecute=0 rethrow=0 return_oop=0} ; - Test::test0 at 13 (line 12) 0x00007fbef8bb31f3: lea 0x10(%r13,%r10,1),%r8 0x00007fbef8bb31f8: lea 0x10(%r11,%r10,1),%r10 0x00007fbef8bb31fd: test $0x7,%r8b 0x00007fbef8bb3201: je 0x00007fbef8bb3217 0x00007fbef8bb3203: movabs $0x7fbf08c15fc8,%rdi ; {external_word} 0x00007fbef8bb320d: and $0xfffffffffffffff0,%rsp 0x00007fbef8bb3211: callq 0x00007fbf085d3162 ; {runtime_call MacroAssembler::debug64(char*, long, long*)} 0x00007fbef8bb3216: hlt 0x00007fbef8bb3217: vmovdqu32 (%r8),%zmm0 0x00007fbef8bb321d: test $0x7,%r10b 0x00007fbef8bb3221: je 0x00007fbef8bb3237 0x00007fbef8bb3223: movabs $0x7fbf08c15fc8,%rdi ; {external_word} 0x00007fbef8bb322d: and $0xfffffffffffffff0,%rsp 0x00007fbef8bb3231: callq 0x00007fbf085d3162 ; {runtime_call MacroAssembler::debug64(char*, long, long*)} 0x00007fbef8bb3236: hlt 0x00007fbef8bb3237: vmovdqu32 %zmm0,(%r10) ;*iastore {reexecute=0 rethrow=0 return_oop=0} ; - Test::test0 at 14 (line 12) 0x00007fbef8bb323d: add $0x10,%ebx ;*iinc {reexecute=0 rethrow=0 return_oop=0} ; - Test::test0 at 15 (line 11) 0x00007fbef8bb3240: cmp %r9d,%ebx 0x00007fbef8bb3243: jl 0x00007fbef8bb31ec ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0} ; - Test::test0 at 6 (line 11) With `-XX:-VerifyAlignVector`: `./java -XX:CompileCommand=compileonly,Test::test* -XX:+TraceSuperWord -Xcomp -XX:+PrintIdeal -XX:+AlignVector -XX:-VerifyAlignVector -XX:CompileCommand=print,Test::test* Test.java` ;; B30: # out( B30 B31 ) <- in( B29 B30 ) Loop( B30-B30 inner post of N1028) Freq: 4.49976 0x00007f90e4bb2ab8: vmovdqu32 0x10(%rbx,%r13,4),%zmm0 0x00007f90e4bb2ac3: vmovdqu32 %zmm0,0x10(%rcx,%r13,4) ;*iastore {reexecute=0 rethrow=0 return_oop=0} ; - Test::test0 at 14 (line 12) 0x00007f90e4bb2ace: add $0x10,%r13d ;*iinc {reexecute=0 rethrow=0 return_oop=0} ; - Test::test0 at 15 (line 11) 0x00007f90e4bb2ad2: cmp %r11d,%r13d 0x00007f90e4bb2ad5: jl 0x00007f90e4bb2ab8 ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0} ; - Test::test0 at 6 (line 11) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1442616543 From epeter at openjdk.org Fri Jan 5 08:51:37 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Jan 2024 08:51:37 GMT Subject: RFR: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs [v58] In-Reply-To: References: <9qbGMjWjQfwA5r6XimWz_GZUdQEgo6GEzY-uYwIoP4s=.9c68a4f4-a82b-4760-a428-3b5025eda61b@github.com> <8FqVnqkZB8_FsKx6kJfkJN8Oi90W6hAeBNl8uQy_QwE=.7d6061b5-4a3d-4d3f-aaa4-1c30151d066d@github.com> Message-ID: On Fri, 5 Jan 2024 08:35:46 GMT, Emanuel Peter wrote: >> Also why your assembler example have tested alignment twice for the same address? May be because the same array's element for load and store?: >> >> 0x00007f83c8bb2f6d: mov %r10,%r8 >> 0x00007f83c8bb2f70: test $0x7,%r8b >> 0x00007f83c8bb2f74: je 0x00007f83c8bb2f8a >> ... >> 0x00007f83c8bb2f8a: test $0x7,%r10b >> 0x00007f83c8bb2f8e: je 0x00007f83c8bb2fa4 >> >> No need to optimize I think since it is only for debugging. > > @vnkozlov >> Can you show assembler code for simple load and store instructions (move data from one array to another)? > > Here the example with simple load -> store with two different arrays: > > public class Test { > static int RANGE = 1024*64; > > public static void main(String[] strArr) { > int a[] = new int[RANGE]; > int b[] = new int[RANGE]; > test0(a, b); > } > > static void test0(int[] a, int[] b) { > for (int i = 0; i < RANGE; i++) { > a[i] = b[i]; > } > } > } > > > With `-XX:+VerifyAlignVector`: > `./java -XX:CompileCommand=compileonly,Test::test* -XX:+TraceSuperWord -Xcomp -XX:+PrintIdeal -XX:+AlignVector -XX:+VerifyAlignVector -XX:CompileCommand=print,Test::test* Test.java` > > > ;; B32: # out( B32 B33 ) <- in( B31 B32 ) Loop( B32-B32 inner post of N1028) Freq: 4.49976 > 0x00007fbef8bb31ec: movslq %ebx,%r10 > 0x00007fbef8bb31ef: shl $0x2,%r10 ;*iaload {reexecute=0 rethrow=0 return_oop=0} > ; - Test::test0 at 13 (line 12) > 0x00007fbef8bb31f3: lea 0x10(%r13,%r10,1),%r8 > 0x00007fbef8bb31f8: lea 0x10(%r11,%r10,1),%r10 > 0x00007fbef8bb31fd: test $0x7,%r8b > 0x00007fbef8bb3201: je 0x00007fbef8bb3217 > 0x00007fbef8bb3203: movabs $0x7fbf08c15fc8,%rdi ; {external_word} > 0x00007fbef8bb320d: and $0xfffffffffffffff0,%rsp > 0x00007fbef8bb3211: callq 0x00007fbf085d3162 ; {runtime_call MacroAssembler::debug64(char*, long, long*)} > 0x00007fbef8bb3216: hlt > 0x00007fbef8bb3217: vmovdqu32 (%r8),%zmm0 > 0x00007fbef8bb321d: test $0x7,%r10b > 0x00007fbef8bb3221: je 0x00007fbef8bb3237 > 0x00007fbef8bb3223: movabs $0x7fbf08c15fc8,%rdi ; {external_word} > 0x00007fbef8bb322d: and $0xfffffffffffffff0,%rsp > 0x00007fbef8bb3231: callq 0x00007fbf085d3162 ; {runtime_call MacroAssembler::debug64(char*, long, long*)} > 0x00007fbef8bb3236: hlt > 0x00007fbef8bb3237: vmovdqu32 %zmm0,(%r10) ;*iastore {reexecute=0 rethrow=0 return_oop=0} > ; - Test::test0 at 14 (line 12) > 0x00007fbef8bb323d: add $0x10,%ebx ;*iinc {reexecute=0 rethrow=0 return_oop=0} > ; - Test::test0 at 15 (line 11) > 0x00007fbef8bb3240: cmp %r9d,%ebx > 0x00007fbef8bb3243: jl 0x00007fbef8bb31ec ;*if_icmpge {reexecute=0 rethrow... > My concern is that LoadV and StoreV are defined only with memory input // Indirect Memory Operand operand indirect(any_RegP reg) %{ constraint(ALLOC_IN_RC(ptr_reg)); match(reg); format %{ "[$reg]" %} interface(MEMORY_INTER) %{ base($reg); index(0x4); scale(0x0); disp(0x0); %} %} opclass memory(indirect, indOffset8, indOffset32, indIndexOffset, indIndex, indIndexScale, indPosIndexScale, indIndexScaleOffset, indPosIndexOffset, indPosIndexScaleOffset, indCompressedOopOffset, indirectNarrow, indOffset8Narrow, indOffset32Narrow, indIndexOffsetNarrow, indIndexNarrow, indIndexScaleNarrow, indIndexScaleOffsetNarrow, indPosIndexOffsetNarrow, indPosIndexScaleOffsetNarrow); It seems that `memory` summarizes many different patterns. One of them is the `indirect` one, which simply loads the address from a register. In our case this address was computed by a `lea`, then used in the alignment verification, and then passed on as `memory` to load / store. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1442625027 From epeter at openjdk.org Fri Jan 5 08:51:38 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Jan 2024 08:51:38 GMT Subject: RFR: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs [v58] In-Reply-To: References: <9qbGMjWjQfwA5r6XimWz_GZUdQEgo6GEzY-uYwIoP4s=.9c68a4f4-a82b-4760-a428-3b5025eda61b@github.com> <8FqVnqkZB8_FsKx6kJfkJN8Oi90W6hAeBNl8uQy_QwE=.7d6061b5-4a3d-4d3f-aaa4-1c30151d066d@github.com> Message-ID: On Fri, 5 Jan 2024 08:47:09 GMT, Emanuel Peter wrote: >> @vnkozlov >>> Can you show assembler code for simple load and store instructions (move data from one array to another)? >> >> Here the example with simple load -> store with two different arrays: >> >> public class Test { >> static int RANGE = 1024*64; >> >> public static void main(String[] strArr) { >> int a[] = new int[RANGE]; >> int b[] = new int[RANGE]; >> test0(a, b); >> } >> >> static void test0(int[] a, int[] b) { >> for (int i = 0; i < RANGE; i++) { >> a[i] = b[i]; >> } >> } >> } >> >> >> With `-XX:+VerifyAlignVector`: >> `./java -XX:CompileCommand=compileonly,Test::test* -XX:+TraceSuperWord -Xcomp -XX:+PrintIdeal -XX:+AlignVector -XX:+VerifyAlignVector -XX:CompileCommand=print,Test::test* Test.java` >> >> >> ;; B32: # out( B32 B33 ) <- in( B31 B32 ) Loop( B32-B32 inner post of N1028) Freq: 4.49976 >> 0x00007fbef8bb31ec: movslq %ebx,%r10 >> 0x00007fbef8bb31ef: shl $0x2,%r10 ;*iaload {reexecute=0 rethrow=0 return_oop=0} >> ; - Test::test0 at 13 (line 12) >> 0x00007fbef8bb31f3: lea 0x10(%r13,%r10,1),%r8 >> 0x00007fbef8bb31f8: lea 0x10(%r11,%r10,1),%r10 >> 0x00007fbef8bb31fd: test $0x7,%r8b >> 0x00007fbef8bb3201: je 0x00007fbef8bb3217 >> 0x00007fbef8bb3203: movabs $0x7fbf08c15fc8,%rdi ; {external_word} >> 0x00007fbef8bb320d: and $0xfffffffffffffff0,%rsp >> 0x00007fbef8bb3211: callq 0x00007fbf085d3162 ; {runtime_call MacroAssembler::debug64(char*, long, long*)} >> 0x00007fbef8bb3216: hlt >> 0x00007fbef8bb3217: vmovdqu32 (%r8),%zmm0 >> 0x00007fbef8bb321d: test $0x7,%r10b >> 0x00007fbef8bb3221: je 0x00007fbef8bb3237 >> 0x00007fbef8bb3223: movabs $0x7fbf08c15fc8,%rdi ; {external_word} >> 0x00007fbef8bb322d: and $0xfffffffffffffff0,%rsp >> 0x00007fbef8bb3231: callq 0x00007fbf085d3162 ; {runtime_call MacroAssembler::debug64(char*, long, long*)} >> 0x00007fbef8bb3236: hlt >> 0x00007fbef8bb3237: vmovdqu32 %zmm0,(%r10) ;*iastore {reexecute=0 rethrow=0 return_oop=0} >> ; - Test::test0 at 14 (line 12) >> 0x00007fbef8bb323d: add $0x10,%ebx ;*iinc {reexecute=0 rethrow=0 return_oop=0} >> ; - Test::test0 at 15 (line 11) >> 0x00007fbef8bb3240: cmp %r9... > >> My concern is that LoadV and StoreV are defined only with memory input > > > // Indirect Memory Operand > operand indirect(any_RegP reg) > %{ > constraint(ALLOC_IN_RC(ptr_reg)); > match(reg); > > format %{ "[$reg]" %} > interface(MEMORY_INTER) %{ > base($reg); > index(0x4); > scale(0x0); > disp(0x0); > %} > %} > > > > opclass memory(indirect, indOffset8, indOffset32, indIndexOffset, indIndex, > indIndexScale, indPosIndexScale, indIndexScaleOffset, indPosIndexOffset, indPosIndexScaleOffset, > indCompressedOopOffset, > indirectNarrow, indOffset8Narrow, indOffset32Narrow, > indIndexOffsetNarrow, indIndexNarrow, indIndexScaleNarrow, > indIndexScaleOffsetNarrow, indPosIndexOffsetNarrow, indPosIndexScaleOffsetNarrow); > > It seems that `memory` summarizes many different patterns. One of them is the `indirect` one, which simply loads the address from a register. In our case this address was computed by a `lea`, then used in the alignment verification, and then passed on as `memory` to load / store. > Also why your assembler example have tested alignment twice for the same address? May be because the same array's element for load and store? Yes, exactly. I emit verification for every loadV / storeV. And since it is debug only, and only with the extra flag `-XX:+VerifyAlignVector` I thought optimizing is not necessary. And it seems you agree with that :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1442626289 From chagedorn at openjdk.org Fri Jan 5 08:52:28 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 5 Jan 2024 08:52:28 GMT Subject: RFR: 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 [v9] In-Reply-To: References: Message-ID: <0467mugqEs3f6AxCG9krY6cAXXuqRJKptnwEaLJ0gIA=.ce7c02be-1d51-470f-b309-380f1de83f30@github.com> On Wed, 3 Jan 2024 15:53:04 GMT, Roland Westrelin wrote: >> Range check smearing and range check predication make an array access >> dependent on 2 (or more in the case of RC smearing) conditions. As a >> consequence, if a range check can be eliminated because there's an >> identical dominating range check, the control dependent nodes that >> could float and become dependent on the dominating range check cannot >> be allowed to float because there's a risk that they would then bypass >> one of the checks that make the access legal. >> >> `IfNode::dominated_by()` and `PhaseIdealLoop::dominated_by()` have >> logic to prevent this: nodes that are control dependent on a range >> check or predicate are not allowed to float. This is however not >> sufficient as demonstrated by the test cases. >> >> In `TestArrayAccessAboveRCAfterSmearingOrPredication.testRangeCheckSmearing()`: >> >> >> v += array[i]; >> if (flag2) { >> if (flag3) { >> field = 0x42; >> } >> } >> if (flagField == 1) { >> v += array[i]; >> } >> >> >> The range check for the second `array[i]` load is replaced by the >> dominating range check for the first `array[i]` but because the second >> `array[i]` load could really be dependent on multiple range checks (in >> case smearing happened which is not the case here), c2 doesn't allow >> the second `array[i]` to float when the second range check is >> removed. The second `array[i]` is then control dependent on: >> >> >> if (flagField == 1) { >> >> >> which is next found to be dominated by the same test: >> >> >> if (flag == 1) { >> >> >> and is removed. However nothing in `dominated_by()` treats node >> dependent on tests that are not range check or predicates >> specially. So the second `array[i]` is allowed to float and become >> dependent on: >> >> >> if (flag == 1) { >> >> >> which is above the range check for that access. The test method in its >> last invocation is passed an index for the array access that's widely >> out of range. The array load happens before the range check and >> crashes the VM. `testLoopPredication()` is a similar test where array >> loads become dependent on predicates and end up above range checks. >> >> `TestArrayAccessCastIIAboveRC.java` is the test case from the bug >> where for similar reasons a range check `CastII` ends up above its >> range check, becomes top because its input becomes some integer that >> conflicts with its... > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: > > - Merge branch 'master' into JDK-8319793 > - review > - Revert "Update src/hotspot/share/opto/castnode.hpp" > > This reverts commit 356c91cca911ed486f9f87f3eff53ce21e1e3ec9. > - Revert "Update src/hotspot/share/opto/memnode.hpp" > > This reverts commit bdb731ea562f314f44d327f7243ef5cf9ad40b2e. > - review > - Update src/hotspot/share/opto/memnode.hpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/castnode.hpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/castnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/ifnode.cpp > > Co-authored-by: Christian Hagedorn > - Merge branch 'master' into JDK-8319793 > - ... and 1 more: https://git.openjdk.org/jdk/compare/15519285...dbe3c4c1 src/hotspot/share/opto/loopopts.cpp line 345: > 343: > 344: if (dp == nullptr) > 345: return; Since we bail out above if `iff->outcnt() != 2` (can it even be that we have an `If` at this point which does not have 2 out projections?) this bailout seems redundant. Looks like it was only added due to a parfait report with https://github.com/openjdk/jdk/commit/25c4a7fccdbdaa9da0a7aa5e04e80966138fe42c. Maybe we can remove that as well and change `proj_out_or_null()` back to `proj_out()` (not sure though if parfait will then report this again). But could also be done separately. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1442626740 From ddong at openjdk.org Fri Jan 5 08:57:33 2024 From: ddong at openjdk.org (Denghui Dong) Date: Fri, 5 Jan 2024 08:57:33 GMT Subject: RFR: 8322735: C2: minor improvements of bubble sort used in SuperWord::packset_sort [v2] In-Reply-To: References: Message-ID: <1W_HcoiwEnXLvtqprc7L8mghr0BRH3a_ITn6Cerzb_c=.f6c90e24-7d41-418d-bc50-41bff396663b@github.com> > A minor improvement could be made for bubble sort in SuperWord::packset_sort to reduce the comparison count in bad cases. > > See https://en.wikipedia.org/wiki/Bubble_sort Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: update ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17190/files - new: https://git.openjdk.org/jdk/pull/17190/files/7d64bd8d..ba53ed56 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17190&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17190&range=00-01 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/17190.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17190/head:pull/17190 PR: https://git.openjdk.org/jdk/pull/17190 From ddong at openjdk.org Fri Jan 5 08:57:35 2024 From: ddong at openjdk.org (Denghui Dong) Date: Fri, 5 Jan 2024 08:57:35 GMT Subject: RFR: 8322735: C2: minor improvements of bubble sort used in SuperWord::packset_sort In-Reply-To: References: Message-ID: On Fri, 5 Jan 2024 08:22:23 GMT, Emanuel Peter wrote: > Please do the renaming, and then I can run testing and give you my approval. Updated. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17190#issuecomment-1878329186 From davleopo at openjdk.org Fri Jan 5 09:02:23 2024 From: davleopo at openjdk.org (David Leopoldseder) Date: Fri, 5 Jan 2024 09:02:23 GMT Subject: RFR: 8322636: [JVMCI] HotSpotSpeculationLog can be inconsistent across a single compile [v2] In-Reply-To: References: Message-ID: On Wed, 3 Jan 2024 20:39:21 GMT, Tom Rodriguez wrote: >> David Leopoldseder has updated the pull request incrementally with one additional commit since the last revision: >> >> 8322636: [JVMCI] HotSpotSpeculationLog add javadoc to maySpeculate > > So I looked more closely the HotSpot and substrate implementations and I'm not sure we can currently align the implementation and the javadoc. In the HotSpot world, HotSpotSpeculationLog is a compiler local object that reads data from the real speculation data that's kept in the MDO. This means that it has full control over when collectFailedSpeculations is called. SubstrateSpeculationLog is the actual log so if two threads are operating on the same log then one of them could see the effects of a call to collectFailedSpeculations by the other thread. Maybe in practice 2 threads never do this because it would mean they are compiling the same root method but it doesn't seem guaranteed. installCode on substrate also doesn't perform the speculation log check that HotSpot does. So maybe we punt on javadoc updates for now. @tkrodriguez please sponsor ------------- PR Comment: https://git.openjdk.org/jdk/pull/17183#issuecomment-1878335348 From roland at openjdk.org Fri Jan 5 09:17:27 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 5 Jan 2024 09:17:27 GMT Subject: RFR: 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 [v9] In-Reply-To: References: Message-ID: <6tg6hV9e4ZDXHm5x22pLlKgs1hC6wLyIU6Jr14oJafY=.b20d08b1-1734-4136-a447-4c36aa92fb68@github.com> On Thu, 4 Jan 2024 15:32:21 GMT, Emanuel Peter wrote: > Generally, I'm not really happy with the name of `UnknownControl`. Sounds like the control is unknown. In what sense is it unknown, after all we have a control and want the Load to be pinned to it...? `UnknownControl` was not added by this change. > Maybe then we could rename `pin_for_array_access` -> `make_pinned`. But `make_pinned` seems to imply that it operates on any node type when it only does something for a subset of nodes (those used for array accesses). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1442649362 From thartmann at openjdk.org Fri Jan 5 09:34:30 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 5 Jan 2024 09:34:30 GMT Subject: RFR: 8323012: C2 fails with fatal error: no reachable node should have no use Message-ID: Incorrect refactoring from [JDK-8322490](https://bugs.openjdk.org/browse/JDK-8322490), probably a copy-paste error, leads to a dead `CastPP` and might have other not yet observed effects. See JBS for details. I wasn't able to extract a regression test for this but it reliably reproduced with replay compilation. Thanks, Tobias ------------- Commit messages: - 8323012: C2 fails with fatal error: no reachable node should have no use Changes: https://git.openjdk.org/jdk/pull/17276/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17276&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8323012 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17276.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17276/head:pull/17276 PR: https://git.openjdk.org/jdk/pull/17276 From roland at openjdk.org Fri Jan 5 09:48:27 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 5 Jan 2024 09:48:27 GMT Subject: RFR: 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 [v9] In-Reply-To: References: Message-ID: On Thu, 4 Jan 2024 16:03:16 GMT, Emanuel Peter wrote: > Actually, the problem is that we **would** have multiple dependency, but we only have one dependency input we can set, hence forgetting about the others. Pinning makes sure that there is no bypassing of dependencies, right? Right. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1442677055 From roland at openjdk.org Fri Jan 5 09:55:30 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 5 Jan 2024 09:55:30 GMT Subject: RFR: 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 [v9] In-Reply-To: References: Message-ID: On Thu, 4 Jan 2024 16:10:24 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/ifnode.cpp line 1958: >> >>> 1956: return nullptr; >>> 1957: } >>> 1958: >> >> Why is it ok to delay this to post-loop-opts? Does it not prevent moving some CFG from being eliminated out of loops? Would be nice to have a little justification comment. > > Ah. Does this mean that if there are multiple RangeCheck in a loop, where some could be smeared, these are not smeared, and then we have more RangeChecks to eliminate out of the loop? Maybe in the end this all comes down to the same anyway. Just wondering. > Why is it ok to delay this to post-loop-opts? Does it not prevent moving some CFG from being eliminated out of loops? Would be nice to have a little justification comment. Maybe. With this fix, range check smearing requires pinning nodes. So running it early also has a drawback: it can cause nodes that would otherwise float to be pinned. The way I see it, range check smearing is a local optimization for cases where range checks can't be eliminated some other way so running it late should not make a difference. If the range check is in a loop and predication removes it then running RC smearing early doesn't make a difference. If the range check is part of a range check sequence that can only be optimized by RC smearing then having a longer range check sequence for the duration of loop opts probably makes no difference. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1442683646 From roland at openjdk.org Fri Jan 5 10:00:37 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 5 Jan 2024 10:00:37 GMT Subject: RFR: 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 [v9] In-Reply-To: References: Message-ID: <6uOq3OJeUPIG2SMHYTKnIA-GHPTIQTobNmvCuKrFNUM=.3e37dadd-2982-423b-86bc-bed54366068a@github.com> On Thu, 4 Jan 2024 16:18:16 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: >> >> - Merge branch 'master' into JDK-8319793 >> - review >> - Revert "Update src/hotspot/share/opto/castnode.hpp" >> >> This reverts commit 356c91cca911ed486f9f87f3eff53ce21e1e3ec9. >> - Revert "Update src/hotspot/share/opto/memnode.hpp" >> >> This reverts commit bdb731ea562f314f44d327f7243ef5cf9ad40b2e. >> - review >> - Update src/hotspot/share/opto/memnode.hpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/castnode.hpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/castnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/ifnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Merge branch 'master' into JDK-8319793 >> - ... and 1 more: https://git.openjdk.org/jdk/compare/15519285...dbe3c4c1 > > src/hotspot/share/opto/loopopts.cpp line 361: > >> 359: return; // Let IGVN transformation change control dependence. >> 360: } >> 361: > > Why it ok to remove this bailout? It's: "IfNode::dominated_by() and PhaseIdealLoop::dominated_by() have logic to prevent this: nodes that are control dependent on a range check or predicate are not allowed to float." that I mentioned in the fix description. It's the way array access nodes are currently prevented from floating above the range checks they depend on. It's flawed, replaced by pinning of the array access nodes in the patch. So this logic is no longer useful. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1442687996 From roland at openjdk.org Fri Jan 5 10:03:32 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 5 Jan 2024 10:03:32 GMT Subject: RFR: 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 [v9] In-Reply-To: <0467mugqEs3f6AxCG9krY6cAXXuqRJKptnwEaLJ0gIA=.ce7c02be-1d51-470f-b309-380f1de83f30@github.com> References: <0467mugqEs3f6AxCG9krY6cAXXuqRJKptnwEaLJ0gIA=.ce7c02be-1d51-470f-b309-380f1de83f30@github.com> Message-ID: <01kR0e8YhFsKsSlClNnbE2A4IDAeJn1q2Xxs3gNxGcU=.0cad5e28-7f54-4f76-b386-56788c94e932@github.com> On Fri, 5 Jan 2024 08:49:12 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: >> >> - Merge branch 'master' into JDK-8319793 >> - review >> - Revert "Update src/hotspot/share/opto/castnode.hpp" >> >> This reverts commit 356c91cca911ed486f9f87f3eff53ce21e1e3ec9. >> - Revert "Update src/hotspot/share/opto/memnode.hpp" >> >> This reverts commit bdb731ea562f314f44d327f7243ef5cf9ad40b2e. >> - review >> - Update src/hotspot/share/opto/memnode.hpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/castnode.hpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/castnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/ifnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Merge branch 'master' into JDK-8319793 >> - ... and 1 more: https://git.openjdk.org/jdk/compare/15519285...dbe3c4c1 > > src/hotspot/share/opto/loopopts.cpp line 345: > >> 343: >> 344: if (dp == nullptr) >> 345: return; > > Since we bail out above if `iff->outcnt() != 2` (can it even be that we have an `If` at this point which does not have 2 out projections?) this bailout seems redundant. Looks like it was only added due to a parfait report with https://github.com/openjdk/jdk/commit/25c4a7fccdbdaa9da0a7aa5e04e80966138fe42c. Maybe we can remove that as well and change `proj_out_or_null()` back to `proj_out()` (not sure though if parfait will then report this again). But could also be done separately. Thanks for the details. Why would it have it been necessary before but no longer necessary now? What is it that has changed so parfait would not complain? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1442690872 From epeter at openjdk.org Fri Jan 5 10:05:22 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Jan 2024 10:05:22 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v3] In-Reply-To: References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Message-ID: On Fri, 5 Jan 2024 07:08:35 GMT, Jatin Bhateja wrote: >> Hi, >> >> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only targets. >> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 instruction set. >> These are very frequently used APIs in columnar database filter operation. >> >> Implementation uses a lookup table to record permute indices. Table index is computed using >> mask argument of compress/expand operation. >> >> Following are the performance number of JMH micro included with the patch. >> >> >> System : Intel(R) Xeon(R) Platinum 8480+ (Sapphire Rapids) >> >> Baseline: >> Benchmark (size) Mode Cnt Score Error Units >> ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 142.767 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 71.436 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 35.992 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 182.151 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 91.096 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 44.757 ops/ms >> ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 184.099 ops/ms >> ColumnFilterBenchmark.filterIntColumn 2047 thrpt 2 91.981 ops/ms >> ColumnFilterBenchmark.filterIntColumn 4096 thrpt 2 45.170 ops/ms >> ColumnFilterBenchmark.filterLongColumn 1024 thrpt 2 148.017 ops/ms >> ColumnFilterBenchmark.filterLongColumn 2047 thrpt 2 73.516 ops/ms >> ColumnFilterBenchmark.filterLongColumn 4096 thrpt 2 36.844 ops/ms >> >> Withopt: >> Benchmark (size) Mode Cnt Score Error Units >> ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 2051.707 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 914.072 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 489.898 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 5324.195 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 2587.229 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 1278.665 ops/ms >> ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 4149.384 ops/ms >> ColumnFilterBenchmark.filterIntColumn 2047 thrpt ... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolutions. Thanks for the updates! One more idea: Your AVX2 solution has a lot of cost for converting the mask to a permutation. Might it make sense to split this off into a separate vector-node, so that it can float out of a loop if the mask is invariant? src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 963: > 961: // or a -1 (default) value. > 962: for (int i = 0; i < 256; i++) { > 963: int tmp = i; why is `tmp` needed? Would it not be better to replace `i` with `mask` (i.e. the bit pattern that is then translated to a permutation)? src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 966: > 964: int ctr = 0; > 965: for (int j = 0; j < 8; j++) { > 966: if (tmp & (1 << j)) { Suggestion: if (mask & (1 << j)) { would be much more readable ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17261#pullrequestreview-1805616736 PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1442664755 PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1442668939 From epeter at openjdk.org Fri Jan 5 10:05:23 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Jan 2024 10:05:23 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2] In-Reply-To: References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Message-ID: On Fri, 5 Jan 2024 07:03:34 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 5307: >> >>> 5305: assert(bt == T_LONG || bt == T_DOUBLE, ""); >>> 5306: vmovmskpd(rtmp, mask, vec_enc); >>> 5307: shlq(rtmp, 5); >> >> Might this need to be 6? If I understand right, then you want to have a 64bit stride, hence 2^6, right? >> If that is correct, then this did not show in your tests, and you need a regression test anyway. > > This computes the byte offset from start of the table, both integer and long permute table have same row sizes, 8 int elements vs 4 long elements. Ah, I understand now. Maybe leave a comment for that? >> test/micro/org/openjdk/bench/jdk/incubator/vector/ColumnFilterBenchmark.java line 76: >> >>> 74: longinCol = new long[size]; >>> 75: longoutCol = new long[size]; >>> 76: lpivot = size / 2; >> >> I'd be interested to see what happens if you move up or down the "density" of elements that you accept. Would the simple branch prediction be faster if the density is low enough, i.e. we almost take no element. >> >> Though maybe that is not compiler problem but a user-problem? > > Included fuzzy filter micro with varying mask density. > ![image](https://github.com/openjdk/jdk/assets/59989778/a6af21cc-36c0-4503-aeb3-e66b862da2e1) You are using `VectorMask pred = VectorMask.fromLong(ispecies, maskctr++);`. That basically systematically iterates over all masks, which is nice for a correctness test. But that would use different density inside one test run, right? The average over the loop is still at `50%`, correct? I was thinking more a run where the percentage over the whole loop is lower than maybe `1%`. That would get us to a point where maybe the branch prediction of non-vectorized code might be faster, what do you think? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1442670411 PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1442676633 From epeter at openjdk.org Fri Jan 5 10:05:24 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Jan 2024 10:05:24 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2] In-Reply-To: References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Message-ID: On Fri, 5 Jan 2024 09:37:55 GMT, Emanuel Peter wrote: >> This computes the byte offset from start of the table, both integer and long permute table have same row sizes, 8 int elements vs 4 long elements. > > Ah, I understand now. Maybe leave a comment for that? I would say something like this: Given a `mask`, we compute the index into the permutation table, and load the corresponding `permutation` (4 long elements). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1442688495 From epeter at openjdk.org Fri Jan 5 10:05:27 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Jan 2024 10:05:27 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2] In-Reply-To: References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Message-ID: On Thu, 4 Jan 2024 13:40:19 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Updating copyright year of modified files. > > src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 957: > >> 955: __ align(CodeEntryAlignment); >> 956: StubCodeMark mark(this, "StubRoutines", stub_name); >> 957: address start = __ pc(); > > Could you please add some comments here why you are filling the data like this? > Presumably, you are emitting 32 bits and 64 bits respectively, right? So the cells have different size, correct? Thanks for the comment addition! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1442665042 From epeter at openjdk.org Fri Jan 5 10:05:28 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Jan 2024 10:05:28 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2] In-Reply-To: References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> <2ykpNJfFJYotuyI59zfh966TIQYUF2Id6NR56zpq_Vw=.6b3b5411-e086-45df-8666-2496ac013548@github.com> Message-ID: <8arXva3XJTvJpbElEu8ubw6SF58TL2hVlAgoJFZ3_6s=.c6bd79f0-ecd1-4d26-8294-40f8e99bf59c@github.com> On Fri, 5 Jan 2024 07:05:51 GMT, Jatin Bhateja wrote: >> We do have extensive functional tests for compress/expand APIs in [test/jdk/jdk/incubator/vector](https://github.com/openjdk/jdk/tree/master/test/jdk/jdk/incubator/vector) > >> Could there be equivalent `expand` tests? > > Here are the performance number for existing [VectorAPI JMH micros.](https://github.com/openjdk/panama-vector/tree/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation) > > ![image](https://github.com/openjdk/jdk/assets/59989778/4b260814-3d3c-4e9b-b81a-61492ea48cce) > ![image](https://github.com/openjdk/jdk/assets/59989778/50048281-ad50-44f6-a875-308e02537be2) Ah, excellent. Thanks for the numbers! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1442673797 From epeter at openjdk.org Fri Jan 5 10:05:28 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Jan 2024 10:05:28 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2] In-Reply-To: References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Message-ID: On Fri, 5 Jan 2024 09:31:50 GMT, Emanuel Peter wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 957: >> >>> 955: __ align(CodeEntryAlignment); >>> 956: StubCodeMark mark(this, "StubRoutines", stub_name); >>> 957: address start = __ pc(); >> >> Could you please add some comments here why you are filling the data like this? >> Presumably, you are emitting 32 bits and 64 bits respectively, right? So the cells have different size, correct? > > Thanks for the comment addition! Improvement suggestion: For a vector with 8 ints, we get `2^8 = 256` many bit patterns for the mask. The table has a row for each `mask` value, consisting of 8 ints, which provide the valid permute index corresponding to set bit position in the `mask`, or a -1 (default) value. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1442668440 From epeter at openjdk.org Fri Jan 5 10:16:21 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Jan 2024 10:16:21 GMT Subject: RFR: 8322735: C2: minor improvements of bubble sort used in SuperWord::packset_sort In-Reply-To: References: Message-ID: On Fri, 5 Jan 2024 08:54:36 GMT, Denghui Dong wrote: >> Ok. This change is fine with me. Thanks for taking the time to look into this :) >> >> I was just curious what was your motivation. I may completely redo this code once I remove the alignment constraints (here used for sorting), but that will have to be decided in a few months. >> >> Please do the renaming, and then I can run testing and give you my approval. > >> Please do the renaming, and then I can run testing and give you my approval. > > Updated. Thanks. @D-D-H nice. Testing running. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17190#issuecomment-1878430341 From aph at openjdk.org Fri Jan 5 10:19:22 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 5 Jan 2024 10:19:22 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v4] In-Reply-To: References: Message-ID: On Thu, 4 Jan 2024 08:12:22 GMT, Roland Westrelin wrote: >> This change implements C2 optimizations for calls to >> ScopedValue.get(). Indeed, in: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> `v2` can be replaced by `v1` and the second call to `get()` can be >> optimized out. That's true whatever is between the 2 calls unless a >> new mapping for `scopedValue` is created in between (when that happens >> no optimizations is performed for the method being compiled). Hoisting >> a `get()` call out of loop for a loop invariant `scopedValue` should >> also be legal in most cases. >> >> `ScopedValue.get()` is implemented in java code as a 2 step process. A >> cache is attached to the current thread object. If the `ScopedValue` >> object is in the cache then the result from `get()` is read from >> there. Otherwise a slow call is performed that also inserts the >> mapping in the cache. The cache itself is lazily allocated. One >> `ScopedValue` can be hashed to 2 different indexes in the cache. On a >> cache probe, both indexes are checked. As a consequence, the process >> of probing the cache is a multi step process (check if the cache is >> present, check first index, check second index if first index >> failed). If the cache is populated early on, then when the method that >> calls `ScopedValue.get()` is compiled, profile reports the slow path >> as never taken and only the read from the cache is compiled. >> >> To perform the optimizations, I added 3 new node types to C2: >> >> - the pair >> ScopedValueGetHitsInCacheNode/ScopedValueGetLoadFromCacheNode for >> the cache probe >> >> - a cfg node ScopedValueGetResultNode to help locate the result of the >> `get()` call in the IR graph. >> >> In pseudo code, once the nodes are inserted, the code of a `get()` is: >> >> >> hits_in_the_cache = ScopedValueGetHitsInCache(scopedValue) >> if (hits_in_the_cache) { >> res = ScopedValueGetLoadFromCache(hits_in_the_cache); >> } else { >> res = ..; //slow call possibly inlined. Subgraph can be arbitray complex >> } >> res = ScopedValueGetResult(res) >> >> >> In the snippet: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> Replacing `v2` by `v1` is then done by starting from the >> `ScopedValueGetResult` node for the second `get()` and looking for a >> dominating `ScopedValueGetResult` for the same `ScopedValue` >> object. When one is found, it is used as a replacement. Eliminating >> the second `get()` call is achieved by making >> `ScopedValueGetHitsInCache` always successful if there's a dominating >> `Scoped... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > merge fix A couple of answers: > I'm not a C2 expert, so my high-level comments might not all make sense, but here goes. > > * I would expect multiple get() calls in the same method or in loops to be rare and/or programmer errors. The real advantage seems to be when the binding and the get() can be connected from caller to callee through inlining. Binding and get() are usually separated by a long way. It's a common pattern to use get() inside a loop when a ScopedValue is used to hold a capability object which is private within a library context. > * Are we able to optimize a get() on a constant/final ScopedValue into a simple array load at a constant offset? Maybe I'm misunderstanding this question, but that's what the scoped value cache does. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-1878433272 From chagedorn at openjdk.org Fri Jan 5 10:37:30 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 5 Jan 2024 10:37:30 GMT Subject: RFR: 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 [v9] In-Reply-To: <01kR0e8YhFsKsSlClNnbE2A4IDAeJn1q2Xxs3gNxGcU=.0cad5e28-7f54-4f76-b386-56788c94e932@github.com> References: <0467mugqEs3f6AxCG9krY6cAXXuqRJKptnwEaLJ0gIA=.ce7c02be-1d51-470f-b309-380f1de83f30@github.com> <01kR0e8YhFsKsSlClNnbE2A4IDAeJn1q2Xxs3gNxGcU=.0cad5e28-7f54-4f76-b386-56788c94e932@github.com> Message-ID: On Fri, 5 Jan 2024 10:00:57 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/loopopts.cpp line 345: >> >>> 343: >>> 344: if (dp == nullptr) >>> 345: return; >> >> Since we bail out above if `iff->outcnt() != 2` (can it even be that we have an `If` at this point which does not have 2 out projections?) this bailout seems redundant. Looks like it was only added due to a parfait report with https://github.com/openjdk/jdk/commit/25c4a7fccdbdaa9da0a7aa5e04e80966138fe42c. Maybe we can remove that as well and change `proj_out_or_null()` back to `proj_out()` (not sure though if parfait will then report this again). But could also be done separately. > > Thanks for the details. Why would it have it been necessary before but no longer necessary now? What is it that has changed so parfait would not complain? Unfortunately, the report details are no longer available today. I think the fix back there should have been that it's a false positive and it cannot happen that `dp` is null - even though parfait fails to prove that (it probably still cannot). I'm not exactly sure how parfait works and how we could ensure that it will not complain about this again but maybe adding an assert that `dp` is not null would help. Anyway, this should not block this PR and might be better handled separately. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1442720560 From chagedorn at openjdk.org Fri Jan 5 10:39:21 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 5 Jan 2024 10:39:21 GMT Subject: RFR: 8323012: C2 fails with fatal error: no reachable node should have no use In-Reply-To: References: Message-ID: On Fri, 5 Jan 2024 09:29:15 GMT, Tobias Hartmann wrote: > Incorrect refactoring from [JDK-8322490](https://bugs.openjdk.org/browse/JDK-8322490), probably a copy-paste error, leads to a dead `CastPP` and might have other not yet observed effects. See JBS for details. > > I wasn't able to extract a regression test for this but it reliably reproduced with replay compilation. > > Thanks, > Tobias Nice catch - that was hard to spot. Looks good and trivial. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17276#pullrequestreview-1805713175 From thartmann at openjdk.org Fri Jan 5 10:48:25 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 5 Jan 2024 10:48:25 GMT Subject: RFR: 8320310: CompiledMethod::has_monitors flag can be incorrect [v4] In-Reply-To: References: <4efExybeWDkEbcsckI1Qdz8kpYFqd-Rbmt7oiWz5qlo=.d8d38d0e-affa-48dc-b963-45f958041c4e@github.com> Message-ID: On Mon, 11 Dec 2023 18:38:55 GMT, Jorn Vernee wrote: >> Currently, the `CompiledMethod::has_monitors` flag is set when either a `monitorenter` is parsed by C1, and `monitorexit` is parsed by C1 or C2 during method compilation. However, not necessarily every bytecode of a method is parsed, which means that we could miss all `monitorenter`/`monitorexit` byte codes in a method, while it actually does use monitors. This can lead to situations where a thread holds a monitor, but `has_monitors` for all frames is set to `false`, leading to an assertion failure in 'freeze_internal' in continuationFreezeThaw.cpp: >> >> assert(monitors_on_stack(current) == ((current->held_monitor_count() - current->jni_monitor_count()) > 0), >> "Held monitor count and locks on stack invariant: " INT64_FORMAT " JNI: " INT64_FORMAT, (int64_t)current->held_monitor_count(), (int64_t)current->jni_monitor_count()); >> >> The proposed fix is to rely on `Method::has_monitor_bytecodes` to set the `has_monitors` flag when compiling, which is immune to issues where not all byte codes of a method are parsed during compilation. We can follow the pattern established for `has_reserved_stack_access`, which is similar. >> >> Note that this PR is based on: https://github.com/openjdk/jdk/pull/16416 which disables the assertion. The goal of this PR is to fix the issue, and then re-enable the assertion. >> >> Testing: Tier 1-4, `java/lang/Thread/virtual/stress/PinALot.java` > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > re-enable assert again Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16799#pullrequestreview-1805725501 From chagedorn at openjdk.org Fri Jan 5 10:58:41 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 5 Jan 2024 10:58:41 GMT Subject: RFR: 8310711: [IR Framework] Remove safepoint while printing handling [v2] In-Reply-To: References: Message-ID: <278BZfCuvI5xJSWh2PvZtONVaZ2QjxkWKj1NifCbFYE=.af0a47ab-63a6-47bd-953b-4b0756107227@github.com> > This clean-up PR removes the handling of the `` message in the IR framework. It is no longer required since we dump the output of `PrintIdeal` to the hotspot_pid file differently since [JDK-8306922](https://bugs.openjdk.org/browse/JDK-8306922). There is no interrupting `` message anymore. I removed the corresponding now unneeded code together with the previously added test case for it. > > Testing: tier1-4 > > Thanks, > Christian Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Update copyright year - Merge branch 'master' into JDK-8310711 - 8310711: [IR Framework] Remove safepoint while printing handling ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16921/files - new: https://git.openjdk.org/jdk/pull/16921/files/ed5ef1fd..38f00cc3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16921&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16921&range=00-01 Stats: 121382 lines in 2594 files changed: 66365 ins; 45354 del; 9663 mod Patch: https://git.openjdk.org/jdk/pull/16921.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16921/head:pull/16921 PR: https://git.openjdk.org/jdk/pull/16921 From thartmann at openjdk.org Fri Jan 5 11:02:23 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 5 Jan 2024 11:02:23 GMT Subject: RFR: 8322077: Add Ideal transformation: (~a) | (~b) => ~(a & b) [v4] In-Reply-To: <5KKTYmY7dJ4nW0OEQ2UPuloIWOYc-pI9M8HRjoaRzw4=.f5eda6f9-c38c-4a2b-9690-cbb1791a2622@github.com> References: <5KKTYmY7dJ4nW0OEQ2UPuloIWOYc-pI9M8HRjoaRzw4=.f5eda6f9-c38c-4a2b-9690-cbb1791a2622@github.com> Message-ID: On Fri, 15 Dec 2023 23:35:56 GMT, Zhiqiang Zang wrote: >> Hello, >> >> (~a) | (~b) => ~(a & b) is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1617C28-L1617C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. >> >> Thanks. > > Zhiqiang Zang has updated the pull request incrementally with one additional commit since the last revision: > > untabify. Looks good to me. Please update the copyright dates. I submitted testing and will report back once it passed. ------------- PR Review: https://git.openjdk.org/jdk/pull/16334#pullrequestreview-1805744288 From thartmann at openjdk.org Fri Jan 5 11:07:22 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 5 Jan 2024 11:07:22 GMT Subject: RFR: 8322589: Add Ideal transformation: (~a) & (~b) => ~(a | b) In-Reply-To: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> References: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> Message-ID: On Tue, 24 Oct 2023 04:49:20 GMT, Zhiqiang Zang wrote: > Hello, > > `(~a) & (~b) => ~(a | b)` is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1616C28-L1616C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. > > Thanks. I think dedicated methods like you used in https://github.com/openjdk/jdk/pull/16334 would be good. Please also update the copyright dates. ------------- PR Review: https://git.openjdk.org/jdk/pull/16333#pullrequestreview-1805750056 From thartmann at openjdk.org Fri Jan 5 11:16:30 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 5 Jan 2024 11:16:30 GMT Subject: RFR: 8323012: C2 fails with fatal error: no reachable node should have no use In-Reply-To: References: Message-ID: On Fri, 5 Jan 2024 09:29:15 GMT, Tobias Hartmann wrote: > Incorrect refactoring from [JDK-8322490](https://bugs.openjdk.org/browse/JDK-8322490), probably a copy-paste error, leads to a dead `CastPP` and might have other not yet observed effects. See JBS for details. > > I wasn't able to extract a regression test for this but it reliably reproduced with replay compilation. > > Thanks, > Tobias Thanks Christian! ------------- PR Comment: https://git.openjdk.org/jdk/pull/17276#issuecomment-1878501253 From thartmann at openjdk.org Fri Jan 5 11:16:31 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 5 Jan 2024 11:16:31 GMT Subject: Integrated: 8323012: C2 fails with fatal error: no reachable node should have no use In-Reply-To: References: Message-ID: On Fri, 5 Jan 2024 09:29:15 GMT, Tobias Hartmann wrote: > Incorrect refactoring from [JDK-8322490](https://bugs.openjdk.org/browse/JDK-8322490), probably a copy-paste error, leads to a dead `CastPP` and might have other not yet observed effects. See JBS for details. > > I wasn't able to extract a regression test for this but it reliably reproduced with replay compilation. > > Thanks, > Tobias This pull request has now been integrated. Changeset: 78623c95 Author: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/78623c95f2a3954384963c4c761d2e4e5f4aefed Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8323012: C2 fails with fatal error: no reachable node should have no use Reviewed-by: chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/17276 From thartmann at openjdk.org Fri Jan 5 11:17:25 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 5 Jan 2024 11:17:25 GMT Subject: RFR: 8320128: Clean up Parse constructor for OSR [v4] In-Reply-To: <-QQZANSu7PVpZ3CUX7kbp00f_xkhjiy7la3Z3gXutKA=.a1c938e4-12e0-44b8-a2dd-7cc0c0c9f09f@github.com> References: <0mxhhLR9DDdpiBhbXWPxCmMzo_OI9DYiIX36xGhxC_c=.1eed99b9-2dc3-4403-b3c1-22b6ebed6079@github.com> <-QQZANSu7PVpZ3CUX7kbp00f_xkhjiy7la3Z3gXutKA=.a1c938e4-12e0-44b8-a2dd-7cc0c0c9f09f@github.com> Message-ID: <5FSKfaW9bdkNn6Wr7MTr0-A3Zouqm8veKGyC9y11-vo=.b3aacd6e-492b-4e31-b04b-0de64a06cc9b@github.com> On Thu, 4 Jan 2024 17:06:38 GMT, Xin Liu wrote: >> There's a special case for the constructor of Parse. If current compilation is OSR and it is handling the top-level method(depth() == 1), then >> >> 1. _tf = C->tf(); >> 2. _entry_bci = C->entry_bci(); >> 3. _flow = method()->get_osr_flow_analysis(_entry_bci); >> >> We don't need to assign those member data twice. We can also factor out _flow->failing() for the special case and normal cases. >> >> It's worth mentioning that we can't save ciTypeFlow computation because >> get_osr_flow_analysis(_entry_bci) actually needs get_flow_analysis(method()). > > Xin Liu has updated the pull request incrementally with one additional commit since the last revision: > > Use print_cr for the log message. I performed some testing. Submitted it again and will report back once it passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16669#issuecomment-1878504674 From shade at openjdk.org Fri Jan 5 11:36:34 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 5 Jan 2024 11:36:34 GMT Subject: Withdrawn: 8321137: Relax ICStub alignment In-Reply-To: References: Message-ID: On Thu, 30 Nov 2023 18:31:24 GMT, Aleksey Shipilev wrote: > WIP, submitting for others to poke holes in it. > > Similarly to [JDK-8284578](https://bugs.openjdk.org/browse/JDK-8284578), we would like to handle `ICStub` alignment. Currently, the small stub that takes only 24 bytes of code is covered by 128 bytes on AArch64. This is due to the same thing fixed by [JDK-8284578](https://bugs.openjdk.org/browse/JDK-8284578) for interpreter codelets: aligning twice the `CodeEntryAlignment`. > > 128 bytes per `ICStub` means we deplete 10K `ICBuffer` with only 79 stubs. This actually happens multiple times even on a simple `HelloWorld.java` invocation that invokes some javac code, causing `ICBufferFull` safepoints. We can increase `ICBuffer` size, especially after [JDK-8314220](https://bugs.openjdk.org/browse/JDK-8314220), but we cannot do this without limits, since it eats up code cache. > > But if we assume that code entry alignment is not a strict requirement, and used to improve performance for frequently used code, then maybe we do not have to over-align the IC stub, given it is probably only used during IC transitions? It would significantly improve `ICStub` footprint and require smaller `ICBuffer`. > > Current patch affects ICStub size in different ways on different platforms, since current size is effectively 2x`CodeEntryAlignment`, and new size is cache line size: > - AArch64: 128 -> 64 bytes > - x86_64: 64 -> 64 bytes > - PPC64: 512 -> 128 bytes > - S390X: 128 -> 256 bytes (!) > - ARM: 32 -> 64 bytes (!) > - Zero: > > Additional testing: > - [x] Linux x86_64 server fastdebug `tier1 tier2 tier3` > - [x] Linux AArch64 server fastdebug `tier1 tier2 tier3` This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/16911 From bulasevich at openjdk.org Fri Jan 5 11:37:44 2024 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Fri, 5 Jan 2024 11:37:44 GMT Subject: RFR: 8322858: compiler/c2/aarch64/TestFarJump.java fails on AArch64 due to unexpected PrintAssembly output Message-ID: <2CaDCuUHVZ7kwRvW-aKzLtsSM0yJ5crh-DZ5QbHevRU=.28845f90-4c77-4875-b88d-8d5af411ea17@github.com> Test checks for the ADRP instruction in the [Exception Handler] section of the TestFarJump::main PrintAssembly output. The -XX:CompileOnly=TestFarJump::main option is not sufficient to restrict PrintAssembly output to a specific method. A number of method assemblies precede the output of the TestFarJump::main method: - java.lang.invoke.MethodHandle::linkToStatic(LLLLLLL)L - java.lang.invoke.MethodHandle::linkToStatic(LLLL)V - java.lang.invoke.MethodHandle::invokeBasic(LLLLLL)L - java.lang.invoke.MethodHandle::linkToSpecial(LLLLLLLL)L - java.lang.invoke.MethodHandle::linkToSpecial(LLLLL)V - java.lang.invoke.MethodHandle::invokeBasic(LLL)L - java.lang.invoke.MethodHandle::linkToSpecial(LLLLL)L - java.lang.invoke.MethodHandle::linkToStatic(LL)L - java.lang.invoke.MethodHandle::linkToSpecial(LL)V - java.lang.invoke.MethodHandle::invokeBasic()L - java.lang.invoke.MethodHandle::linkToSpecial(LL)L - java.lang.invoke.MethodHandle::linkToStatic(LLL)L - java.lang.invoke.MethodHandle::linkToStatic(LLLL)L - java.lang.invoke.MethodHandle::linkToSpecial(LLL)V - java.lang.invoke.MethodHandle::invokeBasic(L)L - java.lang.invoke.MethodHandle::linkToSpecial(LLL)L - java.lang.invoke.MethodHandle::linkToStatic(LLL)V - java.lang.invoke.MethodHandle::linkToStatic(LL)I - jdk.internal.vm.Continuation::enterSpecial - compiler.c2.aarch64.TestFarJump::main With this change, I use the -XX:CompileCommand=option,TestFarJump::main,bool,PrintAssembly,true option to restrict the PrintAssembly output to a specific method. Now the only [Exception Handler] in the listing is the section of TestFarJump::main method. ------------- Commit messages: - 8322858: compiler/c2/aarch64/TestFarJump.java fails on AArch64 due to unexpected PrintAssembly output Changes: https://git.openjdk.org/jdk/pull/17278/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17278&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8322858 Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17278.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17278/head:pull/17278 PR: https://git.openjdk.org/jdk/pull/17278 From roland at openjdk.org Fri Jan 5 12:47:22 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 5 Jan 2024 12:47:22 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v4] In-Reply-To: References: Message-ID: On Fri, 5 Jan 2024 00:59:03 GMT, Dean Long wrote: > I'm not a C2 expert, so my high-level comments might not all make sense, but here goes. > > * My first reaction was why does this need to be so complicated? That's a fair reaction. > Can't we treat the slow path and cache implementation as a black box, and just common up the high-level get()? In my mind the ScopedValue get should be similar to a read from a "condy" dynamic constant. Initially, I thought about delaying the inlining of `get()` methods and simply have a pass that look for `get()` calls with the same inputs. I don't think that works well because the current late inlining framework can't delay inlining very late. We don't run loop opts before we're done with inlining for instance. If we wanted to hoist a call out of loop we would need loop opts. For instance, tt's likely a call to `get()` depends on a null check that we would need to hoist first. The other thing about optimizing `get()` calls is that they are heavy weight nodes (a high level `get()` macro node would be very similar to a `get()` call node whichever way you look at it). We don't know how to hoist a call out of loop. A call acts as a barrier on the entire memory state and get in the way of memory optimizations. If profile reports the slow path to be never taken then the shape of the `get()` becomes lighter weight. It doesn't disrupt other optimizations. Probing the cache acts as a load + test which we know how to hoist from a loop. It felt to me that it would be fairly common for the slow path to not be needed and given the shape without the slow path is much easier to optimize, it was important to be able to expose early on if the slow path was there or not. > > * The reason for breaking up the operations into individual nodes seems to be because of the profiling information. So I'm wondering how much this helps, given the added complexity. The thing about `get()` is that in simple cases, it optimizes well because of profile data. A `get()` call once inlined can essentially be hoisted out of loop if all goes well. It doesn't take much for simple optimizations on `get()` to not happen anymore. The goal of this patch is to bring consistency and have optimizations work well in all sort of scenarios. But it would be hard to sell if the simple cases don't work as well as they do without the patch. And I believe that requires profile data. > > * I would expect multiple get() calls in the same method or in loops to be rare and/or programmer errors. The real advantage seems to be when the binding and the get() can be connected from caller to callee through inlining. Eliminating `get()` calls with the same inputs may not be common in java code but that transformation is a building block for optimizations. Hoisting a `get()` out of loop can be achieved by peeling one iteration and letting the `get()` from the loop body be removed because it's redundant with the one from the peeled iteration. Also, code that c2 optimizes once inlining has happened and dead paths have been trimmed doesn't necessarily look like the java code the programmer wrote. > > * Needing to do things like treat ScopedValueGetHitsInCache as always successful give be a bad feeling for some reason, and seem unnecessary if we did more at a higher (macro?) level rather than eagerly expanding the high-level operation into individual nodes. I think my comments above cover that one. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-1878609794 From aph at openjdk.org Fri Jan 5 13:34:23 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 5 Jan 2024 13:34:23 GMT Subject: RFR: 8322858: compiler/c2/aarch64/TestFarJump.java fails on AArch64 due to unexpected PrintAssembly output In-Reply-To: <2CaDCuUHVZ7kwRvW-aKzLtsSM0yJ5crh-DZ5QbHevRU=.28845f90-4c77-4875-b88d-8d5af411ea17@github.com> References: <2CaDCuUHVZ7kwRvW-aKzLtsSM0yJ5crh-DZ5QbHevRU=.28845f90-4c77-4875-b88d-8d5af411ea17@github.com> Message-ID: <93wsMJPSK3Sk7jSR4J8QbHq7T56rUZTP0Y1kHYrUc6U=.7621639c-39f4-4fb7-a6df-9bff8419e86a@github.com> On Fri, 5 Jan 2024 11:32:16 GMT, Boris Ulasevich wrote: > Test checks for the ADRP instruction in the [Exception Handler] section of the TestFarJump::main PrintAssembly output. > > The -XX:CompileOnly=TestFarJump::main option is not sufficient to restrict PrintAssembly output to a specific method. A number of method assemblies precede the output of the TestFarJump::main method: > - java.lang.invoke.MethodHandle::linkToStatic(LLLLLLL)L > - java.lang.invoke.MethodHandle::linkToStatic(LLLL)V > - java.lang.invoke.MethodHandle::invokeBasic(LLLLLL)L > - java.lang.invoke.MethodHandle::linkToSpecial(LLLLLLLL)L > - java.lang.invoke.MethodHandle::linkToSpecial(LLLLL)V > - java.lang.invoke.MethodHandle::invokeBasic(LLL)L > - java.lang.invoke.MethodHandle::linkToSpecial(LLLLL)L > - java.lang.invoke.MethodHandle::linkToStatic(LL)L > - java.lang.invoke.MethodHandle::linkToSpecial(LL)V > - java.lang.invoke.MethodHandle::invokeBasic()L > - java.lang.invoke.MethodHandle::linkToSpecial(LL)L > - java.lang.invoke.MethodHandle::linkToStatic(LLL)L > - java.lang.invoke.MethodHandle::linkToStatic(LLLL)L > - java.lang.invoke.MethodHandle::linkToSpecial(LLL)V > - java.lang.invoke.MethodHandle::invokeBasic(L)L > - java.lang.invoke.MethodHandle::linkToSpecial(LLL)L > - java.lang.invoke.MethodHandle::linkToStatic(LLL)V > - java.lang.invoke.MethodHandle::linkToStatic(LL)I > - jdk.internal.vm.Continuation::enterSpecial > - compiler.c2.aarch64.TestFarJump::main > > With this change, I use the -XX:CompileCommand=option,TestFarJump::main,bool,PrintAssembly,true option to restrict the PrintAssembly output to a specific method. Now the only [Exception Handler] in the listing is the section of TestFarJump::main method. Marked as reviewed by aph (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/17278#pullrequestreview-1805971972 From chagedorn at openjdk.org Fri Jan 5 13:41:23 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 5 Jan 2024 13:41:23 GMT Subject: RFR: 8310844: [AArch64] C1 compilation fails because monitor offset in OSR buffer is too large for immediate [v2] In-Reply-To: <2Hnw1VDu1Ml2XB-VfUV8l-t8dtrMzPNh3RyNPfV-tgA=.fc400df2-9529-4bdd-8e0c-fb49ca4f0b48@github.com> References: <6vNa0ZYwKL0HXjHjWTyIe6kB5hza1aaXlJbxM4qh8-w=.edc55dc9-08ff-4861-9038-d4b150bcf4f7@github.com> <2Hnw1VDu1Ml2XB-VfUV8l-t8dtrMzPNh3RyNPfV-tgA=.fc400df2-9529-4bdd-8e0c-fb49ca4f0b48@github.com> Message-ID: On Thu, 4 Jan 2024 14:20:39 GMT, Tobias Hartmann wrote: >> [JDK-8287349](https://bugs.openjdk.org/browse/JDK-8287349) changed the code in `LIR_Assembler::osr_entry()` to use a single `ldp` instruction instead of two `ldr` instructions to load the monitor lock and object from the OSR state. This is not correct because the `ldp` instruction only supports a [7-bit signed immediate value](https://developer.arm.com/documentation/den0024/a/The-A64-instruction-set/Memory-access-instructions/Accessing-multiple-memory-locations). If the offset is larger, for example due to a large number of locals as in `TestLargeMonitorOffset::test`, we hit the `Field too big for insn` guarantee. >> >> I suggest to revert [JDK-8287349](https://bugs.openjdk.org/browse/JDK-8287349). >> >> I also found two unrelated bugs when working on the reproducer: [JDK-8322992](https://bugs.openjdk.org/browse/JDK-8322992) (javac) and [JDK-8322996](https://bugs.openjdk.org/browse/JDK-8322996) (C2). >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Adjusted according to review Looks good. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17266#pullrequestreview-1805986676 From thartmann at openjdk.org Fri Jan 5 13:51:36 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 5 Jan 2024 13:51:36 GMT Subject: RFR: 8310844: [AArch64] C1 compilation fails because monitor offset in OSR buffer is too large for immediate [v2] In-Reply-To: <2Hnw1VDu1Ml2XB-VfUV8l-t8dtrMzPNh3RyNPfV-tgA=.fc400df2-9529-4bdd-8e0c-fb49ca4f0b48@github.com> References: <6vNa0ZYwKL0HXjHjWTyIe6kB5hza1aaXlJbxM4qh8-w=.edc55dc9-08ff-4861-9038-d4b150bcf4f7@github.com> <2Hnw1VDu1Ml2XB-VfUV8l-t8dtrMzPNh3RyNPfV-tgA=.fc400df2-9529-4bdd-8e0c-fb49ca4f0b48@github.com> Message-ID: On Thu, 4 Jan 2024 14:20:39 GMT, Tobias Hartmann wrote: >> [JDK-8287349](https://bugs.openjdk.org/browse/JDK-8287349) changed the code in `LIR_Assembler::osr_entry()` to use a single `ldp` instruction instead of two `ldr` instructions to load the monitor lock and object from the OSR state. This is not correct because the `ldp` instruction only supports a [7-bit signed immediate value](https://developer.arm.com/documentation/den0024/a/The-A64-instruction-set/Memory-access-instructions/Accessing-multiple-memory-locations). If the offset is larger, for example due to a large number of locals as in `TestLargeMonitorOffset::test`, we hit the `Field too big for insn` guarantee. >> >> I suggest to revert [JDK-8287349](https://bugs.openjdk.org/browse/JDK-8287349). >> >> I also found two unrelated bugs when working on the reproducer: [JDK-8322992](https://bugs.openjdk.org/browse/JDK-8322992) (javac) and [JDK-8322996](https://bugs.openjdk.org/browse/JDK-8322996) (C2). >> >> Thanks, >> Tobias > > Tobias Hartmann has updated the pull request incrementally with one additional commit since the last revision: > > Adjusted according to review Thanks, Christian! ------------- PR Comment: https://git.openjdk.org/jdk/pull/17266#issuecomment-1878684009 From thartmann at openjdk.org Fri Jan 5 13:51:37 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 5 Jan 2024 13:51:37 GMT Subject: Integrated: 8310844: [AArch64] C1 compilation fails because monitor offset in OSR buffer is too large for immediate In-Reply-To: <6vNa0ZYwKL0HXjHjWTyIe6kB5hza1aaXlJbxM4qh8-w=.edc55dc9-08ff-4861-9038-d4b150bcf4f7@github.com> References: <6vNa0ZYwKL0HXjHjWTyIe6kB5hza1aaXlJbxM4qh8-w=.edc55dc9-08ff-4861-9038-d4b150bcf4f7@github.com> Message-ID: <5L1hf9r4_BrbzI-pXsVoaB7MFhbNkuppE1N8Jp6lV8I=.d9660cd6-962b-4ec1-99d7-5b94ae67d88c@github.com> On Thu, 4 Jan 2024 12:39:18 GMT, Tobias Hartmann wrote: > [JDK-8287349](https://bugs.openjdk.org/browse/JDK-8287349) changed the code in `LIR_Assembler::osr_entry()` to use a single `ldp` instruction instead of two `ldr` instructions to load the monitor lock and object from the OSR state. This is not correct because the `ldp` instruction only supports a [7-bit signed immediate value](https://developer.arm.com/documentation/den0024/a/The-A64-instruction-set/Memory-access-instructions/Accessing-multiple-memory-locations). If the offset is larger, for example due to a large number of locals as in `TestLargeMonitorOffset::test`, we hit the `Field too big for insn` guarantee. > > I suggest to revert [JDK-8287349](https://bugs.openjdk.org/browse/JDK-8287349). > > I also found two unrelated bugs when working on the reproducer: [JDK-8322992](https://bugs.openjdk.org/browse/JDK-8322992) (javac) and [JDK-8322996](https://bugs.openjdk.org/browse/JDK-8322996) (C2). > > Thanks, > Tobias This pull request has now been integrated. Changeset: ade21a96 Author: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/ade21a965f8a5fc889cd48bba76fad507bdeddf5 Stats: 149 lines in 2 files changed: 147 ins; 0 del; 2 mod 8310844: [AArch64] C1 compilation fails because monitor offset in OSR buffer is too large for immediate Reviewed-by: aph, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/17266 From shade at openjdk.org Fri Jan 5 14:43:41 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 5 Jan 2024 14:43:41 GMT Subject: RFR: 8323065: Unneccesary CodeBlob lookup in CompiledIC::internal_set_ic_destination Message-ID: I was looking at hotpath for IC stub cleaning (happens at safepoint), and one obvious thing is that we look-up `CodeBlob` from `call->instruction_address()` only to assert that is compiled one. It used to be protected by `#ifdef ASSERT` before [JDK-8212681](https://bugs.openjdk.org/browse/JDK-8212681), and pulled from it to be used in Mutex: https://hg.openjdk.org/jdk/jdk/rev/d6dc479bcdd3#l15.62 ...which is now gone after [JDK-8214257](https://bugs.openjdk.org/browse/JDK-8214257). This fix reinstates the `ASSERT` block again. There are small improvements (~1..10us) for safepoint cleanup on small ad-hoc tests. But since this whole thing involves looking up things in code cache, it may cost quite a lot. ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/17281/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17281&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8323065 Stats: 4 lines in 1 file changed: 3 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17281.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17281/head:pull/17281 PR: https://git.openjdk.org/jdk/pull/17281 From epeter at openjdk.org Fri Jan 5 15:36:22 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Jan 2024 15:36:22 GMT Subject: RFR: 8319690: [AArch64] C2 compilation hits offset_ok_for_immed: assert "c2 compiler bug" In-Reply-To: References: <16J-lJ2AceGTVcRWBcP15yKcwO-1IA1XsngyOuNjf7k=.0776f081-ae2c-4279-87cf-d909806c2bc4@github.com> <4KRCqYxn02wMjYDuN3_HbYWxc9BDtYMNd40bNrJ4K8w=.5362dda4-8d30-48fd-8915-909eb15a6023@github.com> Message-ID: <82i5GmtoNQdveJShSuWQa7dGHszWLCLVbsJNC6Mulx4=.bd5b27f5-098b-4350-abe1-98af81f0bb3e@github.com> On Thu, 7 Dec 2023 06:45:30 GMT, Fei Gao wrote: >> After this change, `immIOffset` and `immLOffset` appear to be obsolete. > >> After this change, `immIOffset` and `immLOffset` appear to be obsolete. > > Removed them in the new commit. Thanks! @fg1417 what is the state on this? The example here may look like a strange edge-case. But with my plans this may become more common: [JDK-8318446](https://bugs.openjdk.org/browse/JDK-8318446) C2: implement StoreNode::Ideal_merge_stores I will merge stores, which can then look like unaligned stores of a larger type. This bug here blocks my progress. I'm not in a hurry, just curious if there is now a plan how to proceed here ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/16991#issuecomment-1878857862 From epeter at openjdk.org Fri Jan 5 16:03:27 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Jan 2024 16:03:27 GMT Subject: RFR: 8322589: Add Ideal transformation: (~a) & (~b) => ~(a | b) In-Reply-To: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> References: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> Message-ID: On Tue, 24 Oct 2023 04:49:20 GMT, Zhiqiang Zang wrote: > Hello, > > `(~a) & (~b) => ~(a | b)` is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1616C28-L1616C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. > > Thanks. Looks like a good idea :) I left a few suggestions below. src/hotspot/share/opto/mulnode.cpp line 617: > 615: && phase->type(in(1)->in(2)) == TypeInt::MINUS_1 > 616: && in(2)->Opcode() == Op_XorI > 617: && in(1)->in(2) == in(2)->in(2)) { minor code style issue: please take the `&&` to the end of the line. That is what I usually see. It also makes reading the lines easier, as they are aligned with the first line. src/hotspot/share/opto/mulnode.cpp line 618: > 616: && in(2)->Opcode() == Op_XorI > 617: && in(1)->in(2) == in(2)->in(2)) { > 618: return new XorINode(phase->transform(new OrINode(in(1)->in(1), in(2)->in(1))), in(1)->in(2)); The nesting of this line is difficult to read. I suggest you take multiple lines and name intermediate results with something helpful. test/hotspot/jtreg/compiler/c2/irTests/AndINodeIdealizationTests.java line 50: > 48: > 49: assertResult(0, 0); > 50: assertResult(a, a); Suggestion: assertResult(a, b); I assume you wanted this? Otherwise `b` is useless ;) test/hotspot/jtreg/compiler/c2/irTests/AndLNodeIdealizationTests.java line 50: > 48: > 49: assertResult(0, 0); > 50: assertResult(a, a); Suggestion: assertResult(a, b); ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16333#pullrequestreview-1806210975 PR Review Comment: https://git.openjdk.org/jdk/pull/16333#discussion_r1443035127 PR Review Comment: https://git.openjdk.org/jdk/pull/16333#discussion_r1443038716 PR Review Comment: https://git.openjdk.org/jdk/pull/16333#discussion_r1443043016 PR Review Comment: https://git.openjdk.org/jdk/pull/16333#discussion_r1443043862 From epeter at openjdk.org Fri Jan 5 16:03:29 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Jan 2024 16:03:29 GMT Subject: RFR: 8322589: Add Ideal transformation: (~a) & (~b) => ~(a | b) In-Reply-To: References: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> Message-ID: <2Xcm4mZkW4mN9d8LBmLSkwEM3Hq4I0Vy8NEZz9HL70Y=.6b3f843b-e28c-4f12-b422-12e982d81f6c@github.com> On Fri, 5 Jan 2024 15:51:14 GMT, Emanuel Peter wrote: >> Hello, >> >> `(~a) & (~b) => ~(a | b)` is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1616C28-L1616C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. >> >> Thanks. > > src/hotspot/share/opto/mulnode.cpp line 617: > >> 615: && phase->type(in(1)->in(2)) == TypeInt::MINUS_1 >> 616: && in(2)->Opcode() == Op_XorI >> 617: && in(1)->in(2) == in(2)->in(2)) { > > minor code style issue: please take the `&&` to the end of the line. That is what I usually see. It also makes reading the lines easier, as they are aligned with the first line. Suggestion: && phase->type(in(2)->in(2)) == TypeInt::MINUS_1) { Could be nice for symmetry. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16333#discussion_r1443037674 From epeter at openjdk.org Fri Jan 5 16:25:27 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 5 Jan 2024 16:25:27 GMT Subject: RFR: 8322077: Add Ideal transformation: (~a) | (~b) => ~(a & b) [v4] In-Reply-To: <5KKTYmY7dJ4nW0OEQ2UPuloIWOYc-pI9M8HRjoaRzw4=.f5eda6f9-c38c-4a2b-9690-cbb1791a2622@github.com> References: <5KKTYmY7dJ4nW0OEQ2UPuloIWOYc-pI9M8HRjoaRzw4=.f5eda6f9-c38c-4a2b-9690-cbb1791a2622@github.com> Message-ID: On Fri, 15 Dec 2023 23:35:56 GMT, Zhiqiang Zang wrote: >> Hello, >> >> (~a) | (~b) => ~(a & b) is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1617C28-L1617C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. >> >> Thanks. > > Zhiqiang Zang has updated the pull request incrementally with one additional commit since the last revision: > > untabify. Looks like a good idea. Left a few comments. I would have merged this with https://github.com/openjdk/jdk/pull/16333, since it is essentially the symmetric case. But leave it separate now. It would be nice to have some shared tests, where both optimizations need to be combined. Like: `(~a | ~b) & (~c | ~d)` -> `~(a & b) & ~(c & d)` -> `~((a & b) | (c & d))` src/hotspot/share/opto/addnode.cpp line 787: > 785: } > 786: return nullptr; > 787: } If you are going to use this also for your changes in https://github.com/openjdk/jdk/pull/16333, then you probably want this to go into a shared file. src/hotspot/share/opto/addnode.cpp line 816: > 814: return make_not(phase, > 815: phase->transform(new AndINode(in(1)->in(1), in(2)->in(1))), > 816: T_INT); I'd put the `AndI` node on a separate line. Call it `add_a_b` or similar. Then you can transform on the next line. And then on a third line the `make_not`. ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16334#pullrequestreview-1806238724 PR Review Comment: https://git.openjdk.org/jdk/pull/16334#discussion_r1443055241 PR Review Comment: https://git.openjdk.org/jdk/pull/16334#discussion_r1443052473 From kvn at openjdk.org Fri Jan 5 18:45:29 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 5 Jan 2024 18:45:29 GMT Subject: RFR: 8323012: C2 fails with fatal error: no reachable node should have no use In-Reply-To: References: Message-ID: <5P6XCWehuGjMZUGRNFcu8Z7bPP6t95Z8PYYojMwWi5I=.56eae905-629e-4488-992d-41d5b9dd5f67@github.com> On Fri, 5 Jan 2024 09:29:15 GMT, Tobias Hartmann wrote: > Incorrect refactoring from [JDK-8322490](https://bugs.openjdk.org/browse/JDK-8322490), probably a copy-paste error, leads to a dead `CastPP` and might have other not yet observed effects. See JBS for details. > > I wasn't able to extract a regression test for this but it reliably reproduced with replay compilation. > > Thanks, > Tobias Good. ------------- PR Review: https://git.openjdk.org/jdk/pull/17276#pullrequestreview-1806644365 From kvn at openjdk.org Fri Jan 5 18:55:39 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 5 Jan 2024 18:55:39 GMT Subject: RFR: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs [v59] In-Reply-To: References: Message-ID: On Thu, 4 Jan 2024 07:00:48 GMT, Emanuel Peter wrote: >> I want to push this in JDK23. >> After this fix here, I'm doing [this refactoring](https://github.com/openjdk/jdk/pull/16620). >> >> To calm your nerves: most of the changes are in auto-generated tests, and tests in general. >> >> **Context** >> >> `-XX:+AlignVector` ensures that SuperWord only creates LoadVector and StoreVector that can be memory aligned. This is achieved by iterating in the pre-loop until we reach the alignment boundary, then we can start the main loop properly aligned. However, this is not possible in all cases, sometimes some memory accesses cannot be guaranteed to be aligned, and we need to reject vectorization (at least partially, for some of the packs). >> >> Alignment is split into two tasks: >> - Alignment Correctness Checks: only relevant if `-XX:+AlignVector`. Need to reject vectorization if alignment is not possible. We must check if the address of the vector load/store is aligned with (divisible by) `ObjectAlignmentInBytes`. >> - Alignment by adjusting pre-loop limit: alignment is desirable even if `-XX:-AlignVector`. We would like to align the vectors with their vector width. >> >> **Problem** >> >> I have recently found a bug with our AlignVector [JDK-8310190](https://bugs.openjdk.org/browse/JDK-8310190). >> In that bug, we perform a misaligned memory vector access, which results in a `SIGBUS` on an ARM32 machine. >> Thanks @fg1417 for confirming this! >> Hence, we need to fix the alignment correctness checks. >> >> While working on this task, I also found some bugs in the "alignment by adjusting pre-loop limit": there were cases where it did not align the vectors correctly. >> >> **Problem Details** >> >> Reproducer: >> >> >> static void test(short[] a, short[] b, short mask) { >> for (int i = 0; i < RANGE; i+=8) { >> // Problematic for AlignVector >> b[i+0] = (short)(a[i+0] & mask); // best_memref, align 0 >> >> b[i+3] = (short)(a[i+3] & mask); // pack at offset 6 bytes >> b[i+4] = (short)(a[i+4] & mask); >> b[i+5] = (short)(a[i+5] & mask); >> b[i+6] = (short)(a[i+6] & mask); >> } >> } >> >> >> During `SuperWord::find_adjacent_refs` we used to check if the references are expected to be aligned. For that, we look at each "group" of references (eg all `LoadS`) and take the reference with the lowest offset. For that chosen reference, we check if it is alignable. If yes, we accept all references of that group, if no we reject all. >> >> This is problemati... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > some minor changes for Vladimir Okay, so C2 is smart enough to use `lea` when needed. Good. No more questions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14785#issuecomment-1879123189 From davleopo at openjdk.org Fri Jan 5 19:03:37 2024 From: davleopo at openjdk.org (David Leopoldseder) Date: Fri, 5 Jan 2024 19:03:37 GMT Subject: Integrated: 8322636: [JVMCI] HotSpotSpeculationLog can be inconsistent across a single compile In-Reply-To: References: Message-ID: On Fri, 22 Dec 2023 09:55:16 GMT, David Leopoldseder wrote: > This PR fixes a subtle inconsistency in `HotSpotSpeculationLog` . > > Normal uses of `HotSpotSpeculationLog` work by using a `SpeculationReason` and asking the speculation log via `maySpeculate` if the speculation can be performed, i.e., if it failed before for the given method. An example for this can be seen in Graal https://github.com/oracle/graal/blob/master/compiler/src/jdk.graal.compiler/src/jdk/graal/compiler/nodes/loop/CountedLoopInfo.java#L591C15-L591C15 > The implicit assumption is that the speculation log, `HotSpotSpeculationLog` in particular collects failed speculations at the beginning of a compile and then stays consistent during the compile. Why is that? - Because if there are new failed speculations added to the failed speculations during the compile - the compiler would speculate again on those in an inconsistent way. E.g. at the beginning of a compile a certain speculation has not failed yet and the compiler thinks it can do optimization xyz using a speculation - later during the compilation process it consults the speculation log but gets a different answer. All those inconsistent speculations that already failed will anyway later fail code installation in jvmci (they will throw a bailout during `HotSpotCodeCacheProvider#installCode` https://github.com/openjdk/jdk/blob/master/src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotSpeculationLog.java#L192 ). Thus, we should at least return a consistent result during a compile. > The problem for consistency here, that also makes troubles on the graal side, is that `maySpeculate` itself can collect failed speculations if there have not been any previously, i.e., `failedSpeculations == null`. > In order to make the speculation log consistent across an entire JVMCI compile this PR removes the collection of failed speculations in `maySpeculate`. This pull request has now been integrated. Changeset: 35a1b77d Author: David Leopoldseder Committer: Tom Rodriguez URL: https://git.openjdk.org/jdk/commit/35a1b77da541e4df3c4d1bab0825ea39e653808c Stats: 9 lines in 1 file changed: 6 ins; 3 del; 0 mod 8322636: [JVMCI] HotSpotSpeculationLog can be inconsistent across a single compile Reviewed-by: dnsimon, never ------------- PR: https://git.openjdk.org/jdk/pull/17183 From kvn at openjdk.org Fri Jan 5 19:05:38 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 5 Jan 2024 19:05:38 GMT Subject: RFR: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs [v59] In-Reply-To: References: Message-ID: On Thu, 4 Jan 2024 07:00:48 GMT, Emanuel Peter wrote: >> I want to push this in JDK23. >> After this fix here, I'm doing [this refactoring](https://github.com/openjdk/jdk/pull/16620). >> >> To calm your nerves: most of the changes are in auto-generated tests, and tests in general. >> >> **Context** >> >> `-XX:+AlignVector` ensures that SuperWord only creates LoadVector and StoreVector that can be memory aligned. This is achieved by iterating in the pre-loop until we reach the alignment boundary, then we can start the main loop properly aligned. However, this is not possible in all cases, sometimes some memory accesses cannot be guaranteed to be aligned, and we need to reject vectorization (at least partially, for some of the packs). >> >> Alignment is split into two tasks: >> - Alignment Correctness Checks: only relevant if `-XX:+AlignVector`. Need to reject vectorization if alignment is not possible. We must check if the address of the vector load/store is aligned with (divisible by) `ObjectAlignmentInBytes`. >> - Alignment by adjusting pre-loop limit: alignment is desirable even if `-XX:-AlignVector`. We would like to align the vectors with their vector width. >> >> **Problem** >> >> I have recently found a bug with our AlignVector [JDK-8310190](https://bugs.openjdk.org/browse/JDK-8310190). >> In that bug, we perform a misaligned memory vector access, which results in a `SIGBUS` on an ARM32 machine. >> Thanks @fg1417 for confirming this! >> Hence, we need to fix the alignment correctness checks. >> >> While working on this task, I also found some bugs in the "alignment by adjusting pre-loop limit": there were cases where it did not align the vectors correctly. >> >> **Problem Details** >> >> Reproducer: >> >> >> static void test(short[] a, short[] b, short mask) { >> for (int i = 0; i < RANGE; i+=8) { >> // Problematic for AlignVector >> b[i+0] = (short)(a[i+0] & mask); // best_memref, align 0 >> >> b[i+3] = (short)(a[i+3] & mask); // pack at offset 6 bytes >> b[i+4] = (short)(a[i+4] & mask); >> b[i+5] = (short)(a[i+5] & mask); >> b[i+6] = (short)(a[i+6] & mask); >> } >> } >> >> >> During `SuperWord::find_adjacent_refs` we used to check if the references are expected to be aligned. For that, we look at each "group" of references (eg all `LoadS`) and take the reference with the lowest offset. For that chosen reference, we check if it is alignable. If yes, we accept all references of that group, if no we reject all. >> >> This is problemati... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > some minor changes for Vladimir Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14785#pullrequestreview-1806680693 From duke at openjdk.org Fri Jan 5 20:32:39 2024 From: duke at openjdk.org (Zhiqiang Zang) Date: Fri, 5 Jan 2024 20:32:39 GMT Subject: RFR: 8322589: Add Ideal transformation: (~a) & (~b) => ~(a | b) [v2] In-Reply-To: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> References: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> Message-ID: > Hello, > > `(~a) & (~b) => ~(a | b)` is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1616C28-L1616C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. > > Thanks. Zhiqiang Zang has updated the pull request incrementally with one additional commit since the last revision: update the copyright dates. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16333/files - new: https://git.openjdk.org/jdk/pull/16333/files/cf2edb46..5072eb14 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16333&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16333&range=00-01 Stats: 4 lines in 4 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/16333.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16333/head:pull/16333 PR: https://git.openjdk.org/jdk/pull/16333 From duke at openjdk.org Fri Jan 5 20:45:41 2024 From: duke at openjdk.org (Zhiqiang Zang) Date: Fri, 5 Jan 2024 20:45:41 GMT Subject: RFR: 8322589: Add Ideal transformation: (~a) & (~b) => ~(a | b) [v3] In-Reply-To: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> References: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> Message-ID: <8DbGA3jHE7obdtiXueTWKSwKbQ7-q66G09lcOmaAcu8=.9a4d4e9f-523e-444e-a740-ca23a12f45f2@github.com> > Hello, > > `(~a) & (~b) => ~(a | b)` is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1616C28-L1616C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. > > Thanks. Zhiqiang Zang has updated the pull request incrementally with one additional commit since the last revision: address comments. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16333/files - new: https://git.openjdk.org/jdk/pull/16333/files/5072eb14..3b95720a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16333&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16333&range=01-02 Stats: 9 lines in 3 files changed: 2 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/16333.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16333/head:pull/16333 PR: https://git.openjdk.org/jdk/pull/16333 From xliu at openjdk.org Fri Jan 5 20:47:24 2024 From: xliu at openjdk.org (Xin Liu) Date: Fri, 5 Jan 2024 20:47:24 GMT Subject: RFR: 8320128: Clean up Parse constructor for OSR [v4] In-Reply-To: References: <0mxhhLR9DDdpiBhbXWPxCmMzo_OI9DYiIX36xGhxC_c=.1eed99b9-2dc3-4403-b3c1-22b6ebed6079@github.com> <-QQZANSu7PVpZ3CUX7kbp00f_xkhjiy7la3Z3gXutKA=.a1c938e4-12e0-44b8-a2dd-7cc0c0c9f09f@github.com> Message-ID: On Thu, 4 Jan 2024 20:16:48 GMT, Aleksey Shipilev wrote: >> Xin Liu has updated the pull request incrementally with one additional commit since the last revision: >> >> Use print_cr for the log message. > > What testing was done here? I suggest at least `tier{1,2,3}` to capture surprises. hi, @shipilev I ran tier1~3 yesterday. It only had 2 failures: 1. java/util/Base64/TestEncodingDecodingLength.java (I guess it's due to out of memory JDK-8295153) 2. sun/security/pkcs11/Provider/MultipleLogins.sh ( unsupported OS: Linux-amd64-64, please initialize NSS library location, skipping test) ============================== Test summary ============================== TEST TOTAL PASS FAIL ERROR jtreg:test/hotspot/jtreg:tier1 2511 2511 0 0 >> jtreg:test/jdk:tier1 2400 2399 1 0 << jtreg:test/langtools:tier1 4458 4458 0 0 jtreg:test/jaxp:tier1 0 0 0 0 jtreg:test/lib-test:tier1 32 32 0 0 jtreg:test/hotspot/jtreg:tier2 742 742 0 0 >> jtreg:test/jdk:tier2 4081 4080 1 0 << jtreg:test/langtools:tier2 11 11 0 0 jtreg:test/jaxp:tier2 512 512 0 0 jtreg:test/hotspot/jtreg:tier3 256 256 0 0 jtreg:test/jdk:tier3 1434 1434 0 0 jtreg:test/langtools:tier3 0 0 0 0 jtreg:test/jaxp:tier3 0 0 0 0 ============================== TEST FAILURE ------------- PR Comment: https://git.openjdk.org/jdk/pull/16669#issuecomment-1879235918 From duke at openjdk.org Fri Jan 5 21:37:36 2024 From: duke at openjdk.org (Zhiqiang Zang) Date: Fri, 5 Jan 2024 21:37:36 GMT Subject: RFR: 8322589: Add Ideal transformation: (~a) & (~b) => ~(a | b) [v4] In-Reply-To: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> References: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> Message-ID: > Hello, > > `(~a) & (~b) => ~(a | b)` is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1616C28-L1616C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. > > Thanks. Zhiqiang Zang has updated the pull request incrementally with one additional commit since the last revision: use utility functions. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16333/files - new: https://git.openjdk.org/jdk/pull/16333/files/3b95720a..6ee5f182 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16333&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16333&range=02-03 Stats: 34 lines in 3 files changed: 23 ins; 4 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/16333.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16333/head:pull/16333 PR: https://git.openjdk.org/jdk/pull/16333 From duke at openjdk.org Fri Jan 5 21:43:39 2024 From: duke at openjdk.org (Zhiqiang Zang) Date: Fri, 5 Jan 2024 21:43:39 GMT Subject: RFR: 8322589: Add Ideal transformation: (~a) & (~b) => ~(a | b) [v5] In-Reply-To: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> References: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> Message-ID: > Hello, > > `(~a) & (~b) => ~(a | b)` is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1616C28-L1616C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. > > Thanks. Zhiqiang Zang has updated the pull request incrementally with one additional commit since the last revision: update the copyright dates. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16333/files - new: https://git.openjdk.org/jdk/pull/16333/files/6ee5f182..ecb2098b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16333&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16333&range=03-04 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16333.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16333/head:pull/16333 PR: https://git.openjdk.org/jdk/pull/16333 From duke at openjdk.org Fri Jan 5 21:57:38 2024 From: duke at openjdk.org (Zhiqiang Zang) Date: Fri, 5 Jan 2024 21:57:38 GMT Subject: RFR: 8322077: Add Ideal transformation: (~a) | (~b) => ~(a & b) [v5] In-Reply-To: References: Message-ID: > Hello, > > (~a) | (~b) => ~(a & b) is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1617C28-L1617C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. > > Thanks. Zhiqiang Zang has updated the pull request incrementally with two additional commits since the last revision: - update the copyright dates. - address comments. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16334/files - new: https://git.openjdk.org/jdk/pull/16334/files/8697e399..154c69e5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16334&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16334&range=03-04 Stats: 54 lines in 5 files changed: 23 ins; 18 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/16334.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16334/head:pull/16334 PR: https://git.openjdk.org/jdk/pull/16334 From duke at openjdk.org Fri Jan 5 22:18:22 2024 From: duke at openjdk.org (Joshua Cao) Date: Fri, 5 Jan 2024 22:18:22 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines In-Reply-To: References: Message-ID: <_UiWT_w5oDMw_UTviUeL7vmlicXvi9c983ARz4FXcYo=.ef453e4e-9992-4520-9f2f-53a5fbc94cb1@github.com> On Thu, 16 Nov 2023 12:07:10 GMT, Roland Westrelin wrote: >> I can see why its confusing. I reworded the JBS title and added more to the summary. >> >> >> This confused me when first starting looking at compilation units. I would see a method reported as inlined, but in the early compilation phases, I still see the method call. I was not aware of late inlines. I think it would be a nice enhancement for PrintInlining to report which methods are late inlined. >> >> >> Yes, `PrintInlining` reports late inlines, but I think it would be nice for it to explicitly state which inlines are late inlines. I want to print `late inline`. >> >>> There's an open bug to clean it up: https://bugs.openjdk.org/browse/JDK-8039555 FWIW, I gave it a try at some point but I couldn't find a better solution. >> >> I can echo this issue, the inlining code does feel a little messy. I hope this patch does not make it worse, I'd say it keeps the messiness the same. > >> Yes, `PrintInlining` reports late inlines, but I think it would be nice for it to explicitly state which inlines are late inlines. I want to print `late inline`. > > I get it now and that looks reasonable to me. What about method handle invokes and late inlining of virtual calls. For those 2, the call site is initially found to not be a candidate for inlining and only later the compiler finds that it can inline. Does your change cover those 2 cases? @rwestrel could you take a look? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16595#issuecomment-1879318818 From dlong at openjdk.org Fri Jan 5 23:10:23 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 5 Jan 2024 23:10:23 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v4] In-Reply-To: References: Message-ID: <_s5v6PDwZFV4oLrpaNVKf-hBoB73NjCw2r_uMzK5XlQ=.1bfc49db-6786-45ed-a2ca-8e719a910a6b@github.com> On Fri, 5 Jan 2024 12:45:02 GMT, Roland Westrelin wrote: >> I'm not a C2 expert, so my high-level comments might not all make sense, but here goes. >> - My first reaction was why does this need to be so complicated? Can't we treat the slow path and cache implementation as a black box, and just common up the high-level get()? In my mind the ScopedValue get should be similar to a read from a "condy" dynamic constant. >> - The reason for breaking up the operations into individual nodes seems to be because of the profiling information. So I'm wondering how much this helps, given the added complexity. >> - I would expect multiple get() calls in the same method or in loops to be rare and/or programmer errors. The real advantage seems to be when the binding and the get() can be connected from caller to callee through inlining. >> - Are we able to optimize a get() on a constant/final ScopedValue into a simple array load at a constant offset? >> - Needing to do things like treat ScopedValueGetHitsInCache as always successful give be a bad feeling for some reason, and seem unnecessary if we did more at a higher (macro?) level rather than eagerly expanding the high-level operation into individual nodes. > >> I'm not a C2 expert, so my high-level comments might not all make sense, but here goes. >> >> * My first reaction was why does this need to be so complicated? > > That's a fair reaction. > >> Can't we treat the slow path and cache implementation as a black box, and just common up the high-level get()? In my mind the ScopedValue get should be similar to a read from a "condy" dynamic constant. > > Initially, I thought about delaying the inlining of `get()` methods and simply have a pass that look for `get()` calls with the same inputs. I don't think that works well because the current late inlining framework can't delay inlining very late. We don't run loop opts before we're done with inlining for instance. If we wanted to hoist a call out of loop we would need loop opts. For instance, tt's likely a call to `get()` depends on a null check that we would need to hoist first. > > The other thing about optimizing `get()` calls is that they are heavy weight nodes (a high level `get()` macro node would be very similar to a `get()` call node whichever way you look at it). We don't know how to hoist a call out of loop. A call acts as a barrier on the entire memory state and get in the way of memory optimizations. If profile reports the slow path to be never taken then the shape of the `get()` becomes lighter weight. It doesn't disrupt other optimizations. Probing the cache acts as a load + test which we know how to hoist from a loop. > > It felt to me that it would be fairly common for the slow path to not be needed and given the shape without the slow path is much easier to optimize, it was important to be able to expose early on if the slow path was there or not. > >> >> * The reason for breaking up the operations into individual nodes seems to be because of the profiling information. So I'm wondering how much this helps, given the added complexity. > > The thing about `get()` is that in simple cases, it optimizes well because of profile data. A `get()` call once inlined can essentially be hoisted out of loop if all goes well. It doesn't take much for simple optimizations on `get()` to not happen anymore. The goal of this patch is to bring consistency and have optimizations work well in all sort of scenarios. But it would be hard to sell if the simple cases don't work as well as they do without the patch. And I believe that requires profile data. > >> >> * I would expect multiple get() calls in the same method or in loops to be rare and/or programmer errors. The re... Thanks @rwestrel, that helps. I have no objections to this change, but I don't understand C2 enough to do a deeper review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-1879363796 From duke at openjdk.org Fri Jan 5 23:12:48 2024 From: duke at openjdk.org (Joshua Cao) Date: Fri, 5 Jan 2024 23:12:48 GMT Subject: RFR: 8323095: Expand TraceOptoParse block output abbreviations Message-ID: `e` -> `exception block` `lphd` -> `loop head` Also removing an unnecessary space. The successor ids have a space before them. Examples from `java -Xcomp -XX:+TraceOptoParse -version`: Parsing block #8 at bci [33,39), successors: 9 16(exception block) loop head ------------- Commit messages: - 8323095: Expand TraceOptoParse block output abbreviations Changes: https://git.openjdk.org/jdk/pull/17289/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17289&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8323095 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/17289.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17289/head:pull/17289 PR: https://git.openjdk.org/jdk/pull/17289 From dlong at openjdk.org Fri Jan 5 23:13:21 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 5 Jan 2024 23:13:21 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v4] In-Reply-To: References: Message-ID: On Fri, 5 Jan 2024 10:16:24 GMT, Andrew Haley wrote: > Maybe I'm misunderstanding this question, but that's what the scoped value cache does. @theRealAph I guess it boils down to whether the hash value can be treated as a compile-time constant, which seems possible because it's marked final. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-1879365710 From duke at openjdk.org Fri Jan 5 23:43:43 2024 From: duke at openjdk.org (Zhiqiang Zang) Date: Fri, 5 Jan 2024 23:43:43 GMT Subject: RFR: 8322077: Add Ideal transformation: (~a) | (~b) => ~(a & b) [v6] In-Reply-To: References: Message-ID: > Hello, > > (~a) | (~b) => ~(a & b) is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1617C28-L1617C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. > > Thanks. Zhiqiang Zang has updated the pull request incrementally with one additional commit since the last revision: remove unused code from tests. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16334/files - new: https://git.openjdk.org/jdk/pull/16334/files/154c69e5..f7e57ce4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16334&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16334&range=04-05 Stats: 2 lines in 2 files changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16334.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16334/head:pull/16334 PR: https://git.openjdk.org/jdk/pull/16334 From dlong at openjdk.org Fri Jan 5 23:45:22 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 5 Jan 2024 23:45:22 GMT Subject: RFR: 8323065: Unneccesary CodeBlob lookup in CompiledIC::internal_set_ic_destination In-Reply-To: References: Message-ID: On Fri, 5 Jan 2024 14:37:16 GMT, Aleksey Shipilev wrote: > I was looking at hotpath for IC stub cleaning (happens at safepoint), and one obvious thing is that we look-up `CodeBlob` from `call->instruction_address()` only to assert that is compiled one. It used to be protected by `#ifdef ASSERT` before [JDK-8212681](https://bugs.openjdk.org/browse/JDK-8212681), and pulled from it to be used in Mutex in JDK 12: https://hg.openjdk.org/jdk/jdk/rev/d6dc479bcdd3#l15.62 And the Mutex was shortly gone after [JDK-8214257](https://bugs.openjdk.org/browse/JDK-8214257). So we are exposing this code to product binaries since JDK 12. > > This fix reinstates the `ASSERT` block again. There are small improvements (~1..10us) for safepoint cleanup on small ad-hoc tests in release builds on my Mac. But since this whole thing involves looking up things in code cache, it may cost quite a lot. Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/17281#pullrequestreview-1807085790 From duke at openjdk.org Sat Jan 6 00:20:45 2024 From: duke at openjdk.org (Zhiqiang Zang) Date: Sat, 6 Jan 2024 00:20:45 GMT Subject: RFR: 8322077: Add Ideal transformation: (~a) | (~b) => ~(a & b) [v7] In-Reply-To: References: Message-ID: > Hello, > > (~a) | (~b) => ~(a & b) is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1617C28-L1617C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. > > Thanks. Zhiqiang Zang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: - remove unused code from tests. - update the copyright dates. - address comments. - untabify. - use common helpful functions. - include bug id. - include new optimization and tests. ------------- Changes: https://git.openjdk.org/jdk/pull/16334/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16334&range=06 Stats: 151 lines in 3 files changed: 151 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16334.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16334/head:pull/16334 PR: https://git.openjdk.org/jdk/pull/16334 From duke at openjdk.org Sat Jan 6 00:44:07 2024 From: duke at openjdk.org (Zhiqiang Zang) Date: Sat, 6 Jan 2024 00:44:07 GMT Subject: RFR: 8322077: Add Ideal transformation: (~a) | (~b) => ~(a & b) [v8] In-Reply-To: References: Message-ID: <7nWuqQFeSjlCr0JfuC9sxty9HNF37Ytrz6H6EWTApZg=.b9ef1659-5815-4540-828e-69f1f450afe0@github.com> > Hello, > > (~a) | (~b) => ~(a & b) is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1617C28-L1617C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. > > Thanks. Zhiqiang Zang has updated the pull request incrementally with one additional commit since the last revision: Add tests for using De Morgan's Law for both optimizations. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16334/files - new: https://git.openjdk.org/jdk/pull/16334/files/15a38bda..d8ed0f35 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16334&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16334&range=06-07 Stats: 218 lines in 2 files changed: 218 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16334.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16334/head:pull/16334 PR: https://git.openjdk.org/jdk/pull/16334 From duke at openjdk.org Sat Jan 6 00:44:29 2024 From: duke at openjdk.org (Zhiqiang Zang) Date: Sat, 6 Jan 2024 00:44:29 GMT Subject: RFR: 8322589: Add Ideal transformation: (~a) & (~b) => ~(a | b) [v5] In-Reply-To: References: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> Message-ID: On Fri, 5 Jan 2024 16:00:22 GMT, Emanuel Peter wrote: >> Zhiqiang Zang has updated the pull request incrementally with one additional commit since the last revision: >> >> update the copyright dates. > > Looks like a good idea :) > I left a few suggestions below. @eme64 @TobiHartmann Thanks for the comments. All addressed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16333#issuecomment-1879462996 From duke at openjdk.org Sat Jan 6 00:47:28 2024 From: duke at openjdk.org (Zhiqiang Zang) Date: Sat, 6 Jan 2024 00:47:28 GMT Subject: RFR: 8322077: Add Ideal transformation: (~a) | (~b) => ~(a & b) [v4] In-Reply-To: References: <5KKTYmY7dJ4nW0OEQ2UPuloIWOYc-pI9M8HRjoaRzw4=.f5eda6f9-c38c-4a2b-9690-cbb1791a2622@github.com> Message-ID: <2yuELMZxtnZVtWL2rhdtdzwIjg0tgZEi0csMHasBXtQ=.3e62b735-b4e5-4bfe-9594-bfb66e19205c@github.com> On Fri, 5 Jan 2024 16:22:38 GMT, Emanuel Peter wrote: >> Zhiqiang Zang has refreshed the contents of this pull request, and previous commits have been removed. Incremental views are not available. > > Looks like a good idea. Left a few comments. > > I would have merged this with https://github.com/openjdk/jdk/pull/16333, since it is essentially the symmetric case. But leave it separate now. > > It would be nice to have some shared tests, where both optimizations need to be combined. Like: > `(~a | ~b) & (~c | ~d)` -> `~(a & b) & ~(c & d)` -> `~((a & b) | (c & d))` @eme64 @TobiHartmann Thanks for the comments. All addressed. I rebased this PR onto #16333 so I was able to add these tests for using both optimizations. (the history was messed up). ------------- PR Comment: https://git.openjdk.org/jdk/pull/16334#issuecomment-1879464432 From aph at openjdk.org Sat Jan 6 10:10:23 2024 From: aph at openjdk.org (Andrew Haley) Date: Sat, 6 Jan 2024 10:10:23 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v4] In-Reply-To: References: Message-ID: On Fri, 5 Jan 2024 23:10:37 GMT, Dean Long wrote: > > Maybe I'm misunderstanding this question, but that's what the scoped value cache does. > > @theRealAph I guess it boils down to whether the hash value can be treated as a compile-time constant, which seems possible because it's marked final. It always has been in the tests I've done. One of the interesting challenges with this work has been to make sure scoped value performance doesn't regress. A great advantage of this PR is that a dedicated scoped value optimization helps to make such regressions less likely. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-1879624947 From igavrilin at openjdk.org Sat Jan 6 16:27:41 2024 From: igavrilin at openjdk.org (Ilya Gavrilin) Date: Sat, 6 Jan 2024 16:27:41 GMT Subject: RFR: 8322790: RISC-V: Tune costs for shuffles with no conversion [v2] In-Reply-To: References: Message-ID: > Hi all, please review this small change to RISC-V nodes insertion costs. > Now we have several nodes which provide shuffles without conversion: https://github.com/openjdk/jdk/blob/32d80e2caf6063b58128bd5f3dc87b276f3bd0cb/src/hotspot/cpu/riscv/riscv.ad#L8525-L8741 > On most RISC-V cpu`s we prefer reg<->reg operations, because they are faster, but now stack<->reg operations used (for details about reasons, please, visit connected jbs issue). > After changing insertion costs reg<->reg operations selected, and we can see performance improvements for benchmarks, which use such shuffles (tested on thead C910 board): > | Benchmark | Upstream build (ops/ms) | Patched build (ops/ms) | difference (%) | > |:-----------------------------------:|:-----------------------:|:----------------------:|:--------------:| > | MathBench.doubleToRawLongBitsDouble | 30935.139 | 32171.761 | +4.00 | > | StrictMathBench.ceilDouble | 24682.810 | 29782.050 | +20.66 | > | StrictMathBench.cosDouble | 6948.309 | 6938.276 | -0.14 | > | StrictMathBench.expDouble | 6816.143 | 7211.021 | +5.79 | > | StrictMathBench.floorDouble | 30699.630 | 34189.509 | +11.37 | > | StrictMathBench.maxDouble | 35157.355 | 34675.191 | -1.37 | > | StrictMathBench.minDouble | 35192.135 | 35183.015 | -0.03 | > | StrictMathBench.sinDouble | 6698.405 | 6721.809 | +0.35 | > > New benchmark for changed nodes: > > --- a/test/micro/org/openjdk/bench/java/lang/MathBench.java > +++ b/test/micro/org/openjdk/bench/java/lang/MathBench.java > @@ -540,4 +540,11 @@ public class MathBench { > return Math.ulp(float7); > } > > + @Benchmark > + public long doubleToRawLongBitsDouble() { > + double dbl162Dot5 = double81 * 2.0d + double0Dot5; > + double dbl3 = double2 + double1; > + return Double.doubleToRawLongBits(dbl162Dot5) + Double.doubleToRawLongBits(dbl3); > + } > + Ilya Gavrilin has updated the pull request incrementally with one additional commit since the last revision: Revert some costs changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17206/files - new: https://git.openjdk.org/jdk/pull/17206/files/31066965..ae8bca99 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17206&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17206&range=00-01 Stats: 8 lines in 1 file changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/17206.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17206/head:pull/17206 PR: https://git.openjdk.org/jdk/pull/17206 From igavrilin at openjdk.org Sat Jan 6 16:27:43 2024 From: igavrilin at openjdk.org (Ilya Gavrilin) Date: Sat, 6 Jan 2024 16:27:43 GMT Subject: RFR: 8322790: RISC-V: Tune costs for shuffles with no conversion [v2] In-Reply-To: References: Message-ID: On Wed, 3 Jan 2024 05:19:43 GMT, Fei Yang wrote: >> those nodes need to go below 100 which then starts looking ugly > > Seems that the performance gain is still there (tested on lichee-pi-4a board) when reverting part of the changes. I haven't checked the JIT code though. Try this addon change: > > [addon-change.diff.txt](https://github.com/openjdk/jdk/files/13815870/addon-change.diff.txt) Thanks, reverting some changes still leaves good generation. I have performed some more benchmarks on thead board, in all cases necessary instructions are generated in JIT code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17206#discussion_r1443795840 From igavrilin at openjdk.org Sat Jan 6 16:30:21 2024 From: igavrilin at openjdk.org (Ilya Gavrilin) Date: Sat, 6 Jan 2024 16:30:21 GMT Subject: RFR: 8322790: RISC-V: Tune costs for shuffles with no conversion In-Reply-To: References: Message-ID: On Sat, 30 Dec 2023 20:07:13 GMT, Ilya Gavrilin wrote: > Hi all, please review this small change to RISC-V nodes insertion costs. > Now we have several nodes which provide shuffles without conversion: https://github.com/openjdk/jdk/blob/32d80e2caf6063b58128bd5f3dc87b276f3bd0cb/src/hotspot/cpu/riscv/riscv.ad#L8525-L8741 > On most RISC-V cpu`s we prefer reg<->reg operations, because they are faster, but now stack<->reg operations used (for details about reasons, please, visit connected jbs issue). > After changing insertion costs reg<->reg operations selected, and we can see performance improvements for benchmarks, which use such shuffles (tested on thead C910 board): > | Benchmark | Upstream build (ops/ms) | Patched build (ops/ms) | difference (%) | > |:-----------------------------------:|:-----------------------:|:----------------------:|:--------------:| > | MathBench.doubleToRawLongBitsDouble | 30935.139 | 32171.761 | +4.00 | > | StrictMathBench.ceilDouble | 24682.810 | 29782.050 | +20.66 | > | StrictMathBench.cosDouble | 6948.309 | 6938.276 | -0.14 | > | StrictMathBench.expDouble | 6816.143 | 7211.021 | +5.79 | > | StrictMathBench.floorDouble | 30699.630 | 34189.509 | +11.37 | > | StrictMathBench.maxDouble | 35157.355 | 34675.191 | -1.37 | > | StrictMathBench.minDouble | 35192.135 | 35183.015 | -0.03 | > | StrictMathBench.sinDouble | 6698.405 | 6721.809 | +0.35 | > > New benchmark for changed nodes: > > --- a/test/micro/org/openjdk/bench/java/lang/MathBench.java > +++ b/test/micro/org/openjdk/bench/java/lang/MathBench.java > @@ -540,4 +540,11 @@ public class MathBench { > return Math.ulp(float7); > } > > + @Benchmark > + public long doubleToRawLongBitsDouble() { > + double dbl162Dot5 = double81 * 2.0d + double0Dot5; > + double dbl3 = double2 + double1; > + return Double.doubleToRawLongBits(dbl162Dot5) + Double.doubleToRawLongBits(dbl3); > + } > + Thanks @RealFYang for suggested changes, performed some additional tests on thead board, also checked JIT code for some tests. | Benchmark | Upstream | Old patch | Current patch | |------------------------------------------|-----------|-----------|---------------| | lang.MathBench.doubleToRawLongBitsDouble | 30495.868 | 32332.48 | 31635.15 | | lang.MathBench.longBitsToDoubleLong | 35161.101 | 34542.878 | 34146.705 | | lang.StrictMathBench.ceilDouble | 24272.224 | 29797.862 | 29094.981 | | lang.StrictMathBench.cosDouble | 6967.161 | 6930.468 | 6960.957 | | lang.StrictMathBench.expDouble | 6812.605 | 7211.988 | 7123.429 | | lang.StrictMathBench.floorDouble | 29893.151 | 34193.412 | 33257.669 | | lang.StrictMathBench.maxDouble | 34684.497 | 35194.694 | 35199.944 | | lang.StrictMathBench.minDouble | 34692.521 | 34673.531 | 34678.324 | | lang.StrictMathBench.sinDouble | 6769.593 | 6714.003 | 6736.884 | | math.FpRoundingBenchmark.testnativeceil | 67.801 | 115.6 | 116.822 | | math.FpRoundingBenchmark.testnativefloor | 71.745 | 116.59 | 116.662 | Additional benchmarks: diff --git a/test/micro/org/openjdk/bench/java/lang/MathBench.java b/test/micro/org/openjdk/bench/java/lang/MathBench.java index 27d8033b8b7..fd39cc58222 100644 --- a/test/micro/org/openjdk/bench/java/lang/MathBench.java +++ b/test/micro/org/openjdk/bench/java/lang/MathBench.java @@ -540,4 +540,17 @@ public class MathBench { return Math.ulp(float7); } + @Benchmark + public long doubleToRawLongBitsDouble() { + double dbl162Dot5 = double81 * 2.0d + double0Dot5; + double dbl3 = double2 + double1; + return Double.doubleToRawLongBits(dbl162Dot5) + Double.doubleToRawLongBits(dbl3); + } + + @Benchmark + public double longBitsToDoubleLong() { + long lng14 = long13 + long1; + long lng750 = long747 + 3; + return Double.longBitsToDouble(lng14) + Double.longBitsToDouble(lng750); + } } diff --git a/test/micro/org/openjdk/bench/java/math/FpRoundingBenchmark.java b/test/micro/org/openjdk/bench/java/math/FpRoundingBenchmark.java index cf0eed32e07..3687f43b886 100644 --- a/test/micro/org/openjdk/bench/java/math/FpRoundingBenchmark.java +++ b/test/micro/org/openjdk/bench/java/math/FpRoundingBenchmark.java @@ -75,4 +75,16 @@ public class FpRoundingBenchmark { for (int i = 0; i < TESTSIZE; i++) Res[i] = Math.rint(DargV1[i]); } + + @Benchmark + public void testnativeceil(Blackhole bh) { + for (int i = 0; i < TESTSIZE; i++) + Res[i] = StrictMath.ceil(DargV1[i]); + } + + @Benchmark + public void testnativefloor(Blackhole bh) { + for (int i = 0; i < TESTSIZE; i++) + Res[i] = StrictMath.floor(DargV1[i]); + } } ------------- PR Comment: https://git.openjdk.org/jdk/pull/17206#issuecomment-1879745479 From aph at openjdk.org Sat Jan 6 17:46:21 2024 From: aph at openjdk.org (Andrew Haley) Date: Sat, 6 Jan 2024 17:46:21 GMT Subject: RFR: 8319690: [AArch64] C2 compilation hits offset_ok_for_immed: assert "c2 compiler bug" In-Reply-To: References: <16J-lJ2AceGTVcRWBcP15yKcwO-1IA1XsngyOuNjf7k=.0776f081-ae2c-4279-87cf-d909806c2bc4@github.com> <4KRCqYxn02wMjYDuN3_HbYWxc9BDtYMNd40bNrJ4K8w=.5362dda4-8d30-48fd-8915-909eb15a6023@github.com> Message-ID: On Thu, 7 Dec 2023 06:45:30 GMT, Fei Gao wrote: >> After this change, `immIOffset` and `immLOffset` appear to be obsolete. > >> After this change, `immIOffset` and `immLOffset` appear to be obsolete. > > Removed them in the new commit. Thanks! > @fg1417 what is the state on this? > > The example here may look like a strange edge-case. But with my plans this may become more common: [JDK-8318446](https://bugs.openjdk.org/browse/JDK-8318446) C2: implement StoreNode::Ideal_merge_stores > > I will merge stores, which can then look like unaligned stores of a larger type. This bug here blocks my progress. I'm not in a hurry, just curious if there is now a plan how to proceed here ;) The problem with this PR is that the code is way too complex for such a simple problem. The port is correct as it is, in the release build. The only problem is an assertion. We could simply remove that assertion, but if it were me I'd fix the problem properly. Both @dean-long and I have suggested ways to improve this patch with less code. If @fg1417 decides to drop this PR I'll fix it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16991#issuecomment-1879765450 From qamai at openjdk.org Sun Jan 7 15:52:17 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Sun, 7 Jan 2024 15:52:17 GMT Subject: RFR: 8282365: Consolidate and improve division by constant idealizations [v43] In-Reply-To: References: Message-ID: > This patch implements idealisation for unsigned divisions to change a division by a constant into a series of multiplication and shift. I also change the idealisation of `DivI` to get a more efficient series when the magic constant overflows an int32. > > In general, the idea behind a signed division transformation is that for a positive constant `d`, we would need to find constants `c` and `m` so that: > > floor(x / d) = floor(x * c / 2**m) for 0 < x < 2**(N - 1) (1) > ceil(x / d) = floor(x * c / 2**m) + 1 for -2**(N - 1) <= x < 0 (2) > > The implementation in the original book takes into consideration that the machine may not be able to perform the full multiplication `x * c`, so the constant overflow and we need to add back the dividend as in `DivLNode::Ideal` cases. However, for int32 division, `x * c` cannot overflow an int64. As a result, it is always feasible to just calculate the product and shift the result. > > For unsigned multiplication, the situation is somewhat trickier because the condition needs to be twice as strong (the condition (1) and (2) above are mostly the same). This results in the magic constant `c` calculated based on the method presented in Hacker's Delight by Henry S. Warren, Jr. may overflow an uintN. For int division, we can depend on the theorem devised by Arch D. Robison in N-Bit Unsigned Division Via N-Bit Multiply-Add, which states that there exists either: > > c1 in uint32 and m1, such that floor(x / d) = floor(x * c1 / 2**m1) for 0 < x < 2**32 (3) > c2 in uint32 and m2, such that floor(x / d) = floor((x + 1) * c2 / 2**m2) for 0 < x < 2**32 (4) > > which means that either `x * c1` never overflows an uint64 or `(x + 1) * c2` never overflows an uint64. And we can perform a full multiplication. > > For longs, there is no way to do a full multiplication so we do some basic transformations to achieve a computable formula. The details I have written as comments in the overflow case. > > More tests are added to cover the possible patterns. > > Please take a look and have some reviews. Thank you very much. Quan Anh Mai has updated the pull request incrementally with two additional commits since the last revision: - parentheses - another round of reviews ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9947/files - new: https://git.openjdk.org/jdk/pull/9947/files/0f2c57c7..bba52b74 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9947&range=42 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9947&range=41-42 Stats: 18 lines in 3 files changed: 8 ins; 2 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/9947.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/9947/head:pull/9947 PR: https://git.openjdk.org/jdk/pull/9947 From qamai at openjdk.org Sun Jan 7 15:52:20 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Sun, 7 Jan 2024 15:52:20 GMT Subject: RFR: 8282365: Consolidate and improve division by constant idealizations [v42] In-Reply-To: References: Message-ID: On Tue, 2 Jan 2024 09:23:37 GMT, Stefan Karlsson wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> power of 2 > > test/hotspot/gtest/opto/test_constant_division.cpp line 29: > >> 27: #include "runtime/os.hpp" >> 28: #include "utilities/growableArray.hpp" >> 29: #include > > Move include. I was told that `unittest.hpp` should come last so this is the order, I have added a line between JDK header and stdlib header as well as resolved your other comments. Thanks a lot. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1444028106 From qamai at openjdk.org Sun Jan 7 15:52:22 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Sun, 7 Jan 2024 15:52:22 GMT Subject: RFR: 8282365: Consolidate and improve division by constant idealizations [v42] In-Reply-To: References: Message-ID: On Sat, 30 Dec 2023 18:36:19 GMT, Kim Barrett wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> power of 2 > > test/hotspot/gtest/opto/test_constant_division.cpp line 33: > >> 31: >> 32: // Generate a random positive integer of type T in a way that biases >> 33: // towards smaller values > > Why is there a bias toward smaller numbers? Maybe it should be named differently to indicate that bias? Because we are dealing with inputs of division so it makes more sense to have them following somewhat a reciprocal distribution. > test/hotspot/gtest/opto/test_constant_division.cpp line 54: > >> 52: template <> >> 53: julong random() { >> 54: juint bits = juint(os::random()) % 63 + 1; > > This change (`&` => `%`, and the similar change below) go a long way toward explaining why I couldn't > puzzle out what this function was intended to do. Note that `&` has lower precedence than `+`, so the > earlier version was masking with 64. The new version doesn't have that operator precedence mistake, > though I'd prefer the precedence be made explicit using parens. Yes that was my mistake, have added parentheses. > test/hotspot/gtest/opto/test_constant_division.cpp line 132: > >> 130: for (int i = 0; i < iter_num;) { >> 131: UT d = random(); >> 132: if ((d & (d - 1)) == 0) { > > We have `is_power_of_2` for this. This catches `d == 0` also so using `is_power_of_2` is a little misleading I think. > test/hotspot/gtest/opto/test_constant_division.cpp line 139: > >> 137: UT N_pos = random(); >> 138: if (N_neg < d && N_pos < d) { >> 139: continue; > > With sufficiently bad luck, we could spin here for a long time. (Similarly, though much less likely above with > the power-of-2 case.) That doesn't seem great. Of course, if one does count these skipped cases against > the iteration limit then with sufficiently bad luck one might not test anything. Rather than skipping the test > here, could you instead modify one of the values and proceed with the test? Yes I have done that, thanks a lot for your suggestions. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1444028650 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1444028423 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1444028703 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1444028776 From qamai at openjdk.org Sun Jan 7 16:22:55 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Sun, 7 Jan 2024 16:22:55 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v4] In-Reply-To: References: Message-ID: > Hi, > > This patch adds unsigned bounds and known bits constraints to `TypeInt` and `TypeLong`. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. The new constraints are applied to identity and value calls of the common nodes (Add, Sub, L/R/URShift, And, Or, Xor, bit counting, Cmp, Bool, ConvI2L/L2I), the detailed ideas for each node will be presented below. > > In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (~x & ones) == 0`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must normalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. > > Please kindly review, thanks very much. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-4 Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 26 commits: - Merge branch 'master' into improvevalue - improve add/sub implementation - Merge branch 'master' into improvevalue - typo - whitespace - fix tests for x86_32 - fix widen of ConvI2L - problem lists - format - comment - ... and 16 more: https://git.openjdk.org/jdk/compare/faa9c690...de1bac2e ------------- Changes: https://git.openjdk.org/jdk/pull/15440/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15440&range=03 Stats: 3753 lines in 35 files changed: 1953 ins; 1234 del; 566 mod Patch: https://git.openjdk.org/jdk/pull/15440.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15440/head:pull/15440 PR: https://git.openjdk.org/jdk/pull/15440 From kbarrett at openjdk.org Mon Jan 8 01:05:39 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 8 Jan 2024 01:05:39 GMT Subject: RFR: 8322816: RISC-V: Incorrect guarantee in patch_vtype [v2] In-Reply-To: References: Message-ID: > Please review this change that fixes a test for a guarantee. This also > removes a -Wparentheses warning when those are enabled (which is how the > problem was discovered). > > The problem is that operator precedence groups the sub-expressions differently > than intended. The fix is to override the operator precedence by adding > parentheses to achieve the intended grouping. > > Testing: Local (linux-x64) cross-build for linux-riscv with this change plus > -Wparentheses enabled and other changes to allow that to work. > > Requesting someone from the riscv porters to properly test this. Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: guarantee !vill ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17215/files - new: https://git.openjdk.org/jdk/pull/17215/files/a3723801..ab335602 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17215&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17215&range=00-01 Stats: 4 lines in 1 file changed: 0 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17215.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17215/head:pull/17215 PR: https://git.openjdk.org/jdk/pull/17215 From kbarrett at openjdk.org Mon Jan 8 01:05:40 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 8 Jan 2024 01:05:40 GMT Subject: RFR: 8322816: RISC-V: Incorrect guarantee in patch_vtype [v2] In-Reply-To: References: <2Dii1vRWzgzjyBGHrx_tH28SlYMBWUZ0h2mO7D5so_4=.817b5636-aa8a-4c6d-807a-29b1855a59a7@github.com> Message-ID: On Wed, 3 Jan 2024 01:59:00 GMT, Fei Yang wrote: >> Rather than removing the guarantee, wouldn't it be better to guarantee/assert `vill == 0`? >> Although looking at uses, that argument is a bool, so it should be `guarantee(!vill, ...)`. > > Hi, Yes, that's better. Maybe: `guarantee(!vill, "should be");` I've changed the guarantee as discussed. There are further cleanups possible here, but I'll leave that to the riscv port maintainers. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17215#discussion_r1444109348 From fyang at openjdk.org Mon Jan 8 02:08:22 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 8 Jan 2024 02:08:22 GMT Subject: RFR: 8322816: RISC-V: Incorrect guarantee in patch_vtype [v2] In-Reply-To: References: Message-ID: On Mon, 8 Jan 2024 01:05:39 GMT, Kim Barrett wrote: >> Please review this change that fixes a test for a guarantee. This also >> removes a -Wparentheses warning when those are enabled (which is how the >> problem was discovered). >> >> The problem is that operator precedence groups the sub-expressions differently >> than intended. The fix is to override the operator precedence by adding >> parentheses to achieve the intended grouping. >> >> Testing: Local (linux-x64) cross-build for linux-riscv with this change plus >> -Wparentheses enabled and other changes to allow that to work. >> >> Requesting someone from the riscv porters to properly test this. > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > guarantee !vill Marked as reviewed by fyang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/17215#pullrequestreview-1807946542 From xliu at openjdk.org Mon Jan 8 03:24:33 2024 From: xliu at openjdk.org (Xin Liu) Date: Mon, 8 Jan 2024 03:24:33 GMT Subject: RFR: 8322982: CTW fails to build after 8308753 Message-ID: This patch fixes the build error of CTW by sidelining ModuleInfoWriter.java. ModuleInfoWriter uses Class-File API, which has transitioned to preview. If we really need to compile it, we have to append --enable-preview and --source N. The fact is CTW itself doesn't depend on ModuleInfoWriter. I think it's easier to maintain CTW if we filter it out in Makefile. ------------- Commit messages: - 8322982: CTW fails to build after 8308753 Changes: https://git.openjdk.org/jdk/pull/17292/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17292&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8322982 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17292.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17292/head:pull/17292 PR: https://git.openjdk.org/jdk/pull/17292 From kbarrett at openjdk.org Mon Jan 8 05:37:45 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 8 Jan 2024 05:37:45 GMT Subject: RFR: 8322759: Eliminate -Wparentheses warnings in compiler code [v2] In-Reply-To: <496tGkQ1KUCrW1IHOETyvhqopkNYjsEoupxjo0Ze3Wg=.0223f494-fd34-4e4a-a31a-5030603f2113@github.com> References: <496tGkQ1KUCrW1IHOETyvhqopkNYjsEoupxjo0Ze3Wg=.0223f494-fd34-4e4a-a31a-5030603f2113@github.com> Message-ID: <-w16Kse74yx2EiWCorBtcKf1KXA1Rh5q-6Ze2T_qors=.06ead22b-ec5a-4859-888c-f0e3a283d7f3@github.com> > Please review this change to eliminate some -Wparentheses warnings. This > involved simply adding a few parentheses to make some implicit operator > precedence explicit. > > This change addresses non-C2 parts of the compiler component. > > Testing: mach5 tier1 > > Also ran mach5 tier1 with these changes in conjunction enabling -Wparentheses > and other changes needed to make that work. Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' into compiler-wparentheses - simplify asserts - update copyrights for new year - fix -Wparentheses warnings in non-C2 compiler code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17200/files - new: https://git.openjdk.org/jdk/pull/17200/files/b2a4515a..e130bb2f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17200&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17200&range=00-01 Stats: 5096 lines in 438 files changed: 2760 ins; 933 del; 1403 mod Patch: https://git.openjdk.org/jdk/pull/17200.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17200/head:pull/17200 PR: https://git.openjdk.org/jdk/pull/17200 From kbarrett at openjdk.org Mon Jan 8 05:37:47 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 8 Jan 2024 05:37:47 GMT Subject: RFR: 8322759: Eliminate -Wparentheses warnings in compiler code [v2] In-Reply-To: References: <496tGkQ1KUCrW1IHOETyvhqopkNYjsEoupxjo0Ze3Wg=.0223f494-fd34-4e4a-a31a-5030603f2113@github.com> Message-ID: On Wed, 3 Jan 2024 12:07:10 GMT, Aleksey Shipilev wrote: >> Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Merge branch 'master' into compiler-wparentheses >> - simplify asserts >> - update copyrights for new year >> - fix -Wparentheses warnings in non-C2 compiler code > > src/hotspot/share/compiler/compilerDefinitions.inline.hpp line 60: > >> 58: >> 59: inline bool CompilerConfig::is_c1_or_interpreter_only_no_jvmci() { >> 60: assert((is_jvmci_compiler() && is_jvmci()) || !is_jvmci_compiler(), "JVMCI compiler implies enabled JVMCI"); > > This looks like simply: > > > assert(!is_jvmci_compiler() || is_jvmci(), "JVMCI compiler implies enabled JVMCI"); Agreed. Changed accordingly. > src/hotspot/share/compiler/compilerDefinitions.inline.hpp line 117: > >> 115: // Tiered is basically C1 & (C2 | JVMCI) minus all the odd cases with restrictions. >> 116: inline bool CompilerConfig::is_tiered() { >> 117: assert((is_c1_simple_only() && is_c1_only()) || !is_c1_simple_only(), "c1 simple mode must imply c1-only mode"); > > Ditto, > > > assert(!is_c1_simple_only() || is_c1_only(), "c1 simple mode must imply c1-only mode"); Agreed. Changed accordingly. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17200#discussion_r1444184964 PR Review Comment: https://git.openjdk.org/jdk/pull/17200#discussion_r1444185000 From jbhateja at openjdk.org Mon Jan 8 06:09:22 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 8 Jan 2024 06:09:22 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v3] In-Reply-To: References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Message-ID: On Fri, 5 Jan 2024 10:02:28 GMT, Emanuel Peter wrote: > Thanks for the updates! > > One more idea: Your AVX2 solution has a lot of cost for converting the mask to a permutation. Might it make sense to split this off into a separate vector-node, so that it can float out of a loop if the mask is invariant? CompressV / ExpandV only accepts two inputs, vector to be operated on and mask under which operation is performed, permute table based implementation is specific to x86 backend implementation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17261#issuecomment-1880430502 From jbhateja at openjdk.org Mon Jan 8 06:09:24 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 8 Jan 2024 06:09:24 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2] In-Reply-To: References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Message-ID: <1GHGK7AGinCMKjFIB5oadUP0jiZrC39Z0hncAS3H-9Y=.eb617125-1f51-4cff-889e-b15321f5c72b@github.com> On Fri, 5 Jan 2024 09:45:11 GMT, Emanuel Peter wrote: > You are using `VectorMask pred = VectorMask.fromLong(ispecies, maskctr++);`. That basically systematically iterates over all masks, which is nice for a correctness test. But that would use different density inside one test run, right? The average over the loop is still at `50%`, correct? > > I was thinking more a run where the percentage over the whole loop is lower than maybe `1%`. That would get us to a point where maybe the branch prediction of non-vectorized code might be faster, what do you think? An imperative loop compression will check each mask bit to select compressible lane. Therefore mask with low or high density of set bits should show similar performance. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1444196848 From jbhateja at openjdk.org Mon Jan 8 06:23:46 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 8 Jan 2024 06:23:46 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v4] In-Reply-To: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Message-ID: > Hi, > > Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only targets. > Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 instruction set. > These are very frequently used APIs in columnar database filter operation. > > Implementation uses a lookup table to record permute indices. Table index is computed using > mask argument of compress/expand operation. > > Following are the performance number of JMH micro included with the patch. > > > System : Intel(R) Xeon(R) Platinum 8480+ (Sapphire Rapids) > > Baseline: > Benchmark (size) Mode Cnt Score Error Units > ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 142.767 ops/ms > ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 71.436 ops/ms > ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 35.992 ops/ms > ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 182.151 ops/ms > ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 91.096 ops/ms > ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 44.757 ops/ms > ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 184.099 ops/ms > ColumnFilterBenchmark.filterIntColumn 2047 thrpt 2 91.981 ops/ms > ColumnFilterBenchmark.filterIntColumn 4096 thrpt 2 45.170 ops/ms > ColumnFilterBenchmark.filterLongColumn 1024 thrpt 2 148.017 ops/ms > ColumnFilterBenchmark.filterLongColumn 2047 thrpt 2 73.516 ops/ms > ColumnFilterBenchmark.filterLongColumn 4096 thrpt 2 36.844 ops/ms > > Withopt: > Benchmark (size) Mode Cnt Score Error Units > ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 2051.707 ops/ms > ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 914.072 ops/ms > ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 489.898 ops/ms > ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 5324.195 ops/ms > ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 2587.229 ops/ms > ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 1278.665 ops/ms > ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 4149.384 ops/ms > ColumnFilterBenchmark.filterIntColumn 2047 thrpt 2 1791.170 ops/ms > ColumnFilterBenchmark.filterIntColumn 4096... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review suggestions incorporated. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17261/files - new: https://git.openjdk.org/jdk/pull/17261/files/ea0aa0b4..257a6351 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17261&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17261&range=02-03 Stats: 24 lines in 1 file changed: 2 ins; 2 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/17261.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17261/head:pull/17261 PR: https://git.openjdk.org/jdk/pull/17261 From thartmann at openjdk.org Mon Jan 8 06:55:24 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 8 Jan 2024 06:55:24 GMT Subject: RFR: 8320128: Clean up Parse constructor for OSR [v4] In-Reply-To: <-QQZANSu7PVpZ3CUX7kbp00f_xkhjiy7la3Z3gXutKA=.a1c938e4-12e0-44b8-a2dd-7cc0c0c9f09f@github.com> References: <0mxhhLR9DDdpiBhbXWPxCmMzo_OI9DYiIX36xGhxC_c=.1eed99b9-2dc3-4403-b3c1-22b6ebed6079@github.com> <-QQZANSu7PVpZ3CUX7kbp00f_xkhjiy7la3Z3gXutKA=.a1c938e4-12e0-44b8-a2dd-7cc0c0c9f09f@github.com> Message-ID: <_LWzDNl5A61rNi5D-W0kgE3nFG5dScUQ8KO1TtqMCKw=.ad6c34cc-1bd8-4bb8-9949-00f0ce09a432@github.com> On Thu, 4 Jan 2024 17:06:38 GMT, Xin Liu wrote: >> There's a special case for the constructor of Parse. If current compilation is OSR and it is handling the top-level method(depth() == 1), then >> >> 1. _tf = C->tf(); >> 2. _entry_bci = C->entry_bci(); >> 3. _flow = method()->get_osr_flow_analysis(_entry_bci); >> >> We don't need to assign those member data twice. We can also factor out _flow->failing() for the special case and normal cases. >> >> It's worth mentioning that we can't save ciTypeFlow computation because >> get_osr_flow_analysis(_entry_bci) actually needs get_flow_analysis(method()). > > Xin Liu has updated the pull request incrementally with one additional commit since the last revision: > > Use print_cr for the log message. All tests passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16669#issuecomment-1880464791 From thartmann at openjdk.org Mon Jan 8 06:58:21 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 8 Jan 2024 06:58:21 GMT Subject: RFR: 8322694: C1: Handle Constant and IfOp in NullCheckEliminator [v2] In-Reply-To: References: <1vvyuwLRjlWItKwCyighjCSM5SNbO4CSEE59hQtCU24=.b4783e52-328a-4ce3-8c92-7b736cea7546@github.com> Message-ID: On Wed, 3 Jan 2024 13:37:21 GMT, Denghui Dong wrote: >> This patch added the support for Constant and IfOn in NullCheckEliminator to eliminate more null check. >> >> testing: tier1-4 in progress > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > update Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17191#pullrequestreview-1808102434 From thartmann at openjdk.org Mon Jan 8 07:01:25 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 8 Jan 2024 07:01:25 GMT Subject: RFR: 8322589: Add Ideal transformation: (~a) & (~b) => ~(a | b) [v5] In-Reply-To: References: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> Message-ID: On Fri, 5 Jan 2024 21:43:39 GMT, Zhiqiang Zang wrote: >> Hello, >> >> `(~a) & (~b) => ~(a | b)` is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1616C28-L1616C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. >> >> Thanks. > > Zhiqiang Zang has updated the pull request incrementally with one additional commit since the last revision: > > update the copyright dates. Looks good to me otherwise. src/hotspot/share/opto/mulnode.cpp line 615: > 613: // Convert "(~a) & (~b)" into "~(a | b)" > 614: if (AddNode::is_not(phase, in(1), T_INT) && AddNode::is_not(phase, in(2), T_INT)) { > 615: Node *or_a_b = new OrINode(in(1)->in(1), in(2)->in(1)); Suggestion: Node* or_a_b = new OrINode(in(1)->in(1), in(2)->in(1)); Same below. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16333#pullrequestreview-1808105721 PR Review Comment: https://git.openjdk.org/jdk/pull/16333#discussion_r1444223718 From thartmann at openjdk.org Mon Jan 8 07:05:27 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 8 Jan 2024 07:05:27 GMT Subject: RFR: 8322589: Add Ideal transformation: (~a) & (~b) => ~(a | b) [v5] In-Reply-To: References: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> Message-ID: On Fri, 5 Jan 2024 21:43:39 GMT, Zhiqiang Zang wrote: >> Hello, >> >> `(~a) & (~b) => ~(a | b)` is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1616C28-L1616C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. >> >> Thanks. > > Zhiqiang Zang has updated the pull request incrementally with one additional commit since the last revision: > > update the copyright dates. src/hotspot/share/opto/addnode.hpp line 84: > 82: // Utility function to check if the given node is a NOT operation, > 83: // i.e., n == m ^ (-1). > 84: static bool is_not(PhaseGVN* phase, Node* n, BasicType bt); Could these be made non-static? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16333#discussion_r1444226697 From rehn at openjdk.org Mon Jan 8 07:30:25 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Mon, 8 Jan 2024 07:30:25 GMT Subject: RFR: 8322790: RISC-V: Tune costs for shuffles with no conversion [v2] In-Reply-To: References: Message-ID: On Sat, 6 Jan 2024 16:27:41 GMT, Ilya Gavrilin wrote: >> Hi all, please review this small change to RISC-V nodes insertion costs. >> Now we have several nodes which provide shuffles without conversion: https://github.com/openjdk/jdk/blob/32d80e2caf6063b58128bd5f3dc87b276f3bd0cb/src/hotspot/cpu/riscv/riscv.ad#L8525-L8741 >> On most RISC-V cpu`s we prefer reg<->reg operations, because they are faster, but now stack<->reg operations used (for details about reasons, please, visit connected jbs issue). >> After changing insertion costs reg<->reg operations selected, and we can see performance improvements for benchmarks, which use such shuffles (tested on thead C910 board): >> | Benchmark | Upstream build (ops/ms) | Patched build (ops/ms) | difference (%) | >> |:-----------------------------------:|:-----------------------:|:----------------------:|:--------------:| >> | MathBench.doubleToRawLongBitsDouble | 30935.139 | 32171.761 | +4.00 | >> | StrictMathBench.ceilDouble | 24682.810 | 29782.050 | +20.66 | >> | StrictMathBench.cosDouble | 6948.309 | 6938.276 | -0.14 | >> | StrictMathBench.expDouble | 6816.143 | 7211.021 | +5.79 | >> | StrictMathBench.floorDouble | 30699.630 | 34189.509 | +11.37 | >> | StrictMathBench.maxDouble | 35157.355 | 34675.191 | -1.37 | >> | StrictMathBench.minDouble | 35192.135 | 35183.015 | -0.03 | >> | StrictMathBench.sinDouble | 6698.405 | 6721.809 | +0.35 | >> >> New benchmark for changed nodes: >> >> --- a/test/micro/org/openjdk/bench/java/lang/MathBench.java >> +++ b/test/micro/org/openjdk/bench/java/lang/MathBench.java >> @@ -540,4 +540,11 @@ public class MathBench { >> return Math.ulp(float7); >> } >> >> + @Benchmark >> + public long doubleToRawLongBitsDouble() { >> + double dbl162Dot5 = double81 * 2.0d + double0Dot5; >> + double dbl3 = double2 + double1; >> + return Double.doubleToRawLongBits(dbl162Dot5) + Double.doubleToRawLongBits(dbl3); >> + } >> + > > Ilya Gavrilin has updated the pull request incrementally with one additional commit since the last revision: > > Revert some costs changes Still reasonable to me. ------------- Marked as reviewed by rehn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17206#pullrequestreview-1808157228 From epeter at openjdk.org Mon Jan 8 07:47:25 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 8 Jan 2024 07:47:25 GMT Subject: RFR: 8322589: Add Ideal transformation: (~a) & (~b) => ~(a | b) [v5] In-Reply-To: References: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> Message-ID: On Mon, 8 Jan 2024 07:02:50 GMT, Tobias Hartmann wrote: >> Zhiqiang Zang has updated the pull request incrementally with one additional commit since the last revision: >> >> update the copyright dates. > > src/hotspot/share/opto/addnode.hpp line 84: > >> 82: // Utility function to check if the given node is a NOT operation, >> 83: // i.e., n == m ^ (-1). >> 84: static bool is_not(PhaseGVN* phase, Node* n, BasicType bt); > > Could these be made non-static? Hmm, I agree with this idea. `n->is_not(...)` would really be nicer. You'd probably have to move the two methods to `node.hpp`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16333#discussion_r1444251176 From epeter at openjdk.org Mon Jan 8 07:50:25 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 8 Jan 2024 07:50:25 GMT Subject: RFR: 8322077: Add Ideal transformation: (~a) | (~b) => ~(a & b) [v8] In-Reply-To: <7nWuqQFeSjlCr0JfuC9sxty9HNF37Ytrz6H6EWTApZg=.b9ef1659-5815-4540-828e-69f1f450afe0@github.com> References: <7nWuqQFeSjlCr0JfuC9sxty9HNF37Ytrz6H6EWTApZg=.b9ef1659-5815-4540-828e-69f1f450afe0@github.com> Message-ID: On Sat, 6 Jan 2024 00:44:07 GMT, Zhiqiang Zang wrote: >> Hello, >> >> (~a) | (~b) => ~(a & b) is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1617C28-L1617C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. >> >> Thanks. > > Zhiqiang Zang has updated the pull request incrementally with one additional commit since the last revision: > > Add tests for using De Morgan's Law for both optimizations. Nice, looks much better, thanks for the updates! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16334#issuecomment-1880515343 From kbarrett at openjdk.org Mon Jan 8 08:00:39 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 8 Jan 2024 08:00:39 GMT Subject: RFR: 8323115: x86-32: Incorrect predicates for cmov instruct transforms with UseSSE Message-ID: <7BcH08lab4xxFOxm0UAfhHKVyBBEhRWtIPEUJuOIlmo=.2c5b0918-19ac-4cb3-b2f5-11131ce73422@github.com> Please review this change that fixes generation of CMOV by C2 as controlled by UseSSE. The predicates controlling that generation were using implicit operator precedence that didn't have the expected grouping. Fixed by adding parentheses to make the desired grouping explicit. Testing: Ran GHA with -Wparentheses enabled along with this and other changes needed to make that work. ------------- Commit messages: - fix predicates for cmov with UseSSE Changes: https://git.openjdk.org/jdk/pull/17296/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17296&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8323115 Stats: 13 lines in 1 file changed: 0 ins; 0 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/17296.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17296/head:pull/17296 PR: https://git.openjdk.org/jdk/pull/17296 From thartmann at openjdk.org Mon Jan 8 08:39:27 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 8 Jan 2024 08:39:27 GMT Subject: RFR: 8322589: Add Ideal transformation: (~a) & (~b) => ~(a | b) [v5] In-Reply-To: References: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> Message-ID: On Fri, 5 Jan 2024 21:43:39 GMT, Zhiqiang Zang wrote: >> Hello, >> >> `(~a) & (~b) => ~(a | b)` is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1616C28-L1616C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. >> >> Thanks. > > Zhiqiang Zang has updated the pull request incrementally with one additional commit since the last revision: > > update the copyright dates. src/hotspot/share/opto/addnode.cpp line 260: > 258: } > 259: > 260: AddNode* AddNode::make_not(PhaseGVN* phase, Node*n, BasicType bt) { Suggestion: AddNode* AddNode::make_not(PhaseGVN* phase, Node* n, BasicType bt) { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16333#discussion_r1444288412 From thartmann at openjdk.org Mon Jan 8 08:40:26 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 8 Jan 2024 08:40:26 GMT Subject: RFR: 8322077: Add Ideal transformation: (~a) | (~b) => ~(a & b) [v8] In-Reply-To: <7nWuqQFeSjlCr0JfuC9sxty9HNF37Ytrz6H6EWTApZg=.b9ef1659-5815-4540-828e-69f1f450afe0@github.com> References: <7nWuqQFeSjlCr0JfuC9sxty9HNF37Ytrz6H6EWTApZg=.b9ef1659-5815-4540-828e-69f1f450afe0@github.com> Message-ID: On Sat, 6 Jan 2024 00:44:07 GMT, Zhiqiang Zang wrote: >> Hello, >> >> (~a) | (~b) => ~(a & b) is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1617C28-L1617C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. >> >> Thanks. > > Zhiqiang Zang has updated the pull request incrementally with one additional commit since the last revision: > > Add tests for using De Morgan's Law for both optimizations. Looks good to me otherwise. test/hotspot/jtreg/compiler/c2/irTests/DeMorganLawIntTests.java line 31: > 29: * @test > 30: * @bug 8322077 > 31: * @summary Test that Ideal transformations on the De Morgan's Law performe Suggestion: * @summary Test that Ideal transformations on the De Morgan's Law perform test/hotspot/jtreg/compiler/c2/irTests/DeMorganLawLongTests.java line 31: > 29: * @test > 30: * @bug 8322077 > 31: * @summary Test that Ideal transformations on the De Morgan's Law performe Suggestion: * @summary Test that Ideal transformations on the De Morgan's Law perform ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16334#pullrequestreview-1808302290 PR Review Comment: https://git.openjdk.org/jdk/pull/16334#discussion_r1444279955 PR Review Comment: https://git.openjdk.org/jdk/pull/16334#discussion_r1444280127 From epeter at openjdk.org Mon Jan 8 08:49:23 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 8 Jan 2024 08:49:23 GMT Subject: RFR: 8322735: C2: minor improvements of bubble sort used in SuperWord::packset_sort [v2] In-Reply-To: <1W_HcoiwEnXLvtqprc7L8mghr0BRH3a_ITn6Cerzb_c=.f6c90e24-7d41-418d-bc50-41bff396663b@github.com> References: <1W_HcoiwEnXLvtqprc7L8mghr0BRH3a_ITn6Cerzb_c=.f6c90e24-7d41-418d-bc50-41bff396663b@github.com> Message-ID: On Fri, 5 Jan 2024 08:57:33 GMT, Denghui Dong wrote: >> A minor improvement could be made for bubble sort in SuperWord::packset_sort to reduce the comparison count in bad cases. >> >> See https://en.wikipedia.org/wiki/Bubble_sort > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > update @D-D-H testing passed. Looks good. Thanks for the change! ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17190#pullrequestreview-1808357678 From ddong at openjdk.org Mon Jan 8 09:24:20 2024 From: ddong at openjdk.org (Denghui Dong) Date: Mon, 8 Jan 2024 09:24:20 GMT Subject: RFR: 8322735: C2: minor improvements of bubble sort used in SuperWord::packset_sort In-Reply-To: References: Message-ID: On Fri, 5 Jan 2024 10:14:08 GMT, Emanuel Peter wrote: >>> Please do the renaming, and then I can run testing and give you my approval. >> >> Updated. Thanks. > > @D-D-H nice. Testing running. @eme64 Thank you! Do I need a second review? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17190#issuecomment-1880632132 From epeter at openjdk.org Mon Jan 8 09:24:23 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 8 Jan 2024 09:24:23 GMT Subject: RFR: 8322735: C2: minor improvements of bubble sort used in SuperWord::packset_sort [v2] In-Reply-To: <1W_HcoiwEnXLvtqprc7L8mghr0BRH3a_ITn6Cerzb_c=.f6c90e24-7d41-418d-bc50-41bff396663b@github.com> References: <1W_HcoiwEnXLvtqprc7L8mghr0BRH3a_ITn6Cerzb_c=.f6c90e24-7d41-418d-bc50-41bff396663b@github.com> Message-ID: On Fri, 5 Jan 2024 08:57:33 GMT, Denghui Dong wrote: >> A minor improvement could be made for bubble sort in SuperWord::packset_sort to reduce the comparison count in bad cases. >> >> See https://en.wikipedia.org/wiki/Bubble_sort > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > update Yes, I think that would be preferrable, even though this is not a very complicated fix. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17190#issuecomment-1880633452 From stefank at openjdk.org Mon Jan 8 09:28:49 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 8 Jan 2024 09:28:49 GMT Subject: RFR: 8282365: Consolidate and improve division by constant idealizations [v42] In-Reply-To: References: Message-ID: On Sun, 7 Jan 2024 15:44:44 GMT, Quan Anh Mai wrote: >> test/hotspot/gtest/opto/test_constant_division.cpp line 29: >> >>> 27: #include "runtime/os.hpp" >>> 28: #include "utilities/growableArray.hpp" >>> 29: #include >> >> Move include. > > I was told that `unittest.hpp` should come last so this is the order, I have added a line between JDK header and stdlib header as well as resolved your other comments. Thanks a lot. The rules around the includes lines in our tests and what we currently have in the tests are messy at the movement. We should fix that when we find the time to. For HotSpot source code files the includes should be structured as:: hotspot includes blank line system includes There are some deviations from that, but those should be cleaned up instead of used as a precedence. For our tests we should add "unittest.hpp" at the end of the "hotspot includes" section. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1444334222 From thartmann at openjdk.org Mon Jan 8 09:33:34 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 8 Jan 2024 09:33:34 GMT Subject: RFR: 8310711: [IR Framework] Remove safepoint while printing handling [v2] In-Reply-To: <278BZfCuvI5xJSWh2PvZtONVaZ2QjxkWKj1NifCbFYE=.af0a47ab-63a6-47bd-953b-4b0756107227@github.com> References: <278BZfCuvI5xJSWh2PvZtONVaZ2QjxkWKj1NifCbFYE=.af0a47ab-63a6-47bd-953b-4b0756107227@github.com> Message-ID: On Fri, 5 Jan 2024 10:58:41 GMT, Christian Hagedorn wrote: >> This clean-up PR removes the handling of the `` message in the IR framework. It is no longer required since we dump the output of `PrintIdeal` to the hotspot_pid file differently since [JDK-8306922](https://bugs.openjdk.org/browse/JDK-8306922). There is no interrupting `` message anymore. I removed the corresponding now unneeded code together with the previously added test case for it. >> >> Testing: tier1-4 >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Update copyright year > - Merge branch 'master' into JDK-8310711 > - 8310711: [IR Framework] Remove safepoint while printing handling Marked as reviewed by thartmann (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16921#pullrequestreview-1808483045 From epeter at openjdk.org Mon Jan 8 09:33:35 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 8 Jan 2024 09:33:35 GMT Subject: RFR: 8310711: [IR Framework] Remove safepoint while printing handling [v2] In-Reply-To: <278BZfCuvI5xJSWh2PvZtONVaZ2QjxkWKj1NifCbFYE=.af0a47ab-63a6-47bd-953b-4b0756107227@github.com> References: <278BZfCuvI5xJSWh2PvZtONVaZ2QjxkWKj1NifCbFYE=.af0a47ab-63a6-47bd-953b-4b0756107227@github.com> Message-ID: On Fri, 5 Jan 2024 10:58:41 GMT, Christian Hagedorn wrote: >> This clean-up PR removes the handling of the `` message in the IR framework. It is no longer required since we dump the output of `PrintIdeal` to the hotspot_pid file differently since [JDK-8306922](https://bugs.openjdk.org/browse/JDK-8306922). There is no interrupting `` message anymore. I removed the corresponding now unneeded code together with the previously added test case for it. >> >> Testing: tier1-4 >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Update copyright year > - Merge branch 'master' into JDK-8310711 > - 8310711: [IR Framework] Remove safepoint while printing handling Marked as reviewed by epeter (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16921#pullrequestreview-1808483801 From ddong at openjdk.org Mon Jan 8 09:40:22 2024 From: ddong at openjdk.org (Denghui Dong) Date: Mon, 8 Jan 2024 09:40:22 GMT Subject: RFR: 8322735: C2: minor improvements of bubble sort used in SuperWord::packset_sort [v2] In-Reply-To: References: <1W_HcoiwEnXLvtqprc7L8mghr0BRH3a_ITn6Cerzb_c=.f6c90e24-7d41-418d-bc50-41bff396663b@github.com> Message-ID: On Mon, 8 Jan 2024 09:21:56 GMT, Emanuel Peter wrote: > Yes, I think that would be preferrable, even though this is not a very complicated fix. Okay. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17190#issuecomment-1880655516 From thartmann at openjdk.org Mon Jan 8 09:45:49 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 8 Jan 2024 09:45:49 GMT Subject: [jdk22] RFR: 8310844: [AArch64] C1 compilation fails because monitor offset in OSR buffer is too large for immediate Message-ID: Hi all, This pull request contains a backport of commit [ade21a96](https://github.com/openjdk/jdk/commit/ade21a965f8a5fc889cd48bba76fad507bdeddf5) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. The commit being backported was authored by Tobias Hartmann on 5 Jan 2024 and was reviewed by Andrew Haley and Christian Hagedorn. Thanks! ------------- Commit messages: - Backport ade21a965f8a5fc889cd48bba76fad507bdeddf5 Changes: https://git.openjdk.org/jdk22/pull/38/files Webrev: https://webrevs.openjdk.org/?repo=jdk22&pr=38&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8310844 Stats: 149 lines in 2 files changed: 147 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk22/pull/38.diff Fetch: git fetch https://git.openjdk.org/jdk22.git pull/38/head:pull/38 PR: https://git.openjdk.org/jdk22/pull/38 From chagedorn at openjdk.org Mon Jan 8 09:45:49 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 8 Jan 2024 09:45:49 GMT Subject: [jdk22] RFR: 8310844: [AArch64] C1 compilation fails because monitor offset in OSR buffer is too large for immediate In-Reply-To: References: Message-ID: <6RC5XTDso8lKKf66R1NODDilXgqJ0mpO_08yXAmFJuw=.51a89b78-6aed-4d65-bd2b-f5e40145db61@github.com> On Mon, 8 Jan 2024 09:36:25 GMT, Tobias Hartmann wrote: > Hi all, > > This pull request contains a backport of commit [ade21a96](https://github.com/openjdk/jdk/commit/ade21a965f8a5fc889cd48bba76fad507bdeddf5) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Tobias Hartmann on 5 Jan 2024 and was reviewed by Andrew Haley and Christian Hagedorn. > > Thanks! Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk22/pull/38#pullrequestreview-1808522531 From thartmann at openjdk.org Mon Jan 8 09:45:50 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 8 Jan 2024 09:45:50 GMT Subject: [jdk22] RFR: 8310844: [AArch64] C1 compilation fails because monitor offset in OSR buffer is too large for immediate In-Reply-To: References: Message-ID: On Mon, 8 Jan 2024 09:36:25 GMT, Tobias Hartmann wrote: > Hi all, > > This pull request contains a backport of commit [ade21a96](https://github.com/openjdk/jdk/commit/ade21a965f8a5fc889cd48bba76fad507bdeddf5) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Tobias Hartmann on 5 Jan 2024 and was reviewed by Andrew Haley and Christian Hagedorn. > > Thanks! Thanks, Christian! ------------- PR Comment: https://git.openjdk.org/jdk22/pull/38#issuecomment-1880663155 From kbarrett at openjdk.org Mon Jan 8 09:47:49 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 8 Jan 2024 09:47:49 GMT Subject: RFR: 8282365: Consolidate and improve division by constant idealizations [v42] In-Reply-To: References: Message-ID: On Mon, 8 Jan 2024 09:25:53 GMT, Stefan Karlsson wrote: >> I was told that `unittest.hpp` should come last so this is the order, I have added a line between JDK header and stdlib header as well as resolved your other comments. Thanks a lot. > > The rules around the includes lines in our tests and what we currently have in the tests are messy at the movement. We should fix that when we find the time to. > > For HotSpot source code files the includes should be structured as:: > > hotspot includes > blank line > system includes > > > There are some deviations from that, but those should be cleaned up instead of used as a precedence. For our tests we should add "unittest.hpp" at the end of the "hotspot includes" section. In the Oracle-internal discussion of include order from about a year ago, there was not a consensus decision about the position of "unittest.hpp". There was a concern that in some cases it really was required to be last for some technical reason. That needed (and still needs) investigation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1444358649 From shade at openjdk.org Mon Jan 8 09:53:22 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 8 Jan 2024 09:53:22 GMT Subject: RFR: 8323115: x86-32: Incorrect predicates for cmov instruct transforms with UseSSE In-Reply-To: <7BcH08lab4xxFOxm0UAfhHKVyBBEhRWtIPEUJuOIlmo=.2c5b0918-19ac-4cb3-b2f5-11131ce73422@github.com> References: <7BcH08lab4xxFOxm0UAfhHKVyBBEhRWtIPEUJuOIlmo=.2c5b0918-19ac-4cb3-b2f5-11131ce73422@github.com> Message-ID: On Mon, 8 Jan 2024 07:55:21 GMT, Kim Barrett wrote: > Please review this change that fixes generation of CMOV by C2 as controlled by > UseSSE. The predicates controlling that generation were using implicit > operator precedence that didn't have the expected grouping. Fixed by adding > parentheses to make the desired grouping explicit. > > Testing: Ran GHA with -Wparentheses enabled along with this and other changes > needed to make that work. Oh, ouch. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17296#pullrequestreview-1808549197 From thartmann at openjdk.org Mon Jan 8 10:09:23 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 8 Jan 2024 10:09:23 GMT Subject: RFR: 8323095: Expand TraceOptoParse block output abbreviations In-Reply-To: References: Message-ID: On Fri, 5 Jan 2024 23:07:12 GMT, Joshua Cao wrote: > `e` -> `exception block` > `lphd` -> `loop head` > > Also removing an unnecessary space. The successor ids have a space before them. > > Examples from `java -Xcomp -XX:+TraceOptoParse -version`: > > > Parsing block #8 at bci [33,39), successors: 9 16(exception block) loop head Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17289#pullrequestreview-1808606516 From thartmann at openjdk.org Mon Jan 8 10:11:21 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 8 Jan 2024 10:11:21 GMT Subject: RFR: 8323065: Unneccesary CodeBlob lookup in CompiledIC::internal_set_ic_destination In-Reply-To: References: Message-ID: On Fri, 5 Jan 2024 14:37:16 GMT, Aleksey Shipilev wrote: > I was looking at hotpath for IC stub cleaning (happens at safepoint), and one obvious thing is that we look-up `CodeBlob` from `call->instruction_address()` only to assert that is compiled one. It used to be protected by `#ifdef ASSERT` before [JDK-8212681](https://bugs.openjdk.org/browse/JDK-8212681), and pulled from it to be used in Mutex in JDK 12: https://hg.openjdk.org/jdk/jdk/rev/d6dc479bcdd3#l15.62 And the Mutex was shortly gone after [JDK-8214257](https://bugs.openjdk.org/browse/JDK-8214257). So we are exposing this code to product binaries since JDK 12. > > This fix reinstates the `ASSERT` block again. There are small improvements (~1..10us) for safepoint cleanup on small ad-hoc tests in release builds on my Mac. But since this whole thing involves looking up things in code cache, it may cost quite a lot. Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17281#pullrequestreview-1808611926 From thartmann at openjdk.org Mon Jan 8 10:19:22 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 8 Jan 2024 10:19:22 GMT Subject: RFR: 8320649: C2: Optimize scoped values [v4] In-Reply-To: References: Message-ID: On Thu, 4 Jan 2024 08:12:22 GMT, Roland Westrelin wrote: >> This change implements C2 optimizations for calls to >> ScopedValue.get(). Indeed, in: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> `v2` can be replaced by `v1` and the second call to `get()` can be >> optimized out. That's true whatever is between the 2 calls unless a >> new mapping for `scopedValue` is created in between (when that happens >> no optimizations is performed for the method being compiled). Hoisting >> a `get()` call out of loop for a loop invariant `scopedValue` should >> also be legal in most cases. >> >> `ScopedValue.get()` is implemented in java code as a 2 step process. A >> cache is attached to the current thread object. If the `ScopedValue` >> object is in the cache then the result from `get()` is read from >> there. Otherwise a slow call is performed that also inserts the >> mapping in the cache. The cache itself is lazily allocated. One >> `ScopedValue` can be hashed to 2 different indexes in the cache. On a >> cache probe, both indexes are checked. As a consequence, the process >> of probing the cache is a multi step process (check if the cache is >> present, check first index, check second index if first index >> failed). If the cache is populated early on, then when the method that >> calls `ScopedValue.get()` is compiled, profile reports the slow path >> as never taken and only the read from the cache is compiled. >> >> To perform the optimizations, I added 3 new node types to C2: >> >> - the pair >> ScopedValueGetHitsInCacheNode/ScopedValueGetLoadFromCacheNode for >> the cache probe >> >> - a cfg node ScopedValueGetResultNode to help locate the result of the >> `get()` call in the IR graph. >> >> In pseudo code, once the nodes are inserted, the code of a `get()` is: >> >> >> hits_in_the_cache = ScopedValueGetHitsInCache(scopedValue) >> if (hits_in_the_cache) { >> res = ScopedValueGetLoadFromCache(hits_in_the_cache); >> } else { >> res = ..; //slow call possibly inlined. Subgraph can be arbitray complex >> } >> res = ScopedValueGetResult(res) >> >> >> In the snippet: >> >> >> v1 = scopedValue.get(); >> ... >> v2 = scopedValue.get(); >> >> >> Replacing `v2` by `v1` is then done by starting from the >> `ScopedValueGetResult` node for the second `get()` and looking for a >> dominating `ScopedValueGetResult` for the same `ScopedValue` >> object. When one is found, it is used as a replacement. Eliminating >> the second `get()` call is achieved by making >> `ScopedValueGetHitsInCache` always successful if there's a dominating >> `Scoped... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > merge fix Tests all pass now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16966#issuecomment-1880715729 From qamai at openjdk.org Mon Jan 8 10:23:22 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 8 Jan 2024 10:23:22 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v3] In-Reply-To: References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Message-ID: <6ipaD7eRW4J37zaeFEKVf2LUVE3C0LmZmoAeePCG2PE=.7bb8ff9a-638e-4e7f-bea2-a40a424004f0@github.com> On Mon, 8 Jan 2024 06:06:22 GMT, Jatin Bhateja wrote: >> Thanks for the updates! >> >> One more idea: Your AVX2 solution has a lot of cost for converting the mask to a permutation. Might it make sense to split this off into a separate vector-node, so that it can float out of a loop if the mask is invariant? > >> Thanks for the updates! >> >> One more idea: Your AVX2 solution has a lot of cost for converting the mask to a permutation. Might it make sense to split this off into a separate vector-node, so that it can float out of a loop if the mask is invariant? > > CompressV / ExpandV only accepts two inputs, vector to be operated on and mask under which operation is performed, permute table based implementation is specific to x86 backend implementation. @jatin-bhateja I think you can expand them in the matcher into several `MachNode`s that will get scheduled separately. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17261#issuecomment-1880724248 From shade at openjdk.org Mon Jan 8 10:29:30 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 8 Jan 2024 10:29:30 GMT Subject: RFR: 8323065: Unneccesary CodeBlob lookup in CompiledIC::internal_set_ic_destination In-Reply-To: References: Message-ID: On Fri, 5 Jan 2024 14:37:16 GMT, Aleksey Shipilev wrote: > I was looking at hotpath for IC stub cleaning (happens at safepoint), and one obvious thing is that we look-up `CodeBlob` from `call->instruction_address()` only to assert that is compiled one. It used to be protected by `#ifdef ASSERT` before [JDK-8212681](https://bugs.openjdk.org/browse/JDK-8212681), and pulled from it to be used in Mutex in JDK 12: https://hg.openjdk.org/jdk/jdk/rev/d6dc479bcdd3#l15.62 And the Mutex was shortly gone after [JDK-8214257](https://bugs.openjdk.org/browse/JDK-8214257). So we are exposing this code to product binaries since JDK 12. > > This fix reinstates the `ASSERT` block again. There are small improvements (~1..10us) for safepoint cleanup on small ad-hoc tests in release builds on my Mac. But since this whole thing involves looking up things in code cache, it may cost quite a lot. Thanks! I am going to integrate it then. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17281#issuecomment-1880732904 From shade at openjdk.org Mon Jan 8 10:29:31 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 8 Jan 2024 10:29:31 GMT Subject: Integrated: 8323065: Unneccesary CodeBlob lookup in CompiledIC::internal_set_ic_destination In-Reply-To: References: Message-ID: On Fri, 5 Jan 2024 14:37:16 GMT, Aleksey Shipilev wrote: > I was looking at hotpath for IC stub cleaning (happens at safepoint), and one obvious thing is that we look-up `CodeBlob` from `call->instruction_address()` only to assert that is compiled one. It used to be protected by `#ifdef ASSERT` before [JDK-8212681](https://bugs.openjdk.org/browse/JDK-8212681), and pulled from it to be used in Mutex in JDK 12: https://hg.openjdk.org/jdk/jdk/rev/d6dc479bcdd3#l15.62 And the Mutex was shortly gone after [JDK-8214257](https://bugs.openjdk.org/browse/JDK-8214257). So we are exposing this code to product binaries since JDK 12. > > This fix reinstates the `ASSERT` block again. There are small improvements (~1..10us) for safepoint cleanup on small ad-hoc tests in release builds on my Mac. But since this whole thing involves looking up things in code cache, it may cost quite a lot. This pull request has now been integrated. Changeset: eb9e754b Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/eb9e754b3a439cc3ce36c2c9393bc8b250343844 Stats: 4 lines in 1 file changed: 3 ins; 1 del; 0 mod 8323065: Unneccesary CodeBlob lookup in CompiledIC::internal_set_ic_destination Reviewed-by: dlong, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/17281 From thartmann at openjdk.org Mon Jan 8 10:32:21 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 8 Jan 2024 10:32:21 GMT Subject: RFR: 8323115: x86-32: Incorrect predicates for cmov instruct transforms with UseSSE In-Reply-To: <7BcH08lab4xxFOxm0UAfhHKVyBBEhRWtIPEUJuOIlmo=.2c5b0918-19ac-4cb3-b2f5-11131ce73422@github.com> References: <7BcH08lab4xxFOxm0UAfhHKVyBBEhRWtIPEUJuOIlmo=.2c5b0918-19ac-4cb3-b2f5-11131ce73422@github.com> Message-ID: On Mon, 8 Jan 2024 07:55:21 GMT, Kim Barrett wrote: > Please review this change that fixes generation of CMOV by C2 as controlled by > UseSSE. The predicates controlling that generation were using implicit > operator precedence that didn't have the expected grouping. Fixed by adding > parentheses to make the desired grouping explicit. > > Testing: Ran GHA with -Wparentheses enabled along with this and other changes > needed to make that work. The fix looks good to me but it's concerning that we never hit this in testing. Maybe it never fails because the expanded instructions are guarded by `predicate (UseSSE ...` as well. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17296#pullrequestreview-1808685025 From epeter at openjdk.org Mon Jan 8 10:36:22 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 8 Jan 2024 10:36:22 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v4] In-Reply-To: References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Message-ID: <7x_AB9EVEuOwt5SldzxWgEKIqDG3ovw6ngBCjL4XKzU=.c8c79b8a-3023-42f5-b8d6-9ed6183d97f8@github.com> On Mon, 8 Jan 2024 06:23:46 GMT, Jatin Bhateja wrote: >> Hi, >> >> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only targets. >> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 instruction set. >> These are very frequently used APIs in columnar database filter operation. >> >> Implementation uses a lookup table to record permute indices. Table index is computed using >> mask argument of compress/expand operation. >> >> Following are the performance number of JMH micro included with the patch. >> >> >> System : Intel(R) Xeon(R) Platinum 8480+ (Sapphire Rapids) >> >> Baseline: >> Benchmark (size) Mode Cnt Score Error Units >> ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 142.767 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 71.436 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 35.992 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 182.151 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 91.096 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 44.757 ops/ms >> ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 184.099 ops/ms >> ColumnFilterBenchmark.filterIntColumn 2047 thrpt 2 91.981 ops/ms >> ColumnFilterBenchmark.filterIntColumn 4096 thrpt 2 45.170 ops/ms >> ColumnFilterBenchmark.filterLongColumn 1024 thrpt 2 148.017 ops/ms >> ColumnFilterBenchmark.filterLongColumn 2047 thrpt 2 73.516 ops/ms >> ColumnFilterBenchmark.filterLongColumn 4096 thrpt 2 36.844 ops/ms >> >> Withopt: >> Benchmark (size) Mode Cnt Score Error Units >> ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 2051.707 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 914.072 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 489.898 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 5324.195 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 2587.229 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 1278.665 ops/ms >> ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 4149.384 ops/ms >> ColumnFilterBenchmark.filterIntColumn 2047 thrpt ... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review suggestions incorporated. Exactly, like @merykitty suggests: you can do a platform-dependent expansion. ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17261#pullrequestreview-1808218664 From epeter at openjdk.org Mon Jan 8 10:36:23 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 8 Jan 2024 10:36:23 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2] In-Reply-To: References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Message-ID: <_IaxcZYOfUasnC-VujuwT4nFF3KVdEcKU2Pt92o5UO8=.bc760b2f-8ce9-4021-beb8-bfb19827cce3@github.com> On Fri, 5 Jan 2024 09:35:34 GMT, Emanuel Peter wrote: >> Thanks for the comment addition! > > Improvement suggestion: > For a vector with 8 ints, we get `2^8 = 256` many bit patterns for the mask. The table has a row for each `mask` value, consisting of 8 ints, which provide the valid permute index corresponding to set bit position in the `mask`, or a -1 (default) value. @jatin-bhateja thanks for the update! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1444256359 From epeter at openjdk.org Mon Jan 8 10:36:25 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 8 Jan 2024 10:36:25 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2] In-Reply-To: <1GHGK7AGinCMKjFIB5oadUP0jiZrC39Z0hncAS3H-9Y=.eb617125-1f51-4cff-889e-b15321f5c72b@github.com> References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> <1GHGK7AGinCMKjFIB5oadUP0jiZrC39Z0hncAS3H-9Y=.eb617125-1f51-4cff-889e-b15321f5c72b@github.com> Message-ID: On Mon, 8 Jan 2024 06:06:20 GMT, Jatin Bhateja wrote: >> You are using `VectorMask pred = VectorMask.fromLong(ispecies, maskctr++);`. >> That basically systematically iterates over all masks, which is nice for a correctness test. >> But that would use different density inside one test run, right? The average over the loop is still at `50%`, correct? >> >> I was thinking more a run where the percentage over the whole loop is lower than maybe `1%`. That would get us to a point where maybe the branch prediction of non-vectorized code might be faster, what do you think? > >> You are using `VectorMask pred = VectorMask.fromLong(ispecies, maskctr++);`. That basically systematically iterates over all masks, which is nice for a correctness test. But that would use different density inside one test run, right? The average over the loop is still at `50%`, correct? >> >> I was thinking more a run where the percentage over the whole loop is lower than maybe `1%`. That would get us to a point where maybe the branch prediction of non-vectorized code might be faster, what do you think? > > An imperative loop for compression will check each mask bit to select compressible lane. Therefore mask with low or high density of set bits should show similar performance. Yes, IF it is vectorized, then there is no difference between high and low density. My concern was more if vectorization is preferrable over the scalar alternative in the low-density case, where branch prediction is more stable. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1444257535 From fyang at openjdk.org Mon Jan 8 10:43:23 2024 From: fyang at openjdk.org (Fei Yang) Date: Mon, 8 Jan 2024 10:43:23 GMT Subject: RFR: 8322790: RISC-V: Tune costs for shuffles with no conversion [v2] In-Reply-To: References: Message-ID: On Sat, 6 Jan 2024 16:27:41 GMT, Ilya Gavrilin wrote: >> Hi all, please review this small change to RISC-V nodes insertion costs. >> Now we have several nodes which provide shuffles without conversion: https://github.com/openjdk/jdk/blob/32d80e2caf6063b58128bd5f3dc87b276f3bd0cb/src/hotspot/cpu/riscv/riscv.ad#L8525-L8741 >> On most RISC-V cpu`s we prefer reg<->reg operations, because they are faster, but now stack<->reg operations used (for details about reasons, please, visit connected jbs issue). >> After changing insertion costs reg<->reg operations selected, and we can see performance improvements for benchmarks, which use such shuffles (tested on thead C910 board): >> | Benchmark | Upstream build (ops/ms) | Patched build (ops/ms) | difference (%) | >> |:-----------------------------------:|:-----------------------:|:----------------------:|:--------------:| >> | MathBench.doubleToRawLongBitsDouble | 30935.139 | 32171.761 | +4.00 | >> | StrictMathBench.ceilDouble | 24682.810 | 29782.050 | +20.66 | >> | StrictMathBench.cosDouble | 6948.309 | 6938.276 | -0.14 | >> | StrictMathBench.expDouble | 6816.143 | 7211.021 | +5.79 | >> | StrictMathBench.floorDouble | 30699.630 | 34189.509 | +11.37 | >> | StrictMathBench.maxDouble | 35157.355 | 34675.191 | -1.37 | >> | StrictMathBench.minDouble | 35192.135 | 35183.015 | -0.03 | >> | StrictMathBench.sinDouble | 6698.405 | 6721.809 | +0.35 | >> >> New benchmark for changed nodes: >> >> --- a/test/micro/org/openjdk/bench/java/lang/MathBench.java >> +++ b/test/micro/org/openjdk/bench/java/lang/MathBench.java >> @@ -540,4 +540,11 @@ public class MathBench { >> return Math.ulp(float7); >> } >> >> + @Benchmark >> + public long doubleToRawLongBitsDouble() { >> + double dbl162Dot5 = double81 * 2.0d + double0Dot5; >> + double dbl3 = double2 + double1; >> + return Double.doubleToRawLongBits(dbl162Dot5) + Double.doubleToRawLongBits(dbl3); >> + } >> + > > Ilya Gavrilin has updated the pull request incrementally with one additional commit since the last revision: > > Revert some costs changes Marked as reviewed by fyang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/17206#pullrequestreview-1808723779 From stuefe at openjdk.org Mon Jan 8 11:24:51 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 8 Jan 2024 11:24:51 GMT Subject: RFR: JDK-8318444: Write details about compilation bailouts into crash reports [v6] In-Reply-To: References: Message-ID: > A little debugging aid to help analyze broken bailout chains, mainly in C2 (C1 is pretty clean). > > A broken bailout chain occurs when code marks a compilation as failed, but then either that function itself or any of its caller functions fails to abort the compilation. That may cause crashes, e.g. [JDK-8318183](https://bugs.openjdk.org/browse/JDK-8318183) or [JDK-8318445](https://bugs.openjdk.org/browse/JDK-8318445). > > Now, if the compiler initiates a bailout, it stores some context information - compile id, time, and call stack. That information is stored as part of `Compile` or `Compilation`, depending on the compiler. > > If we crash later during the same compilation, we print out that information as part of the crash report. That way, we have two call stacks, and it is easy to spot where the compiler failed to heed the bailout. > > --------- > > Looks like this (from https://github.com/openjdk/jdk/pull/16248). The first call stack is the crash point. The second call stack is the point where the compiler bailout was initiated. > > > Current CompileTask: > C2:2574 45 45 843 4 sun.nio.fs.UnixPath::resolve (17 bytes) > > Stack: [0x00007fa608cb3000,0x00007fa608db4000], sp=0x00007fa608daf310, free space=1008k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0x631bb4] Unique_Node_List::push(Node*)+0x20 (node.hpp:1650) > V [libjvm.so+0xb8ea65] ConnectionGraph::verify_ram_nodes(Compile*, Node*)+0x87 (escape.cpp:743) > V [libjvm.so+0x960dda] Compile::Optimize()+0x956 (compile.cpp:2361) > V [libjvm.so+0x959d6c] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x165e (compile.cpp:860) > V [libjvm.so+0x81bcd9] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x203 (c2compiler.cpp:134) > V [libjvm.so+0x97bf63] CompileBroker::invoke_compiler_on_method(CompileTask*)+0xac5 (compileBroker.cpp:2290) > V [libjvm.so+0x97a981] CompileBroker::compiler_thread_loop()+0x411 (compileBroker.cpp:1951) > V [libjvm.so+0x99ebc0] CompilerThread::thread_entry(JavaThread*, JavaThread*)+0x84 (compilerThread.cpp:61) > V [libjvm.so+0xde0050] JavaThread::thread_main_inner()+0x15c (javaThread.cpp:720) > V [libjvm.so+0xddfeea] JavaThread::run()+0x258 (javaThread.cpp:705) > V [libjvm.so+0x15f5a04] Thread::call_run()+0x1a8 (thread.cpp:220) > V [libjvm.so+0x12de0a2] thread_native_entry(Thread*)+0x1c3 (os_linux.cpp:785) > > siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x000000000000002... Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: - Merge branch 'openjdk:master' into JDK-8318444-Write-details-about-compilation-bailouts-into-crash-reports - Merge branch 'master' into JDK-8318444-Write-details-about-compilation-bailouts-into-crash-reports - Feedback Christian - Merge branch 'master' into JDK-8318444-Write-details-about-compilation-bailouts-into-crash-reports - Update src/hotspot/share/compiler/compilationFailureInfo.hpp Co-authored-by: Tobias Hartmann - Update src/hotspot/share/utilities/vmError.cpp Co-authored-by: Tobias Hartmann - reinstate elapsed time prefix in hs-err file - Merge branch 'openjdk:master' into JDK-8318444-Write-details-about-compilation-bailouts-into-crash-reports - wip - wip - ... and 3 more: https://git.openjdk.org/jdk/compare/eb9e754b...06f157c4 ------------- Changes: https://git.openjdk.org/jdk/pull/16247/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16247&range=05 Stats: 236 lines in 11 files changed: 224 ins; 5 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/16247.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16247/head:pull/16247 PR: https://git.openjdk.org/jdk/pull/16247 From thartmann at openjdk.org Mon Jan 8 11:39:26 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 8 Jan 2024 11:39:26 GMT Subject: [jdk22] Integrated: 8310844: [AArch64] C1 compilation fails because monitor offset in OSR buffer is too large for immediate In-Reply-To: References: Message-ID: On Mon, 8 Jan 2024 09:36:25 GMT, Tobias Hartmann wrote: > Hi all, > > This pull request contains a backport of commit [ade21a96](https://github.com/openjdk/jdk/commit/ade21a965f8a5fc889cd48bba76fad507bdeddf5) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Tobias Hartmann on 5 Jan 2024 and was reviewed by Andrew Haley and Christian Hagedorn. > > Thanks! This pull request has now been integrated. Changeset: 0442d772 Author: Tobias Hartmann URL: https://git.openjdk.org/jdk22/commit/0442d772b0eb253aebf8638eb966957ab2b694c2 Stats: 149 lines in 2 files changed: 147 ins; 0 del; 2 mod 8310844: [AArch64] C1 compilation fails because monitor offset in OSR buffer is too large for immediate Reviewed-by: chagedorn Backport-of: ade21a965f8a5fc889cd48bba76fad507bdeddf5 ------------- PR: https://git.openjdk.org/jdk22/pull/38 From shade at openjdk.org Mon Jan 8 11:45:25 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 8 Jan 2024 11:45:25 GMT Subject: RFR: 8320128: Clean up Parse constructor for OSR [v4] In-Reply-To: <-QQZANSu7PVpZ3CUX7kbp00f_xkhjiy7la3Z3gXutKA=.a1c938e4-12e0-44b8-a2dd-7cc0c0c9f09f@github.com> References: <0mxhhLR9DDdpiBhbXWPxCmMzo_OI9DYiIX36xGhxC_c=.1eed99b9-2dc3-4403-b3c1-22b6ebed6079@github.com> <-QQZANSu7PVpZ3CUX7kbp00f_xkhjiy7la3Z3gXutKA=.a1c938e4-12e0-44b8-a2dd-7cc0c0c9f09f@github.com> Message-ID: On Thu, 4 Jan 2024 17:06:38 GMT, Xin Liu wrote: >> There's a special case for the constructor of Parse. If current compilation is OSR and it is handling the top-level method(depth() == 1), then >> >> 1. _tf = C->tf(); >> 2. _entry_bci = C->entry_bci(); >> 3. _flow = method()->get_osr_flow_analysis(_entry_bci); >> >> We don't need to assign those member data twice. We can also factor out _flow->failing() for the special case and normal cases. >> >> It's worth mentioning that we can't save ciTypeFlow computation because >> get_osr_flow_analysis(_entry_bci) actually needs get_flow_analysis(method()). > > Xin Liu has updated the pull request incrementally with one additional commit since the last revision: > > Use print_cr for the log message. All right then, I think we are good to go. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16669#issuecomment-1880843848 From tholenstein at openjdk.org Mon Jan 8 11:48:32 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 8 Jan 2024 11:48:32 GMT Subject: RFR: JDK-8277869: Maven POMs are using HTTP links where HTTPS is available Message-ID: Replace `http` with `https` in xml files of IdealGraphVisualizer and LogCompilation. Moreover, replace the old URL (./maven-v4_0_0.xsd) with of the suggested one (./maven-4.0.0.xsd) for `xsi:schemaLocation` in pom.xml. Tested: IdealGraphVisualizer and LogCompilation build and run as expected. ------------- Commit messages: - replace http:// with https:// in IdealGraphVisualizer - LogCompilation use https and maven-4.0.0.xsd in pom.xml Changes: https://git.openjdk.org/jdk/pull/17302/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17302&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8277869 Stats: 43 lines in 40 files changed: 1 ins; 1 del; 41 mod Patch: https://git.openjdk.org/jdk/pull/17302.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17302/head:pull/17302 PR: https://git.openjdk.org/jdk/pull/17302 From shade at openjdk.org Mon Jan 8 11:50:22 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 8 Jan 2024 11:50:22 GMT Subject: RFR: 8322759: Eliminate -Wparentheses warnings in compiler code [v2] In-Reply-To: <-w16Kse74yx2EiWCorBtcKf1KXA1Rh5q-6Ze2T_qors=.06ead22b-ec5a-4859-888c-f0e3a283d7f3@github.com> References: <496tGkQ1KUCrW1IHOETyvhqopkNYjsEoupxjo0Ze3Wg=.0223f494-fd34-4e4a-a31a-5030603f2113@github.com> <-w16Kse74yx2EiWCorBtcKf1KXA1Rh5q-6Ze2T_qors=.06ead22b-ec5a-4859-888c-f0e3a283d7f3@github.com> Message-ID: On Mon, 8 Jan 2024 05:37:45 GMT, Kim Barrett wrote: >> Please review this change to eliminate some -Wparentheses warnings. This >> involved simply adding a few parentheses to make some implicit operator >> precedence explicit. >> >> This change addresses non-C2 parts of the compiler component. >> >> Testing: mach5 tier1 >> >> Also ran mach5 tier1 with these changes in conjunction enabling -Wparentheses >> and other changes needed to make that work. > > Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into compiler-wparentheses > - simplify asserts > - update copyrights for new year > - fix -Wparentheses warnings in non-C2 compiler code Looks reasonable, thanks! ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17200#pullrequestreview-1808919758 From shade at openjdk.org Mon Jan 8 11:55:24 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 8 Jan 2024 11:55:24 GMT Subject: RFR: 8322982: CTW fails to build after 8308753 In-Reply-To: References: Message-ID: On Mon, 8 Jan 2024 03:19:52 GMT, Xin Liu wrote: > This patch fixes the build error of CTW by sidelining ModuleInfoWriter.java. ModuleInfoWriter uses Class-File API, which has transitioned to preview. > If we really need to compile it, we have to append --enable-preview and --source N. > > The fact is CTW itself doesn't depend on ModuleInfoWriter. I think it's easier to maintain CTW if we filter it out in Makefile. test/hotspot/jtreg/testlibrary/ctw/Makefile line 50: > 48: $(TESTLIBRARY_DIR)/jtreg \ > 49: -maxdepth 1 -name '*.java') > 50: LIB_FILES=$(filter-out %ModuleInfoWriter.java, $(LIB_FILES_ORIG)) Looks reasonable, but I think you can chain these without introducing new variables: LIB_FILES = $(filter-out %ModuleInfoWriter.java, \ $(shell find $(TESTLIBRARY_DIR)/jdk/test/lib/ \ $(TESTLIBRARY_DIR)/jdk/test/lib/process \ $(TESTLIBRARY_DIR)/jdk/test/lib/util \ $(TESTLIBRARY_DIR)/jtreg \ -maxdepth 1 -name '*.java')) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17292#discussion_r1444501336 From shade at openjdk.org Mon Jan 8 12:13:21 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 8 Jan 2024 12:13:21 GMT Subject: RFR: 8323115: x86-32: Incorrect predicates for cmov instruct transforms with UseSSE In-Reply-To: References: <7BcH08lab4xxFOxm0UAfhHKVyBBEhRWtIPEUJuOIlmo=.2c5b0918-19ac-4cb3-b2f5-11131ce73422@github.com> Message-ID: On Mon, 8 Jan 2024 10:29:35 GMT, Tobias Hartmann wrote: > The fix looks good to me but it's concerning that we never hit this in testing. Maybe it never fails because the expanded instructions are guarded by `predicate (UseSSE ...` as well. In my experience fixing bugs in these FPU-related match rules is that it takes a combination of code shape and relevant hardware (that defaults for unusual `UseSSE <= 2`), or specific testing that runs with lower `UseSSE`. I think I was one of the few remaining people who ran x86_32 with `-XX:UseSSE=0`, for example, but finally stopped. I think going forward we would just need to require `UseSSE >= 2` for x86_32, like for x86_64, making these issues go away. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17296#issuecomment-1880883043 From chagedorn at openjdk.org Mon Jan 8 13:00:34 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 8 Jan 2024 13:00:34 GMT Subject: RFR: 8310711: [IR Framework] Remove safepoint while printing handling [v2] In-Reply-To: References: <278BZfCuvI5xJSWh2PvZtONVaZ2QjxkWKj1NifCbFYE=.af0a47ab-63a6-47bd-953b-4b0756107227@github.com> Message-ID: On Mon, 8 Jan 2024 09:30:53 GMT, Tobias Hartmann wrote: >> Christian Hagedorn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Update copyright year >> - Merge branch 'master' into JDK-8310711 >> - 8310711: [IR Framework] Remove safepoint while printing handling > > Marked as reviewed by thartmann (Reviewer). Thanks for the re-review @TobiHartmann @eme64! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16921#issuecomment-1880957797 From chagedorn at openjdk.org Mon Jan 8 13:00:36 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 8 Jan 2024 13:00:36 GMT Subject: Integrated: 8310711: [IR Framework] Remove safepoint while printing handling In-Reply-To: References: Message-ID: On Fri, 1 Dec 2023 12:47:48 GMT, Christian Hagedorn wrote: > This clean-up PR removes the handling of the `` message in the IR framework. It is no longer required since we dump the output of `PrintIdeal` to the hotspot_pid file differently since [JDK-8306922](https://bugs.openjdk.org/browse/JDK-8306922). There is no interrupting `` message anymore. I removed the corresponding now unneeded code together with the previously added test case for it. > > Testing: tier1-4 > > Thanks, > Christian This pull request has now been integrated. Changeset: 458e563c Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/458e563cd994f5e0f590c2144e8ed35d020d53d6 Stats: 461 lines in 6 files changed: 0 ins; 457 del; 4 mod 8310711: [IR Framework] Remove safepoint while printing handling Reviewed-by: thartmann, epeter ------------- PR: https://git.openjdk.org/jdk/pull/16921 From jvernee at openjdk.org Mon Jan 8 13:45:31 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 8 Jan 2024 13:45:31 GMT Subject: RFR: 8320310: CompiledMethod::has_monitors flag can be incorrect [v4] In-Reply-To: References: <4efExybeWDkEbcsckI1Qdz8kpYFqd-Rbmt7oiWz5qlo=.d8d38d0e-affa-48dc-b963-45f958041c4e@github.com> Message-ID: On Mon, 11 Dec 2023 18:38:55 GMT, Jorn Vernee wrote: >> Currently, the `CompiledMethod::has_monitors` flag is set when either a `monitorenter` is parsed by C1, and `monitorexit` is parsed by C1 or C2 during method compilation. However, not necessarily every bytecode of a method is parsed, which means that we could miss all `monitorenter`/`monitorexit` byte codes in a method, while it actually does use monitors. This can lead to situations where a thread holds a monitor, but `has_monitors` for all frames is set to `false`, leading to an assertion failure in 'freeze_internal' in continuationFreezeThaw.cpp: >> >> assert(monitors_on_stack(current) == ((current->held_monitor_count() - current->jni_monitor_count()) > 0), >> "Held monitor count and locks on stack invariant: " INT64_FORMAT " JNI: " INT64_FORMAT, (int64_t)current->held_monitor_count(), (int64_t)current->jni_monitor_count()); >> >> The proposed fix is to rely on `Method::has_monitor_bytecodes` to set the `has_monitors` flag when compiling, which is immune to issues where not all byte codes of a method are parsed during compilation. We can follow the pattern established for `has_reserved_stack_access`, which is similar. >> >> Note that this PR is based on: https://github.com/openjdk/jdk/pull/16416 which disables the assertion. The goal of this PR is to fix the issue, and then re-enable the assertion. >> >> Testing: Tier 1-4, `java/lang/Thread/virtual/stress/PinALot.java` > > Jorn Vernee has updated the pull request incrementally with one additional commit since the last revision: > > re-enable assert again Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16799#issuecomment-1881031735 From stuefe at openjdk.org Mon Jan 8 13:50:39 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 8 Jan 2024 13:50:39 GMT Subject: Integrated: JDK-8318444: Write details about compilation bailouts into crash reports In-Reply-To: References: Message-ID: On Wed, 18 Oct 2023 13:32:32 GMT, Thomas Stuefe wrote: > A little debugging aid to help analyze broken bailout chains, mainly in C2 (C1 is pretty clean). > > A broken bailout chain occurs when code marks a compilation as failed, but then either that function itself or any of its caller functions fails to abort the compilation. That may cause crashes, e.g. [JDK-8318183](https://bugs.openjdk.org/browse/JDK-8318183) or [JDK-8318445](https://bugs.openjdk.org/browse/JDK-8318445). > > Now, if the compiler initiates a bailout, it stores some context information - compile id, time, and call stack. That information is stored as part of `Compile` or `Compilation`, depending on the compiler. > > If we crash later during the same compilation, we print out that information as part of the crash report. That way, we have two call stacks, and it is easy to spot where the compiler failed to heed the bailout. > > --------- > > Looks like this (from https://github.com/openjdk/jdk/pull/16248). The first call stack is the crash point. The second call stack is the point where the compiler bailout was initiated. > > > Current CompileTask: > C2:2574 45 45 843 4 sun.nio.fs.UnixPath::resolve (17 bytes) > > Stack: [0x00007fa608cb3000,0x00007fa608db4000], sp=0x00007fa608daf310, free space=1008k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0x631bb4] Unique_Node_List::push(Node*)+0x20 (node.hpp:1650) > V [libjvm.so+0xb8ea65] ConnectionGraph::verify_ram_nodes(Compile*, Node*)+0x87 (escape.cpp:743) > V [libjvm.so+0x960dda] Compile::Optimize()+0x956 (compile.cpp:2361) > V [libjvm.so+0x959d6c] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x165e (compile.cpp:860) > V [libjvm.so+0x81bcd9] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x203 (c2compiler.cpp:134) > V [libjvm.so+0x97bf63] CompileBroker::invoke_compiler_on_method(CompileTask*)+0xac5 (compileBroker.cpp:2290) > V [libjvm.so+0x97a981] CompileBroker::compiler_thread_loop()+0x411 (compileBroker.cpp:1951) > V [libjvm.so+0x99ebc0] CompilerThread::thread_entry(JavaThread*, JavaThread*)+0x84 (compilerThread.cpp:61) > V [libjvm.so+0xde0050] JavaThread::thread_main_inner()+0x15c (javaThread.cpp:720) > V [libjvm.so+0xddfeea] JavaThread::run()+0x258 (javaThread.cpp:705) > V [libjvm.so+0x15f5a04] Thread::call_run()+0x1a8 (thread.cpp:220) > V [libjvm.so+0x12de0a2] thread_native_entry(Thread*)+0x1c3 (os_linux.cpp:785) > > siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x000000000000002... This pull request has now been integrated. Changeset: c90768c9 Author: Thomas Stuefe URL: https://git.openjdk.org/jdk/commit/c90768c93b26771bb8f4bdbe855d054ad089b337 Stats: 236 lines in 11 files changed: 224 ins; 5 del; 7 mod 8318444: Write details about compilation bailouts into crash reports Reviewed-by: thartmann, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/16247 From chagedorn at openjdk.org Mon Jan 8 14:41:47 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 8 Jan 2024 14:41:47 GMT Subject: RFR: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs [v59] In-Reply-To: References: Message-ID: <7koVi33WJ_-H7Z69jVF3kWIFB8cGzKlLW2chbzv2WZc=.cfdfda1c-1c7c-446a-bb9e-4df94643a566@github.com> On Thu, 4 Jan 2024 07:00:48 GMT, Emanuel Peter wrote: >> I want to push this in JDK23. >> After this fix here, I'm doing [this refactoring](https://github.com/openjdk/jdk/pull/16620). >> >> To calm your nerves: most of the changes are in auto-generated tests, and tests in general. >> >> **Context** >> >> `-XX:+AlignVector` ensures that SuperWord only creates LoadVector and StoreVector that can be memory aligned. This is achieved by iterating in the pre-loop until we reach the alignment boundary, then we can start the main loop properly aligned. However, this is not possible in all cases, sometimes some memory accesses cannot be guaranteed to be aligned, and we need to reject vectorization (at least partially, for some of the packs). >> >> Alignment is split into two tasks: >> - Alignment Correctness Checks: only relevant if `-XX:+AlignVector`. Need to reject vectorization if alignment is not possible. We must check if the address of the vector load/store is aligned with (divisible by) `ObjectAlignmentInBytes`. >> - Alignment by adjusting pre-loop limit: alignment is desirable even if `-XX:-AlignVector`. We would like to align the vectors with their vector width. >> >> **Problem** >> >> I have recently found a bug with our AlignVector [JDK-8310190](https://bugs.openjdk.org/browse/JDK-8310190). >> In that bug, we perform a misaligned memory vector access, which results in a `SIGBUS` on an ARM32 machine. >> Thanks @fg1417 for confirming this! >> Hence, we need to fix the alignment correctness checks. >> >> While working on this task, I also found some bugs in the "alignment by adjusting pre-loop limit": there were cases where it did not align the vectors correctly. >> >> **Problem Details** >> >> Reproducer: >> >> >> static void test(short[] a, short[] b, short mask) { >> for (int i = 0; i < RANGE; i+=8) { >> // Problematic for AlignVector >> b[i+0] = (short)(a[i+0] & mask); // best_memref, align 0 >> >> b[i+3] = (short)(a[i+3] & mask); // pack at offset 6 bytes >> b[i+4] = (short)(a[i+4] & mask); >> b[i+5] = (short)(a[i+5] & mask); >> b[i+6] = (short)(a[i+6] & mask); >> } >> } >> >> >> During `SuperWord::find_adjacent_refs` we used to check if the references are expected to be aligned. For that, we look at each "group" of references (eg all `LoadS`) and take the reference with the lowest offset. For that chosen reference, we check if it is alignable. If yes, we accept all references of that group, if no we reject all. >> >> This is problemati... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > some minor changes for Vladimir Some last minor comments. Otherwise, looks good! src/hotspot/share/opto/chaitin.cpp line 1794: > 1792: Node* PhaseChaitin::find_base_for_derived(Node** derived_base_map, Node* derived, uint& maxlrg) { > 1793: // See if already computed; if so return it > 1794: if(derived_base_map[derived->_idx]) { Suggestion: if (derived_base_map[derived->_idx]) { src/hotspot/share/opto/superword.cpp line 1620: > 1618: > 1619: const MemNode* mem_ref = pack->at(0)->as_Mem(); > 1620: VPointer mem_ref_p(mem_ref, phase(), lpt(), nullptr, false); Since you renamed `p` -> `pack`, you should also rename this one to pack: Suggestion: VPointer mem_ref_pack(mem_ref, phase(), lpt(), nullptr, false); src/hotspot/share/opto/superword.cpp line 1630: > 1628: mem_ref_p.invar(), > 1629: mem_ref_p.invar_factor(), > 1630: mem_ref_p.scale_in_bytes(), Suggestion: AlignmentSolver solver(pack->at(0)->as_Mem(), pack->size(), mem_ref_pack.base(), mem_ref_pack.offset_in_bytes(), mem_ref_pack.invar(), mem_ref_pack.invar_factor(), mem_ref_pack.scale_in_bytes(), src/hotspot/share/opto/superword.cpp line 1702: > 1700: if (current->is_constrained()) { > 1701: // Solution is constrained (not trivial) > 1702: // -> must change pre-limit to acheive alignment Suggestion: // -> must change pre-limit to achieve alignment src/hotspot/share/opto/vectorization.cpp line 756: > 754: // We describe the 6 terms: > 755: // 1) The "base" of the address is the address of a Java object (e.g. array), > 756: // and as such ObjectAlignmentInBytes (a power of 2) aligned. We have Suggestion: // and as such ObjectAlignmentInBytes (a power of 2) aligned. We have src/hotspot/share/opto/vectorization.cpp line 934: > 932: // > 933: // Hence, pre_iter_C_const has a non-trivial (because x > 1) periodic (periodicity x) > 934: // solution, i.e it has a constrained solution. Suggestion: // solution, i.e. it has a constrained solution. src/hotspot/share/opto/vectorization.cpp line 947: > 945: // (C_const + C_pre * pre_iter_C_const) % aw != 0 > 946: // > 947: // This is in constradiction with (4a), and therefore there cannot be any solution, Suggestion: // This is in contradiction with (4a), and therefore there cannot be any solution, src/hotspot/share/opto/vectorization.cpp line 1038: > 1036: // sign(C_pre) = C_pre / abs(C_pre) = (C_pre > 0) ? 1 : -1, (7) > 1037: // > 1038: // We know that abs(C_pre) as well as aw are a powers of 2, and since (5) we can define integer q: Suggestion: // We know that abs(C_pre) as well as aw are powers of 2, and since (5) we can define integer q: ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14785#pullrequestreview-1809074856 PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1444615477 PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1444624179 PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1444624796 PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1444632226 PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1444680843 PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1444699344 PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1444700628 PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1444711822 From chagedorn at openjdk.org Mon Jan 8 14:44:25 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 8 Jan 2024 14:44:25 GMT Subject: RFR: 8323095: Expand TraceOptoParse block output abbreviations In-Reply-To: References: Message-ID: On Fri, 5 Jan 2024 23:07:12 GMT, Joshua Cao wrote: > `e` -> `exception block` > `lphd` -> `loop head` > > Also removing an unnecessary space. The successor ids have a space before them. > > Examples from `java -Xcomp -XX:+TraceOptoParse -version`: > > > Parsing block #8 at bci [33,39), successors: 9 16(exception block) loop head Looks good to me, too. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17289#pullrequestreview-1809259801 From fgao at openjdk.org Mon Jan 8 14:46:41 2024 From: fgao at openjdk.org (Fei Gao) Date: Mon, 8 Jan 2024 14:46:41 GMT Subject: RFR: 8319690: [AArch64] C2 compilation hits offset_ok_for_immed: assert "c2 compiler bug" In-Reply-To: References: <16J-lJ2AceGTVcRWBcP15yKcwO-1IA1XsngyOuNjf7k=.0776f081-ae2c-4279-87cf-d909806c2bc4@github.com> <4KRCqYxn02wMjYDuN3_HbYWxc9BDtYMNd40bNrJ4K8w=.5362dda4-8d30-48fd-8915-909eb15a6023@github.com> Message-ID: On Sat, 6 Jan 2024 17:44:04 GMT, Andrew Haley wrote: >>> After this change, `immIOffset` and `immLOffset` appear to be obsolete. >> >> Removed them in the new commit. Thanks! > >> @fg1417 what is the state on this? >> >> The example here may look like a strange edge-case. But with my plans this may become more common: [JDK-8318446](https://bugs.openjdk.org/browse/JDK-8318446) C2: implement StoreNode::Ideal_merge_stores >> >> I will merge stores, which can then look like unaligned stores of a larger type. This bug here blocks my progress. I'm not in a hurry, just curious if there is now a plan how to proceed here ;) > > The problem with this PR is that the code is way too complex for such a simple problem. The port is correct as it is, in the release build. > > The only problem is an assertion. We could simply remove that assertion, but if it were me I'd fix the problem properly. Both @dean-long and I have suggested ways to improve this patch with less code. If @fg1417 decides to drop this PR I'll fix it. Sorry, I can't work on this right now. @theRealAph could you help to push the changes please? Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16991#issuecomment-1881144322 From fgao at openjdk.org Mon Jan 8 14:46:42 2024 From: fgao at openjdk.org (Fei Gao) Date: Mon, 8 Jan 2024 14:46:42 GMT Subject: Withdrawn: 8319690: [AArch64] C2 compilation hits offset_ok_for_immed: assert "c2 compiler bug" In-Reply-To: <16J-lJ2AceGTVcRWBcP15yKcwO-1IA1XsngyOuNjf7k=.0776f081-ae2c-4279-87cf-d909806c2bc4@github.com> References: <16J-lJ2AceGTVcRWBcP15yKcwO-1IA1XsngyOuNjf7k=.0776f081-ae2c-4279-87cf-d909806c2bc4@github.com> Message-ID: On Wed, 6 Dec 2023 06:24:59 GMT, Fei Gao wrote: > On LP64 systems, if the heap can be moved into low virtual address space (below 4GB) and the heap size is smaller than the interesting threshold of 4 GB, we can use unscaled decoding pattern for narrow klass decoding. It means that a generic field reference can be decoded by: > > cast<64> (32-bit compressed reference) + field_offset > > > When the `field_offset` is an immediate, on aarch64 platform, the unscaled decoding pattern can match perfectly with a direct addressing mode, i.e., `base_plus_offset`, supported by `LDR/STR` instructions. But for certain data width, not all immediates can be encoded in the instruction field of `LDR/STR` [[1]](https://github.com/openjdk/jdk/blob/8db7bad992a0f31de9c7e00c2657c18670539102/src/hotspot/cpu/aarch64/assembler_aarch64.inline.hpp#L33). The ranges are different as data widths vary. > > For example, when we try to load a value of long type at offset of `1030`, the address expression is `(AddP (DecodeN base) 1030)`. Before the patch, the expression was matching with `operand indOffIN()`. But, for 64-bit `LDR/STR`, signed immediate byte offset must be in the range -256 to 255 or positive immediate byte offset must be a multiple of 8 in the range 0 to 32760 [[2]](https://developer.arm.com/documentation/ddi0602/2023-09/Base-Instructions/LDR--immediate---Load-Register--immediate--?lang=en). `1030` can't be encoded in the instruction field. So, after matching, when we do checking for instruction encoding, the assertion would fail. > > In this patch, we're going to filter out invalid immediates when deciding if current addressing mode can be matched as `base_plus_offset`. We introduce `indOffIN4/indOffLN4` and `indOffIN8/indOffLN8` for 32-bit data type and 64-bit data type separately in the patch. E.g., for `memory4`, we remove the generic `indOffIN/indOffLN`, which matches wrong unscaled immediate range, and replace them with `indOffIN4/indOffLN4` instead. > > Since 8-bit and 16-bit `LDR/STR` instructions also support the unscaled decoding pattern, we add the addressing mode in the lists of `memory1` and `memory2` by introducing `indOffIN1/indOffLN1` and `indOffIN2/indOffLN2`. > > We also remove unused operands `indOffI/indOffl/indOffIN/indOffLN` to avoid misuse. > > Tier 1-3 passed on aarch64. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/16991 From roland at openjdk.org Mon Jan 8 14:49:00 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 8 Jan 2024 14:49:00 GMT Subject: RFR: 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 [v10] In-Reply-To: References: Message-ID: > Range check smearing and range check predication make an array access > dependent on 2 (or more in the case of RC smearing) conditions. As a > consequence, if a range check can be eliminated because there's an > identical dominating range check, the control dependent nodes that > could float and become dependent on the dominating range check cannot > be allowed to float because there's a risk that they would then bypass > one of the checks that make the access legal. > > `IfNode::dominated_by()` and `PhaseIdealLoop::dominated_by()` have > logic to prevent this: nodes that are control dependent on a range > check or predicate are not allowed to float. This is however not > sufficient as demonstrated by the test cases. > > In `TestArrayAccessAboveRCAfterSmearingOrPredication.testRangeCheckSmearing()`: > > > v += array[i]; > if (flag2) { > if (flag3) { > field = 0x42; > } > } > if (flagField == 1) { > v += array[i]; > } > > > The range check for the second `array[i]` load is replaced by the > dominating range check for the first `array[i]` but because the second > `array[i]` load could really be dependent on multiple range checks (in > case smearing happened which is not the case here), c2 doesn't allow > the second `array[i]` to float when the second range check is > removed. The second `array[i]` is then control dependent on: > > > if (flagField == 1) { > > > which is next found to be dominated by the same test: > > > if (flag == 1) { > > > and is removed. However nothing in `dominated_by()` treats node > dependent on tests that are not range check or predicates > specially. So the second `array[i]` is allowed to float and become > dependent on: > > > if (flag == 1) { > > > which is above the range check for that access. The test method in its > last invocation is passed an index for the array access that's widely > out of range. The array load happens before the range check and > crashes the VM. `testLoopPredication()` is a similar test where array > loads become dependent on predicates and end up above range checks. > > `TestArrayAccessCastIIAboveRC.java` is the test case from the bug > where for similar reasons a range check `CastII` ends up above its > range check, becomes top because its input becomes some integer that > conflicts with its type (but there's no condition to catch it). The > graph becomes broken and c2 crashes. > > Logic in the `dominated_by()` methods ... Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16886/files - new: https://git.openjdk.org/jdk/pull/16886/files/dbe3c4c1..2cc6f1d3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16886&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16886&range=08-09 Stats: 64 lines in 10 files changed: 29 ins; 0 del; 35 mod Patch: https://git.openjdk.org/jdk/pull/16886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16886/head:pull/16886 PR: https://git.openjdk.org/jdk/pull/16886 From epeter at openjdk.org Mon Jan 8 14:53:07 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 8 Jan 2024 14:53:07 GMT Subject: RFR: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs [v60] In-Reply-To: References: Message-ID: > I want to push this in JDK23. > After this fix here, I'm doing [this refactoring](https://github.com/openjdk/jdk/pull/16620). > > To calm your nerves: most of the changes are in auto-generated tests, and tests in general. > > **Context** > > `-XX:+AlignVector` ensures that SuperWord only creates LoadVector and StoreVector that can be memory aligned. This is achieved by iterating in the pre-loop until we reach the alignment boundary, then we can start the main loop properly aligned. However, this is not possible in all cases, sometimes some memory accesses cannot be guaranteed to be aligned, and we need to reject vectorization (at least partially, for some of the packs). > > Alignment is split into two tasks: > - Alignment Correctness Checks: only relevant if `-XX:+AlignVector`. Need to reject vectorization if alignment is not possible. We must check if the address of the vector load/store is aligned with (divisible by) `ObjectAlignmentInBytes`. > - Alignment by adjusting pre-loop limit: alignment is desirable even if `-XX:-AlignVector`. We would like to align the vectors with their vector width. > > **Problem** > > I have recently found a bug with our AlignVector [JDK-8310190](https://bugs.openjdk.org/browse/JDK-8310190). > In that bug, we perform a misaligned memory vector access, which results in a `SIGBUS` on an ARM32 machine. > Thanks @fg1417 for confirming this! > Hence, we need to fix the alignment correctness checks. > > While working on this task, I also found some bugs in the "alignment by adjusting pre-loop limit": there were cases where it did not align the vectors correctly. > > **Problem Details** > > Reproducer: > > > static void test(short[] a, short[] b, short mask) { > for (int i = 0; i < RANGE; i+=8) { > // Problematic for AlignVector > b[i+0] = (short)(a[i+0] & mask); // best_memref, align 0 > > b[i+3] = (short)(a[i+3] & mask); // pack at offset 6 bytes > b[i+4] = (short)(a[i+4] & mask); > b[i+5] = (short)(a[i+5] & mask); > b[i+6] = (short)(a[i+6] & mask); > } > } > > > During `SuperWord::find_adjacent_refs` we used to check if the references are expected to be aligned. For that, we look at each "group" of references (eg all `LoadS`) and take the reference with the lowest offset. For that chosen reference, we check if it is alignable. If yes, we accept all references of that group, if no we reject all. > > This is problematic as shown in this example. We have references at index offset `0, 3, 4, 5, 6`, and by... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review Thanks to Christian Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14785/files - new: https://git.openjdk.org/jdk/pull/14785/files/aef48ab4..76630041 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14785&range=59 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14785&range=58-59 Stats: 6 lines in 3 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/14785.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14785/head:pull/14785 PR: https://git.openjdk.org/jdk/pull/14785 From epeter at openjdk.org Mon Jan 8 14:53:08 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 8 Jan 2024 14:53:08 GMT Subject: RFR: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs [v59] In-Reply-To: <7koVi33WJ_-H7Z69jVF3kWIFB8cGzKlLW2chbzv2WZc=.cfdfda1c-1c7c-446a-bb9e-4df94643a566@github.com> References: <7koVi33WJ_-H7Z69jVF3kWIFB8cGzKlLW2chbzv2WZc=.cfdfda1c-1c7c-446a-bb9e-4df94643a566@github.com> Message-ID: On Mon, 8 Jan 2024 13:19:23 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> some minor changes for Vladimir > > src/hotspot/share/opto/superword.cpp line 1620: > >> 1618: >> 1619: const MemNode* mem_ref = pack->at(0)->as_Mem(); >> 1620: VPointer mem_ref_p(mem_ref, phase(), lpt(), nullptr, false); > > Since you renamed `p` -> `pack`, you should also rename this one to pack: > Suggestion: > > VPointer mem_ref_pack(mem_ref, phase(), lpt(), nullptr, false); I keep it without your suggestion. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1444766758 From epeter at openjdk.org Mon Jan 8 14:53:08 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 8 Jan 2024 14:53:08 GMT Subject: RFR: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs [v59] In-Reply-To: References: <7koVi33WJ_-H7Z69jVF3kWIFB8cGzKlLW2chbzv2WZc=.cfdfda1c-1c7c-446a-bb9e-4df94643a566@github.com> Message-ID: On Mon, 8 Jan 2024 14:48:01 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/superword.cpp line 1620: >> >>> 1618: >>> 1619: const MemNode* mem_ref = pack->at(0)->as_Mem(); >>> 1620: VPointer mem_ref_p(mem_ref, phase(), lpt(), nullptr, false); >> >> Since you renamed `p` -> `pack`, you should also rename this one to pack: >> Suggestion: >> >> VPointer mem_ref_pack(mem_ref, phase(), lpt(), nullptr, false); > > I keep it without your suggestion. The idea is that it is the pointer of the mem_ref ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14785#discussion_r1444769346 From roland at openjdk.org Mon Jan 8 14:58:29 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 8 Jan 2024 14:58:29 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v2] In-Reply-To: <42h7t16pyeYV2jszIztjGu0JE2ZZWnnJCiyRd2s2oLg=.fffb35a5-e208-442c-9157-ec5d3fcaa31d@github.com> References: <42h7t16pyeYV2jszIztjGu0JE2ZZWnnJCiyRd2s2oLg=.fffb35a5-e208-442c-9157-ec5d3fcaa31d@github.com> Message-ID: <_0ZJL7u55Fcg1yID2yjH4DHPkrgKTKeekpYtWG1YsAI=.e9caec05-a88d-4123-832d-6699a1990e49@github.com> On Thu, 7 Dec 2023 22:51:50 GMT, Joshua Cao wrote: >> I'm not 100% sure if this covers all case of late inlines. >> >> Passes jtreg tier1 locally on my Linux machine with a fastdebug build. With sample Java programs and -XX:+PrintInlining, I can see >> >> >> @ 15 java.lang.Float::valueOf (9 bytes) late inline (boxing method) > > Joshua Cao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: > > - 8319850: PrintInlining should report late inlines > - Revert "8319850: PrintInlining should report late inlines" > > This reverts commit c5bfb832ff989261b6b2c98f26017c6491fe3067. > - 8319850: PrintInlining should report late inlines When `InlineTree::ok_to_inline()` is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the `InlineTree::ok_to_inline()` has some useful information that's lost when late inlining happens? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16595#issuecomment-1881167177 From jvernee at openjdk.org Mon Jan 8 14:58:33 2024 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 8 Jan 2024 14:58:33 GMT Subject: Integrated: 8320310: CompiledMethod::has_monitors flag can be incorrect In-Reply-To: <4efExybeWDkEbcsckI1Qdz8kpYFqd-Rbmt7oiWz5qlo=.d8d38d0e-affa-48dc-b963-45f958041c4e@github.com> References: <4efExybeWDkEbcsckI1Qdz8kpYFqd-Rbmt7oiWz5qlo=.d8d38d0e-affa-48dc-b963-45f958041c4e@github.com> Message-ID: On Thu, 23 Nov 2023 15:55:07 GMT, Jorn Vernee wrote: > Currently, the `CompiledMethod::has_monitors` flag is set when either a `monitorenter` is parsed by C1, and `monitorexit` is parsed by C1 or C2 during method compilation. However, not necessarily every bytecode of a method is parsed, which means that we could miss all `monitorenter`/`monitorexit` byte codes in a method, while it actually does use monitors. This can lead to situations where a thread holds a monitor, but `has_monitors` for all frames is set to `false`, leading to an assertion failure in 'freeze_internal' in continuationFreezeThaw.cpp: > > assert(monitors_on_stack(current) == ((current->held_monitor_count() - current->jni_monitor_count()) > 0), > "Held monitor count and locks on stack invariant: " INT64_FORMAT " JNI: " INT64_FORMAT, (int64_t)current->held_monitor_count(), (int64_t)current->jni_monitor_count()); > > The proposed fix is to rely on `Method::has_monitor_bytecodes` to set the `has_monitors` flag when compiling, which is immune to issues where not all byte codes of a method are parsed during compilation. We can follow the pattern established for `has_reserved_stack_access`, which is similar. > > Note that this PR is based on: https://github.com/openjdk/jdk/pull/16416 which disables the assertion. The goal of this PR is to fix the issue, and then re-enable the assertion. > > Testing: Tier 1-4, `java/lang/Thread/virtual/stress/PinALot.java` This pull request has now been integrated. Changeset: c8fa3e21 Author: Jorn Vernee URL: https://git.openjdk.org/jdk/commit/c8fa3e21e6a4fd7846932b545a1748cc1dc6d9f1 Stats: 48 lines in 5 files changed: 9 ins; 17 del; 22 mod 8320310: CompiledMethod::has_monitors flag can be incorrect Reviewed-by: vlivanov, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/16799 From roland at openjdk.org Mon Jan 8 15:01:12 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 8 Jan 2024 15:01:12 GMT Subject: RFR: 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 [v11] In-Reply-To: References: Message-ID: > Range check smearing and range check predication make an array access > dependent on 2 (or more in the case of RC smearing) conditions. As a > consequence, if a range check can be eliminated because there's an > identical dominating range check, the control dependent nodes that > could float and become dependent on the dominating range check cannot > be allowed to float because there's a risk that they would then bypass > one of the checks that make the access legal. > > `IfNode::dominated_by()` and `PhaseIdealLoop::dominated_by()` have > logic to prevent this: nodes that are control dependent on a range > check or predicate are not allowed to float. This is however not > sufficient as demonstrated by the test cases. > > In `TestArrayAccessAboveRCAfterSmearingOrPredication.testRangeCheckSmearing()`: > > > v += array[i]; > if (flag2) { > if (flag3) { > field = 0x42; > } > } > if (flagField == 1) { > v += array[i]; > } > > > The range check for the second `array[i]` load is replaced by the > dominating range check for the first `array[i]` but because the second > `array[i]` load could really be dependent on multiple range checks (in > case smearing happened which is not the case here), c2 doesn't allow > the second `array[i]` to float when the second range check is > removed. The second `array[i]` is then control dependent on: > > > if (flagField == 1) { > > > which is next found to be dominated by the same test: > > > if (flag == 1) { > > > and is removed. However nothing in `dominated_by()` treats node > dependent on tests that are not range check or predicates > specially. So the second `array[i]` is allowed to float and become > dependent on: > > > if (flag == 1) { > > > which is above the range check for that access. The test method in its > last invocation is passed an index for the array access that's widely > out of range. The array load happens before the range check and > crashes the VM. `testLoopPredication()` is a similar test where array > loads become dependent on predicates and end up above range checks. > > `TestArrayAccessCastIIAboveRC.java` is the test case from the bug > where for similar reasons a range check `CastII` ends up above its > range check, becomes top because its input becomes some integer that > conflicts with its type (but there's no condition to catch it). The > graph becomes broken and c2 crashes. > > Logic in the `dominated_by()` methods ... Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: whitespaces ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16886/files - new: https://git.openjdk.org/jdk/pull/16886/files/2cc6f1d3..51231631 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16886&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16886&range=09-10 Stats: 6 lines in 1 file changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/16886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16886/head:pull/16886 PR: https://git.openjdk.org/jdk/pull/16886 From igavrilin at openjdk.org Mon Jan 8 15:52:24 2024 From: igavrilin at openjdk.org (Ilya Gavrilin) Date: Mon, 8 Jan 2024 15:52:24 GMT Subject: RFR: 8322790: RISC-V: Tune costs for shuffles with no conversion [v2] In-Reply-To: References: Message-ID: <1PfPMl6oI_lYd-rw0LevGwVDph6ffIrIM_gZ2ikL0D0=.1e57ac0b-e14e-4b84-9920-71c18df0ecbe@github.com> On Mon, 8 Jan 2024 07:27:46 GMT, Robbin Ehn wrote: >> Ilya Gavrilin has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert some costs changes > > Still reasonable to me. @robehn @RealFYang Thanks for your reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17206#issuecomment-1881311125 From igavrilin at openjdk.org Mon Jan 8 15:56:33 2024 From: igavrilin at openjdk.org (Ilya Gavrilin) Date: Mon, 8 Jan 2024 15:56:33 GMT Subject: Integrated: 8322790: RISC-V: Tune costs for shuffles with no conversion In-Reply-To: References: Message-ID: On Sat, 30 Dec 2023 20:07:13 GMT, Ilya Gavrilin wrote: > Hi all, please review this small change to RISC-V nodes insertion costs. > Now we have several nodes which provide shuffles without conversion: https://github.com/openjdk/jdk/blob/32d80e2caf6063b58128bd5f3dc87b276f3bd0cb/src/hotspot/cpu/riscv/riscv.ad#L8525-L8741 > On most RISC-V cpu`s we prefer reg<->reg operations, because they are faster, but now stack<->reg operations used (for details about reasons, please, visit connected jbs issue). > After changing insertion costs reg<->reg operations selected, and we can see performance improvements for benchmarks, which use such shuffles (tested on thead C910 board): > | Benchmark | Upstream build (ops/ms) | Patched build (ops/ms) | difference (%) | > |:-----------------------------------:|:-----------------------:|:----------------------:|:--------------:| > | MathBench.doubleToRawLongBitsDouble | 30935.139 | 32171.761 | +4.00 | > | StrictMathBench.ceilDouble | 24682.810 | 29782.050 | +20.66 | > | StrictMathBench.cosDouble | 6948.309 | 6938.276 | -0.14 | > | StrictMathBench.expDouble | 6816.143 | 7211.021 | +5.79 | > | StrictMathBench.floorDouble | 30699.630 | 34189.509 | +11.37 | > | StrictMathBench.maxDouble | 35157.355 | 34675.191 | -1.37 | > | StrictMathBench.minDouble | 35192.135 | 35183.015 | -0.03 | > | StrictMathBench.sinDouble | 6698.405 | 6721.809 | +0.35 | > > New benchmark for changed nodes: > > --- a/test/micro/org/openjdk/bench/java/lang/MathBench.java > +++ b/test/micro/org/openjdk/bench/java/lang/MathBench.java > @@ -540,4 +540,11 @@ public class MathBench { > return Math.ulp(float7); > } > > + @Benchmark > + public long doubleToRawLongBitsDouble() { > + double dbl162Dot5 = double81 * 2.0d + double0Dot5; > + double dbl3 = double2 + double1; > + return Double.doubleToRawLongBits(dbl162Dot5) + Double.doubleToRawLongBits(dbl3); > + } > + This pull request has now been integrated. Changeset: 2acb5bd9 Author: Ilya Gavrilin Committer: Vladimir Kempik URL: https://git.openjdk.org/jdk/commit/2acb5bd9924511b58b0e57ea9eb6c2dee9fd3ee8 Stats: 5 lines in 1 file changed: 1 ins; 0 del; 4 mod 8322790: RISC-V: Tune costs for shuffles with no conversion Reviewed-by: rehn, fyang ------------- PR: https://git.openjdk.org/jdk/pull/17206 From roland at openjdk.org Mon Jan 8 16:12:34 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 8 Jan 2024 16:12:34 GMT Subject: RFR: 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 [v9] In-Reply-To: References: Message-ID: <7FURGQ8UskC-HVN6r-ne-VJnCH8XjxV7BAX5dIYiJhw=.7a559dc9-f6ef-4602-b784-66d2256a210d@github.com> On Thu, 4 Jan 2024 16:22:38 GMT, Emanuel Peter wrote: > I mostly left suggestions for better comments and improved naming. Thanks for reviewing this. I pushed an update with more comments/some renaming following your suggestions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16886#issuecomment-1881379045 From epeter at openjdk.org Mon Jan 8 16:12:55 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 8 Jan 2024 16:12:55 GMT Subject: RFR: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs [v57] In-Reply-To: References: <4g4SbB2RBLU-ZFcrH_ukdqC_QSoSvibNGanasAFl-lw=.731266a6-9974-402e-954e-e441706426ab@github.com> Message-ID: On Fri, 22 Dec 2023 15:40:47 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> Apply suggestions from code review by Christian >> >> Co-authored-by: Christian Hagedorn > > Thanks a lot Emanuel for all the discussions and for addressing all my comments online and offline :-) > > It looks very good now and it's easy to follow the logic. The proofs are great and really helpful to better understand the (rather simple in the end) code for proving and calculating the alignment solutions. Thanks for putting the extra effort in here. > > I will have another complete look at the entire PR in the new year. But I think it looks good! Thanks @chhagedorn for all the help to get this over the line :) Thanks @vnkozlov for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14785#issuecomment-1881377644 From epeter at openjdk.org Mon Jan 8 16:12:57 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 8 Jan 2024 16:12:57 GMT Subject: Integrated: 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs In-Reply-To: References: Message-ID: On Thu, 6 Jul 2023 14:13:01 GMT, Emanuel Peter wrote: > I want to push this in JDK23. > After this fix here, I'm doing [this refactoring](https://github.com/openjdk/jdk/pull/16620). > > To calm your nerves: most of the changes are in auto-generated tests, and tests in general. > > **Context** > > `-XX:+AlignVector` ensures that SuperWord only creates LoadVector and StoreVector that can be memory aligned. This is achieved by iterating in the pre-loop until we reach the alignment boundary, then we can start the main loop properly aligned. However, this is not possible in all cases, sometimes some memory accesses cannot be guaranteed to be aligned, and we need to reject vectorization (at least partially, for some of the packs). > > Alignment is split into two tasks: > - Alignment Correctness Checks: only relevant if `-XX:+AlignVector`. Need to reject vectorization if alignment is not possible. We must check if the address of the vector load/store is aligned with (divisible by) `ObjectAlignmentInBytes`. > - Alignment by adjusting pre-loop limit: alignment is desirable even if `-XX:-AlignVector`. We would like to align the vectors with their vector width. > > **Problem** > > I have recently found a bug with our AlignVector [JDK-8310190](https://bugs.openjdk.org/browse/JDK-8310190). > In that bug, we perform a misaligned memory vector access, which results in a `SIGBUS` on an ARM32 machine. > Thanks @fg1417 for confirming this! > Hence, we need to fix the alignment correctness checks. > > While working on this task, I also found some bugs in the "alignment by adjusting pre-loop limit": there were cases where it did not align the vectors correctly. > > **Problem Details** > > Reproducer: > > > static void test(short[] a, short[] b, short mask) { > for (int i = 0; i < RANGE; i+=8) { > // Problematic for AlignVector > b[i+0] = (short)(a[i+0] & mask); // best_memref, align 0 > > b[i+3] = (short)(a[i+3] & mask); // pack at offset 6 bytes > b[i+4] = (short)(a[i+4] & mask); > b[i+5] = (short)(a[i+5] & mask); > b[i+6] = (short)(a[i+6] & mask); > } > } > > > During `SuperWord::find_adjacent_refs` we used to check if the references are expected to be aligned. For that, we look at each "group" of references (eg all `LoadS`) and take the reference with the lowest offset. For that chosen reference, we check if it is alignable. If yes, we accept all references of that group, if no we reject all. > > This is problematic as shown in this example. We have references at index offset `0, 3, 4, 5, 6`, and by... This pull request has now been integrated. Changeset: 827c71da Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/827c71dac9a5732f70bc7341743bce314cad302f Stats: 8892 lines in 23 files changed: 7569 ins; 362 del; 961 mod 8310190: C2 SuperWord: AlignVector is broken, generates misaligned packs Co-authored-by: Christian Hagedorn Reviewed-by: kvn, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/14785 From kxu at openjdk.org Mon Jan 8 17:45:31 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Mon, 8 Jan 2024 17:45:31 GMT Subject: RFR: 8320237 C2: late inlining of method handle invoke causes duplicate lines in PrintInlining output Message-ID: This PR resolves [JDK-8320237](https://bugs.openjdk.org/browse/JDK-8320237) The original behavior produces both a failure and success message upon late inlining which is confusing. The patch removes the failure message if inlining was successful. Huge thanks to @rwestrel for reporting and working out a solution. Unit test `test/hotspot/jtreg/compiler/inlining/TestDuplicatedLateInliningOutput.java` is added and passing. ------------- Commit messages: - update test summary, requirements, and VM flags - Merge branch 'master' into JDK-8320237 - make regex whitespace consistent - 8320237: C2: late inlining of method handle invoke causes duplicate lines in PrintInlining output Changes: https://git.openjdk.org/jdk/pull/17147/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17147&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8320237 Stats: 186 lines in 2 files changed: 186 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17147.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17147/head:pull/17147 PR: https://git.openjdk.org/jdk/pull/17147 From xliu at openjdk.org Mon Jan 8 18:53:38 2024 From: xliu at openjdk.org (Xin Liu) Date: Mon, 8 Jan 2024 18:53:38 GMT Subject: RFR: 8322982: CTW fails to build after 8308753 [v2] In-Reply-To: References: Message-ID: > This patch fixes the build error of CTW by sidelining ModuleInfoWriter.java. ModuleInfoWriter uses Class-File API, which has transitioned to preview. > If we really need to compile it, we have to append --enable-preview and --source N. > > The fact is CTW itself doesn't depend on ModuleInfoWriter. I think it's easier to maintain CTW if we filter it out in Makefile. Xin Liu has updated the pull request incrementally with one additional commit since the last revision: Combine two functions into one. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17292/files - new: https://git.openjdk.org/jdk/pull/17292/files/efd4e973..5ac1d9f1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17292&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17292&range=00-01 Stats: 4 lines in 1 file changed: 1 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17292.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17292/head:pull/17292 PR: https://git.openjdk.org/jdk/pull/17292 From xliu at openjdk.org Mon Jan 8 18:56:34 2024 From: xliu at openjdk.org (Xin Liu) Date: Mon, 8 Jan 2024 18:56:34 GMT Subject: Integrated: 8320128: Clean up Parse constructor for OSR In-Reply-To: <0mxhhLR9DDdpiBhbXWPxCmMzo_OI9DYiIX36xGhxC_c=.1eed99b9-2dc3-4403-b3c1-22b6ebed6079@github.com> References: <0mxhhLR9DDdpiBhbXWPxCmMzo_OI9DYiIX36xGhxC_c=.1eed99b9-2dc3-4403-b3c1-22b6ebed6079@github.com> Message-ID: On Wed, 15 Nov 2023 07:01:35 GMT, Xin Liu wrote: > There's a special case for the constructor of Parse. If current compilation is OSR and it is handling the top-level method(depth() == 1), then > > 1. _tf = C->tf(); > 2. _entry_bci = C->entry_bci(); > 3. _flow = method()->get_osr_flow_analysis(_entry_bci); > > We don't need to assign those member data twice. We can also factor out _flow->failing() for the special case and normal cases. > > It's worth mentioning that we can't save ciTypeFlow computation because > get_osr_flow_analysis(_entry_bci) actually needs get_flow_analysis(method()). This pull request has now been integrated. Changeset: d47393bd Author: Xin Liu URL: https://git.openjdk.org/jdk/commit/d47393bd8225e818f0f9cd45192a5e656018af11 Stats: 45 lines in 2 files changed: 19 ins; 17 del; 9 mod 8320128: Clean up Parse constructor for OSR Reviewed-by: thartmann, shade ------------- PR: https://git.openjdk.org/jdk/pull/16669 From shade at openjdk.org Mon Jan 8 19:28:24 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 8 Jan 2024 19:28:24 GMT Subject: RFR: 8322982: CTW fails to build after 8308753 [v2] In-Reply-To: References: Message-ID: On Mon, 8 Jan 2024 18:53:38 GMT, Xin Liu wrote: >> This patch fixes the build error of CTW by sidelining ModuleInfoWriter.java. ModuleInfoWriter uses Class-File API, which has transitioned to preview. >> If we really need to compile it, we have to append --enable-preview and --source N. >> >> The fact is CTW itself doesn't depend on ModuleInfoWriter. I think it's easier to maintain CTW if we filter it out in Makefile. > > Xin Liu has updated the pull request incrementally with one additional commit since the last revision: > > Combine two functions into one. Looks fine to me. Marked as reviewed by shade (Reviewer). test/hotspot/jtreg/testlibrary/ctw/Makefile line 45: > 43: > 44: SRC_FILES = $(shell find $(SRC_DIR) -name '*.java') > 45: # Exclude ModuleInfoWriter.java to circumvent '--enable-preview'. Wording: `Exclude files that need --enable-preview to compile`. There would probably be more files later. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17292#pullrequestreview-1809880221 PR Review: https://git.openjdk.org/jdk/pull/17292#pullrequestreview-1809881199 PR Review Comment: https://git.openjdk.org/jdk/pull/17292#discussion_r1445230097 From xliu at openjdk.org Mon Jan 8 19:48:34 2024 From: xliu at openjdk.org (Xin Liu) Date: Mon, 8 Jan 2024 19:48:34 GMT Subject: RFR: 8323095: Expand TraceOptoParse block output abbreviations In-Reply-To: References: Message-ID: On Fri, 5 Jan 2024 23:07:12 GMT, Joshua Cao wrote: > `e` -> `exception block` > `lphd` -> `loop head` > > Also removing an unnecessary space. The successor ids have a space before them. > > Examples from `java -Xcomp -XX:+TraceOptoParse -version`: > > > Parsing block #8 at bci [33,39), successors: 9 16(exception block) loop head LGMT. I am not a reviewer. ------------- Marked as reviewed by xliu (Committer). PR Review: https://git.openjdk.org/jdk/pull/17289#pullrequestreview-1809905645 From duke at openjdk.org Mon Jan 8 19:48:35 2024 From: duke at openjdk.org (Joshua Cao) Date: Mon, 8 Jan 2024 19:48:35 GMT Subject: Integrated: 8323095: Expand TraceOptoParse block output abbreviations In-Reply-To: References: Message-ID: On Fri, 5 Jan 2024 23:07:12 GMT, Joshua Cao wrote: > `e` -> `exception block` > `lphd` -> `loop head` > > Also removing an unnecessary space. The successor ids have a space before them. > > Examples from `java -Xcomp -XX:+TraceOptoParse -version`: > > > Parsing block #8 at bci [33,39), successors: 9 16(exception block) loop head This pull request has now been integrated. Changeset: 24823ba6 Author: Joshua Cao Committer: Xin Liu URL: https://git.openjdk.org/jdk/commit/24823ba647d4bf412586372cd5076f35bbc131a5 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod 8323095: Expand TraceOptoParse block output abbreviations Reviewed-by: thartmann, chagedorn, xliu ------------- PR: https://git.openjdk.org/jdk/pull/17289 From xliu at openjdk.org Mon Jan 8 20:08:37 2024 From: xliu at openjdk.org (Xin Liu) Date: Mon, 8 Jan 2024 20:08:37 GMT Subject: RFR: 8322982: CTW fails to build after 8308753 [v3] In-Reply-To: References: Message-ID: > This patch fixes the build error of CTW by sidelining ModuleInfoWriter.java. ModuleInfoWriter uses Class-File API, which has transitioned to preview. > If we really need to compile it, we have to append --enable-preview and --source N. > > The fact is CTW itself doesn't depend on ModuleInfoWriter. I think it's easier to maintain CTW if we filter it out in Makefile. Xin Liu has updated the pull request incrementally with one additional commit since the last revision: Wording and also remove add-modules required by ModuleInfoWriter.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17292/files - new: https://git.openjdk.org/jdk/pull/17292/files/5ac1d9f1..7978052e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17292&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17292&range=01-02 Stats: 5 lines in 1 file changed: 0 ins; 4 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17292.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17292/head:pull/17292 PR: https://git.openjdk.org/jdk/pull/17292 From xliu at openjdk.org Mon Jan 8 20:08:40 2024 From: xliu at openjdk.org (Xin Liu) Date: Mon, 8 Jan 2024 20:08:40 GMT Subject: RFR: 8322982: CTW fails to build after 8308753 [v2] In-Reply-To: References: Message-ID: On Mon, 8 Jan 2024 19:25:54 GMT, Aleksey Shipilev wrote: >> Xin Liu has updated the pull request incrementally with one additional commit since the last revision: >> >> Combine two functions into one. > > test/hotspot/jtreg/testlibrary/ctw/Makefile line 45: > >> 43: >> 44: SRC_FILES = $(shell find $(SRC_DIR) -name '*.java') >> 45: # Exclude ModuleInfoWriter.java to circumvent '--enable-preview'. > > Wording: `Exclude files that need --enable-preview to compile`. There would probably be more files later. I took a look at LIB_FILES. Only 'ModuleInfoWriter.java' depends on advanced APIs. It was added to testlibrary in [JDK-8304163](https://bugs.openjdk.org/browse/JDK-8304163). Yes, we may need to exclude more files in the future. Currently, Makefile selects LIB_FILES using wildcard matching. If it's necessary, we need to define LIB_FILES explicitly. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17292#discussion_r1445271062 From kvn at openjdk.org Mon Jan 8 20:50:21 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 8 Jan 2024 20:50:21 GMT Subject: RFR: JDK-8277869: Maven POMs are using HTTP links where HTTPS is available In-Reply-To: References: Message-ID: <_yC54VkkHOUc9a7YC6Wf-7QjqTiJkA9ieAWMlwJYApQ=.032ae2c6-0471-4b8f-bf78-dd57fb6c90db@github.com> On Mon, 8 Jan 2024 10:29:38 GMT, Tobias Holenstein wrote: > Replace `http` with `https` in xml files of IdealGraphVisualizer and LogCompilation. > Moreover, replace the old URL (./maven-v4_0_0.xsd) with of the suggested one (./maven-4.0.0.xsd) for `xsi:schemaLocation` in pom.xml. > > Tested: IdealGraphVisualizer and LogCompilation build and run as expected. Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17302#pullrequestreview-1809988510 From kvn at openjdk.org Mon Jan 8 21:07:24 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 8 Jan 2024 21:07:24 GMT Subject: RFR: 8322735: C2: minor improvements of bubble sort used in SuperWord::packset_sort [v2] In-Reply-To: <1W_HcoiwEnXLvtqprc7L8mghr0BRH3a_ITn6Cerzb_c=.f6c90e24-7d41-418d-bc50-41bff396663b@github.com> References: <1W_HcoiwEnXLvtqprc7L8mghr0BRH3a_ITn6Cerzb_c=.f6c90e24-7d41-418d-bc50-41bff396663b@github.com> Message-ID: On Fri, 5 Jan 2024 08:57:33 GMT, Denghui Dong wrote: >> A minor improvement could be made for bubble sort in SuperWord::packset_sort to reduce the comparison count in bad cases. >> >> See https://en.wikipedia.org/wiki/Bubble_sort > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > update Clever. src/hotspot/share/opto/superword.cpp line 3526: > 3524: // only swap when we find something to swap > 3525: if (alignment(q_low->at(0)) > alignment(q_i->at(0))) { > 3526: Node_List* t = q_i; Why you need this local `t`? src/hotspot/share/opto/superword.cpp line 3529: > 3527: *(_packset.adr_at(i)) = q_low; > 3528: *(_packset.adr_at(i-1)) = q_i; > 3529: max_swap_index = i; So we not using `i+1` here because all previous values should be < than `i`'s Right? ------------- PR Review: https://git.openjdk.org/jdk/pull/17190#pullrequestreview-1810006561 PR Review Comment: https://git.openjdk.org/jdk/pull/17190#discussion_r1445326103 PR Review Comment: https://git.openjdk.org/jdk/pull/17190#discussion_r1445331241 From kbarrett at openjdk.org Mon Jan 8 21:29:36 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 8 Jan 2024 21:29:36 GMT Subject: Integrated: 8322759: Eliminate -Wparentheses warnings in compiler code In-Reply-To: <496tGkQ1KUCrW1IHOETyvhqopkNYjsEoupxjo0Ze3Wg=.0223f494-fd34-4e4a-a31a-5030603f2113@github.com> References: <496tGkQ1KUCrW1IHOETyvhqopkNYjsEoupxjo0Ze3Wg=.0223f494-fd34-4e4a-a31a-5030603f2113@github.com> Message-ID: On Fri, 29 Dec 2023 03:33:11 GMT, Kim Barrett wrote: > Please review this change to eliminate some -Wparentheses warnings. This > involved simply adding a few parentheses to make some implicit operator > precedence explicit. > > This change addresses non-C2 parts of the compiler component. > > Testing: mach5 tier1 > > Also ran mach5 tier1 with these changes in conjunction enabling -Wparentheses > and other changes needed to make that work. This pull request has now been integrated. Changeset: ca9635df Author: Kim Barrett URL: https://git.openjdk.org/jdk/commit/ca9635df3357bf70b41645f619237b6d2068afb7 Stats: 16 lines in 5 files changed: 0 ins; 0 del; 16 mod 8322759: Eliminate -Wparentheses warnings in compiler code Reviewed-by: kvn, shade ------------- PR: https://git.openjdk.org/jdk/pull/17200 From kbarrett at openjdk.org Mon Jan 8 21:29:35 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 8 Jan 2024 21:29:35 GMT Subject: RFR: 8322759: Eliminate -Wparentheses warnings in compiler code [v2] In-Reply-To: <2WJkEZqCHKmE27ORwdudo3QC0JLzBxShw6HBBJ8k2qE=.4f172823-b930-418a-924d-578342d2c991@github.com> References: <496tGkQ1KUCrW1IHOETyvhqopkNYjsEoupxjo0Ze3Wg=.0223f494-fd34-4e4a-a31a-5030603f2113@github.com> <2WJkEZqCHKmE27ORwdudo3QC0JLzBxShw6HBBJ8k2qE=.4f172823-b930-418a-924d-578342d2c991@github.com> Message-ID: On Tue, 2 Jan 2024 20:13:47 GMT, Vladimir Kozlov wrote: >> Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Merge branch 'master' into compiler-wparentheses >> - simplify asserts >> - update copyrights for new year >> - fix -Wparentheses warnings in non-C2 compiler code > > Looks good. Thanks for reviews @vnkozlov and @shipilev . ------------- PR Comment: https://git.openjdk.org/jdk/pull/17200#issuecomment-1881844523 From sviswanathan at openjdk.org Tue Jan 9 00:10:39 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 9 Jan 2024 00:10:39 GMT Subject: RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp Message-ID: The C2_MacroAssembler::vminmax_fp and evminmax_fp were incorrectly checking for the two USE operand registers to be different and thus resulting in assertion failure. In x86_64.ad: instruct maxF_reg(legRegF dst, legRegF a, legRegF b, legRegF tmp, legRegF atmp, legRegF btmp) %{ ... effect(USE a, USE b, TEMP tmp, TEMP atmp, TEMP btmp); ... __ vminmax_fp(Op_MaxV, T_FLOAT, $dst$$XMMRegister, $a$$XMMRegister, $b$$XMMRegister, $tmp$$XMMRegister, $atmp$$XMMRegister, $btmp$$XMMRegister, Assembler::AVX_128bit); %} Changing the assert in vminmax_fp from: assert_different_registers(a, b, tmp, atmp, btmp); to: assert_different_registers(a, tmp, atmp, btmp); assert_different_registers(b, tmp, atmp, btmp); fixes the issue. Similar change done in evminmax_fp. Please review. Best Regards, Sandhya ------------- Commit messages: - 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp Changes: https://git.openjdk.org/jdk/pull/17315/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17315&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8321712 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17315.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17315/head:pull/17315 PR: https://git.openjdk.org/jdk/pull/17315 From duke at openjdk.org Tue Jan 9 01:52:50 2024 From: duke at openjdk.org (Zhiqiang Zang) Date: Tue, 9 Jan 2024 01:52:50 GMT Subject: RFR: 8322589: Add Ideal transformation: (~a) & (~b) => ~(a | b) [v6] In-Reply-To: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> References: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> Message-ID: > Hello, > > `(~a) & (~b) => ~(a | b)` is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1616C28-L1616C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. > > Thanks. Zhiqiang Zang has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/addnode.cpp Co-authored-by: Tobias Hartmann ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16333/files - new: https://git.openjdk.org/jdk/pull/16333/files/ecb2098b..afa0737a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16333&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16333&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/16333.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16333/head:pull/16333 PR: https://git.openjdk.org/jdk/pull/16333 From duke at openjdk.org Tue Jan 9 01:53:50 2024 From: duke at openjdk.org (Zhiqiang Zang) Date: Tue, 9 Jan 2024 01:53:50 GMT Subject: RFR: 8322077: Add Ideal transformation: (~a) | (~b) => ~(a & b) [v9] In-Reply-To: References: Message-ID: <1X-pxmUfbW67Uog-E7xJBsSmO_fJHahJj16iR_ZL7Ds=.083a0765-0375-4792-b835-8a43aa7c46d2@github.com> > Hello, > > (~a) | (~b) => ~(a & b) is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1617C28-L1617C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. > > Thanks. Zhiqiang Zang has updated the pull request incrementally with two additional commits since the last revision: - Update test/hotspot/jtreg/compiler/c2/irTests/DeMorganLawLongTests.java Co-authored-by: Tobias Hartmann - Update test/hotspot/jtreg/compiler/c2/irTests/DeMorganLawIntTests.java Co-authored-by: Tobias Hartmann ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16334/files - new: https://git.openjdk.org/jdk/pull/16334/files/d8ed0f35..6eb29aef Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16334&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16334&range=07-08 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16334.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16334/head:pull/16334 PR: https://git.openjdk.org/jdk/pull/16334 From duke at openjdk.org Tue Jan 9 01:55:59 2024 From: duke at openjdk.org (Zhiqiang Zang) Date: Tue, 9 Jan 2024 01:55:59 GMT Subject: RFR: 8322589: Add Ideal transformation: (~a) & (~b) => ~(a | b) [v7] In-Reply-To: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> References: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> Message-ID: > Hello, > > `(~a) & (~b) => ~(a | b)` is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1616C28-L1616C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. > > Thanks. Zhiqiang Zang has updated the pull request incrementally with one additional commit since the last revision: address minor comments. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16333/files - new: https://git.openjdk.org/jdk/pull/16333/files/afa0737a..c4fa2e40 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16333&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16333&range=05-06 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/16333.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16333/head:pull/16333 PR: https://git.openjdk.org/jdk/pull/16333 From kvn at openjdk.org Tue Jan 9 02:28:21 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 9 Jan 2024 02:28:21 GMT Subject: RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp In-Reply-To: References: Message-ID: On Tue, 9 Jan 2024 00:01:04 GMT, Sandhya Viswanathan wrote: > The C2_MacroAssembler::vminmax_fp and evminmax_fp were incorrectly checking for the two USE operand registers to be different and thus resulting in assertion failure. > > In x86_64.ad: > > instruct maxF_reg(legRegF dst, legRegF a, legRegF b, legRegF tmp, legRegF atmp, legRegF btmp) %{ > ... > effect(USE a, USE b, TEMP tmp, TEMP atmp, TEMP btmp); > ... > __ vminmax_fp(Op_MaxV, T_FLOAT, $dst$$XMMRegister, $a$$XMMRegister, $b$$XMMRegister, $tmp$$XMMRegister, $atmp$$XMMRegister, $btmp$$XMMRegister, Assembler::AVX_128bit); > %} > > > Changing the assert in vminmax_fp from: > assert_different_registers(a, b, tmp, atmp, btmp); > to: > assert_different_registers(a, tmp, atmp, btmp); > assert_different_registers(b, tmp, atmp, btmp); > fixes the issue. > > Similar change done in evminmax_fp. > > Please review. > > Best Regards, > Sandhya Should we "short cut" code when registers are the same? ------------- PR Review: https://git.openjdk.org/jdk/pull/17315#pullrequestreview-1810310673 From duke at openjdk.org Tue Jan 9 02:35:04 2024 From: duke at openjdk.org (Zhiqiang Zang) Date: Tue, 9 Jan 2024 02:35:04 GMT Subject: RFR: 8322589: Add Ideal transformation: (~a) & (~b) => ~(a | b) [v8] In-Reply-To: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> References: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> Message-ID: <-0O_jW7NWGynEROp33izEgAreJ1FQEjVOg4AA8h5E8E=.a85abda4-54a0-4fba-abfe-1d1628f8a9ca@github.com> > Hello, > > `(~a) & (~b) => ~(a | b)` is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1616C28-L1616C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. > > Thanks. Zhiqiang Zang has updated the pull request incrementally with one additional commit since the last revision: move the two helper functions to member functions of the node class. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16333/files - new: https://git.openjdk.org/jdk/pull/16333/files/c4fa2e40..7a962d69 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16333&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16333&range=06-07 Stats: 53 lines in 5 files changed: 24 ins; 23 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/16333.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16333/head:pull/16333 PR: https://git.openjdk.org/jdk/pull/16333 From duke at openjdk.org Tue Jan 9 02:52:52 2024 From: duke at openjdk.org (Zhiqiang Zang) Date: Tue, 9 Jan 2024 02:52:52 GMT Subject: RFR: 8322077: Add Ideal transformation: (~a) | (~b) => ~(a & b) [v10] In-Reply-To: References: Message-ID: > Hello, > > (~a) | (~b) => ~(a & b) is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1617C28-L1617C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. > > Thanks. Zhiqiang Zang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: - adapt changes from the dependent pr. - Merge branch 'andnode-PNewDeMorganLawAndToOr' into ornode-PNewDeMorganLawOrToAnd - Update test/hotspot/jtreg/compiler/c2/irTests/DeMorganLawLongTests.java Co-authored-by: Tobias Hartmann - Update test/hotspot/jtreg/compiler/c2/irTests/DeMorganLawIntTests.java Co-authored-by: Tobias Hartmann - Add tests for using De Morgan's Law for both optimizations. - remove unused code from tests. - update the copyright dates. - address comments. - untabify. - use common helpful functions. - ... and 2 more: https://git.openjdk.org/jdk/compare/9fcae094...0c8d1077 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16334/files - new: https://git.openjdk.org/jdk/pull/16334/files/6eb29aef..0c8d1077 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16334&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16334&range=08-09 Stats: 60 lines in 5 files changed: 24 ins; 23 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/16334.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16334/head:pull/16334 PR: https://git.openjdk.org/jdk/pull/16334 From duke at openjdk.org Tue Jan 9 02:53:58 2024 From: duke at openjdk.org (Zhiqiang Zang) Date: Tue, 9 Jan 2024 02:53:58 GMT Subject: RFR: 8322589: Add Ideal transformation: (~a) & (~b) => ~(a | b) [v9] In-Reply-To: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> References: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> Message-ID: <5z37bxWaSr9AFumvmDHQgPYfj_qz5P0XFifGU-j8Mjk=.5fa90579-6998-4a94-a4da-345d59a4f69e@github.com> > Hello, > > `(~a) & (~b) => ~(a | b)` is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1616C28-L1616C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. > > Thanks. Zhiqiang Zang has updated the pull request incrementally with one additional commit since the last revision: update copyright dates. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16333/files - new: https://git.openjdk.org/jdk/pull/16333/files/7a962d69..3665de2f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16333&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16333&range=07-08 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/16333.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16333/head:pull/16333 PR: https://git.openjdk.org/jdk/pull/16333 From ddong at openjdk.org Tue Jan 9 05:26:43 2024 From: ddong at openjdk.org (Denghui Dong) Date: Tue, 9 Jan 2024 05:26:43 GMT Subject: RFR: 8322735: C2: minor improvements of bubble sort used in SuperWord::packset_sort [v3] In-Reply-To: References: Message-ID: > A minor improvement could be made for bubble sort in SuperWord::packset_sort to reduce the comparison count in bad cases. > > See https://en.wikipedia.org/wiki/Bubble_sort Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: update ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17190/files - new: https://git.openjdk.org/jdk/pull/17190/files/ba53ed56..c635b10d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17190&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17190&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17190.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17190/head:pull/17190 PR: https://git.openjdk.org/jdk/pull/17190 From ddong at openjdk.org Tue Jan 9 05:26:46 2024 From: ddong at openjdk.org (Denghui Dong) Date: Tue, 9 Jan 2024 05:26:46 GMT Subject: RFR: 8322735: C2: minor improvements of bubble sort used in SuperWord::packset_sort [v2] In-Reply-To: References: <1W_HcoiwEnXLvtqprc7L8mghr0BRH3a_ITn6Cerzb_c=.f6c90e24-7d41-418d-bc50-41bff396663b@github.com> Message-ID: On Mon, 8 Jan 2024 20:59:53 GMT, Vladimir Kozlov wrote: >> Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: >> >> update > > src/hotspot/share/opto/superword.cpp line 3526: > >> 3524: // only swap when we find something to swap >> 3525: if (alignment(q_low->at(0)) > alignment(q_i->at(0))) { >> 3526: Node_List* t = q_i; > > Why you need this local `t`? Good catch. Deleted. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17190#discussion_r1445638467 From ddong at openjdk.org Tue Jan 9 05:30:22 2024 From: ddong at openjdk.org (Denghui Dong) Date: Tue, 9 Jan 2024 05:30:22 GMT Subject: RFR: 8322735: C2: minor improvements of bubble sort used in SuperWord::packset_sort [v2] In-Reply-To: References: <1W_HcoiwEnXLvtqprc7L8mghr0BRH3a_ITn6Cerzb_c=.f6c90e24-7d41-418d-bc50-41bff396663b@github.com> Message-ID: <15K1TZYYnVPyFf2zZD2hlqQI7ddz-U-1Ued9JNBq5vM=.816a182b-8604-4e6c-94e1-2145fc60cdfb@github.com> On Mon, 8 Jan 2024 21:05:03 GMT, Vladimir Kozlov wrote: >> Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: >> >> update > > src/hotspot/share/opto/superword.cpp line 3529: > >> 3527: *(_packset.adr_at(i)) = q_low; >> 3528: *(_packset.adr_at(i-1)) = q_i; >> 3529: max_swap_index = i; > > So we not using `i+1` here because all previous values should be < than `i`'s > Right? Yes. The last `i`'s value is > previous values and values between `i` and end are already sorted. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17190#discussion_r1445640947 From duke at openjdk.org Tue Jan 9 05:54:39 2024 From: duke at openjdk.org (Zhiqiang Zang) Date: Tue, 9 Jan 2024 05:54:39 GMT Subject: RFR: 8322589: Add Ideal transformation: (~a) & (~b) => ~(a | b) [v10] In-Reply-To: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> References: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> Message-ID: <_lWTVDYsWmINZsi0bPleMs3F3n-WPgHbLoTfpu8sHSg=.0dde4ccd-77e8-4e0e-80ad-cb233b858579@github.com> > Hello, > > `(~a) & (~b) => ~(a | b)` is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1616C28-L1616C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. > > Thanks. Zhiqiang Zang has updated the pull request incrementally with one additional commit since the last revision: move make_not back to AddNode because it cannot compile for architecture arm, s390x, ppc64le. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16333/files - new: https://git.openjdk.org/jdk/pull/16333/files/3665de2f..4ee8b089 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16333&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16333&range=08-09 Stats: 35 lines in 5 files changed: 15 ins; 16 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/16333.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16333/head:pull/16333 PR: https://git.openjdk.org/jdk/pull/16333 From duke at openjdk.org Tue Jan 9 06:02:24 2024 From: duke at openjdk.org (Zhiqiang Zang) Date: Tue, 9 Jan 2024 06:02:24 GMT Subject: RFR: 8322589: Add Ideal transformation: (~a) & (~b) => ~(a | b) [v5] In-Reply-To: References: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> Message-ID: On Mon, 8 Jan 2024 07:02:50 GMT, Tobias Hartmann wrote: >> Zhiqiang Zang has updated the pull request incrementally with one additional commit since the last revision: >> >> update the copyright dates. > > src/hotspot/share/opto/addnode.hpp line 84: > >> 82: // Utility function to check if the given node is a NOT operation, >> 83: // i.e., n == m ^ (-1). >> 84: static bool is_not(PhaseGVN* phase, Node* n, BasicType bt); > > Could these be made non-static? @TobiHartmann @eme64 I moved `is_not` but I was not able to move `make_not` to `node` class, because otherwise it would not compile for arm, s390x, ppc64le. /home/runner/work/jdk/jdk/src/hotspot/share/opto/node.cpp:1605:18: error: expected type-specifier before 'XorINode' 1605 | return new XorINode(this, phase->intcon(-1)); Please let me know if we still want to move `make_not`. Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16333#discussion_r1445656863 From duke at openjdk.org Tue Jan 9 06:06:51 2024 From: duke at openjdk.org (Zhiqiang Zang) Date: Tue, 9 Jan 2024 06:06:51 GMT Subject: RFR: 8322077: Add Ideal transformation: (~a) | (~b) => ~(a & b) [v11] In-Reply-To: References: Message-ID: > Hello, > > (~a) | (~b) => ~(a & b) is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1617C28-L1617C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. > > Thanks. Zhiqiang Zang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision: - adapt to new changes from the dependant pr. - Merge branch 'andnode-PNewDeMorganLawAndToOr' into ornode-PNewDeMorganLawOrToAnd - adapt changes from the dependent pr. - Merge branch 'andnode-PNewDeMorganLawAndToOr' into ornode-PNewDeMorganLawOrToAnd - Update test/hotspot/jtreg/compiler/c2/irTests/DeMorganLawLongTests.java Co-authored-by: Tobias Hartmann - Update test/hotspot/jtreg/compiler/c2/irTests/DeMorganLawIntTests.java Co-authored-by: Tobias Hartmann - Add tests for using De Morgan's Law for both optimizations. - remove unused code from tests. - update the copyright dates. - address comments. - ... and 4 more: https://git.openjdk.org/jdk/compare/851dbbb1...b21e242b ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16334/files - new: https://git.openjdk.org/jdk/pull/16334/files/0c8d1077..b21e242b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16334&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16334&range=09-10 Stats: 38 lines in 5 files changed: 15 ins; 16 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/16334.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16334/head:pull/16334 PR: https://git.openjdk.org/jdk/pull/16334 From jbhateja at openjdk.org Tue Jan 9 06:16:23 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 9 Jan 2024 06:16:23 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2] In-Reply-To: References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> <1GHGK7AGinCMKjFIB5oadUP0jiZrC39Z0hncAS3H-9Y=.eb617125-1f51-4cff-889e-b15321f5c72b@github.com> Message-ID: On Mon, 8 Jan 2024 07:55:00 GMT, Emanuel Peter wrote: >>> You are using `VectorMask pred = VectorMask.fromLong(ispecies, maskctr++);`. That basically systematically iterates over all masks, which is nice for a correctness test. But that would use different density inside one test run, right? The average over the loop is still at `50%`, correct? >>> >>> I was thinking more a run where the percentage over the whole loop is lower than maybe `1%`. That would get us to a point where maybe the branch prediction of non-vectorized code might be faster, what do you think? >> >> An imperative loop for compression will check each mask bit to select compressible lane. Therefore mask with low or high density of set bits should show similar performance. > > Yes, IF it is vectorized, then there is no difference between high and low density. My concern was more if vectorization is preferrable over the scalar alternative in the low-density case, where branch prediction is more stable. At runtime we do need to scan entire mask to pick the compressible lane corresponding to set mask bit. Thus the loop overhead of mask compare (BTW masks are held in a vector register for AVX2 targets) and jump will anyways be incurred , in addition for sparsely populated mask we may incur additional misprediction penalty for not taking if block which extracts an element from appropriate source vector lane and insert into destination vector lane. Overall vector solution will win for most common cases for varying mask and also for very sparsely populate masks. Here is the result of setting just a single mask bit. I am process of updating to benchmark for 128 bit species will update the patch. @Benchmark public void fuzzyFilterIntColumn() { int i = 0; int j = 0; long maskctr = 1; int endIndex = ispecies.loopBound(size); for (; i < endIndex; i += ispecies.length()) { IntVector vec = IntVector.fromArray(ispecies, intinCol, i); VectorMask pred = VectorMask.fromLong(ispecies, 1); vec.compress(pred).intoArray(intoutCol, j); j += pred.trueCount(); } } Baseline: Benchmark (size) Mode Cnt Score Error Units ColumnFilterBenchmark.fuzzyFilterIntColumn 1024 thrpt 2 379.059 ops/ms ColumnFilterBenchmark.fuzzyFilterIntColumn 2047 thrpt 2 188.355 ops/ms ColumnFilterBenchmark.fuzzyFilterIntColumn 4096 thrpt 2 95.315 ops/ms Withopt: Benchmark (size) Mode Cnt Score Error Units ColumnFilterBenchmark.fuzzyFilterIntColumn 1024 thrpt 2 7390.074 ops/ms ColumnFilterBenchmark.fuzzyFilterIntColumn 2047 thrpt 2 3483.247 ops/ms ColumnFilterBenchmark.fuzzyFilterIntColumn 4096 thrpt 2 1823.817 ops/ms ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1445666305 From kuaiwei.kw at alibaba-inc.com Tue Jan 9 06:23:59 2024 From: kuaiwei.kw at alibaba-inc.com (Kuai Wei) Date: Tue, 09 Jan 2024 14:23:59 +0800 Subject: =?UTF-8?B?ZGlzY3VzcyBhYm91dCByZWxlYXNlIGJhcnJpZXIgZm9yIGZpbmFsIGZpZWxkcyBpbml0aWFs?= =?UTF-8?B?aXphdGlvbg==?= Message-ID: <5e41f867-b31b-4ded-b737-8d0c869c8895.kuaiwei.kw@alibaba-inc.com> Hi, I made some experiments on object allocation performance. And I found on aarch64 N1, if object has final field, the allocation rate is about 75% of normal allocation. The cause is C2 will insert a release membar in , which will be translated as "dmb.ish" in aarch64. For normal allocation, a membar storestore is inserted and is emitted as "dmb.ishst", it make the difference. The test jmh is https://gist.github.com/kuaiwei/f71fba40df29991c93325a8600e34c13 java -jar target/benchmarks.jar -f 1 -wi 5 -w 3 -i 3 -r 3 testAlloc ... Benchmark Mode Cnt Score Error Units AllocFinal.testAlloc thrpt 3 1167.903 ? 44.973 ops/s AllocFinal.testAllocWithFinal thrpt 3 915.330 ? 52.596 ops/s I found only C2 will insert release membar and C1 just insert storestore for both final and normal allocation. In Doug Lea's cookbook https://gee.cs.oswego.edu/dl/jmm/cookbook.html Only storesotre is required. Alex has a great post on this topic https://shipilev.net/blog/2014/all-fields-are-final/ . It referred a case why loadstore is needed. https://www.hboehm.info/c++mm/no_write_fences.html I checked this case and IMO it looks some legacy architecture may break data dependency and cause problem. As I know, alpha architecture is an example. I think it doesn't break on modern architecture. Is there other case I missed? If storestore is enough in this situation, I will send a PR to loose the barrier. Thanks, Kuai Wei -------------- next part -------------- An HTML attachment was scrubbed... URL: From kuaiwei.kw at alibaba-inc.com Tue Jan 9 06:23:59 2024 From: kuaiwei.kw at alibaba-inc.com (Kuai Wei) Date: Tue, 09 Jan 2024 14:23:59 +0800 Subject: =?UTF-8?B?ZGlzY3VzcyBhYm91dCByZWxlYXNlIGJhcnJpZXIgZm9yIGZpbmFsIGZpZWxkcyBpbml0aWFs?= =?UTF-8?B?aXphdGlvbg==?= Message-ID: <5e41f867-b31b-4ded-b737-8d0c869c8895.kuaiwei.kw@alibaba-inc.com> Hi, I made some experiments on object allocation performance. And I found on aarch64 N1, if object has final field, the allocation rate is about 75% of normal allocation. The cause is C2 will insert a release membar in , which will be translated as "dmb.ish" in aarch64. For normal allocation, a membar storestore is inserted and is emitted as "dmb.ishst", it make the difference. The test jmh is https://gist.github.com/kuaiwei/f71fba40df29991c93325a8600e34c13 java -jar target/benchmarks.jar -f 1 -wi 5 -w 3 -i 3 -r 3 testAlloc ... Benchmark Mode Cnt Score Error Units AllocFinal.testAlloc thrpt 3 1167.903 ? 44.973 ops/s AllocFinal.testAllocWithFinal thrpt 3 915.330 ? 52.596 ops/s I found only C2 will insert release membar and C1 just insert storestore for both final and normal allocation. In Doug Lea's cookbook https://gee.cs.oswego.edu/dl/jmm/cookbook.html Only storesotre is required. Alex has a great post on this topic https://shipilev.net/blog/2014/all-fields-are-final/ . It referred a case why loadstore is needed. https://www.hboehm.info/c++mm/no_write_fences.html I checked this case and IMO it looks some legacy architecture may break data dependency and cause problem. As I know, alpha architecture is an example. I think it doesn't break on modern architecture. Is there other case I missed? If storestore is enough in this situation, I will send a PR to loose the barrier. Thanks, Kuai Wei -------------- next part -------------- An HTML attachment was scrubbed... URL: From jbhateja at openjdk.org Tue Jan 9 07:42:20 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 9 Jan 2024 07:42:20 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v3] In-Reply-To: <6ipaD7eRW4J37zaeFEKVf2LUVE3C0LmZmoAeePCG2PE=.7bb8ff9a-638e-4e7f-bea2-a40a424004f0@github.com> References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> <6ipaD7eRW4J37zaeFEKVf2LUVE3C0LmZmoAeePCG2PE=.7bb8ff9a-638e-4e7f-bea2-a40a424004f0@github.com> Message-ID: On Mon, 8 Jan 2024 10:20:33 GMT, Quan Anh Mai wrote: >>> Thanks for the updates! >>> >>> One more idea: Your AVX2 solution has a lot of cost for converting the mask to a permutation. Might it make sense to split this off into a separate vector-node, so that it can float out of a loop if the mask is invariant? >> >> CompressV / ExpandV only accepts two inputs, vector to be operated on and mask under which operation is performed, permute table based implementation is specific to x86 backend implementation. > > @jatin-bhateja I think you can expand them in the matcher into several `MachNode`s that will get scheduled separately. > Exactly, like @merykitty suggests: you can do a platform-dependent expansion. Hi @merykitty , @eme64 , in principle platform specific lowering is a good idea where ever useful, our main concern here is to identify a loop invariant constant mask in matcher patterns and save the cost of re-loading from a permute table index. Existing loop invariant analysis moves invariant masks out of loop and GCM should be able to move expanded load from permute table out of loop. But this looks very restrictive and will mainly be useful for constant one hot bit mask pattern. A constant mask may have more than one set bits and in such a case we will need to generate multiple loads from permute tables and handle multiple expansion scenarios. I think we can defer that complexity for that time being. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17261#issuecomment-1882549544 From roland at openjdk.org Tue Jan 9 07:46:22 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 9 Jan 2024 07:46:22 GMT Subject: RFR: 8320237 C2: late inlining of method handle invoke causes duplicate lines in PrintInlining output In-Reply-To: References: Message-ID: <213kgE2Qkgv1LsELuvCGboaJ6IobOND34Hl5842a3dU=.b5561082-324a-4c96-995f-6dd43b7b3d97@github.com> On Mon, 18 Dec 2023 15:28:47 GMT, Kangcheng Xu wrote: > This PR resolves [JDK-8320237](https://bugs.openjdk.org/browse/JDK-8320237) > > The original behavior produces both a failure and success message upon late inlining which is confusing. The patch removes the failure message if inlining was successful. Huge thanks to @rwestrel for reporting and working out a solution. > > Unit test `test/hotspot/jtreg/compiler/inlining/TestDuplicatedLateInliningOutput.java` is added and passing. Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17147#pullrequestreview-1810562097 From epeter at openjdk.org Tue Jan 9 08:08:42 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 9 Jan 2024 08:08:42 GMT Subject: RFR: 8315361: C2 SuperWord: refactor out loop analysis into shared auto-vectorization facility VLoopAnalyzer [v2] In-Reply-To: References: Message-ID: > This is a refactoring of `SuperWord`. > I intend to push it for JDK23, after [this bug fix](https://github.com/openjdk/jdk/pull/14785). > > **Goals** > > 1. Clean up `SuperWord`: disentangle different components, make them more **modular**. > 2. Make the loop analysis parts a **shared facility**, not just for SuperWord but also the post-loop-vectorizer ([JDK-8308994](https://bugs.openjdk.org/browse/JDK-8308994)). > 3. It is also a necessary step on my bigger plans for improvement with the C2 Auto-Vectorizer ([see my blog post](https://eme64.github.io/blog/2023/11/03/C2-AutoVectorizer-Improvement-Ideas.html)). > 4. Improve tracing in the auto-vectorization by making it more systematic. > > **Summary** > > - I wrote a summary of how C2 auto-vectorization with SuperWord works (please read!): > https://github.com/openjdk/jdk/blob/95fd361e60fc66eb91edad321662e508b2d1bdde/src/hotspot/share/opto/superword.hpp#L32-L177 > - I moved many `Superword` components out to `VLoop` and its subclass `VLoopAnalyzer`. The idea is that any vectorizer can use these facilities in the future. They are therefore made more modular, which should hopefully make future changes easier. These components are: > - Checking the pre-conditions for vectorization (e.g. no unwanted ctrl-flow). > - `VLoop::check_preconditions_helper` replaces code from old `SuperWord::transform_loop`. > - Running all submodules of `VLoopAnalyzer`: `VLoopAnalyzer::analyze_helper`. Replaces analysis part of `SuperWord::SLP_extract`. > - Finding and marking reductions -> `VLoopReductions` > - Detecting memory slices -> `VLoopMemorySlices` > - Analyzing the body -> `VLoopBody` (renamed `in_bb` -> `in_body`) > - Determining vector element types, and functions to determine the `vector_width` of a node -> `VLoopTypes` > - Constructing the dependence graph -> `VLoopDependenceGraph`. Replaces old `DepGraph` with all its components. > - New: CompileCommand option `TraceAutovectorization` > - Run with `-XX:CompileCommand=traceAutovectorization,*::*,help` to get a usage description. > - Replaced all printing with flags `TraceSuperWord` (and `Verbose`) and of `VectorizeDebug`. > - The advantage of a CompileCommand is that tracing can be applied selectively for only a limited set of java classes / methods. > - It uses tags, which are more readable than the `VectorizeDebug` bit-flags. These tags can be used for all parts of the vectorizer, but one can also target SuperWord specifically. > - I systematically added tracing at every point where vector... Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 83 commits: - manual merge - fix PRODUCT / DEBUG_ONLY guards - manual merge - fix whitespace issue - added CompileCommand TraceAutoVectorization Usage - add comments to trace flags - trace flag subtraction implemented - replace SuperWord with trace flags - refactor tracing for alignment - SuperWord algo summary - ... and 73 more: https://git.openjdk.org/jdk/compare/827c71da...e876d845 ------------- Changes: https://git.openjdk.org/jdk/pull/16620/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16620&range=01 Stats: 3809 lines in 29 files changed: 1999 ins; 1307 del; 503 mod Patch: https://git.openjdk.org/jdk/pull/16620.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16620/head:pull/16620 PR: https://git.openjdk.org/jdk/pull/16620 From chagedorn at openjdk.org Tue Jan 9 08:36:35 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 9 Jan 2024 08:36:35 GMT Subject: RFR: 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 [v11] In-Reply-To: References: Message-ID: On Mon, 8 Jan 2024 15:01:12 GMT, Roland Westrelin wrote: >> Range check smearing and range check predication make an array access >> dependent on 2 (or more in the case of RC smearing) conditions. As a >> consequence, if a range check can be eliminated because there's an >> identical dominating range check, the control dependent nodes that >> could float and become dependent on the dominating range check cannot >> be allowed to float because there's a risk that they would then bypass >> one of the checks that make the access legal. >> >> `IfNode::dominated_by()` and `PhaseIdealLoop::dominated_by()` have >> logic to prevent this: nodes that are control dependent on a range >> check or predicate are not allowed to float. This is however not >> sufficient as demonstrated by the test cases. >> >> In `TestArrayAccessAboveRCAfterSmearingOrPredication.testRangeCheckSmearing()`: >> >> >> v += array[i]; >> if (flag2) { >> if (flag3) { >> field = 0x42; >> } >> } >> if (flagField == 1) { >> v += array[i]; >> } >> >> >> The range check for the second `array[i]` load is replaced by the >> dominating range check for the first `array[i]` but because the second >> `array[i]` load could really be dependent on multiple range checks (in >> case smearing happened which is not the case here), c2 doesn't allow >> the second `array[i]` to float when the second range check is >> removed. The second `array[i]` is then control dependent on: >> >> >> if (flagField == 1) { >> >> >> which is next found to be dominated by the same test: >> >> >> if (flag == 1) { >> >> >> and is removed. However nothing in `dominated_by()` treats node >> dependent on tests that are not range check or predicates >> specially. So the second `array[i]` is allowed to float and become >> dependent on: >> >> >> if (flag == 1) { >> >> >> which is above the range check for that access. The test method in its >> last invocation is passed an index for the array access that's widely >> out of range. The array load happens before the range check and >> crashes the VM. `testLoopPredication()` is a similar test where array >> loads become dependent on predicates and end up above range checks. >> >> `TestArrayAccessCastIIAboveRC.java` is the test case from the bug >> where for similar reasons a range check `CastII` ends up above its >> range check, becomes top because its input becomes some integer that >> conflicts with its... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > whitespaces Apart from some minor comment improvement suggestions, the new comments and renaming look good. src/hotspot/share/opto/ifnode.cpp line 569: > 567: igvn->rehash_node_delayed(iff); > 568: iff->set_req_X(1, new_bol, igvn); > 569: // As part of range check smearing, this range check is widen. Loads and range check Cast nodes that are control Suggestion: // As part of range check smearing, this range check is widened. Loads and range check Cast nodes that are control src/hotspot/share/opto/loopPredicate.cpp line 1300: > 1298: // Eliminate the old If in the loop body > 1299: // If a range check is eliminated, data dependent nodes (Load and range check CastII nodes) are now dependent on 2 > 1300: // range check predicates (one for the start of the loop, one for the end) but we can only keep track of one control To follow the naming conventions added by the changes around JDK-8288981: Suggestion: // Hoisted Check Predicates (one for the start of the loop, one for the end) but we can only keep track of one control src/hotspot/share/opto/loopopts.cpp line 356: > 354: _igvn.replace_input_of(cd, 0, prevdom); > 355: if (pin_array_nodes) { > 356: // Because of range check predication, Loads and range check Cast nodes that are control dependent on this range Loop Predication? Suggestion: // Because of Loop Predication, Loads and range check Cast nodes that are control dependent on this range src/hotspot/share/opto/loopopts.cpp line 357: > 355: if (pin_array_nodes) { > 356: // Because of range check predication, Loads and range check Cast nodes that are control dependent on this range > 357: // check (that is about to be removed) now depend on multiple dominating range check predicates. After the Suggestion: // check (that is about to be removed) now depend on multiple dominating Hoisted Check Predicates. After the src/hotspot/share/opto/node.hpp line 1140: > 1138: // Returns a clone of the current node that's pinned (if the current node is not) for nodes found in array accesses > 1139: // (Load and range check CastII nodes). > 1140: // This is used when an array access is made dependent on 2 or more range checks (range check smearing or predication). Suggestion: // This is used when an array access is made dependent on 2 or more range checks (range check smearing or Loop Predication). ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16886#pullrequestreview-1810631648 PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1445772859 PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1445770144 PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1445770896 PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1445771288 PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1445772291 From thartmann at openjdk.org Tue Jan 9 08:51:31 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 9 Jan 2024 08:51:31 GMT Subject: RFR: 8322077: Add Ideal transformation: (~a) | (~b) => ~(a & b) [v11] In-Reply-To: References: Message-ID: On Tue, 9 Jan 2024 06:06:51 GMT, Zhiqiang Zang wrote: >> Hello, >> >> (~a) | (~b) => ~(a & b) is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1617C28-L1617C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. >> >> Thanks. > > Zhiqiang Zang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision: > > - adapt to new changes from the dependant pr. > - Merge branch 'andnode-PNewDeMorganLawAndToOr' into ornode-PNewDeMorganLawOrToAnd > - adapt changes from the dependent pr. > - Merge branch 'andnode-PNewDeMorganLawAndToOr' into ornode-PNewDeMorganLawOrToAnd > - Update test/hotspot/jtreg/compiler/c2/irTests/DeMorganLawLongTests.java > > Co-authored-by: Tobias Hartmann > - Update test/hotspot/jtreg/compiler/c2/irTests/DeMorganLawIntTests.java > > Co-authored-by: Tobias Hartmann > - Add tests for using De Morgan's Law for both optimizations. > - remove unused code from tests. > - update the copyright dates. > - address comments. > - ... and 4 more: https://git.openjdk.org/jdk/compare/2acdb5e1...b21e242b Marked as reviewed by thartmann (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16334#pullrequestreview-1810664512 From epeter at openjdk.org Tue Jan 9 08:52:00 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 9 Jan 2024 08:52:00 GMT Subject: RFR: 8315361: C2 SuperWord: refactor out loop analysis into shared auto-vectorization facility VLoopAnalyzer [v3] In-Reply-To: References: Message-ID: > This is a refactoring of `SuperWord`. > I intend to push it for JDK23, after [this bug fix](https://github.com/openjdk/jdk/pull/14785). > > **Goals** > > 1. Clean up `SuperWord`: disentangle different components, make them more **modular**. > 2. Make the loop analysis parts a **shared facility**, not just for SuperWord but also the post-loop-vectorizer ([JDK-8308994](https://bugs.openjdk.org/browse/JDK-8308994)). > 3. It is also a necessary step on my bigger plans for improvement with the C2 Auto-Vectorizer ([see my blog post](https://eme64.github.io/blog/2023/11/03/C2-AutoVectorizer-Improvement-Ideas.html)). > 4. Improve tracing in the auto-vectorization by making it more systematic. > > **Summary** > > - I wrote a summary of how C2 auto-vectorization with SuperWord works (please read!): > https://github.com/openjdk/jdk/blob/95fd361e60fc66eb91edad321662e508b2d1bdde/src/hotspot/share/opto/superword.hpp#L32-L177 > - I moved many `Superword` components out to `VLoop` and its subclass `VLoopAnalyzer`. The idea is that any vectorizer can use these facilities in the future. They are therefore made more modular, which should hopefully make future changes easier. These components are: > - Checking the pre-conditions for vectorization (e.g. no unwanted ctrl-flow). > - `VLoop::check_preconditions_helper` replaces code from old `SuperWord::transform_loop`. > - Running all submodules of `VLoopAnalyzer`: `VLoopAnalyzer::analyze_helper`. Replaces analysis part of `SuperWord::SLP_extract`. > - Finding and marking reductions -> `VLoopReductions` > - Detecting memory slices -> `VLoopMemorySlices` > - Analyzing the body -> `VLoopBody` (renamed `in_bb` -> `in_body`) > - Determining vector element types, and functions to determine the `vector_width` of a node -> `VLoopTypes` > - Constructing the dependence graph -> `VLoopDependenceGraph`. Replaces old `DepGraph` with all its components. > - New: CompileCommand option `TraceAutovectorization` > - Run with `-XX:CompileCommand=traceAutovectorization,*::*,help` to get a usage description. > - Replaced all printing with flags `TraceSuperWord` (and `Verbose`) and of `VectorizeDebug`. > - The advantage of a CompileCommand is that tracing can be applied selectively for only a limited set of java classes / methods. > - It uses tags, which are more readable than the `VectorizeDebug` bit-flags. These tags can be used for all parts of the vectorizer, but one can also target SuperWord specifically. > - I systematically added tracing at every point where vector... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: error state for align vector ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16620/files - new: https://git.openjdk.org/jdk/pull/16620/files/e876d845..0831bb59 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16620&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16620&range=01-02 Stats: 25 lines in 2 files changed: 19 ins; 1 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/16620.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16620/head:pull/16620 PR: https://git.openjdk.org/jdk/pull/16620 From thartmann at openjdk.org Tue Jan 9 09:08:26 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 9 Jan 2024 09:08:26 GMT Subject: RFR: 8322589: Add Ideal transformation: (~a) & (~b) => ~(a | b) [v5] In-Reply-To: References: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> Message-ID: On Tue, 9 Jan 2024 05:59:26 GMT, Zhiqiang Zang wrote: >> src/hotspot/share/opto/addnode.hpp line 84: >> >>> 82: // Utility function to check if the given node is a NOT operation, >>> 83: // i.e., n == m ^ (-1). >>> 84: static bool is_not(PhaseGVN* phase, Node* n, BasicType bt); >> >> Could these be made non-static? > > @TobiHartmann @eme64 > I moved `is_not` but I was not able to move `make_not` to `node` class, because otherwise it would not compile for arm, s390x, ppc64le. > > /home/runner/work/jdk/jdk/src/hotspot/share/opto/node.cpp:1605:18: error: expected type-specifier before 'XorINode' > 1605 | return new XorINode(this, phase->intcon(-1)); > > I do not see any similar use cases to `new XorINode` in `nocde.cpp`, so I was hesitant to include new header files for the file. > Please let me know if we still want to move `make_not`. Thanks. I would say it's better to leave both methods as static methods then, for consistency. Thanks for giving it a try! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16333#discussion_r1445810287 From thartmann at openjdk.org Tue Jan 9 09:12:23 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 9 Jan 2024 09:12:23 GMT Subject: RFR: JDK-8277869: Maven POMs are using HTTP links where HTTPS is available In-Reply-To: References: Message-ID: On Mon, 8 Jan 2024 10:29:38 GMT, Tobias Holenstein wrote: > Replace `http` with `https` in xml files of IdealGraphVisualizer and LogCompilation. > Moreover, replace the old URL (./maven-v4_0_0.xsd) with of the suggested one (./maven-4.0.0.xsd) for `xsi:schemaLocation` in pom.xml. > > Tested: IdealGraphVisualizer and LogCompilation build and run as expected. Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17302#pullrequestreview-1810703150 From thartmann at openjdk.org Tue Jan 9 09:18:23 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 9 Jan 2024 09:18:23 GMT Subject: RFR: 8322858: compiler/c2/aarch64/TestFarJump.java fails on AArch64 due to unexpected PrintAssembly output In-Reply-To: <2CaDCuUHVZ7kwRvW-aKzLtsSM0yJ5crh-DZ5QbHevRU=.28845f90-4c77-4875-b88d-8d5af411ea17@github.com> References: <2CaDCuUHVZ7kwRvW-aKzLtsSM0yJ5crh-DZ5QbHevRU=.28845f90-4c77-4875-b88d-8d5af411ea17@github.com> Message-ID: On Fri, 5 Jan 2024 11:32:16 GMT, Boris Ulasevich wrote: > Test checks for the ADRP instruction in the [Exception Handler] section of the TestFarJump::main PrintAssembly output. > > The -XX:CompileOnly=TestFarJump::main option is not sufficient to restrict PrintAssembly output to a specific method. A number of method assemblies precede the output of the TestFarJump::main method: > - java.lang.invoke.MethodHandle::linkToStatic(LLLLLLL)L > - java.lang.invoke.MethodHandle::linkToStatic(LLLL)V > - java.lang.invoke.MethodHandle::invokeBasic(LLLLLL)L > - java.lang.invoke.MethodHandle::linkToSpecial(LLLLLLLL)L > - java.lang.invoke.MethodHandle::linkToSpecial(LLLLL)V > - java.lang.invoke.MethodHandle::invokeBasic(LLL)L > - java.lang.invoke.MethodHandle::linkToSpecial(LLLLL)L > - java.lang.invoke.MethodHandle::linkToStatic(LL)L > - java.lang.invoke.MethodHandle::linkToSpecial(LL)V > - java.lang.invoke.MethodHandle::invokeBasic()L > - java.lang.invoke.MethodHandle::linkToSpecial(LL)L > - java.lang.invoke.MethodHandle::linkToStatic(LLL)L > - java.lang.invoke.MethodHandle::linkToStatic(LLLL)L > - java.lang.invoke.MethodHandle::linkToSpecial(LLL)V > - java.lang.invoke.MethodHandle::invokeBasic(L)L > - java.lang.invoke.MethodHandle::linkToSpecial(LLL)L > - java.lang.invoke.MethodHandle::linkToStatic(LLL)V > - java.lang.invoke.MethodHandle::linkToStatic(LL)I > - jdk.internal.vm.Continuation::enterSpecial > - compiler.c2.aarch64.TestFarJump::main > > With this change, I use the -XX:CompileCommand=option,TestFarJump::main,bool,PrintAssembly,true option to restrict the PrintAssembly output to a specific method. Now the only [Exception Handler] in the listing is the section of TestFarJump::main method. Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17278#pullrequestreview-1810713559 From roland at openjdk.org Tue Jan 9 09:28:01 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 9 Jan 2024 09:28:01 GMT Subject: RFR: 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 [v12] In-Reply-To: References: Message-ID: <7WZcrJtRFhNaGiw4c_ov_XW-dWcJ5-GdRy_8Vh2ikWA=.0cdfcad8-eb9b-43ee-b160-ad96f698a0b6@github.com> > Range check smearing and range check predication make an array access > dependent on 2 (or more in the case of RC smearing) conditions. As a > consequence, if a range check can be eliminated because there's an > identical dominating range check, the control dependent nodes that > could float and become dependent on the dominating range check cannot > be allowed to float because there's a risk that they would then bypass > one of the checks that make the access legal. > > `IfNode::dominated_by()` and `PhaseIdealLoop::dominated_by()` have > logic to prevent this: nodes that are control dependent on a range > check or predicate are not allowed to float. This is however not > sufficient as demonstrated by the test cases. > > In `TestArrayAccessAboveRCAfterSmearingOrPredication.testRangeCheckSmearing()`: > > > v += array[i]; > if (flag2) { > if (flag3) { > field = 0x42; > } > } > if (flagField == 1) { > v += array[i]; > } > > > The range check for the second `array[i]` load is replaced by the > dominating range check for the first `array[i]` but because the second > `array[i]` load could really be dependent on multiple range checks (in > case smearing happened which is not the case here), c2 doesn't allow > the second `array[i]` to float when the second range check is > removed. The second `array[i]` is then control dependent on: > > > if (flagField == 1) { > > > which is next found to be dominated by the same test: > > > if (flag == 1) { > > > and is removed. However nothing in `dominated_by()` treats node > dependent on tests that are not range check or predicates > specially. So the second `array[i]` is allowed to float and become > dependent on: > > > if (flag == 1) { > > > which is above the range check for that access. The test method in its > last invocation is passed an index for the array access that's widely > out of range. The array load happens before the range check and > crashes the VM. `testLoopPredication()` is a similar test where array > loads become dependent on predicates and end up above range checks. > > `TestArrayAccessCastIIAboveRC.java` is the test case from the bug > where for similar reasons a range check `CastII` ends up above its > range check, becomes top because its input becomes some integer that > conflicts with its type (but there's no condition to catch it). The > graph becomes broken and c2 crashes. > > Logic in the `dominated_by()` methods ... Roland Westrelin has updated the pull request incrementally with five additional commits since the last revision: - Update src/hotspot/share/opto/ifnode.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/node.hpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/loopopts.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/loopopts.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/loopPredicate.cpp Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16886/files - new: https://git.openjdk.org/jdk/pull/16886/files/51231631..04a9d3a5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16886&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16886&range=10-11 Stats: 5 lines in 4 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/16886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16886/head:pull/16886 PR: https://git.openjdk.org/jdk/pull/16886 From chagedorn at openjdk.org Tue Jan 9 09:48:31 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 9 Jan 2024 09:48:31 GMT Subject: RFR: 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 [v12] In-Reply-To: <7WZcrJtRFhNaGiw4c_ov_XW-dWcJ5-GdRy_8Vh2ikWA=.0cdfcad8-eb9b-43ee-b160-ad96f698a0b6@github.com> References: <7WZcrJtRFhNaGiw4c_ov_XW-dWcJ5-GdRy_8Vh2ikWA=.0cdfcad8-eb9b-43ee-b160-ad96f698a0b6@github.com> Message-ID: <3tBQwAHsTPlkltypH11S3rj0Ptxaa1kZHTWFZMyWJYY=.19ccf21d-9029-4f77-928f-c9ef823e6b92@github.com> On Tue, 9 Jan 2024 09:28:01 GMT, Roland Westrelin wrote: >> Range check smearing and range check predication make an array access >> dependent on 2 (or more in the case of RC smearing) conditions. As a >> consequence, if a range check can be eliminated because there's an >> identical dominating range check, the control dependent nodes that >> could float and become dependent on the dominating range check cannot >> be allowed to float because there's a risk that they would then bypass >> one of the checks that make the access legal. >> >> `IfNode::dominated_by()` and `PhaseIdealLoop::dominated_by()` have >> logic to prevent this: nodes that are control dependent on a range >> check or predicate are not allowed to float. This is however not >> sufficient as demonstrated by the test cases. >> >> In `TestArrayAccessAboveRCAfterSmearingOrPredication.testRangeCheckSmearing()`: >> >> >> v += array[i]; >> if (flag2) { >> if (flag3) { >> field = 0x42; >> } >> } >> if (flagField == 1) { >> v += array[i]; >> } >> >> >> The range check for the second `array[i]` load is replaced by the >> dominating range check for the first `array[i]` but because the second >> `array[i]` load could really be dependent on multiple range checks (in >> case smearing happened which is not the case here), c2 doesn't allow >> the second `array[i]` to float when the second range check is >> removed. The second `array[i]` is then control dependent on: >> >> >> if (flagField == 1) { >> >> >> which is next found to be dominated by the same test: >> >> >> if (flag == 1) { >> >> >> and is removed. However nothing in `dominated_by()` treats node >> dependent on tests that are not range check or predicates >> specially. So the second `array[i]` is allowed to float and become >> dependent on: >> >> >> if (flag == 1) { >> >> >> which is above the range check for that access. The test method in its >> last invocation is passed an index for the array access that's widely >> out of range. The array load happens before the range check and >> crashes the VM. `testLoopPredication()` is a similar test where array >> loads become dependent on predicates and end up above range checks. >> >> `TestArrayAccessCastIIAboveRC.java` is the test case from the bug >> where for similar reasons a range check `CastII` ends up above its >> range check, becomes top because its input becomes some integer that >> conflicts with its... > > Roland Westrelin has updated the pull request incrementally with five additional commits since the last revision: > > - Update src/hotspot/share/opto/ifnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/node.hpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopopts.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopopts.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopPredicate.cpp > > Co-authored-by: Christian Hagedorn Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16886#pullrequestreview-1810774372 From epeter at openjdk.org Tue Jan 9 10:12:53 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 9 Jan 2024 10:12:53 GMT Subject: RFR: 8315361: C2 SuperWord: refactor out loop analysis into shared auto-vectorization facility VLoopAnalyzer [v4] In-Reply-To: References: Message-ID: > This is a refactoring of `SuperWord`. > I intend to push it for JDK23, after [this bug fix](https://github.com/openjdk/jdk/pull/14785). > > **Goals** > > 1. Clean up `SuperWord`: disentangle different components, make them more **modular**. > 2. Make the loop analysis parts a **shared facility**, not just for SuperWord but also the post-loop-vectorizer ([JDK-8308994](https://bugs.openjdk.org/browse/JDK-8308994)). > 3. It is also a necessary step on my bigger plans for improvement with the C2 Auto-Vectorizer ([see my blog post](https://eme64.github.io/blog/2023/11/03/C2-AutoVectorizer-Improvement-Ideas.html)). > 4. Improve tracing in the auto-vectorization by making it more systematic. > > **Summary** > > - I wrote a summary of how C2 auto-vectorization with SuperWord works (please read!): > https://github.com/openjdk/jdk/blob/95fd361e60fc66eb91edad321662e508b2d1bdde/src/hotspot/share/opto/superword.hpp#L32-L177 > - I moved many `Superword` components out to `VLoop` and its subclass `VLoopAnalyzer`. The idea is that any vectorizer can use these facilities in the future. They are therefore made more modular, which should hopefully make future changes easier. These components are: > - Checking the pre-conditions for vectorization (e.g. no unwanted ctrl-flow). > - `VLoop::check_preconditions_helper` replaces code from old `SuperWord::transform_loop`. > - Running all submodules of `VLoopAnalyzer`: `VLoopAnalyzer::analyze_helper`. Replaces analysis part of `SuperWord::SLP_extract`. > - Finding and marking reductions -> `VLoopReductions` > - Detecting memory slices -> `VLoopMemorySlices` > - Analyzing the body -> `VLoopBody` (renamed `in_bb` -> `in_body`) > - Determining vector element types, and functions to determine the `vector_width` of a node -> `VLoopTypes` > - Constructing the dependence graph -> `VLoopDependenceGraph`. Replaces old `DepGraph` with all its components. > - New: CompileCommand option `TraceAutovectorization` > - Run with `-XX:CompileCommand=traceAutovectorization,*::*,help` to get a usage description. > - Replaced all printing with flags `TraceSuperWord` (and `Verbose`) and of `VectorizeDebug`. > - The advantage of a CompileCommand is that tracing can be applied selectively for only a limited set of java classes / methods. > - It uses tags, which are more readable than the `VectorizeDebug` bit-flags. These tags can be used for all parts of the vectorizer, but one can also target SuperWord specifically. > - I systematically added tracing at every point where vector... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: move superword tracing ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16620/files - new: https://git.openjdk.org/jdk/pull/16620/files/0831bb59..99b577bd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16620&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16620&range=02-03 Stats: 94 lines in 4 files changed: 23 ins; 32 del; 39 mod Patch: https://git.openjdk.org/jdk/pull/16620.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16620/head:pull/16620 PR: https://git.openjdk.org/jdk/pull/16620 From bulasevich at openjdk.org Tue Jan 9 10:36:30 2024 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Tue, 9 Jan 2024 10:36:30 GMT Subject: RFR: 8322858: compiler/c2/aarch64/TestFarJump.java fails on AArch64 due to unexpected PrintAssembly output In-Reply-To: <2CaDCuUHVZ7kwRvW-aKzLtsSM0yJ5crh-DZ5QbHevRU=.28845f90-4c77-4875-b88d-8d5af411ea17@github.com> References: <2CaDCuUHVZ7kwRvW-aKzLtsSM0yJ5crh-DZ5QbHevRU=.28845f90-4c77-4875-b88d-8d5af411ea17@github.com> Message-ID: On Fri, 5 Jan 2024 11:32:16 GMT, Boris Ulasevich wrote: > Test checks for the ADRP instruction in the [Exception Handler] section of the TestFarJump::main PrintAssembly output. > > The -XX:CompileOnly=TestFarJump::main option is not sufficient to restrict PrintAssembly output to a specific method. A number of method assemblies precede the output of the TestFarJump::main method: > - java.lang.invoke.MethodHandle::linkToStatic(LLLLLLL)L > - java.lang.invoke.MethodHandle::linkToStatic(LLLL)V > - java.lang.invoke.MethodHandle::invokeBasic(LLLLLL)L > - java.lang.invoke.MethodHandle::linkToSpecial(LLLLLLLL)L > - java.lang.invoke.MethodHandle::linkToSpecial(LLLLL)V > - java.lang.invoke.MethodHandle::invokeBasic(LLL)L > - java.lang.invoke.MethodHandle::linkToSpecial(LLLLL)L > - java.lang.invoke.MethodHandle::linkToStatic(LL)L > - java.lang.invoke.MethodHandle::linkToSpecial(LL)V > - java.lang.invoke.MethodHandle::invokeBasic()L > - java.lang.invoke.MethodHandle::linkToSpecial(LL)L > - java.lang.invoke.MethodHandle::linkToStatic(LLL)L > - java.lang.invoke.MethodHandle::linkToStatic(LLLL)L > - java.lang.invoke.MethodHandle::linkToSpecial(LLL)V > - java.lang.invoke.MethodHandle::invokeBasic(L)L > - java.lang.invoke.MethodHandle::linkToSpecial(LLL)L > - java.lang.invoke.MethodHandle::linkToStatic(LLL)V > - java.lang.invoke.MethodHandle::linkToStatic(LL)I > - jdk.internal.vm.Continuation::enterSpecial > - compiler.c2.aarch64.TestFarJump::main > > With this change, I use the -XX:CompileCommand=option,TestFarJump::main,bool,PrintAssembly,true option to restrict the PrintAssembly output to a specific method. Now the only [Exception Handler] in the listing is the section of TestFarJump::main method. Tobias and Andrew, thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/17278#issuecomment-1882814890 From bulasevich at openjdk.org Tue Jan 9 10:36:31 2024 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Tue, 9 Jan 2024 10:36:31 GMT Subject: Integrated: 8322858: compiler/c2/aarch64/TestFarJump.java fails on AArch64 due to unexpected PrintAssembly output In-Reply-To: <2CaDCuUHVZ7kwRvW-aKzLtsSM0yJ5crh-DZ5QbHevRU=.28845f90-4c77-4875-b88d-8d5af411ea17@github.com> References: <2CaDCuUHVZ7kwRvW-aKzLtsSM0yJ5crh-DZ5QbHevRU=.28845f90-4c77-4875-b88d-8d5af411ea17@github.com> Message-ID: <9QSoSVKI_YVlpOE47Akaz_gV9EeML20B-AlXu9CpGVY=.4bfab3ea-9719-4dce-b647-deb87b6ed107@github.com> On Fri, 5 Jan 2024 11:32:16 GMT, Boris Ulasevich wrote: > Test checks for the ADRP instruction in the [Exception Handler] section of the TestFarJump::main PrintAssembly output. > > The -XX:CompileOnly=TestFarJump::main option is not sufficient to restrict PrintAssembly output to a specific method. A number of method assemblies precede the output of the TestFarJump::main method: > - java.lang.invoke.MethodHandle::linkToStatic(LLLLLLL)L > - java.lang.invoke.MethodHandle::linkToStatic(LLLL)V > - java.lang.invoke.MethodHandle::invokeBasic(LLLLLL)L > - java.lang.invoke.MethodHandle::linkToSpecial(LLLLLLLL)L > - java.lang.invoke.MethodHandle::linkToSpecial(LLLLL)V > - java.lang.invoke.MethodHandle::invokeBasic(LLL)L > - java.lang.invoke.MethodHandle::linkToSpecial(LLLLL)L > - java.lang.invoke.MethodHandle::linkToStatic(LL)L > - java.lang.invoke.MethodHandle::linkToSpecial(LL)V > - java.lang.invoke.MethodHandle::invokeBasic()L > - java.lang.invoke.MethodHandle::linkToSpecial(LL)L > - java.lang.invoke.MethodHandle::linkToStatic(LLL)L > - java.lang.invoke.MethodHandle::linkToStatic(LLLL)L > - java.lang.invoke.MethodHandle::linkToSpecial(LLL)V > - java.lang.invoke.MethodHandle::invokeBasic(L)L > - java.lang.invoke.MethodHandle::linkToSpecial(LLL)L > - java.lang.invoke.MethodHandle::linkToStatic(LLL)V > - java.lang.invoke.MethodHandle::linkToStatic(LL)I > - jdk.internal.vm.Continuation::enterSpecial > - compiler.c2.aarch64.TestFarJump::main > > With this change, I use the -XX:CompileCommand=option,TestFarJump::main,bool,PrintAssembly,true option to restrict the PrintAssembly output to a specific method. Now the only [Exception Handler] in the listing is the section of TestFarJump::main method. This pull request has now been integrated. Changeset: 52a6c375 Author: Boris Ulasevich URL: https://git.openjdk.org/jdk/commit/52a6c37558fa970f595067bc1bb5bc2b710c3876 Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod 8322858: compiler/c2/aarch64/TestFarJump.java fails on AArch64 due to unexpected PrintAssembly output Reviewed-by: aph, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/17278 From epeter at openjdk.org Tue Jan 9 10:49:57 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 9 Jan 2024 10:49:57 GMT Subject: RFR: 8315361: C2 SuperWord: refactor out loop analysis into shared auto-vectorization facility VLoopAnalyzer [v5] In-Reply-To: References: Message-ID: > This is a refactoring of `SuperWord`. > > **Goals** > > 1. Clean up `SuperWord`: disentangle different components, make them more **modular**. > 2. Make the loop analysis parts a **shared facility**, not just for SuperWord but also the post-loop-vectorizer ([JDK-8308994](https://bugs.openjdk.org/browse/JDK-8308994)). > 3. It is also a necessary step on my bigger plans for improvement with the C2 Auto-Vectorizer ([see my blog post](https://eme64.github.io/blog/2023/11/03/C2-AutoVectorizer-Improvement-Ideas.html)). > 4. Improve tracing in the auto-vectorization by making it more systematic. > > **Summary** > > - I wrote a summary of how C2 auto-vectorization with SuperWord works (please read!): > https://github.com/openjdk/jdk/blob/95fd361e60fc66eb91edad321662e508b2d1bdde/src/hotspot/share/opto/superword.hpp#L32-L177 > - I moved many `Superword` components out to `VLoop` and its subclass `VLoopAnalyzer`. The idea is that any vectorizer can use these facilities in the future. They are therefore made more modular, which should hopefully make future changes easier. These components are: > - Checking the pre-conditions for vectorization (e.g. no unwanted ctrl-flow). > - `VLoop::check_preconditions_helper` replaces code from old `SuperWord::transform_loop`. > - Running all submodules of `VLoopAnalyzer`: `VLoopAnalyzer::analyze_helper`. Replaces analysis part of `SuperWord::SLP_extract`. > - Finding and marking reductions -> `VLoopReductions` > - Detecting memory slices -> `VLoopMemorySlices` > - Analyzing the body -> `VLoopBody` (renamed `in_bb` -> `in_body`) > - Determining vector element types, and functions to determine the `vector_width` of a node -> `VLoopTypes` > - Constructing the dependence graph -> `VLoopDependenceGraph`. Replaces old `DepGraph` with all its components. > - New: CompileCommand option `TraceAutovectorization` > - Run with `-XX:CompileCommand=traceAutovectorization,*::*,help` to get a usage description. > - Replaced all printing with flags `TraceSuperWord` (and `Verbose`) and of `VectorizeDebug`. > - The advantage of a CompileCommand is that tracing can be applied selectively for only a limited set of java classes / methods. > - It uses tags, which are more readable than the `VectorizeDebug` bit-flags. These tags can be used for all parts of the vectorizer, but one can also target SuperWord specifically. > - I systematically added tracing at every point where vectorization (partially) fails (use tag `SW_REJECTIONS`). > - `TraceSuperWord` still works, and perfo... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: fix debug / product guards for tracing, now consistently not_product ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16620/files - new: https://git.openjdk.org/jdk/pull/16620/files/99b577bd..28e0e4e0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16620&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16620&range=03-04 Stats: 19 lines in 3 files changed: 0 ins; 1 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/16620.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16620/head:pull/16620 PR: https://git.openjdk.org/jdk/pull/16620 From epeter at openjdk.org Tue Jan 9 10:57:40 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 9 Jan 2024 10:57:40 GMT Subject: RFR: 8315361: C2 SuperWord: refactor out loop analysis into shared auto-vectorization facility VLoopAnalyzer [v6] In-Reply-To: References: Message-ID: > This is a refactoring of `SuperWord`. > > **Goals** > > 1. Clean up `SuperWord`: disentangle different components, make them more **modular**. > 2. Make the loop analysis parts a **shared facility**, not just for SuperWord but also the post-loop-vectorizer ([JDK-8308994](https://bugs.openjdk.org/browse/JDK-8308994)). > 3. It is also a necessary step on my bigger plans for improvement with the C2 Auto-Vectorizer ([see my blog post](https://eme64.github.io/blog/2023/11/03/C2-AutoVectorizer-Improvement-Ideas.html)). > 4. Improve tracing in the auto-vectorization by making it more systematic. > > **Summary** > > - I wrote a summary of how C2 auto-vectorization with SuperWord works (please read!): > https://github.com/openjdk/jdk/blob/95fd361e60fc66eb91edad321662e508b2d1bdde/src/hotspot/share/opto/superword.hpp#L32-L177 > - I moved many `Superword` components out to `VLoop` and its subclass `VLoopAnalyzer`. The idea is that any vectorizer can use these facilities in the future. They are therefore made more modular, which should hopefully make future changes easier. These components are: > - Checking the pre-conditions for vectorization (e.g. no unwanted ctrl-flow). > - `VLoop::check_preconditions_helper` replaces code from old `SuperWord::transform_loop`. > - Running all submodules of `VLoopAnalyzer`: `VLoopAnalyzer::analyze_helper`. Replaces analysis part of `SuperWord::SLP_extract`. > - Finding and marking reductions -> `VLoopReductions` > - Detecting memory slices -> `VLoopMemorySlices` > - Analyzing the body -> `VLoopBody` (renamed `in_bb` -> `in_body`) > - Determining vector element types, and functions to determine the `vector_width` of a node -> `VLoopTypes` > - Constructing the dependence graph -> `VLoopDependenceGraph`. Replaces old `DepGraph` with all its components. > - New: CompileCommand option `TraceAutovectorization` > - Run with `-XX:CompileCommand=traceAutovectorization,*::*,help` to get a usage description. > - Replaced all printing with flags `TraceSuperWord` (and `Verbose`) and of `VectorizeDebug`. > - The advantage of a CompileCommand is that tracing can be applied selectively for only a limited set of java classes / methods. > - It uses tags, which are more readable than the `VectorizeDebug` bit-flags. These tags can be used for all parts of the vectorizer, but one can also target SuperWord specifically. > - I systematically added tracing at every point where vectorization (partially) fails (use tag `SW_REJECTIONS`). > - `TraceSuperWord` still works, and perfo... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: beautify bailout on failure state ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16620/files - new: https://git.openjdk.org/jdk/pull/16620/files/28e0e4e0..c9079656 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16620&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16620&range=04-05 Stats: 16 lines in 1 file changed: 6 ins; 5 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/16620.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16620/head:pull/16620 PR: https://git.openjdk.org/jdk/pull/16620 From epeter at openjdk.org Tue Jan 9 11:43:40 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 9 Jan 2024 11:43:40 GMT Subject: RFR: 8315361: C2 SuperWord: refactor out loop analysis into shared auto-vectorization facility VLoopAnalyzer [v7] In-Reply-To: References: Message-ID: <_ltybJEddrDgVxjogJgcbodsqPTEoeZfWTZsX1v4Jvg=.43b52e44-c1ec-42d3-b117-de9bc28432e8@github.com> > This is a refactoring of `SuperWord`. > > **Goals** > > 1. Clean up `SuperWord`: disentangle different components, make them more **modular**. > 2. Make the loop analysis parts a **shared facility**, not just for SuperWord but also the post-loop-vectorizer ([JDK-8308994](https://bugs.openjdk.org/browse/JDK-8308994)). > 3. It is also a necessary step on my bigger plans for improvement with the C2 Auto-Vectorizer ([see my blog post](https://eme64.github.io/blog/2023/11/03/C2-AutoVectorizer-Improvement-Ideas.html)). > 4. Improve tracing in the auto-vectorization by making it more systematic. > > **Summary** > > - I wrote a summary of how C2 auto-vectorization with SuperWord works (please read!): > https://github.com/openjdk/jdk/blob/95fd361e60fc66eb91edad321662e508b2d1bdde/src/hotspot/share/opto/superword.hpp#L32-L177 > - I moved many `Superword` components out to `VLoop` and its subclass `VLoopAnalyzer`. The idea is that any vectorizer can use these facilities in the future. They are therefore made more modular, which should hopefully make future changes easier. These components are: > - Checking the pre-conditions for vectorization (e.g. no unwanted ctrl-flow). > - `VLoop::check_preconditions_helper` replaces code from old `SuperWord::transform_loop`. > - Running all submodules of `VLoopAnalyzer`: `VLoopAnalyzer::analyze_helper`. Replaces analysis part of `SuperWord::SLP_extract`. > - Finding and marking reductions -> `VLoopReductions` > - Detecting memory slices -> `VLoopMemorySlices` > - Analyzing the body -> `VLoopBody` (renamed `in_bb` -> `in_body`) > - Determining vector element types, and functions to determine the `vector_width` of a node -> `VLoopTypes` > - Constructing the dependence graph -> `VLoopDependenceGraph`. Replaces old `DepGraph` with all its components. > - New: CompileCommand option `TraceAutovectorization` > - Run with `-XX:CompileCommand=traceAutovectorization,*::*,help` to get a usage description. > - Replaced all printing with flags `TraceSuperWord` (and `Verbose`) and of `VectorizeDebug`. > - The advantage of a CompileCommand is that tracing can be applied selectively for only a limited set of java classes / methods. > - It uses tags, which are more readable than the `VectorizeDebug` bit-flags. These tags can be used for all parts of the vectorizer, but one can also target SuperWord specifically. > - I systematically added tracing at every point where vectorization (partially) fails (use tag `SW_REJECTIONS`). > - `TraceSuperWord` still works, and perfo... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: product guard for TraceSuperWordLoopUnrollAnalysis tracing ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16620/files - new: https://git.openjdk.org/jdk/pull/16620/files/c9079656..1f5d4ef2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16620&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16620&range=05-06 Stats: 6 lines in 1 file changed: 6 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16620.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16620/head:pull/16620 PR: https://git.openjdk.org/jdk/pull/16620 From epeter at openjdk.org Tue Jan 9 11:47:29 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 9 Jan 2024 11:47:29 GMT Subject: RFR: 8315361: C2 SuperWord: refactor out loop analysis into shared auto-vectorization facility VLoopAnalyzer [v7] In-Reply-To: <_ltybJEddrDgVxjogJgcbodsqPTEoeZfWTZsX1v4Jvg=.43b52e44-c1ec-42d3-b117-de9bc28432e8@github.com> References: <_ltybJEddrDgVxjogJgcbodsqPTEoeZfWTZsX1v4Jvg=.43b52e44-c1ec-42d3-b117-de9bc28432e8@github.com> Message-ID: On Tue, 9 Jan 2024 11:43:40 GMT, Emanuel Peter wrote: >> This is a refactoring of `SuperWord`. >> >> **Goals** >> >> 1. Clean up `SuperWord`: disentangle different components, make them more **modular**. >> 2. Make the loop analysis parts a **shared facility**, not just for SuperWord but also the post-loop-vectorizer ([JDK-8308994](https://bugs.openjdk.org/browse/JDK-8308994)). >> 3. It is also a necessary step on my bigger plans for improvement with the C2 Auto-Vectorizer ([see my blog post](https://eme64.github.io/blog/2023/11/03/C2-AutoVectorizer-Improvement-Ideas.html)). >> 4. Improve tracing in the auto-vectorization by making it more systematic. >> >> **Summary** >> >> - I wrote a summary of how C2 auto-vectorization with SuperWord works (please read!): >> https://github.com/openjdk/jdk/blob/95fd361e60fc66eb91edad321662e508b2d1bdde/src/hotspot/share/opto/superword.hpp#L32-L177 >> - I moved many `Superword` components out to `VLoop` and its subclass `VLoopAnalyzer`. The idea is that any vectorizer can use these facilities in the future. They are therefore made more modular, which should hopefully make future changes easier. These components are: >> - Checking the pre-conditions for vectorization (e.g. no unwanted ctrl-flow). >> - `VLoop::check_preconditions_helper` replaces code from old `SuperWord::transform_loop`. >> - Running all submodules of `VLoopAnalyzer`: `VLoopAnalyzer::analyze_helper`. Replaces analysis part of `SuperWord::SLP_extract`. >> - Finding and marking reductions -> `VLoopReductions` >> - Detecting memory slices -> `VLoopMemorySlices` >> - Analyzing the body -> `VLoopBody` (renamed `in_bb` -> `in_body`) >> - Determining vector element types, and functions to determine the `vector_width` of a node -> `VLoopTypes` >> - Constructing the dependence graph -> `VLoopDependenceGraph`. Replaces old `DepGraph` with all its components. >> - New: CompileCommand option `TraceAutovectorization` >> - Run with `-XX:CompileCommand=traceAutovectorization,*::*,help` to get a usage description. >> - Replaced all printing with flags `TraceSuperWord` (and `Verbose`) and of `VectorizeDebug`. >> - The advantage of a CompileCommand is that tracing can be applied selectively for only a limited set of java classes / methods. >> - It uses tags, which are more readable than the `VectorizeDebug` bit-flags. These tags can be used for all parts of the vectorizer, but one can also target SuperWord specifically. >> - I systematically added tracing at every point where vectorization (partially) fails (use tag `SW_R... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > product guard for TraceSuperWordLoopUnrollAnalysis tracing @fg1417 @chhagedorn I merged in my other SuperWord change (AlignVector fix), and addressed the previous comments. Would you mind reviewing (again)? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16620#issuecomment-1882920281 From chagedorn at openjdk.org Tue Jan 9 13:44:25 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 9 Jan 2024 13:44:25 GMT Subject: RFR: 8315361: C2 SuperWord: refactor out loop analysis into shared auto-vectorization facility VLoopAnalyzer [v7] In-Reply-To: <_ltybJEddrDgVxjogJgcbodsqPTEoeZfWTZsX1v4Jvg=.43b52e44-c1ec-42d3-b117-de9bc28432e8@github.com> References: <_ltybJEddrDgVxjogJgcbodsqPTEoeZfWTZsX1v4Jvg=.43b52e44-c1ec-42d3-b117-de9bc28432e8@github.com> Message-ID: On Tue, 9 Jan 2024 11:43:40 GMT, Emanuel Peter wrote: >> This is a refactoring of `SuperWord`. >> >> **Goals** >> >> 1. Clean up `SuperWord`: disentangle different components, make them more **modular**. >> 2. Make the loop analysis parts a **shared facility**, not just for SuperWord but also the post-loop-vectorizer ([JDK-8308994](https://bugs.openjdk.org/browse/JDK-8308994)). >> 3. It is also a necessary step on my bigger plans for improvement with the C2 Auto-Vectorizer ([see my blog post](https://eme64.github.io/blog/2023/11/03/C2-AutoVectorizer-Improvement-Ideas.html)). >> 4. Improve tracing in the auto-vectorization by making it more systematic. >> >> **Summary** >> >> - I wrote a summary of how C2 auto-vectorization with SuperWord works (please read!): >> https://github.com/openjdk/jdk/blob/95fd361e60fc66eb91edad321662e508b2d1bdde/src/hotspot/share/opto/superword.hpp#L32-L177 >> - I moved many `Superword` components out to `VLoop` and its subclass `VLoopAnalyzer`. The idea is that any vectorizer can use these facilities in the future. They are therefore made more modular, which should hopefully make future changes easier. These components are: >> - Checking the pre-conditions for vectorization (e.g. no unwanted ctrl-flow). >> - `VLoop::check_preconditions_helper` replaces code from old `SuperWord::transform_loop`. >> - Running all submodules of `VLoopAnalyzer`: `VLoopAnalyzer::analyze_helper`. Replaces analysis part of `SuperWord::SLP_extract`. >> - Finding and marking reductions -> `VLoopReductions` >> - Detecting memory slices -> `VLoopMemorySlices` >> - Analyzing the body -> `VLoopBody` (renamed `in_bb` -> `in_body`) >> - Determining vector element types, and functions to determine the `vector_width` of a node -> `VLoopTypes` >> - Constructing the dependence graph -> `VLoopDependenceGraph`. Replaces old `DepGraph` with all its components. >> - New: CompileCommand option `TraceAutovectorization` >> - Run with `-XX:CompileCommand=traceAutovectorization,*::*,help` to get a usage description. >> - Replaced all printing with flags `TraceSuperWord` (and `Verbose`) and of `VectorizeDebug`. >> - The advantage of a CompileCommand is that tracing can be applied selectively for only a limited set of java classes / methods. >> - It uses tags, which are more readable than the `VectorizeDebug` bit-flags. These tags can be used for all parts of the vectorizer, but one can also target SuperWord specifically. >> - I systematically added tracing at every point where vectorization (partially) fails (use tag `SW_R... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > product guard for TraceSuperWordLoopUnrollAnalysis tracing Sure, I'll try to have a look later this week. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16620#issuecomment-1883071528 From epeter at openjdk.org Tue Jan 9 13:54:42 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 9 Jan 2024 13:54:42 GMT Subject: RFR: 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 [v12] In-Reply-To: <7WZcrJtRFhNaGiw4c_ov_XW-dWcJ5-GdRy_8Vh2ikWA=.0cdfcad8-eb9b-43ee-b160-ad96f698a0b6@github.com> References: <7WZcrJtRFhNaGiw4c_ov_XW-dWcJ5-GdRy_8Vh2ikWA=.0cdfcad8-eb9b-43ee-b160-ad96f698a0b6@github.com> Message-ID: <9VeaoeqFVJApx5G4UWDoZM8UqZgm4lKlDiRxxZoha5c=.62c6c314-ae6c-42da-9ebd-de9d200b39ce@github.com> On Tue, 9 Jan 2024 09:28:01 GMT, Roland Westrelin wrote: >> Range check smearing and range check predication make an array access >> dependent on 2 (or more in the case of RC smearing) conditions. As a >> consequence, if a range check can be eliminated because there's an >> identical dominating range check, the control dependent nodes that >> could float and become dependent on the dominating range check cannot >> be allowed to float because there's a risk that they would then bypass >> one of the checks that make the access legal. >> >> `IfNode::dominated_by()` and `PhaseIdealLoop::dominated_by()` have >> logic to prevent this: nodes that are control dependent on a range >> check or predicate are not allowed to float. This is however not >> sufficient as demonstrated by the test cases. >> >> In `TestArrayAccessAboveRCAfterSmearingOrPredication.testRangeCheckSmearing()`: >> >> >> v += array[i]; >> if (flag2) { >> if (flag3) { >> field = 0x42; >> } >> } >> if (flagField == 1) { >> v += array[i]; >> } >> >> >> The range check for the second `array[i]` load is replaced by the >> dominating range check for the first `array[i]` but because the second >> `array[i]` load could really be dependent on multiple range checks (in >> case smearing happened which is not the case here), c2 doesn't allow >> the second `array[i]` to float when the second range check is >> removed. The second `array[i]` is then control dependent on: >> >> >> if (flagField == 1) { >> >> >> which is next found to be dominated by the same test: >> >> >> if (flag == 1) { >> >> >> and is removed. However nothing in `dominated_by()` treats node >> dependent on tests that are not range check or predicates >> specially. So the second `array[i]` is allowed to float and become >> dependent on: >> >> >> if (flag == 1) { >> >> >> which is above the range check for that access. The test method in its >> last invocation is passed an index for the array access that's widely >> out of range. The array load happens before the range check and >> crashes the VM. `testLoopPredication()` is a similar test where array >> loads become dependent on predicates and end up above range checks. >> >> `TestArrayAccessCastIIAboveRC.java` is the test case from the bug >> where for similar reasons a range check `CastII` ends up above its >> range check, becomes top because its input becomes some integer that >> conflicts with its... > > Roland Westrelin has updated the pull request incrementally with five additional commits since the last revision: > > - Update src/hotspot/share/opto/ifnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/node.hpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopopts.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopopts.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopPredicate.cpp > > Co-authored-by: Christian Hagedorn @rwestrel thanks for the update, I really like the comments now! Just one more comment suggestion and a single renaming idea. Otherwise LGTM ? src/hotspot/share/opto/cfgnode.hpp line 434: > 432: static Node* up_one_dom(Node* curr, bool linear_only = false); > 433: bool is_zero_trip_guard() const; > 434: Node* dominated_by(Node* prev_dom, PhaseIterGVN* igvn, bool pin_array_nodes); Suggestion: Node* dominated_by(Node* prev_dom, PhaseIterGVN* igvn, bool pin_array_access_nodes); src/hotspot/share/opto/ifnode.cpp line 1502: > 1500: > 1501: //------------------------------dominated_by----------------------------------- > 1502: Node* IfNode::dominated_by(Node* prev_dom, PhaseIterGVN* igvn, bool pin_array_nodes) { Suggestion: Node* IfNode::dominated_by(Node* prev_dom, PhaseIterGVN* igvn, bool pin_array_access_nodes) { src/hotspot/share/opto/ifnode.cpp line 1537: > 1535: // Do not rewire Div and Mod nodes which could have a zero divisor to avoid skipping their zero check. > 1536: igvn->replace_input_of(s, 0, data_target); // Move child to data-target > 1537: if (pin_array_nodes && data_target != top) { Suggestion: if (pin_array_access_nodes && data_target != top) { src/hotspot/share/opto/loopnode.hpp line 1510: > 1508: // Mark an IfNode as being dominated by a prior test, > 1509: // without actually altering the CFG (and hence IDOM info). > 1510: void dominated_by(IfProjNode* prevdom, IfNode* iff, bool flip = false, bool pin_array_nodes = false); Suggestion: void dominated_by(IfProjNode* prevdom, IfNode* iff, bool flip = false, bool pin_array_access_nodes = false); src/hotspot/share/opto/loopopts.cpp line 308: > 306: // IGVN worklist for later cleanup. Move control-dependent data Nodes on the > 307: // live path up to the dominating control. > 308: void PhaseIdealLoop::dominated_by(IfProjNode* prevdom, IfNode* iff, bool flip, bool pin_array_nodes) { Suggestion: void PhaseIdealLoop::dominated_by(IfProjNode* prevdom, IfNode* iff, bool flip, bool pin_array_access_nodes) { src/hotspot/share/opto/loopopts.cpp line 355: > 353: assert(cd->in(0) == dp, ""); > 354: _igvn.replace_input_of(cd, 0, prevdom); > 355: if (pin_array_nodes) { Suggestion: if (pin_array_access_nodes) { ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16886#pullrequestreview-1811165732 Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16886#pullrequestreview-1811191003 PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1446097554 PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1446098611 PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1446098965 PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1446108306 PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1446108500 PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1446108919 From epeter at openjdk.org Tue Jan 9 13:54:43 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 9 Jan 2024 13:54:43 GMT Subject: RFR: 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 [v9] In-Reply-To: References: Message-ID: On Fri, 5 Jan 2024 09:53:09 GMT, Roland Westrelin wrote: >> Ah. Does this mean that if there are multiple RangeCheck in a loop, where some could be smeared, these are not smeared, and then we have more RangeChecks to eliminate out of the loop? Maybe in the end this all comes down to the same anyway. Just wondering. > >> Why is it ok to delay this to post-loop-opts? Does it not prevent moving some CFG from being eliminated out of loops? Would be nice to have a little justification comment. > > Maybe. With this fix, range check smearing requires pinning nodes. So running it early also has a drawback: it can cause nodes that would otherwise float to be pinned. The way I see it, range check smearing is a local optimization for cases where range checks can't be eliminated some other way so running it late should not make a difference. If the range check is in a loop and predication removes it then running RC smearing early doesn't make a difference. If the range check is part of a range check sequence that can only be optimized by RC smearing then having a longer range check sequence for the duration of loop opts probably makes no difference. @rwestrel would you mind explaining exactly that in a comment? Something like: We are about to perform range check smearing (i.e. remove this RangeCheck if it is dominated by two RangeChecks which have a range that covers the this RangeCheck). This can cause nodes to be pinned. We want to avoid that and first allow RangeCheckElimination a chance to remove the RangeChecks from loops. Hence, we delay range check smearing until after loop opts. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/16886#discussion_r1446106265 From epeter at openjdk.org Tue Jan 9 14:01:28 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 9 Jan 2024 14:01:28 GMT Subject: RFR: 8322589: Add Ideal transformation: (~a) & (~b) => ~(a | b) [v10] In-Reply-To: <_lWTVDYsWmINZsi0bPleMs3F3n-WPgHbLoTfpu8sHSg=.0dde4ccd-77e8-4e0e-80ad-cb233b858579@github.com> References: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> <_lWTVDYsWmINZsi0bPleMs3F3n-WPgHbLoTfpu8sHSg=.0dde4ccd-77e8-4e0e-80ad-cb233b858579@github.com> Message-ID: On Tue, 9 Jan 2024 05:54:39 GMT, Zhiqiang Zang wrote: >> Hello, >> >> `(~a) & (~b) => ~(a | b)` is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1616C28-L1616C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. >> >> Thanks. > > Zhiqiang Zang has updated the pull request incrementally with one additional commit since the last revision: > > move make_not back to AddNode because it cannot compile for architecture arm, s390x, ppc64le. LGTM, thanks for the work! ? ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16333#pullrequestreview-1811202862 From epeter at openjdk.org Tue Jan 9 14:05:27 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 9 Jan 2024 14:05:27 GMT Subject: RFR: 8322077: Add Ideal transformation: (~a) | (~b) => ~(a & b) [v11] In-Reply-To: References: Message-ID: On Tue, 9 Jan 2024 06:06:51 GMT, Zhiqiang Zang wrote: >> Hello, >> >> (~a) | (~b) => ~(a & b) is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1617C28-L1617C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. >> >> Thanks. > > Zhiqiang Zang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 14 additional commits since the last revision: > > - adapt to new changes from the dependant pr. > - Merge branch 'andnode-PNewDeMorganLawAndToOr' into ornode-PNewDeMorganLawOrToAnd > - adapt changes from the dependent pr. > - Merge branch 'andnode-PNewDeMorganLawAndToOr' into ornode-PNewDeMorganLawOrToAnd > - Update test/hotspot/jtreg/compiler/c2/irTests/DeMorganLawLongTests.java > > Co-authored-by: Tobias Hartmann > - Update test/hotspot/jtreg/compiler/c2/irTests/DeMorganLawIntTests.java > > Co-authored-by: Tobias Hartmann > - Add tests for using De Morgan's Law for both optimizations. > - remove unused code from tests. > - update the copyright dates. > - address comments. > - ... and 4 more: https://git.openjdk.org/jdk/compare/25f84663...b21e242b LGTM, and thanks for the work! Please only integrate this once your other change is integrated, and merged into this one. Then wait for GHA to complete, and run your own testing. ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16334#pullrequestreview-1811210531 From epeter at openjdk.org Tue Jan 9 14:16:25 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 9 Jan 2024 14:16:25 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v4] In-Reply-To: References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Message-ID: On Mon, 8 Jan 2024 06:23:46 GMT, Jatin Bhateja wrote: >> Hi, >> >> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only targets. >> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 instruction set. >> These are very frequently used APIs in columnar database filter operation. >> >> Implementation uses a lookup table to record permute indices. Table index is computed using >> mask argument of compress/expand operation. >> >> Following are the performance number of JMH micro included with the patch. >> >> >> System : Intel(R) Xeon(R) Platinum 8480+ (Sapphire Rapids) >> >> Baseline: >> Benchmark (size) Mode Cnt Score Error Units >> ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 142.767 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 71.436 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 35.992 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 182.151 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 91.096 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 44.757 ops/ms >> ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 184.099 ops/ms >> ColumnFilterBenchmark.filterIntColumn 2047 thrpt 2 91.981 ops/ms >> ColumnFilterBenchmark.filterIntColumn 4096 thrpt 2 45.170 ops/ms >> ColumnFilterBenchmark.filterLongColumn 1024 thrpt 2 148.017 ops/ms >> ColumnFilterBenchmark.filterLongColumn 2047 thrpt 2 73.516 ops/ms >> ColumnFilterBenchmark.filterLongColumn 4096 thrpt 2 36.844 ops/ms >> >> Withopt: >> Benchmark (size) Mode Cnt Score Error Units >> ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 2051.707 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 914.072 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 489.898 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 5324.195 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 2587.229 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 1278.665 ops/ms >> ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 4149.384 ops/ms >> ColumnFilterBenchmark.filterIntColumn 2047 thrpt ... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review suggestions incorporated. I think we are almost there! ? src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 5291: > 5289: if (bt == T_INT || bt == T_FLOAT) { > 5290: vmovmskps(rtmp, mask, vec_enc); > 5291: shlq(rtmp, 5); Suggestion: shlq(rtmp, 5); // for 32 bit rows (8 int) src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 5309: > 5307: assert(bt == T_LONG || bt == T_DOUBLE, ""); > 5308: vmovmskpd(rtmp, mask, vec_enc); > 5309: shlq(rtmp, 5); Suggestion: shlq(rtmp, 5); // for 32 bit rows (4 long) src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 1018: > 1016: } else { > 1017: assert(esize == 64, ""); > 1018: // Loop to generate 16 x 4 int expand permute index table. A row is accessed Suggestion: // Loop to generate 16 x 4 long expand permute index table. A row is accessed ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17261#pullrequestreview-1811224600 PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1446133371 PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1446133800 PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1446132575 From epeter at openjdk.org Tue Jan 9 14:16:27 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 9 Jan 2024 14:16:27 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v2] In-Reply-To: References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> <1GHGK7AGinCMKjFIB5oadUP0jiZrC39Z0hncAS3H-9Y=.eb617125-1f51-4cff-889e-b15321f5c72b@github.com> Message-ID: On Tue, 9 Jan 2024 06:13:44 GMT, Jatin Bhateja wrote: >> Yes, IF it is vectorized, then there is no difference between high and low density. My concern was more if vectorization is preferrable over the scalar alternative in the low-density case, where branch prediction is more stable. > > At runtime we do need to scan entire mask to pick the compressible lane corresponding to set mask bit. Thus the loop overhead of mask compare (BTW masks are held in a vector register for AVX2 targets) and jump will anyways be incurred , in addition for sparsely populated mask we may incur additional misprediction penalty for not taking if block which extracts an element from appropriate source vector lane and insert into destination vector lane. Overall vector solution will win for most common cases for varying mask and also for very sparsely populate masks. Here is the result of setting just a single mask bit. > > > @Benchmark > public void fuzzyFilterIntColumn() { > int i = 0; > int j = 0; > long maskctr = 1; > int endIndex = ispecies.loopBound(size); > for (; i < endIndex; i += ispecies.length()) { > IntVector vec = IntVector.fromArray(ispecies, intinCol, i); > VectorMask pred = VectorMask.fromLong(ispecies, 1); > vec.compress(pred).intoArray(intoutCol, j); > j += pred.trueCount(); > } > } > > > Baseline: > Benchmark (size) Mode Cnt Score Error Units > ColumnFilterBenchmark.fuzzyFilterIntColumn 1024 thrpt 2 379.059 ops/ms > ColumnFilterBenchmark.fuzzyFilterIntColumn 2047 thrpt 2 188.355 ops/ms > ColumnFilterBenchmark.fuzzyFilterIntColumn 4096 thrpt 2 95.315 ops/ms > > > Withopt: > Benchmark (size) Mode Cnt Score Error Units > ColumnFilterBenchmark.fuzzyFilterIntColumn 1024 thrpt 2 7390.074 ops/ms > ColumnFilterBenchmark.fuzzyFilterIntColumn 2047 thrpt 2 3483.247 ops/ms > ColumnFilterBenchmark.fuzzyFilterIntColumn 4096 thrpt 2 1823.817 ops/ms Nice, thanks for the data! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17261#discussion_r1446138902 From rrich at openjdk.org Tue Jan 9 14:17:23 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 9 Jan 2024 14:17:23 GMT Subject: RFR: 8290965: PPC64: Implement post-call NOPs [v2] In-Reply-To: References: Message-ID: On Sat, 23 Dec 2023 11:56:10 GMT, Richard Reingruber wrote: >> #### Implementation of post call nops (PCNs) on ppc64. >> >> Depends on https://github.com/openjdk/jdk/pull/17150 >> >> About post call nops: >> >> - instruction(s) at return addresses of compiled java calls >> - emitted iff vm continuations are enabled to support virtual threads >> - encode data that can be be used to find the corresponding CodeBlob and oop map faster >> - mt-safe patchable to trigger deoptimization >> >> Background: >> >> - Frames in continuation StackChunks are not visited if their compiled method is made not entrant (in contrast to frames on stack). >> Instead all PCNs of the compiled method are patched to trigger deoptimization when control returns to such frames. >> - With vm continuations, stacks are walked and inspected more frequently. This requires lookup of metadata like frame size and oop maps. As an optimization the offset of the CodeBlob to the PCN and the oop map slot are encoded as data in the PCN. >> >> Post call nops on ppc64 >> >> - 1 instruction, i.e. 4 bytes (either CMPI or CMPLI[1]) >> x86_64: 1 instruction, 8 bytes >> aarch64: 3 instruction, 12 bytes >> [1] 3.1.10 Fixed Point Compare Instructions in Power ISA 3.1B >> https://openpowerfoundation.org/specifications/isa/ >> >> - 26 bits data payload >> x86_64: 32 bits; aarch64: 32 bits >> - 9 bits dedicated to oop map slot. With 8 bits there where cases with SPECjvm2008 where the slot could not be encoded (on ppc64 and x86_64). >> x86_64: 8 bits; aarch64: 8 bits >> - 17 bits dedicated to cb offset. Effectively 19 bits due to instruction alignment. >> x86_64: 24 bits; aarch64: 24 bits >> - Also used when reconstructing the back chain after thawing continuation frames (see `Thaw::patch_caller_links`) >> >> - Refactored frame constructors to make use of fast CodeBlob lookup based on PCNs. >> The fast lookup may only be used if the pc is known to be in the code cache because `CodeCache::find_blob_fast` can yield wrong results if it finds instructions outside the code cache that look just like PCNs. Callers of the frame class constructors need to pass `frame::kind::native` in that case to avoid errors. Other platforms don't make this explicit which is a problem in my eyes. Picking the wrong constructor can cause errors when porting and in future development. >> >> - Currently only the PCNs in nmethods are initialized. Therefore we don't even try to make a fast lookup based on PCNs if we know the CodeBlob is, e.g., a RuntimeStub. To achieve this we call the frame cons... > > Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: > > Fix comment > > Co-authored-by: Andrew Haley > _Mailing list message from [Andrew Haley](mailto:aph-open at littlepinkcloud.com) on [hotspot-compiler-dev](mailto:hotspot-compiler-dev at mail.openjdk.org):_ > > On 12/20/23 20:36, Richard Reingruber wrote: > > > | test/jdk/java/lang/Thread/virtual/stress/Skynet.java | ppc64le | x86_64 | > > |------------------------------------------------------|-----------|-----------| > > | PCN lookup success | 306955525 | 247185016 | > > | PCN lookup failure | 500975 | 421098 | > > | PCN decode success (C2) | 306951893 | 247181691 | > > | PCN decode failure | 3168 | 59 | > > | PCN patch success | 2080 | 2662 | > > | PCN patch cb offset failure | 0 | 0 | > > | PCN patch oopmap slot failure | 0 | 0 | > > These data are really interesting. How did you gather them? Thanks. This is the code for the stats based on master: https://github.com/openjdk/jdk/commit/c376fcc9099251a3f62edc246748f26d0a54e2c0 This is the version for this pr: https://github.com/openjdk/jdk/commit/ae2b6ba70bfdca6a58f9af6b3a675c0f2aec7d85 (Actually these are a cleaner reimplementations of the original code) ------------- PR Comment: https://git.openjdk.org/jdk/pull/17171#issuecomment-1883125887 From jbhateja at openjdk.org Tue Jan 9 15:17:22 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 9 Jan 2024 15:17:22 GMT Subject: RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp In-Reply-To: References: Message-ID: <8ZGiFoB4TkRgQSP67ekJ_Tw_uMnEyVNdU9GSa4bx69M=.f252a9b8-367c-49e6-916e-48dd0e6e936e@github.com> On Tue, 9 Jan 2024 02:25:15 GMT, Vladimir Kozlov wrote: > Should we "short cut" code when registers are the same? Hi @sviswa7 , An identity transformation may be useful here to prevent generating MaxF/D in case both the arguments are same. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17315#issuecomment-1883238177 From qamai at openjdk.org Tue Jan 9 15:32:51 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 9 Jan 2024 15:32:51 GMT Subject: RFR: 8282365: Consolidate and improve division by constant idealizations [v42] In-Reply-To: References: Message-ID: On Mon, 8 Jan 2024 09:44:51 GMT, Kim Barrett wrote: >> The rules around the includes lines in our tests and what we currently have in the tests are messy at the movement. We should fix that when we find the time to. >> >> For HotSpot source code files the includes should be structured as:: >> >> hotspot includes >> blank line >> system includes >> >> >> There are some deviations from that, but those should be cleaned up instead of used as a precedence. For our tests we should add "unittest.hpp" at the end of the "hotspot includes" section. > > In the Oracle-internal discussion of include order from about a year ago, there was not a consensus > decision about the position of "unittest.hpp". There was a concern that in some cases it really was > required to be last for some technical reason. That needed (and still needs) investigation. I assume this means that the include order is good as it is now? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1446242593 From stefank at openjdk.org Tue Jan 9 16:00:01 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 9 Jan 2024 16:00:01 GMT Subject: RFR: 8282365: Consolidate and improve division by constant idealizations [v42] In-Reply-To: References: Message-ID: On Tue, 9 Jan 2024 15:29:44 GMT, Quan Anh Mai wrote: >> In the Oracle-internal discussion of include order from about a year ago, there was not a consensus >> decision about the position of "unittest.hpp". There was a concern that in some cases it really was >> required to be last for some technical reason. That needed (and still needs) investigation. > > I assume this means that the include order is good as it is now? Please update it to: #include "precompiled.hpp" #include "opto/divconstants.hpp" #include "runtime/os.hpp" #include "utilities/growableArray.hpp" #include "unittest.hpp" #include ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1446279651 From duke at openjdk.org Tue Jan 9 16:47:08 2024 From: duke at openjdk.org (ArsenyBochkarev) Date: Tue, 9 Jan 2024 16:47:08 GMT Subject: RFR: 8317721: RISC-V: Implement CRC32 intrinsic [v4] In-Reply-To: References: Message-ID: > Hi everyone! Please review this port of [AArch64](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L4224) `_updateBytesCRC32`, `_updateByteBufferCRC32` and `_updateCRC32` intrinsics. This patch introduces only the plain (non-vectorized, no Zbc) version. > > ### Correctness checks > > Tier 1/2 tests are ok. > > ### Performance results on T-Head board > > #### Results for enabled intrinsic: > > Used test is `test/micro/org/openjdk/bench/java/util/TestCRC32.java` > > | Benchmark | (count) | Mode | Cnt | Score | Error | Units | > | --- | ---- | ----- | --- | ---- | --- | ---- | > | CRC32.TestCRC32.testCRC32Update | 64 | thrpt | 24 | 3730.929 | 37.773 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 128 | thrpt | 24 | 2126.673 | 2.032 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 256 | thrpt | 24 | 1134.330 | 6.714 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 512 | thrpt | 24 | 584.017 | 2.267 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 2048 | thrpt | 24 | 151.173 | 0.346 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 16384 | thrpt | 24 | 19.113 | 0.008 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 65536 | thrpt | 24 | 4.647 | 0.022 | ops/ms | > > #### Results for disabled intrinsic: > > | Benchmark | (count) | Mode | Cnt | Score | Error | Units | > | --------------------------------------------------- | ---------- | --------- | ---- | ----------- | --------- | ---------- | > | CRC32.TestCRC32.testCRC32Update | 64 | thrpt | 15 | 798.365 | 35.486 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 128 | thrpt | 15 | 677.756 | 46.619 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 256 | thrpt | 15 | 552.781 | 27.143 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 512 | thrpt | 15 | 429.304 | 12.518 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 2048 | thrpt | 15 | 166.738 | 0.935 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 16384 | thrpt | 15 | 25.060 | 0.034 | ops/ms | > | CRC32.TestCRC32.testCRC32Update | 65536 | thrpt | 15 | 6.196 | 0.030 | ops/ms | ArsenyBochkarev has updated the pull request incrementally with four additional commits since the last revision: - Fix unroll size - Rename constants - Partially unroll loop - Optimize loop counter in L_by16_loop ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17046/files - new: https://git.openjdk.org/jdk/pull/17046/files/a59481b4..046d5530 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17046&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17046&range=02-03 Stats: 33 lines in 1 file changed: 7 ins; 1 del; 25 mod Patch: https://git.openjdk.org/jdk/pull/17046.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17046/head:pull/17046 PR: https://git.openjdk.org/jdk/pull/17046 From duke at openjdk.org Tue Jan 9 16:47:08 2024 From: duke at openjdk.org (ArsenyBochkarev) Date: Tue, 9 Jan 2024 16:47:08 GMT Subject: RFR: 8317721: RISC-V: Implement CRC32 intrinsic [v3] In-Reply-To: <_CysHDX3CV-ZM4ilLgHSRrcDk4DHDNe1ClAKFCV_uoM=.751d91bf-e7e0-4b78-8ff5-2b864c38dd73@github.com> References: <_CysHDX3CV-ZM4ilLgHSRrcDk4DHDNe1ClAKFCV_uoM=.751d91bf-e7e0-4b78-8ff5-2b864c38dd73@github.com> Message-ID: On Thu, 21 Dec 2023 22:20:15 GMT, ArsenyBochkarev wrote: >> Hi everyone! Please review this port of [AArch64](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L4224) `_updateBytesCRC32`, `_updateByteBufferCRC32` and `_updateCRC32` intrinsics. This patch introduces only the plain (non-vectorized, no Zbc) version. >> >> ### Correctness checks >> >> Tier 1/2 tests are ok. >> >> ### Performance results on T-Head board >> >> #### Results for enabled intrinsic: >> >> Used test is `test/micro/org/openjdk/bench/java/util/TestCRC32.java` >> >> | Benchmark | (count) | Mode | Cnt | Score | Error | Units | >> | --- | ---- | ----- | --- | ---- | --- | ---- | >> | CRC32.TestCRC32.testCRC32Update | 64 | thrpt | 24 | 3730.929 | 37.773 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 128 | thrpt | 24 | 2126.673 | 2.032 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 256 | thrpt | 24 | 1134.330 | 6.714 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 512 | thrpt | 24 | 584.017 | 2.267 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 2048 | thrpt | 24 | 151.173 | 0.346 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 16384 | thrpt | 24 | 19.113 | 0.008 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 65536 | thrpt | 24 | 4.647 | 0.022 | ops/ms | >> >> #### Results for disabled intrinsic: >> >> | Benchmark | (count) | Mode | Cnt | Score | Error | Units | >> | --------------------------------------------------- | ---------- | --------- | ---- | ----------- | --------- | ---------- | >> | CRC32.TestCRC32.testCRC32Update | 64 | thrpt | 15 | 798.365 | 35.486 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 128 | thrpt | 15 | 677.756 | 46.619 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 256 | thrpt | 15 | 552.781 | 27.143 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 512 | thrpt | 15 | 429.304 | 12.518 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 2048 | thrpt | 15 | 166.738 | 0.935 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 16384 | thrpt | 15 | 25.060 | 0.034 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 65536 | thrpt | 15 | 6.196 | 0.030 | ops/ms | > > ArsenyBochkarev has updated the pull request incrementally with five additional commits since the last revision: > > - Use MacroAssembler::lwu instead of Assembler::lwu > - Save instruction when getting table3 address > - Left note on how table elements are accessed > - Fix comment for result register > - Remove unused L_by16 label Hello again everyone! I was able to optimize regressions for most cases on big amount of data by partially unrolling the big loop and disposing from loop counter (previously in `len` register). Results for `-XX:+UseZba` of StarFive VisionFive2 board: | Benchmark | (count) | Mode | Cnt | Score | Error | Units | | ------------------------------ | ------------ | --------- | ----- | ---------- | ----------- | ----- | | CRC32.TestCRC32.testCRC32Update | 64 | thrpt | 12 | 4215.728 | 3.972 | ops/ms | | CRC32.TestCRC32.testCRC32Update | 128 | thrpt | 12 | 2607.882 | 1.627 | ops/ms | | CRC32.TestCRC32.testCRC32Update | 256 | thrpt | 12 | 1364.899 | 8.857 | ops/ms | | CRC32.TestCRC32.testCRC32Update | 512 | thrpt | 12 | 704.316 | 3.222 | ops/ms | | CRC32.TestCRC32.testCRC32Update | 2048 | thrpt | 12 | 180.738 | 0.474 | ops/ms | | CRC32.TestCRC32.testCRC32Update | 16384 | thrpt | 12 | 22.722 | 0.059 | ops/ms | | CRC32.TestCRC32.testCRC32Update | 65536 | thrpt | 12 | 5.327 | 0.019 | ops/ms | while the results for `-XX:-UseCRC32Intrinsics` are [here](https://github.com/openjdk/jdk/pull/17046#issuecomment-1850364667) ------------- PR Comment: https://git.openjdk.org/jdk/pull/17046#issuecomment-1883404214 From jbhateja at openjdk.org Tue Jan 9 16:48:56 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 9 Jan 2024 16:48:56 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v5] In-Reply-To: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Message-ID: > Hi, > > Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only targets. > Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 instruction set. > These are very frequently used APIs in columnar database filter operation. > > Implementation uses a lookup table to record permute indices. Table index is computed using > mask argument of compress/expand operation. > > Following are the performance number of JMH micro included with the patch. > > > System : Intel(R) Xeon(R) Platinum 8480+ (Sapphire Rapids) > > Baseline: > Benchmark (size) Mode Cnt Score Error Units > ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 142.767 ops/ms > ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 71.436 ops/ms > ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 35.992 ops/ms > ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 182.151 ops/ms > ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 91.096 ops/ms > ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 44.757 ops/ms > ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 184.099 ops/ms > ColumnFilterBenchmark.filterIntColumn 2047 thrpt 2 91.981 ops/ms > ColumnFilterBenchmark.filterIntColumn 4096 thrpt 2 45.170 ops/ms > ColumnFilterBenchmark.filterLongColumn 1024 thrpt 2 148.017 ops/ms > ColumnFilterBenchmark.filterLongColumn 2047 thrpt 2 73.516 ops/ms > ColumnFilterBenchmark.filterLongColumn 4096 thrpt 2 36.844 ops/ms > > Withopt: > Benchmark (size) Mode Cnt Score Error Units > ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 2051.707 ops/ms > ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 914.072 ops/ms > ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 489.898 ops/ms > ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 5324.195 ops/ms > ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 2587.229 ops/ms > ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 1278.665 ops/ms > ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 4149.384 ops/ms > ColumnFilterBenchmark.filterIntColumn 2047 thrpt 2 1791.170 ops/ms > ColumnFilterBenchmark.filterIntColumn 4096... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Using emulated variable blend E-Core optimized instruction. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17261/files - new: https://git.openjdk.org/jdk/pull/17261/files/257a6351..c3f1c50e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17261&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17261&range=03-04 Stats: 28 lines in 4 files changed: 18 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/17261.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17261/head:pull/17261 PR: https://git.openjdk.org/jdk/pull/17261 From roland at openjdk.org Tue Jan 9 16:51:02 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 9 Jan 2024 16:51:02 GMT Subject: RFR: 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 [v13] In-Reply-To: References: Message-ID: > Range check smearing and range check predication make an array access > dependent on 2 (or more in the case of RC smearing) conditions. As a > consequence, if a range check can be eliminated because there's an > identical dominating range check, the control dependent nodes that > could float and become dependent on the dominating range check cannot > be allowed to float because there's a risk that they would then bypass > one of the checks that make the access legal. > > `IfNode::dominated_by()` and `PhaseIdealLoop::dominated_by()` have > logic to prevent this: nodes that are control dependent on a range > check or predicate are not allowed to float. This is however not > sufficient as demonstrated by the test cases. > > In `TestArrayAccessAboveRCAfterSmearingOrPredication.testRangeCheckSmearing()`: > > > v += array[i]; > if (flag2) { > if (flag3) { > field = 0x42; > } > } > if (flagField == 1) { > v += array[i]; > } > > > The range check for the second `array[i]` load is replaced by the > dominating range check for the first `array[i]` but because the second > `array[i]` load could really be dependent on multiple range checks (in > case smearing happened which is not the case here), c2 doesn't allow > the second `array[i]` to float when the second range check is > removed. The second `array[i]` is then control dependent on: > > > if (flagField == 1) { > > > which is next found to be dominated by the same test: > > > if (flag == 1) { > > > and is removed. However nothing in `dominated_by()` treats node > dependent on tests that are not range check or predicates > specially. So the second `array[i]` is allowed to float and become > dependent on: > > > if (flag == 1) { > > > which is above the range check for that access. The test method in its > last invocation is passed an index for the array access that's widely > out of range. The array load happens before the range check and > crashes the VM. `testLoopPredication()` is a similar test where array > loads become dependent on predicates and end up above range checks. > > `TestArrayAccessCastIIAboveRC.java` is the test case from the bug > where for similar reasons a range check `CastII` ends up above its > range check, becomes top because its input becomes some integer that > conflicts with its type (but there's no condition to catch it). The > graph becomes broken and c2 crashes. > > Logic in the `dominated_by()` methods ... Roland Westrelin has updated the pull request incrementally with six additional commits since the last revision: - Update src/hotspot/share/opto/loopopts.cpp Co-authored-by: Emanuel Peter - Update src/hotspot/share/opto/loopopts.cpp Co-authored-by: Emanuel Peter - Update src/hotspot/share/opto/loopnode.hpp Co-authored-by: Emanuel Peter - Update src/hotspot/share/opto/ifnode.cpp Co-authored-by: Emanuel Peter - Update src/hotspot/share/opto/ifnode.cpp Co-authored-by: Emanuel Peter - Update src/hotspot/share/opto/cfgnode.hpp Co-authored-by: Emanuel Peter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16886/files - new: https://git.openjdk.org/jdk/pull/16886/files/04a9d3a5..372021b6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16886&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16886&range=11-12 Stats: 6 lines in 4 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/16886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16886/head:pull/16886 PR: https://git.openjdk.org/jdk/pull/16886 From duke at openjdk.org Tue Jan 9 16:56:50 2024 From: duke at openjdk.org (Zhiqiang Zang) Date: Tue, 9 Jan 2024 16:56:50 GMT Subject: RFR: 8322589: Add Ideal transformation: (~a) & (~b) => ~(a | b) [v11] In-Reply-To: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> References: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> Message-ID: > Hello, > > `(~a) & (~b) => ~(a | b)` is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1616C28-L1616C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. > > Thanks. Zhiqiang Zang has updated the pull request incrementally with three additional commits since the last revision: - Revert "move the two helper functions to member functions of the node class." This reverts commit 7a962d69ac687c0476e54cd004037f2ebb0800ba. - Revert "update copyright dates." This reverts commit 3665de2f3d38b4fce3ba968ec6247dd2576ecc32. - Revert "move make_not back to AddNode because it cannot compile for architecture arm, s390x, ppc64le." This reverts commit 4ee8b089b6f119b1692dcd2c512d249d92734697. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16333/files - new: https://git.openjdk.org/jdk/pull/16333/files/4ee8b089..65942221 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16333&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16333&range=09-10 Stats: 21 lines in 5 files changed: 8 ins; 8 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/16333.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16333/head:pull/16333 PR: https://git.openjdk.org/jdk/pull/16333 From roland at openjdk.org Tue Jan 9 17:02:48 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 9 Jan 2024 17:02:48 GMT Subject: RFR: 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 [v14] In-Reply-To: References: Message-ID: > Range check smearing and range check predication make an array access > dependent on 2 (or more in the case of RC smearing) conditions. As a > consequence, if a range check can be eliminated because there's an > identical dominating range check, the control dependent nodes that > could float and become dependent on the dominating range check cannot > be allowed to float because there's a risk that they would then bypass > one of the checks that make the access legal. > > `IfNode::dominated_by()` and `PhaseIdealLoop::dominated_by()` have > logic to prevent this: nodes that are control dependent on a range > check or predicate are not allowed to float. This is however not > sufficient as demonstrated by the test cases. > > In `TestArrayAccessAboveRCAfterSmearingOrPredication.testRangeCheckSmearing()`: > > > v += array[i]; > if (flag2) { > if (flag3) { > field = 0x42; > } > } > if (flagField == 1) { > v += array[i]; > } > > > The range check for the second `array[i]` load is replaced by the > dominating range check for the first `array[i]` but because the second > `array[i]` load could really be dependent on multiple range checks (in > case smearing happened which is not the case here), c2 doesn't allow > the second `array[i]` to float when the second range check is > removed. The second `array[i]` is then control dependent on: > > > if (flagField == 1) { > > > which is next found to be dominated by the same test: > > > if (flag == 1) { > > > and is removed. However nothing in `dominated_by()` treats node > dependent on tests that are not range check or predicates > specially. So the second `array[i]` is allowed to float and become > dependent on: > > > if (flag == 1) { > > > which is above the range check for that access. The test method in its > last invocation is passed an index for the array access that's widely > out of range. The array load happens before the range check and > crashes the VM. `testLoopPredication()` is a similar test where array > loads become dependent on predicates and end up above range checks. > > `TestArrayAccessCastIIAboveRC.java` is the test case from the bug > where for similar reasons a range check `CastII` ends up above its > range check, becomes top because its input becomes some integer that > conflicts with its type (but there's no condition to catch it). The > graph becomes broken and c2 crashes. > > Logic in the `dominated_by()` methods ... Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16886/files - new: https://git.openjdk.org/jdk/pull/16886/files/372021b6..998d030e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16886&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16886&range=12-13 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16886.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16886/head:pull/16886 PR: https://git.openjdk.org/jdk/pull/16886 From roland at openjdk.org Tue Jan 9 17:02:50 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 9 Jan 2024 17:02:50 GMT Subject: RFR: 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 [v12] In-Reply-To: <9VeaoeqFVJApx5G4UWDoZM8UqZgm4lKlDiRxxZoha5c=.62c6c314-ae6c-42da-9ebd-de9d200b39ce@github.com> References: <7WZcrJtRFhNaGiw4c_ov_XW-dWcJ5-GdRy_8Vh2ikWA=.0cdfcad8-eb9b-43ee-b160-ad96f698a0b6@github.com> <9VeaoeqFVJApx5G4UWDoZM8UqZgm4lKlDiRxxZoha5c=.62c6c314-ae6c-42da-9ebd-de9d200b39ce@github.com> Message-ID: On Tue, 9 Jan 2024 13:51:47 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with five additional commits since the last revision: >> >> - Update src/hotspot/share/opto/ifnode.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/node.hpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/loopopts.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/loopopts.cpp >> >> Co-authored-by: Christian Hagedorn >> - Update src/hotspot/share/opto/loopPredicate.cpp >> >> Co-authored-by: Christian Hagedorn > > Otherwise LGTM ? @eme64 @chhagedorn thanks for the suggestions. I made the change you requested. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16886#issuecomment-1883434825 From epeter at openjdk.org Tue Jan 9 17:10:36 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 9 Jan 2024 17:10:36 GMT Subject: RFR: 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 [v14] In-Reply-To: References: Message-ID: On Tue, 9 Jan 2024 17:02:48 GMT, Roland Westrelin wrote: >> Range check smearing and range check predication make an array access >> dependent on 2 (or more in the case of RC smearing) conditions. As a >> consequence, if a range check can be eliminated because there's an >> identical dominating range check, the control dependent nodes that >> could float and become dependent on the dominating range check cannot >> be allowed to float because there's a risk that they would then bypass >> one of the checks that make the access legal. >> >> `IfNode::dominated_by()` and `PhaseIdealLoop::dominated_by()` have >> logic to prevent this: nodes that are control dependent on a range >> check or predicate are not allowed to float. This is however not >> sufficient as demonstrated by the test cases. >> >> In `TestArrayAccessAboveRCAfterSmearingOrPredication.testRangeCheckSmearing()`: >> >> >> v += array[i]; >> if (flag2) { >> if (flag3) { >> field = 0x42; >> } >> } >> if (flagField == 1) { >> v += array[i]; >> } >> >> >> The range check for the second `array[i]` load is replaced by the >> dominating range check for the first `array[i]` but because the second >> `array[i]` load could really be dependent on multiple range checks (in >> case smearing happened which is not the case here), c2 doesn't allow >> the second `array[i]` to float when the second range check is >> removed. The second `array[i]` is then control dependent on: >> >> >> if (flagField == 1) { >> >> >> which is next found to be dominated by the same test: >> >> >> if (flag == 1) { >> >> >> and is removed. However nothing in `dominated_by()` treats node >> dependent on tests that are not range check or predicates >> specially. So the second `array[i]` is allowed to float and become >> dependent on: >> >> >> if (flag == 1) { >> >> >> which is above the range check for that access. The test method in its >> last invocation is passed an index for the array access that's widely >> out of range. The array load happens before the range check and >> crashes the VM. `testLoopPredication()` is a similar test where array >> loads become dependent on predicates and end up above range checks. >> >> `TestArrayAccessCastIIAboveRC.java` is the test case from the bug >> where for similar reasons a range check `CastII` ends up above its >> range check, becomes top because its input becomes some integer that >> conflicts with its... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Thanks, still LGTM ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16886#pullrequestreview-1811622944 From epeter at openjdk.org Tue Jan 9 17:21:06 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 9 Jan 2024 17:21:06 GMT Subject: RFR: 8315361: C2 SuperWord: refactor out loop analysis into shared auto-vectorization facility VLoopAnalyzer [v8] In-Reply-To: References: Message-ID: > This is a refactoring of `SuperWord`. > > **Goals** > > 1. Clean up `SuperWord`: disentangle different components, make them more **modular**. > 2. Make the loop analysis parts a **shared facility**, not just for SuperWord but also the post-loop-vectorizer ([JDK-8308994](https://bugs.openjdk.org/browse/JDK-8308994)). > 3. It is also a necessary step on my bigger plans for improvement with the C2 Auto-Vectorizer ([see my blog post](https://eme64.github.io/blog/2023/11/03/C2-AutoVectorizer-Improvement-Ideas.html)). > 4. Improve tracing in the auto-vectorization by making it more systematic. > > **Summary** > > - I wrote a summary of how C2 auto-vectorization with SuperWord works (please read!): > https://github.com/openjdk/jdk/blob/95fd361e60fc66eb91edad321662e508b2d1bdde/src/hotspot/share/opto/superword.hpp#L32-L177 > - I moved many `Superword` components out to `VLoop` and its subclass `VLoopAnalyzer`. The idea is that any vectorizer can use these facilities in the future. They are therefore made more modular, which should hopefully make future changes easier. These components are: > - Checking the pre-conditions for vectorization (e.g. no unwanted ctrl-flow). > - `VLoop::check_preconditions_helper` replaces code from old `SuperWord::transform_loop`. > - Running all submodules of `VLoopAnalyzer`: `VLoopAnalyzer::analyze_helper`. Replaces analysis part of `SuperWord::SLP_extract`. > - Finding and marking reductions -> `VLoopReductions` > - Detecting memory slices -> `VLoopMemorySlices` > - Analyzing the body -> `VLoopBody` (renamed `in_bb` -> `in_body`) > - Determining vector element types, and functions to determine the `vector_width` of a node -> `VLoopTypes` > - Constructing the dependence graph -> `VLoopDependenceGraph`. Replaces old `DepGraph` with all its components. > - New: CompileCommand option `TraceAutovectorization` > - Run with `-XX:CompileCommand=traceAutovectorization,*::*,help` to get a usage description. > - Replaced all printing with flags `TraceSuperWord` (and `Verbose`) and of `VectorizeDebug`. > - The advantage of a CompileCommand is that tracing can be applied selectively for only a limited set of java classes / methods. > - It uses tags, which are more readable than the `VectorizeDebug` bit-flags. These tags can be used for all parts of the vectorizer, but one can also target SuperWord specifically. > - I systematically added tracing at every point where vectorization (partially) fails (use tag `SW_REJECTIONS`). > - `TraceSuperWord` still works, and perfo... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: update copyright for 2024 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16620/files - new: https://git.openjdk.org/jdk/pull/16620/files/1f5d4ef2..4302f58b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16620&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16620&range=06-07 Stats: 21 lines in 21 files changed: 0 ins; 0 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/16620.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16620/head:pull/16620 PR: https://git.openjdk.org/jdk/pull/16620 From duke at openjdk.org Tue Jan 9 17:23:58 2024 From: duke at openjdk.org (Zhiqiang Zang) Date: Tue, 9 Jan 2024 17:23:58 GMT Subject: RFR: 8322077: Add Ideal transformation: (~a) | (~b) => ~(a & b) [v12] In-Reply-To: References: Message-ID: > Hello, > > (~a) | (~b) => ~(a & b) is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1617C28-L1617C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. > > Thanks. Zhiqiang Zang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 18 additional commits since the last revision: - update copyright dates. - Merge branch 'andnode-PNewDeMorganLawAndToOr' into ornode-PNewDeMorganLawOrToAnd - Revert "adapt changes from the dependent pr." This reverts commit 0c8d10776fe55a28ea9f60160b4f6f57aabad5ab. - Revert "adapt to new changes from the dependant pr." This reverts commit b21e242b306d9cf01e1ba84ebba47ef5b471d98e. - adapt to new changes from the dependant pr. - Merge branch 'andnode-PNewDeMorganLawAndToOr' into ornode-PNewDeMorganLawOrToAnd - adapt changes from the dependent pr. - Merge branch 'andnode-PNewDeMorganLawAndToOr' into ornode-PNewDeMorganLawOrToAnd - Update test/hotspot/jtreg/compiler/c2/irTests/DeMorganLawLongTests.java Co-authored-by: Tobias Hartmann - Update test/hotspot/jtreg/compiler/c2/irTests/DeMorganLawIntTests.java Co-authored-by: Tobias Hartmann - ... and 8 more: https://git.openjdk.org/jdk/compare/8ab76889...dc60a548 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/16334/files - new: https://git.openjdk.org/jdk/pull/16334/files/b21e242b..dc60a548 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=16334&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=16334&range=10-11 Stats: 23 lines in 5 files changed: 8 ins; 8 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/16334.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16334/head:pull/16334 PR: https://git.openjdk.org/jdk/pull/16334 From qamai at openjdk.org Tue Jan 9 17:27:38 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 9 Jan 2024 17:27:38 GMT Subject: RFR: 8282365: Consolidate and improve division by constant idealizations [v44] In-Reply-To: References: Message-ID: > This patch implements idealisation for unsigned divisions to change a division by a constant into a series of multiplication and shift. I also change the idealisation of `DivI` to get a more efficient series when the magic constant overflows an int32. > > In general, the idea behind a signed division transformation is that for a positive constant `d`, we would need to find constants `c` and `m` so that: > > floor(x / d) = floor(x * c / 2**m) for 0 < x < 2**(N - 1) (1) > ceil(x / d) = floor(x * c / 2**m) + 1 for -2**(N - 1) <= x < 0 (2) > > The implementation in the original book takes into consideration that the machine may not be able to perform the full multiplication `x * c`, so the constant overflow and we need to add back the dividend as in `DivLNode::Ideal` cases. However, for int32 division, `x * c` cannot overflow an int64. As a result, it is always feasible to just calculate the product and shift the result. > > For unsigned multiplication, the situation is somewhat trickier because the condition needs to be twice as strong (the condition (1) and (2) above are mostly the same). This results in the magic constant `c` calculated based on the method presented in Hacker's Delight by Henry S. Warren, Jr. may overflow an uintN. For int division, we can depend on the theorem devised by Arch D. Robison in N-Bit Unsigned Division Via N-Bit Multiply-Add, which states that there exists either: > > c1 in uint32 and m1, such that floor(x / d) = floor(x * c1 / 2**m1) for 0 < x < 2**32 (3) > c2 in uint32 and m2, such that floor(x / d) = floor((x + 1) * c2 / 2**m2) for 0 < x < 2**32 (4) > > which means that either `x * c1` never overflows an uint64 or `(x + 1) * c2` never overflows an uint64. And we can perform a full multiplication. > > For longs, there is no way to do a full multiplication so we do some basic transformations to achieve a computable formula. The details I have written as comments in the overflow case. > > More tests are added to cover the possible patterns. > > Please take a look and have some reviews. Thank you very much. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: update include order and license year ------------- Changes: - all: https://git.openjdk.org/jdk/pull/9947/files - new: https://git.openjdk.org/jdk/pull/9947/files/bba52b74..db80bd4a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=9947&range=43 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=9947&range=42-43 Stats: 15 lines in 13 files changed: 1 ins; 1 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/9947.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/9947/head:pull/9947 PR: https://git.openjdk.org/jdk/pull/9947 From qamai at openjdk.org Tue Jan 9 17:27:39 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 9 Jan 2024 17:27:39 GMT Subject: RFR: 8282365: Consolidate and improve division by constant idealizations [v42] In-Reply-To: References: Message-ID: <111zTZKk5JMIxSLq08oZKKvd4xYn4ZKhH8VjOxX0BVI=.b3ed64da-03e1-4ebe-8af7-124a152292cd@github.com> On Tue, 9 Jan 2024 15:56:27 GMT, Stefan Karlsson wrote: >> I assume this means that the include order is good as it is now? > > Please update it to: > > #include "precompiled.hpp" > #include "opto/divconstants.hpp" > #include "runtime/os.hpp" > #include "utilities/growableArray.hpp" > #include "unittest.hpp" > > #include Got it, fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1446390806 From epeter at openjdk.org Tue Jan 9 17:35:21 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 9 Jan 2024 17:35:21 GMT Subject: RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp In-Reply-To: References: Message-ID: On Tue, 9 Jan 2024 00:01:04 GMT, Sandhya Viswanathan wrote: > The C2_MacroAssembler::vminmax_fp and evminmax_fp were incorrectly checking for the two USE operand registers to be different and thus resulting in assertion failure. > > In x86_64.ad: > > instruct maxF_reg(legRegF dst, legRegF a, legRegF b, legRegF tmp, legRegF atmp, legRegF btmp) %{ > ... > effect(USE a, USE b, TEMP tmp, TEMP atmp, TEMP btmp); > ... > __ vminmax_fp(Op_MaxV, T_FLOAT, $dst$$XMMRegister, $a$$XMMRegister, $b$$XMMRegister, $tmp$$XMMRegister, $atmp$$XMMRegister, $btmp$$XMMRegister, Assembler::AVX_128bit); > %} > > > Changing the assert in vminmax_fp from: > assert_different_registers(a, b, tmp, atmp, btmp); > to: > assert_different_registers(a, tmp, atmp, btmp); > assert_different_registers(b, tmp, atmp, btmp); > fixes the issue. > > Similar change done in evminmax_fp. > > Please review. > > Best Regards, > Sandhya Also: do you have a regression test for this? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17315#issuecomment-1883492250 From sviswanathan at openjdk.org Tue Jan 9 18:03:37 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 9 Jan 2024 18:03:37 GMT Subject: RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp [v2] In-Reply-To: References: Message-ID: <1MyRVb7ND-RQj3XrqDQfJt7KGQ5tCAjRdRjUDVfkqOM=.db88d703-26b5-495b-9bb9-5acebce915e1@github.com> On Tue, 9 Jan 2024 02:25:15 GMT, Vladimir Kozlov wrote: >> Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: >> >> review comments > > Should we "short cut" code when registers are the same? @vnkozlov @jatin-bhateja Your review comments are addressed, please take a look. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17315#issuecomment-1883534160 From sviswanathan at openjdk.org Tue Jan 9 18:03:35 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 9 Jan 2024 18:03:35 GMT Subject: RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp [v2] In-Reply-To: References: Message-ID: > The C2_MacroAssembler::vminmax_fp and evminmax_fp were incorrectly checking for the two USE operand registers to be different and thus resulting in assertion failure. > > In x86_64.ad: > > instruct maxF_reg(legRegF dst, legRegF a, legRegF b, legRegF tmp, legRegF atmp, legRegF btmp) %{ > ... > effect(USE a, USE b, TEMP tmp, TEMP atmp, TEMP btmp); > ... > __ vminmax_fp(Op_MaxV, T_FLOAT, $dst$$XMMRegister, $a$$XMMRegister, $b$$XMMRegister, $tmp$$XMMRegister, $atmp$$XMMRegister, $btmp$$XMMRegister, Assembler::AVX_128bit); > %} > > > Changing the assert in vminmax_fp from: > assert_different_registers(a, b, tmp, atmp, btmp); > to: > assert_different_registers(a, tmp, atmp, btmp); > assert_different_registers(b, tmp, atmp, btmp); > fixes the issue. > > Similar change done in evminmax_fp. > > Please review. > > Best Regards, > Sandhya Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17315/files - new: https://git.openjdk.org/jdk/pull/17315/files/aee22d07..55c6e32e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17315&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17315&range=00-01 Stats: 23 lines in 3 files changed: 23 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17315.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17315/head:pull/17315 PR: https://git.openjdk.org/jdk/pull/17315 From epeter at openjdk.org Tue Jan 9 18:06:23 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 9 Jan 2024 18:06:23 GMT Subject: RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp [v2] In-Reply-To: <1MyRVb7ND-RQj3XrqDQfJt7KGQ5tCAjRdRjUDVfkqOM=.db88d703-26b5-495b-9bb9-5acebce915e1@github.com> References: <1MyRVb7ND-RQj3XrqDQfJt7KGQ5tCAjRdRjUDVfkqOM=.db88d703-26b5-495b-9bb9-5acebce915e1@github.com> Message-ID: <7i0OLV4-N2s3xtJ1cD0zcwFcqQQKWqfQ4ZlxKVaPcJA=.2dbe8082-120d-47d6-a131-656780d9b3e2@github.com> On Tue, 9 Jan 2024 18:00:59 GMT, Sandhya Viswanathan wrote: >> Should we "short cut" code when registers are the same? > > @vnkozlov @jatin-bhateja Your review comments are addressed, please take a look. @sviswa7 but is the "same address" not an indication of a missing ideal transformation? Hence, the assert may actually be ok, and the root cause be fixed in the ideal transformation. I think this maybe what @jatin-bhateja was suggesting. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17315#issuecomment-1883537937 From sviswanathan at openjdk.org Tue Jan 9 18:14:21 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 9 Jan 2024 18:14:21 GMT Subject: RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp [v2] In-Reply-To: <7i0OLV4-N2s3xtJ1cD0zcwFcqQQKWqfQ4ZlxKVaPcJA=.2dbe8082-120d-47d6-a131-656780d9b3e2@github.com> References: <1MyRVb7ND-RQj3XrqDQfJt7KGQ5tCAjRdRjUDVfkqOM=.db88d703-26b5-495b-9bb9-5acebce915e1@github.com> <7i0OLV4-N2s3xtJ1cD0zcwFcqQQKWqfQ4ZlxKVaPcJA=.2dbe8082-120d-47d6-a131-656780d9b3e2@github.com> Message-ID: On Tue, 9 Jan 2024 18:03:40 GMT, Emanuel Peter wrote: >> @vnkozlov @jatin-bhateja Your review comments are addressed, please take a look. > > @sviswa7 but is the "same address" not an indication of a missing ideal transformation? Hence, the assert may actually be ok, and the root cause be fixed in the ideal transformation. I think this maybe what @jatin-bhateja was suggesting. @eme64 Probably, but my goal here is limited. We have to fix this PR within RDP2 i.e. asap. That is why I kept the changes to minimum. On Vladimir's request I have added a minimum change to handle the case when a and b are same. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17315#issuecomment-1883549079 From epeter at openjdk.org Tue Jan 9 18:17:22 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 9 Jan 2024 18:17:22 GMT Subject: RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp [v2] In-Reply-To: References: <1MyRVb7ND-RQj3XrqDQfJt7KGQ5tCAjRdRjUDVfkqOM=.db88d703-26b5-495b-9bb9-5acebce915e1@github.com> <7i0OLV4-N2s3xtJ1cD0zcwFcqQQKWqfQ4ZlxKVaPcJA=.2dbe8082-120d-47d6-a131-656780d9b3e2@github.com> Message-ID: On Tue, 9 Jan 2024 18:11:18 GMT, Sandhya Viswanathan wrote: >> @sviswa7 but is the "same address" not an indication of a missing ideal transformation? Hence, the assert may actually be ok, and the root cause be fixed in the ideal transformation. I think this maybe what @jatin-bhateja was suggesting. > > @eme64 Probably, but my goal here is limited. We have to fix this PR within RDP2 i.e. asap. That is why I kept the changes to minimum. On Vladimir's request I have added a minimum change to handle the case when a and b are same. @sviswa7 Ok, I understand. But a regression test would still be good. We should just reduce the regression test attached to https://bugs.openjdk.org/browse/JDK-8322090, @TobiHartmann mentioned it on JIRA. I guess we can also file a follow up RFE to improve the Ideal transformations. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17315#issuecomment-1883554067 From sviswanathan at openjdk.org Tue Jan 9 18:17:22 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 9 Jan 2024 18:17:22 GMT Subject: RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp [v2] In-Reply-To: References: <1MyRVb7ND-RQj3XrqDQfJt7KGQ5tCAjRdRjUDVfkqOM=.db88d703-26b5-495b-9bb9-5acebce915e1@github.com> <7i0OLV4-N2s3xtJ1cD0zcwFcqQQKWqfQ4ZlxKVaPcJA=.2dbe8082-120d-47d6-a131-656780d9b3e2@github.com> Message-ID: <0EuE5ubzcizekbWuuI5KVIYG1VE_4mqtN-Paw3Z_UYU=.dddffc40-42c8-418b-b405-e415bd35099a@github.com> On Tue, 9 Jan 2024 18:14:55 GMT, Emanuel Peter wrote: >> @eme64 Probably, but my goal here is limited. We have to fix this PR within RDP2 i.e. asap. That is why I kept the changes to minimum. On Vladimir's request I have added a minimum change to handle the case when a and b are same. > > @sviswa7 Ok, I understand. But a regression test would still be good. We should just reduce the regression test attached to https://bugs.openjdk.org/browse/JDK-8322090, @TobiHartmann mentioned it on JIRA. > > I guess we can also file a follow up RFE to improve the Ideal transformations. @eme64 No, I don't have a regression test for this. I followed the ctw.sh mechanism provided in the bug report by Roland Westrelin to reproduce and verify. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17315#issuecomment-1883554360 From epeter at openjdk.org Tue Jan 9 18:20:24 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 9 Jan 2024 18:20:24 GMT Subject: RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp [v2] In-Reply-To: References: Message-ID: On Tue, 9 Jan 2024 18:03:35 GMT, Sandhya Viswanathan wrote: >> The C2_MacroAssembler::vminmax_fp and evminmax_fp were incorrectly checking for the two USE operand registers to be different and thus resulting in assertion failure. >> >> In x86_64.ad: >> >> instruct maxF_reg(legRegF dst, legRegF a, legRegF b, legRegF tmp, legRegF atmp, legRegF btmp) %{ >> ... >> effect(USE a, USE b, TEMP tmp, TEMP atmp, TEMP btmp); >> ... >> __ vminmax_fp(Op_MaxV, T_FLOAT, $dst$$XMMRegister, $a$$XMMRegister, $b$$XMMRegister, $tmp$$XMMRegister, $atmp$$XMMRegister, $btmp$$XMMRegister, Assembler::AVX_128bit); >> %} >> >> >> Changing the assert in vminmax_fp from: >> assert_different_registers(a, b, tmp, atmp, btmp); >> to: >> assert_different_registers(a, tmp, atmp, btmp); >> assert_different_registers(b, tmp, atmp, btmp); >> fixes the issue. >> >> Similar change done in evminmax_fp. >> >> Please review. >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > review comments https://bugs.openjdk.org/secure/attachment/107681/Test_276.java This is the regression test of the bug that is closed as duplicate of your issue, am I correct? This is the duplicate bug: https://bugs.openjdk.org/browse/JDK-8322090 Fails with: `assert(regs[i] != regs[j]) failed: Multiple uses of register: xmm3` You need to at least verify if this bug is fixed with your patch, otherwise we would need to re-open it, since it would not be a duplicate. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17315#issuecomment-1883557719 From kvn at openjdk.org Tue Jan 9 19:57:23 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 9 Jan 2024 19:57:23 GMT Subject: RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp [v2] In-Reply-To: References: Message-ID: On Tue, 9 Jan 2024 18:03:35 GMT, Sandhya Viswanathan wrote: >> The C2_MacroAssembler::vminmax_fp and evminmax_fp were incorrectly checking for the two USE operand registers to be different and thus resulting in assertion failure. >> >> In x86_64.ad: >> >> instruct maxF_reg(legRegF dst, legRegF a, legRegF b, legRegF tmp, legRegF atmp, legRegF btmp) %{ >> ... >> effect(USE a, USE b, TEMP tmp, TEMP atmp, TEMP btmp); >> ... >> __ vminmax_fp(Op_MaxV, T_FLOAT, $dst$$XMMRegister, $a$$XMMRegister, $b$$XMMRegister, $tmp$$XMMRegister, $atmp$$XMMRegister, $btmp$$XMMRegister, Assembler::AVX_128bit); >> %} >> >> >> Changing the assert in vminmax_fp from: >> assert_different_registers(a, b, tmp, atmp, btmp); >> to: >> assert_different_registers(a, tmp, atmp, btmp); >> assert_different_registers(b, tmp, atmp, btmp); >> fixes the issue. >> >> Similar change done in evminmax_fp. >> >> Please review. >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > review comments Actually Ideal transformation fix could be smaller than these changes. You will not need to change platform specific code. Hmm, may be NaN values could be a problem. Have to check for them as we do in other operations. Even suggested "short cut" (use move) could be wrong for NaN. Okay, lets go to the first version of these changes: only assert fix. And file separate RFE to make changes in Ideal graph. And we need regression test as @eme64 pointed. ------------- PR Review: https://git.openjdk.org/jdk/pull/17315#pullrequestreview-1811885864 From kvn at openjdk.org Tue Jan 9 20:21:21 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 9 Jan 2024 20:21:21 GMT Subject: RFR: 8320237 C2: late inlining of method handle invoke causes duplicate lines in PrintInlining output In-Reply-To: References: Message-ID: On Mon, 18 Dec 2023 15:28:47 GMT, Kangcheng Xu wrote: > This PR resolves [JDK-8320237](https://bugs.openjdk.org/browse/JDK-8320237) > > The original behavior produces both a failure and success message upon late inlining which is confusing. The patch removes the failure message if inlining was successful. Huge thanks to @rwestrel for reporting and working out a solution. > > Unit test `test/hotspot/jtreg/compiler/inlining/TestDuplicatedLateInliningOutput.java` is added and passing. Looks good to me too. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17147#pullrequestreview-1811918664 From duke at openjdk.org Tue Jan 9 21:24:26 2024 From: duke at openjdk.org (Zhiqiang Zang) Date: Tue, 9 Jan 2024 21:24:26 GMT Subject: RFR: 8322589: Add Ideal transformation: (~a) & (~b) => ~(a | b) [v5] In-Reply-To: References: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> Message-ID: On Mon, 8 Jan 2024 06:58:58 GMT, Tobias Hartmann wrote: >> Zhiqiang Zang has updated the pull request incrementally with one additional commit since the last revision: >> >> update the copyright dates. > > Looks good to me otherwise. @TobiHartmann @eme64 Thanks a lot for reviewing and all the comments. Can you sponsor when you get a chance? ------------- PR Comment: https://git.openjdk.org/jdk/pull/16333#issuecomment-1883815046 From sviswanathan at openjdk.org Tue Jan 9 22:00:37 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 9 Jan 2024 22:00:37 GMT Subject: RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp [v3] In-Reply-To: References: Message-ID: <30j6-cR2RH4NxQzduweT7lsy9BaJ-q4OF52MA30N0vo=.557a3a38-7dac-4475-b186-538a45f57d10@github.com> > The C2_MacroAssembler::vminmax_fp and evminmax_fp were incorrectly checking for the two USE operand registers to be different and thus resulting in assertion failure. > > In x86_64.ad: > > instruct maxF_reg(legRegF dst, legRegF a, legRegF b, legRegF tmp, legRegF atmp, legRegF btmp) %{ > ... > effect(USE a, USE b, TEMP tmp, TEMP atmp, TEMP btmp); > ... > __ vminmax_fp(Op_MaxV, T_FLOAT, $dst$$XMMRegister, $a$$XMMRegister, $b$$XMMRegister, $tmp$$XMMRegister, $atmp$$XMMRegister, $btmp$$XMMRegister, Assembler::AVX_128bit); > %} > > > Changing the assert in vminmax_fp from: > assert_different_registers(a, b, tmp, atmp, btmp); > to: > assert_different_registers(a, tmp, atmp, btmp); > assert_different_registers(b, tmp, atmp, btmp); > fixes the issue. > > Similar change done in evminmax_fp. > > Please review. > > Best Regards, > Sandhya Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: Retain only asserts ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17315/files - new: https://git.openjdk.org/jdk/pull/17315/files/55c6e32e..c5dac9b5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17315&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17315&range=01-02 Stats: 23 lines in 3 files changed: 0 ins; 23 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17315.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17315/head:pull/17315 PR: https://git.openjdk.org/jdk/pull/17315 From sviswanathan at openjdk.org Tue Jan 9 22:36:23 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 9 Jan 2024 22:36:23 GMT Subject: RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp [v3] In-Reply-To: <30j6-cR2RH4NxQzduweT7lsy9BaJ-q4OF52MA30N0vo=.557a3a38-7dac-4475-b186-538a45f57d10@github.com> References: <30j6-cR2RH4NxQzduweT7lsy9BaJ-q4OF52MA30N0vo=.557a3a38-7dac-4475-b186-538a45f57d10@github.com> Message-ID: On Tue, 9 Jan 2024 22:00:37 GMT, Sandhya Viswanathan wrote: >> The C2_MacroAssembler::vminmax_fp and evminmax_fp were incorrectly checking for the two USE operand registers to be different and thus resulting in assertion failure. >> >> In x86_64.ad: >> >> instruct maxF_reg(legRegF dst, legRegF a, legRegF b, legRegF tmp, legRegF atmp, legRegF btmp) %{ >> ... >> effect(USE a, USE b, TEMP tmp, TEMP atmp, TEMP btmp); >> ... >> __ vminmax_fp(Op_MaxV, T_FLOAT, $dst$$XMMRegister, $a$$XMMRegister, $b$$XMMRegister, $tmp$$XMMRegister, $atmp$$XMMRegister, $btmp$$XMMRegister, Assembler::AVX_128bit); >> %} >> >> >> Changing the assert in vminmax_fp from: >> assert_different_registers(a, b, tmp, atmp, btmp); >> to: >> assert_different_registers(a, tmp, atmp, btmp); >> assert_different_registers(b, tmp, atmp, btmp); >> fixes the issue. >> >> Similar change done in evminmax_fp. >> >> Please review. >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > Retain only asserts RFE filed: https://bugs.openjdk.org/browse/JDK-8323429 ------------- PR Comment: https://git.openjdk.org/jdk/pull/17315#issuecomment-1883906977 From kbarrett at openjdk.org Tue Jan 9 22:36:50 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 9 Jan 2024 22:36:50 GMT Subject: RFR: 8323115: x86-32: Incorrect predicates for cmov instruct transforms with UseSSE [v2] In-Reply-To: <7BcH08lab4xxFOxm0UAfhHKVyBBEhRWtIPEUJuOIlmo=.2c5b0918-19ac-4cb3-b2f5-11131ce73422@github.com> References: <7BcH08lab4xxFOxm0UAfhHKVyBBEhRWtIPEUJuOIlmo=.2c5b0918-19ac-4cb3-b2f5-11131ce73422@github.com> Message-ID: > Please review this change that fixes generation of CMOV by C2 as controlled by > UseSSE. The predicates controlling that generation were using implicit > operator precedence that didn't have the expected grouping. Fixed by adding > parentheses to make the desired grouping explicit. > > Testing: Ran GHA with -Wparentheses enabled along with this and other changes > needed to make that work. Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' into x86-32-cmov-preds - fix predicates for cmov with UseSSE ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17296/files - new: https://git.openjdk.org/jdk/pull/17296/files/f2c5ba0d..6f49985d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17296&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17296&range=00-01 Stats: 13013 lines in 180 files changed: 10072 ins; 1583 del; 1358 mod Patch: https://git.openjdk.org/jdk/pull/17296.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17296/head:pull/17296 PR: https://git.openjdk.org/jdk/pull/17296 From kbarrett at openjdk.org Tue Jan 9 22:36:51 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 9 Jan 2024 22:36:51 GMT Subject: RFR: 8323115: x86-32: Incorrect predicates for cmov instruct transforms with UseSSE [v2] In-Reply-To: References: <7BcH08lab4xxFOxm0UAfhHKVyBBEhRWtIPEUJuOIlmo=.2c5b0918-19ac-4cb3-b2f5-11131ce73422@github.com> Message-ID: On Mon, 8 Jan 2024 12:11:11 GMT, Aleksey Shipilev wrote: >> The fix looks good to me but it's concerning that we never hit this in testing. Maybe it never fails because the expanded instructions are guarded by `predicate (UseSSE ...` as well. > >> The fix looks good to me but it's concerning that we never hit this in testing. Maybe it never fails because the expanded instructions are guarded by `predicate (UseSSE ...` as well. > > In my experience fixing bugs in these FPU-related match rules is that it takes a combination of code shape and relevant hardware (that defaults for unusual `UseSSE <= 2`), or specific testing that runs with lower `UseSSE`. I think I was one of the few remaining people who ran x86_32 with `-XX:UseSSE=0`, for example, but finally stopped. I think going forward we would just need to require `UseSSE >= 2` for x86_32, like for x86_64, making these issues go away. Thanks for reviews, @shipilev and @TobiHartmann . ------------- PR Comment: https://git.openjdk.org/jdk/pull/17296#issuecomment-1883903590 From kbarrett at openjdk.org Tue Jan 9 22:36:52 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 9 Jan 2024 22:36:52 GMT Subject: Integrated: 8323115: x86-32: Incorrect predicates for cmov instruct transforms with UseSSE In-Reply-To: <7BcH08lab4xxFOxm0UAfhHKVyBBEhRWtIPEUJuOIlmo=.2c5b0918-19ac-4cb3-b2f5-11131ce73422@github.com> References: <7BcH08lab4xxFOxm0UAfhHKVyBBEhRWtIPEUJuOIlmo=.2c5b0918-19ac-4cb3-b2f5-11131ce73422@github.com> Message-ID: On Mon, 8 Jan 2024 07:55:21 GMT, Kim Barrett wrote: > Please review this change that fixes generation of CMOV by C2 as controlled by > UseSSE. The predicates controlling that generation were using implicit > operator precedence that didn't have the expected grouping. Fixed by adding > parentheses to make the desired grouping explicit. > > Testing: Ran GHA with -Wparentheses enabled along with this and other changes > needed to make that work. This pull request has now been integrated. Changeset: 28d8149c Author: Kim Barrett URL: https://git.openjdk.org/jdk/commit/28d8149c693a9470bbde4b1a27c4b9be6c5f365c Stats: 13 lines in 1 file changed: 0 ins; 0 del; 13 mod 8323115: x86-32: Incorrect predicates for cmov instruct transforms with UseSSE Reviewed-by: shade, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/17296 From kbarrett at openjdk.org Tue Jan 9 22:52:25 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 9 Jan 2024 22:52:25 GMT Subject: RFR: 8322816: RISC-V: Incorrect guarantee in patch_vtype [v2] In-Reply-To: References: Message-ID: On Mon, 8 Jan 2024 01:05:39 GMT, Kim Barrett wrote: >> Please review this change that fixes a test for a guarantee. This also >> removes a -Wparentheses warning when those are enabled (which is how the >> problem was discovered). >> >> The problem is that operator precedence groups the sub-expressions differently >> than intended. The fix is to override the operator precedence by adding >> parentheses to achieve the intended grouping. >> >> Testing: Local (linux-x64) cross-build for linux-riscv with this change plus >> -Wparentheses enabled and other changes to allow that to work. >> >> Requesting someone from the riscv porters to properly test this. > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > guarantee !vill Waiting for second review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17215#issuecomment-1883922445 From sviswanathan at openjdk.org Tue Jan 9 23:03:35 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 9 Jan 2024 23:03:35 GMT Subject: RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp [v4] In-Reply-To: References: Message-ID: > The C2_MacroAssembler::vminmax_fp and evminmax_fp were incorrectly checking for the two USE operand registers to be different and thus resulting in assertion failure. > > In x86_64.ad: > > instruct maxF_reg(legRegF dst, legRegF a, legRegF b, legRegF tmp, legRegF atmp, legRegF btmp) %{ > ... > effect(USE a, USE b, TEMP tmp, TEMP atmp, TEMP btmp); > ... > __ vminmax_fp(Op_MaxV, T_FLOAT, $dst$$XMMRegister, $a$$XMMRegister, $b$$XMMRegister, $tmp$$XMMRegister, $atmp$$XMMRegister, $btmp$$XMMRegister, Assembler::AVX_128bit); > %} > > > Changing the assert in vminmax_fp from: > assert_different_registers(a, b, tmp, atmp, btmp); > to: > assert_different_registers(a, tmp, atmp, btmp); > assert_different_registers(b, tmp, atmp, btmp); > fixes the issue. > > Similar change done in evminmax_fp. > > Please review. > > Best Regards, > Sandhya Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: add test case from vpaprotsk ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17315/files - new: https://git.openjdk.org/jdk/pull/17315/files/c5dac9b5..43462531 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17315&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17315&range=02-03 Stats: 11 lines in 1 file changed: 11 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17315.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17315/head:pull/17315 PR: https://git.openjdk.org/jdk/pull/17315 From luhenry at openjdk.org Tue Jan 9 23:12:24 2024 From: luhenry at openjdk.org (Ludovic Henry) Date: Tue, 9 Jan 2024 23:12:24 GMT Subject: RFR: 8322816: RISC-V: Incorrect guarantee in patch_vtype [v2] In-Reply-To: References: Message-ID: On Mon, 8 Jan 2024 01:05:39 GMT, Kim Barrett wrote: >> Please review this change that fixes a test for a guarantee. This also >> removes a -Wparentheses warning when those are enabled (which is how the >> problem was discovered). >> >> The problem is that operator precedence groups the sub-expressions differently >> than intended. The fix is to override the operator precedence by adding >> parentheses to achieve the intended grouping. >> >> Testing: Local (linux-x64) cross-build for linux-riscv with this change plus >> -Wparentheses enabled and other changes to allow that to work. >> >> Requesting someone from the riscv porters to properly test this. > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > guarantee !vill Marked as reviewed by luhenry (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/17215#pullrequestreview-1812165209 From sviswanathan at openjdk.org Tue Jan 9 23:23:23 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 9 Jan 2024 23:23:23 GMT Subject: RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp [v2] In-Reply-To: References: Message-ID: On Tue, 9 Jan 2024 19:55:03 GMT, Vladimir Kozlov wrote: >> Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: >> >> review comments > > Actually Ideal transformation fix could be smaller than these changes. You will not need to change platform specific code. Hmm, may be NaN values could be a problem. Have to check for them as we do in other operations. Even suggested "short cut" (use move) could be wrong for NaN. > > Okay, lets go to the first version of these changes: only assert fix. And file separate RFE to make changes in Ideal graph. > > And we need regression test as @eme64 pointed. @vnkozlov I have reverted the changes to just asserts and added a test case to the existing test. The new test case fails without this PR and passes with the PR changes. @eme64 I have verified that [Test_276.java](https://bugs.openjdk.org/secure/attachment/107681/Test_276.java) fails without this PR with the given arguments in the [JBS Bug Entry](https://bugs.openjdk.org/browse/JDK-8322090) and passes with the PR changes. I have filed an [RFE](https://bugs.openjdk.org/browse/JDK-8323429) for future optimization as requested. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17315#issuecomment-1883949886 From kvn at openjdk.org Tue Jan 9 23:36:26 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 9 Jan 2024 23:36:26 GMT Subject: RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp [v4] In-Reply-To: References: Message-ID: On Tue, 9 Jan 2024 23:03:35 GMT, Sandhya Viswanathan wrote: >> The C2_MacroAssembler::vminmax_fp and evminmax_fp were incorrectly checking for the two USE operand registers to be different and thus resulting in assertion failure. >> >> In x86_64.ad: >> >> instruct maxF_reg(legRegF dst, legRegF a, legRegF b, legRegF tmp, legRegF atmp, legRegF btmp) %{ >> ... >> effect(USE a, USE b, TEMP tmp, TEMP atmp, TEMP btmp); >> ... >> __ vminmax_fp(Op_MaxV, T_FLOAT, $dst$$XMMRegister, $a$$XMMRegister, $b$$XMMRegister, $tmp$$XMMRegister, $atmp$$XMMRegister, $btmp$$XMMRegister, Assembler::AVX_128bit); >> %} >> >> >> Changing the assert in vminmax_fp from: >> assert_different_registers(a, b, tmp, atmp, btmp); >> to: >> assert_different_registers(a, tmp, atmp, btmp); >> assert_different_registers(b, tmp, atmp, btmp); >> fixes the issue. >> >> Similar change done in evminmax_fp. >> >> Please review. >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > add test case from vpaprotsk Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17315#pullrequestreview-1812183690 From kvn at openjdk.org Tue Jan 9 23:36:40 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 9 Jan 2024 23:36:40 GMT Subject: RFR: 8322743: assert(held_monitor_count() == jni_monitor_count()) failed Message-ID: Corner case with a local (not escaped) object used for synchronization. C2 Escape Analysis thinks that it can eliminate locks for it. In most cases it is true but not in this case. for (int i = 0; i < 2; ++i) { Object o = new Object(); synchronized (o) { // monitorenter // Trigger OSR compilation for (int j = 0; j < 100_000; ++j) { The test has nested loop which trigger OSR compilation. The locked object comes from Interpreter into compiled OSR code. During parsing C2 creates an other non escaped object and correctly merge both together (with Phi node) so that non escaped object is not scalar replaceable. Because it does not globally escapes EA still removes locks for it and, as result, also for merged locked object from Interpreter which is the bug. The fix is to check that synchronized block does not have any associated escaped objects when EA decides if locks can be eliminated. Added regression test prepared by @TobiHartmann. Tested tier1-5, xcomp and stress. Performance testing show no difference. ------------- Commit messages: - Fix trailing and other spaces. - 8322743: assert(held_monitor_count() == jni_monitor_count()) failed Changes: https://git.openjdk.org/jdk/pull/17331/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17331&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8322743 Stats: 132 lines in 6 files changed: 115 ins; 2 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/17331.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17331/head:pull/17331 PR: https://git.openjdk.org/jdk/pull/17331 From sviswanathan at openjdk.org Tue Jan 9 23:46:45 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 9 Jan 2024 23:46:45 GMT Subject: RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp [v5] In-Reply-To: References: Message-ID: <-rzXk3WJ9aOgrgoyIVDpCeiQLBkq5N5yQCYp_42oMEo=.df74c3a7-508d-48a1-baa2-796a2015bea1@github.com> > The C2_MacroAssembler::vminmax_fp and evminmax_fp were incorrectly checking for the two USE operand registers to be different and thus resulting in assertion failure. > > In x86_64.ad: > > instruct maxF_reg(legRegF dst, legRegF a, legRegF b, legRegF tmp, legRegF atmp, legRegF btmp) %{ > ... > effect(USE a, USE b, TEMP tmp, TEMP atmp, TEMP btmp); > ... > __ vminmax_fp(Op_MaxV, T_FLOAT, $dst$$XMMRegister, $a$$XMMRegister, $b$$XMMRegister, $tmp$$XMMRegister, $atmp$$XMMRegister, $btmp$$XMMRegister, Assembler::AVX_128bit); > %} > > > Changing the assert in vminmax_fp from: > assert_different_registers(a, b, tmp, atmp, btmp); > to: > assert_different_registers(a, tmp, atmp, btmp); > assert_different_registers(b, tmp, atmp, btmp); > fixes the issue. > > Similar change done in evminmax_fp. > > Please review. > > Best Regards, > Sandhya Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: copyright year update ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17315/files - new: https://git.openjdk.org/jdk/pull/17315/files/43462531..05f8cf81 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17315&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17315&range=03-04 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17315.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17315/head:pull/17315 PR: https://git.openjdk.org/jdk/pull/17315 From sviswanathan at openjdk.org Tue Jan 9 23:46:47 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 9 Jan 2024 23:46:47 GMT Subject: RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp [v4] In-Reply-To: References: Message-ID: On Tue, 9 Jan 2024 23:33:39 GMT, Vladimir Kozlov wrote: >> Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: >> >> add test case from vpaprotsk > > Looks good. Thanks a lot @vnkozlov. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17315#issuecomment-1883967275 From dean.long at oracle.com Wed Jan 10 00:11:10 2024 From: dean.long at oracle.com (dean.long at oracle.com) Date: Tue, 9 Jan 2024 16:11:10 -0800 Subject: discuss about release barrier for final fields initialization In-Reply-To: <5e41f867-b31b-4ded-b737-8d0c869c8895.kuaiwei.kw@alibaba-inc.com> References: <5e41f867-b31b-4ded-b737-8d0c869c8895.kuaiwei.kw@alibaba-inc.com> Message-ID: <581217fe-bb9b-4cbb-865b-316559ad8646@oracle.com> We only have https://bugs.openjdk.org/browse/JDK-8300148 for that. thanks, dl On 1/8/24 10:23 PM, Kuai Wei wrote: > > Hi, > > ? I made some experiments on object allocation performance. And I > found on aarch64 N1, if object has final field, the allocation rate is > about 75% of normal allocation. > The cause is C2 will insert a release membar in , which will be > translated as "dmb.ish" in aarch64. For normal allocation, a membar > storestore is inserted and > is emitted as "dmb.ishst", it make the difference. The test jmh is > https://gist.github.com/kuaiwei/f71fba40df29991c93325a8600e34c13 > > java -jar target/benchmarks.jar -f 1 -wi 5 -w 3 -i 3 -r 3 testAlloc > ... > > Benchmark ? ? ? Mode? Cnt ? ? Score? ? Error? Units > AllocFinal.testAlloc ?thrpt? ? 3? 1167.903 ? 44.973? ops/s > AllocFinal.testAllocWithFinal? thrpt? ? 3 915.330 ? 52.596? ops/s > > > ? I found only C2 will insert release membar and C1 just insert > storestore for both final and normal allocation. In Doug Lea's > cookbook https://gee.cs.oswego.edu/dl/jmm/cookbook.html > Only storesotre is required. Alex has a great post on this topic > https://shipilev.net/blog/2014/all-fields-are-final/?. It referred?a > case why loadstore is needed. > https://www.hboehm.info/c++mm/no_write_fences.html > I checked this case and IMO it looks some legacy architecture may > break data dependency and cause problem. As I know, alpha architecture > is an example. I think it doesn't > break on modern architecture. Is there other case I missed? > > ? If storestore is enough in this situation, I will send a PR to loose > the barrier. > > Thanks, > Kuai Wei -------------- next part -------------- An HTML attachment was scrubbed... URL: From kbarrett at openjdk.org Wed Jan 10 00:15:37 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 10 Jan 2024 00:15:37 GMT Subject: RFR: 8322816: RISC-V: Incorrect guarantee in patch_vtype [v3] In-Reply-To: References: Message-ID: > Please review this change that fixes a test for a guarantee. This also > removes a -Wparentheses warning when those are enabled (which is how the > problem was discovered). > > The problem is that operator precedence groups the sub-expressions differently > than intended. The fix is to override the operator precedence by adding > parentheses to achieve the intended grouping. > > Testing: Local (linux-x64) cross-build for linux-riscv with this change plus > -Wparentheses enabled and other changes to allow that to work. > > Requesting someone from the riscv porters to properly test this. Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into riscv-paren-bug - guarantee !vill - fix subexpression grouping in patch_vtype guarantee ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17215/files - new: https://git.openjdk.org/jdk/pull/17215/files/ab335602..22fc7a2b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17215&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17215&range=01-02 Stats: 18102 lines in 593 files changed: 12845 ins; 2510 del; 2747 mod Patch: https://git.openjdk.org/jdk/pull/17215.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17215/head:pull/17215 PR: https://git.openjdk.org/jdk/pull/17215 From kbarrett at openjdk.org Wed Jan 10 00:15:37 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 10 Jan 2024 00:15:37 GMT Subject: RFR: 8322816: RISC-V: Incorrect guarantee in patch_vtype [v3] In-Reply-To: References: Message-ID: On Mon, 8 Jan 2024 02:06:05 GMT, Fei Yang wrote: >> Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Merge branch 'master' into riscv-paren-bug >> - guarantee !vill >> - fix subexpression grouping in patch_vtype guarantee > > Marked as reviewed by fyang (Reviewer). Thanks for reviews, @RealFYang and @luhenry . ------------- PR Comment: https://git.openjdk.org/jdk/pull/17215#issuecomment-1883991703 From kbarrett at openjdk.org Wed Jan 10 00:21:30 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 10 Jan 2024 00:21:30 GMT Subject: Integrated: 8322816: RISC-V: Incorrect guarantee in patch_vtype In-Reply-To: References: Message-ID: On Tue, 2 Jan 2024 07:23:56 GMT, Kim Barrett wrote: > Please review this change that fixes a test for a guarantee. This also > removes a -Wparentheses warning when those are enabled (which is how the > problem was discovered). > > The problem is that operator precedence groups the sub-expressions differently > than intended. The fix is to override the operator precedence by adding > parentheses to achieve the intended grouping. > > Testing: Local (linux-x64) cross-build for linux-riscv with this change plus > -Wparentheses enabled and other changes to allow that to work. > > Requesting someone from the riscv porters to properly test this. This pull request has now been integrated. Changeset: f4ca41ad Author: Kim Barrett URL: https://git.openjdk.org/jdk/commit/f4ca41ad75fa78a08ff069ba0b6ac3596e35c23d Stats: 5 lines in 1 file changed: 0 ins; 2 del; 3 mod 8322816: RISC-V: Incorrect guarantee in patch_vtype Reviewed-by: fyang, luhenry ------------- PR: https://git.openjdk.org/jdk/pull/17215 From vlivanov at openjdk.org Wed Jan 10 01:00:33 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 10 Jan 2024 01:00:33 GMT Subject: RFR: 8323115: x86-32: Incorrect predicates for cmov instruct transforms with UseSSE [v2] In-Reply-To: References: <7BcH08lab4xxFOxm0UAfhHKVyBBEhRWtIPEUJuOIlmo=.2c5b0918-19ac-4cb3-b2f5-11131ce73422@github.com> Message-ID: On Mon, 8 Jan 2024 12:11:11 GMT, Aleksey Shipilev wrote: > The fix looks good to me but it's concerning that we never hit this in testing. @TobiHartmann it looks more like the bug is benign since the predicates are effectively redundant. The AD instructions have different operands (`regFPR`/`regDPR` vs `regF`/`regD` which also have `UseSSE` predicates) , so they don't conflict at runtime. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17296#issuecomment-1884026389 From cslucas at openjdk.org Wed Jan 10 01:29:40 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 10 Jan 2024 01:29:40 GMT Subject: RFR: JDK-8323190: Segfault during deoptimization of C2-compiled code Message-ID: Currently, if `ReduceAllocationMerges` reduces an allocation merge that is used as a monitor C2 will SIGFAULT in `Process_OopMap_Node` because it's missing code to handle that case. This patch fixes C2 to properly handle reduced allocation merges that are used as monitors. Tested with Linux x86_64 hotspot_all. ------------- Commit messages: - Fix invalid location. Changes: https://git.openjdk.org/jdk/pull/17333/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17333&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8323190 Stats: 88 lines in 2 files changed: 88 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17333.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17333/head:pull/17333 PR: https://git.openjdk.org/jdk/pull/17333 From duke at openjdk.org Wed Jan 10 02:09:35 2024 From: duke at openjdk.org (Yude Lin) Date: Wed, 10 Jan 2024 02:09:35 GMT Subject: RFR: 8323122: AArch64: Increase itable stub size estimate Message-ID: Since [JDK-8307352](https://bugs.openjdk.org/browse/JDK-8307352), itable stub size has grown by 20 bytes on linux-aarch64. In particular, the "slop-counted" code increases from 100->120 bytes, where the current estimate is 124 bytes. I haven't found a case where it exceeds the estimate. For now this size is stable across the few linux-aarch64 configurations I ran with. It doesn't vary (for example) by different klass decoding schemes. But I think the idea of the estimate is that we can never know. I propose we increase the estimate to be safe. Passed hotspot/jtreg/:tier1 ------------- Commit messages: - 8323122: AArch64: Increase itable stub size estimate Changes: https://git.openjdk.org/jdk/pull/17336/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17336&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8323122 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17336.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17336/head:pull/17336 PR: https://git.openjdk.org/jdk/pull/17336 From dlong at openjdk.org Wed Jan 10 02:10:22 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 10 Jan 2024 02:10:22 GMT Subject: RFR: 8322743: assert(held_monitor_count() == jni_monitor_count()) failed In-Reply-To: References: Message-ID: On Tue, 9 Jan 2024 22:57:27 GMT, Vladimir Kozlov wrote: > Corner case with a local (not escaped) object used for synchronization. C2 Escape Analysis thinks that it can eliminate locks for it. In most cases it is true but not in this case. > > > for (int i = 0; i < 2; ++i) { > Object o = new Object(); > synchronized (o) { // monitorenter > // Trigger OSR compilation > for (int j = 0; j < 100_000; ++j) { > > The test has nested loop which trigger OSR compilation. The locked object comes from Interpreter into compiled OSR code. During parsing C2 creates an other non escaped object and correctly merge both together (with Phi node) so that non escaped object is not scalar replaceable. Because it does not globally escapes EA still removes locks for it and, as result, also for merged locked object from Interpreter which is the bug. > > The fix is to check that synchronized block does not have any associated escaped objects when EA decides if locks can be eliminated. > > Added regression test prepared by @TobiHartmann. Tested tier1-5, xcomp and stress. > Performance testing show no difference. I'm wondering if there is a simpler solution. What if in `Parse::load_interpreter_state` we maark the lock objects from the interpreter as global escape? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17331#issuecomment-1884076429 From fyang at openjdk.org Wed Jan 10 06:11:24 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 10 Jan 2024 06:11:24 GMT Subject: RFR: 8317721: RISC-V: Implement CRC32 intrinsic [v4] In-Reply-To: References: Message-ID: On Tue, 9 Jan 2024 16:47:08 GMT, ArsenyBochkarev wrote: >> Hi everyone! Please review this port of [AArch64](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L4224) `_updateBytesCRC32`, `_updateByteBufferCRC32` and `_updateCRC32` intrinsics. This patch introduces only the plain (non-vectorized, no Zbc) version. >> >> ### Correctness checks >> >> Tier 1/2 tests are ok. >> >> ### Performance results on T-Head board >> >> #### Results for enabled intrinsic: >> >> Used test is `test/micro/org/openjdk/bench/java/util/TestCRC32.java` >> >> | Benchmark | (count) | Mode | Cnt | Score | Error | Units | >> | --- | ---- | ----- | --- | ---- | --- | ---- | >> | CRC32.TestCRC32.testCRC32Update | 64 | thrpt | 24 | 3730.929 | 37.773 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 128 | thrpt | 24 | 2126.673 | 2.032 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 256 | thrpt | 24 | 1134.330 | 6.714 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 512 | thrpt | 24 | 584.017 | 2.267 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 2048 | thrpt | 24 | 151.173 | 0.346 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 16384 | thrpt | 24 | 19.113 | 0.008 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 65536 | thrpt | 24 | 4.647 | 0.022 | ops/ms | >> >> #### Results for disabled intrinsic: >> >> | Benchmark | (count) | Mode | Cnt | Score | Error | Units | >> | --------------------------------------------------- | ---------- | --------- | ---- | ----------- | --------- | ---------- | >> | CRC32.TestCRC32.testCRC32Update | 64 | thrpt | 15 | 798.365 | 35.486 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 128 | thrpt | 15 | 677.756 | 46.619 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 256 | thrpt | 15 | 552.781 | 27.143 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 512 | thrpt | 15 | 429.304 | 12.518 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 2048 | thrpt | 15 | 166.738 | 0.935 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 16384 | thrpt | 15 | 25.060 | 0.034 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 65536 | thrpt | 15 | 6.196 | 0.030 | ops/ms | > > ArsenyBochkarev has updated the pull request incrementally with four additional commits since the last revision: > > - Fix unroll size > - Rename constants > - Partially unroll loop > - Optimize loop counter in L_by16_loop Hi, do we have performance numbers on other hardware platforms like unmatched? Thanks. src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 4655: > 4653: const Register table3 = c_rarg6; > 4654: > 4655: const Register tmp1 = t0; As previously discussed elsewhere, it is error-prone to create aliases for scratch registers like `t0` and pass as parameters to other assember functions. It will be safer if we use `t0` directly in `kernel_crc32` and remove the `tmp` formal parameter of `kernel_crc32`. ------------- Changes requested by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17046#pullrequestreview-1812473851 PR Review Comment: https://git.openjdk.org/jdk/pull/17046#discussion_r1446923292 From chagedorn at openjdk.org Wed Jan 10 07:20:31 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 10 Jan 2024 07:20:31 GMT Subject: RFR: 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 [v14] In-Reply-To: References: Message-ID: On Tue, 9 Jan 2024 17:02:48 GMT, Roland Westrelin wrote: >> Range check smearing and range check predication make an array access >> dependent on 2 (or more in the case of RC smearing) conditions. As a >> consequence, if a range check can be eliminated because there's an >> identical dominating range check, the control dependent nodes that >> could float and become dependent on the dominating range check cannot >> be allowed to float because there's a risk that they would then bypass >> one of the checks that make the access legal. >> >> `IfNode::dominated_by()` and `PhaseIdealLoop::dominated_by()` have >> logic to prevent this: nodes that are control dependent on a range >> check or predicate are not allowed to float. This is however not >> sufficient as demonstrated by the test cases. >> >> In `TestArrayAccessAboveRCAfterSmearingOrPredication.testRangeCheckSmearing()`: >> >> >> v += array[i]; >> if (flag2) { >> if (flag3) { >> field = 0x42; >> } >> } >> if (flagField == 1) { >> v += array[i]; >> } >> >> >> The range check for the second `array[i]` load is replaced by the >> dominating range check for the first `array[i]` but because the second >> `array[i]` load could really be dependent on multiple range checks (in >> case smearing happened which is not the case here), c2 doesn't allow >> the second `array[i]` to float when the second range check is >> removed. The second `array[i]` is then control dependent on: >> >> >> if (flagField == 1) { >> >> >> which is next found to be dominated by the same test: >> >> >> if (flag == 1) { >> >> >> and is removed. However nothing in `dominated_by()` treats node >> dependent on tests that are not range check or predicates >> specially. So the second `array[i]` is allowed to float and become >> dependent on: >> >> >> if (flag == 1) { >> >> >> which is above the range check for that access. The test method in its >> last invocation is passed an index for the array access that's widely >> out of range. The array load happens before the range check and >> crashes the VM. `testLoopPredication()` is a similar test where array >> loads become dependent on predicates and end up above range checks. >> >> `TestArrayAccessCastIIAboveRC.java` is the test case from the bug >> where for similar reasons a range check `CastII` ends up above its >> range check, becomes top because its input becomes some integer that >> conflicts with its... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Thanks for the updates, looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16886#pullrequestreview-1812551628 From thartmann at openjdk.org Wed Jan 10 07:33:31 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 10 Jan 2024 07:33:31 GMT Subject: RFR: 8322077: Add Ideal transformation: (~a) | (~b) => ~(a & b) [v12] In-Reply-To: References: Message-ID: <8yZZsbOchNFkdPEmTKwxVZ_j_XjzWHV42j32ZXG9fAU=.e1899965-da0a-422c-8898-858c81ecb96b@github.com> On Tue, 9 Jan 2024 17:23:58 GMT, Zhiqiang Zang wrote: >> Hello, >> >> (~a) | (~b) => ~(a & b) is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1617C28-L1617C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. >> >> Thanks. > > Zhiqiang Zang has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 18 additional commits since the last revision: > > - update copyright dates. > - Merge branch 'andnode-PNewDeMorganLawAndToOr' into ornode-PNewDeMorganLawOrToAnd > - Revert "adapt changes from the dependent pr." > > This reverts commit 0c8d10776fe55a28ea9f60160b4f6f57aabad5ab. > - Revert "adapt to new changes from the dependant pr." > > This reverts commit b21e242b306d9cf01e1ba84ebba47ef5b471d98e. > - adapt to new changes from the dependant pr. > - Merge branch 'andnode-PNewDeMorganLawAndToOr' into ornode-PNewDeMorganLawOrToAnd > - adapt changes from the dependent pr. > - Merge branch 'andnode-PNewDeMorganLawAndToOr' into ornode-PNewDeMorganLawOrToAnd > - Update test/hotspot/jtreg/compiler/c2/irTests/DeMorganLawLongTests.java > > Co-authored-by: Tobias Hartmann > - Update test/hotspot/jtreg/compiler/c2/irTests/DeMorganLawIntTests.java > > Co-authored-by: Tobias Hartmann > - ... and 8 more: https://git.openjdk.org/jdk/compare/f2cd45d6...dc60a548 Marked as reviewed by thartmann (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/16334#pullrequestreview-1812568240 From thartmann at openjdk.org Wed Jan 10 07:34:39 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 10 Jan 2024 07:34:39 GMT Subject: RFR: 8322589: Add Ideal transformation: (~a) & (~b) => ~(a | b) [v11] In-Reply-To: References: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> Message-ID: On Tue, 9 Jan 2024 16:56:50 GMT, Zhiqiang Zang wrote: >> Hello, >> >> `(~a) & (~b) => ~(a | b)` is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1616C28-L1616C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. >> >> Thanks. > > Zhiqiang Zang has updated the pull request incrementally with three additional commits since the last revision: > > - Revert "move the two helper functions to member functions of the node class." > > This reverts commit 7a962d69ac687c0476e54cd004037f2ebb0800ba. > - Revert "update copyright dates." > > This reverts commit 3665de2f3d38b4fce3ba968ec6247dd2576ecc32. > - Revert "move make_not back to AddNode because it cannot compile for architecture arm, s390x, ppc64le." > > This reverts commit 4ee8b089b6f119b1692dcd2c512d249d92734697. Thanks, looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/16333#pullrequestreview-1812567403 From duke at openjdk.org Wed Jan 10 07:34:42 2024 From: duke at openjdk.org (Zhiqiang Zang) Date: Wed, 10 Jan 2024 07:34:42 GMT Subject: Integrated: 8322589: Add Ideal transformation: (~a) & (~b) => ~(a | b) In-Reply-To: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> References: <-pwjKEB97C-bM068JQN0PY1hl65IzcuQZfHzRoKu92g=.d6118773-20d2-46cd-9284-5168c9334bb5@github.com> Message-ID: <-TukAXeNv1T-WEIfmKX8CfgtWfhnHflXws1wGZ78e5s=.68f2ac9d-2d85-4821-aa37-3375d52619d1@github.com> On Tue, 24 Oct 2023 04:49:20 GMT, Zhiqiang Zang wrote: > Hello, > > `(~a) & (~b) => ~(a | b)` is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1616C28-L1616C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. > > Thanks. This pull request has now been integrated. Changeset: 85692274 Author: Zhiqiang Zang Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/856922747358291ed2e112c328fb776a7be2567d Stats: 132 lines in 6 files changed: 121 ins; 0 del; 11 mod 8322589: Add Ideal transformation: (~a) & (~b) => ~(a | b) Reviewed-by: thartmann, epeter ------------- PR: https://git.openjdk.org/jdk/pull/16333 From thartmann at openjdk.org Wed Jan 10 07:38:25 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 10 Jan 2024 07:38:25 GMT Subject: RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp [v5] In-Reply-To: <-rzXk3WJ9aOgrgoyIVDpCeiQLBkq5N5yQCYp_42oMEo=.df74c3a7-508d-48a1-baa2-796a2015bea1@github.com> References: <-rzXk3WJ9aOgrgoyIVDpCeiQLBkq5N5yQCYp_42oMEo=.df74c3a7-508d-48a1-baa2-796a2015bea1@github.com> Message-ID: <4yFqfJwrep8NWbNTztQQvQY9dap5gQkLIrOFh6Od2Js=.98c4d04c-1b3b-4b71-9c3c-9c3ff021793e@github.com> On Tue, 9 Jan 2024 23:46:45 GMT, Sandhya Viswanathan wrote: >> The C2_MacroAssembler::vminmax_fp and evminmax_fp were incorrectly checking for the two USE operand registers to be different and thus resulting in assertion failure. >> >> In x86_64.ad: >> >> instruct maxF_reg(legRegF dst, legRegF a, legRegF b, legRegF tmp, legRegF atmp, legRegF btmp) %{ >> ... >> effect(USE a, USE b, TEMP tmp, TEMP atmp, TEMP btmp); >> ... >> __ vminmax_fp(Op_MaxV, T_FLOAT, $dst$$XMMRegister, $a$$XMMRegister, $b$$XMMRegister, $tmp$$XMMRegister, $atmp$$XMMRegister, $btmp$$XMMRegister, Assembler::AVX_128bit); >> %} >> >> >> Changing the assert in vminmax_fp from: >> assert_different_registers(a, b, tmp, atmp, btmp); >> to: >> assert_different_registers(a, tmp, atmp, btmp); >> assert_different_registers(b, tmp, atmp, btmp); >> fixes the issue. >> >> Similar change done in evminmax_fp. >> >> Please review. >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > copyright year update Looks good to me. Thanks for including a test. I submitted some testing and will report back once it passed. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17315#pullrequestreview-1812580424 From epeter at openjdk.org Wed Jan 10 07:50:26 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 10 Jan 2024 07:50:26 GMT Subject: RFR: 8322077: Add Ideal transformation: (~a) | (~b) => ~(a & b) [v4] In-Reply-To: <2yuELMZxtnZVtWL2rhdtdzwIjg0tgZEi0csMHasBXtQ=.3e62b735-b4e5-4bfe-9594-bfb66e19205c@github.com> References: <5KKTYmY7dJ4nW0OEQ2UPuloIWOYc-pI9M8HRjoaRzw4=.f5eda6f9-c38c-4a2b-9690-cbb1791a2622@github.com> <2yuELMZxtnZVtWL2rhdtdzwIjg0tgZEi0csMHasBXtQ=.3e62b735-b4e5-4bfe-9594-bfb66e19205c@github.com> Message-ID: On Sat, 6 Jan 2024 00:44:19 GMT, Zhiqiang Zang wrote: >> Looks like a good idea. Left a few comments. >> >> I would have merged this with https://github.com/openjdk/jdk/pull/16333, since it is essentially the symmetric case. But leave it separate now. >> >> It would be nice to have some shared tests, where both optimizations need to be combined. Like: >> `(~a | ~b) & (~c | ~d)` -> `~(a & b) & ~(c & d)` -> `~((a & b) | (c & d))` > > @eme64 @TobiHartmann Thanks for the comments. All addressed. > > I rebased this PR onto #16333 so I was able to add these tests for using both optimizations. (the history was messed up). @CptGit can you merge from master again, please? It looks now like you are pushing both the changes here and the ones from your previous PR. Once you did that, I'd like to run some testing before we push this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16334#issuecomment-1884342502 From epeter at openjdk.org Wed Jan 10 07:59:26 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 10 Jan 2024 07:59:26 GMT Subject: RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp [v5] In-Reply-To: <-rzXk3WJ9aOgrgoyIVDpCeiQLBkq5N5yQCYp_42oMEo=.df74c3a7-508d-48a1-baa2-796a2015bea1@github.com> References: <-rzXk3WJ9aOgrgoyIVDpCeiQLBkq5N5yQCYp_42oMEo=.df74c3a7-508d-48a1-baa2-796a2015bea1@github.com> Message-ID: On Tue, 9 Jan 2024 23:46:45 GMT, Sandhya Viswanathan wrote: >> The C2_MacroAssembler::vminmax_fp and evminmax_fp were incorrectly checking for the two USE operand registers to be different and thus resulting in assertion failure. >> >> In x86_64.ad: >> >> instruct maxF_reg(legRegF dst, legRegF a, legRegF b, legRegF tmp, legRegF atmp, legRegF btmp) %{ >> ... >> effect(USE a, USE b, TEMP tmp, TEMP atmp, TEMP btmp); >> ... >> __ vminmax_fp(Op_MaxV, T_FLOAT, $dst$$XMMRegister, $a$$XMMRegister, $b$$XMMRegister, $tmp$$XMMRegister, $atmp$$XMMRegister, $btmp$$XMMRegister, Assembler::AVX_128bit); >> %} >> >> >> Changing the assert in vminmax_fp from: >> assert_different_registers(a, b, tmp, atmp, btmp); >> to: >> assert_different_registers(a, tmp, atmp, btmp); >> assert_different_registers(b, tmp, atmp, btmp); >> fixes the issue. >> >> Similar change done in evminmax_fp. >> >> Please review. >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > copyright year update @sviswa7 Thanks for filing the follow up RFE! Nice, the reproducer fails without, and passes with your patch :) I also verified that the reproducer `Test_276.java` fails without, and passes with the patch. ==> LGTM ? (pending testing from Tobias) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17315#pullrequestreview-1812606665 From thartmann at openjdk.org Wed Jan 10 08:03:28 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 10 Jan 2024 08:03:28 GMT Subject: RFR: JDK-8323190: Segfault during deoptimization of C2-compiled code In-Reply-To: References: Message-ID: <6jsDdUuD6X9u8CT-4SUN_r2VUgPA4QuPCMLQyz-myaY=.52f1830e-0c87-44bf-931a-c272cabb74b4@github.com> On Wed, 10 Jan 2024 01:22:37 GMT, Cesar Soares Lucas wrote: > Currently, if `ReduceAllocationMerges` reduces an allocation merge that is used as a monitor C2 will SIGFAULT in `Process_OopMap_Node` because it's missing code to handle that case. This patch fixes C2 to properly handle reduced allocation merges that are used as monitors. > > Tested with Linux x86_64 hotspot_all. Thanks for quickly jumping on this, Cesar! The fix looks good to me. I also submitted testing and will report back once it passed. It's concerning though that we don't have any other test covering this. Would it make sense to extend `AllocationMergesTests.java` to cover some more variants? src/hotspot/share/opto/output.cpp line 1096: > 1094: > 1095: int merge_pointer_idx = smerge->merge_pointer_idx(youngest_jvms); > 1096: (void)FillLocArray(0, sfn, sfn->in(merge_pointer_idx), &deps, objs); Suggestion: FillLocArray(0, sfn, sfn->in(merge_pointer_idx), &deps, objs); Also below. I know that this is used in old code but I don't think it has any value. test/hotspot/jtreg/compiler/escapeAnalysis/TestInvalidLocation.java line 2: > 1: /* Copyright (c) 2023, Oracle and/or its affiliates. All rights reserved. > 2: * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. Suggestion: /* * Copyright (c) 2024, Oracle and/or its affiliates. All rights reserved. * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. test/hotspot/jtreg/compiler/escapeAnalysis/TestInvalidLocation.java line 27: > 25: * @bug 8323190 > 26: * @summary C2 Segfaults during code generation because of unhandled SafePointScalarMerge monitor debug info. > 27: * @run main/othervm -XX:+UnlockDiagnosticVMOptions -Xcomp -Xbatch -XX:+ReduceAllocationMerges TestInvalidLocation Suggestion: * @run main/othervm -XX:+UnlockDiagnosticVMOptions -Xcomp -XX:+ReduceAllocationMerges TestInvalidLocation `-Xcomp` implies `-Xbatch`. ------------- PR Review: https://git.openjdk.org/jdk/pull/17333#pullrequestreview-1812601748 PR Review Comment: https://git.openjdk.org/jdk/pull/17333#discussion_r1447003530 PR Review Comment: https://git.openjdk.org/jdk/pull/17333#discussion_r1447001783 PR Review Comment: https://git.openjdk.org/jdk/pull/17333#discussion_r1447002283 From thartmann at openjdk.org Wed Jan 10 08:09:22 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 10 Jan 2024 08:09:22 GMT Subject: RFR: 8320237 C2: late inlining of method handle invoke causes duplicate lines in PrintInlining output In-Reply-To: References: Message-ID: On Mon, 18 Dec 2023 15:28:47 GMT, Kangcheng Xu wrote: > This PR resolves [JDK-8320237](https://bugs.openjdk.org/browse/JDK-8320237) > > The original behavior produces both a failure and success message upon late inlining which is confusing. The patch removes the failure message if inlining was successful. Huge thanks to @rwestrel for reporting and working out a solution. > > Unit test `test/hotspot/jtreg/compiler/inlining/TestDuplicatedLateInliningOutput.java` is added and passing. Please do not sponsor this yet. We see various test failures. I'll follow-up shortly. ------------- Changes requested by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17147#pullrequestreview-1812622680 From thartmann at openjdk.org Wed Jan 10 08:16:23 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 10 Jan 2024 08:16:23 GMT Subject: RFR: 8320237 C2: late inlining of method handle invoke causes duplicate lines in PrintInlining output In-Reply-To: References: Message-ID: On Mon, 18 Dec 2023 15:28:47 GMT, Kangcheng Xu wrote: > This PR resolves [JDK-8320237](https://bugs.openjdk.org/browse/JDK-8320237) > > The original behavior produces both a failure and success message upon late inlining which is confusing. The patch removes the failure message if inlining was successful. Huge thanks to @rwestrel for reporting and working out a solution. > > Unit test `test/hotspot/jtreg/compiler/inlining/TestDuplicatedLateInliningOutput.java` is added and passing. `compiler/print/PrintInlining.java` fails with `-ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:+TieredCompilation`: # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (workspace/open/src/hotspot/share/opto/compile.cpp:4601), pid=418042, tid=418058 # assert(_print_inlining_stream->size() > 0) failed: missing inlining msg # # JRE version: Java(TM) SE Runtime Environment (23.0) (fastdebug build 23-internal-2024-01-10-0732483.tobias.hartmann.jdk2) # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 23-internal-2024-01-10-0732483.tobias.hartmann.jdk2, compiled mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) # Problematic frame: # V [libjvm.so+0x9db4e6] Compile::print_inlining_update_delayed(CallGenerator*)+0x1c6 Current CompileTask: C2:643 171 b 4 java.lang.String::substring (58 bytes) Stack: [0x00007f59706d4000,0x00007f59707d4000], sp=0x00007f59707cf220, free space=1004k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x9db4e6] Compile::print_inlining_update_delayed(CallGenerator*)+0x1c6 (compile.cpp:4601) V [libjvm.so+0x844d7e] CallGenerator::do_late_inline_helper()+0x8ee (callGenerator.cpp:687) V [libjvm.so+0x9e1a52] Compile::inline_boxing_calls(PhaseIterGVN&)+0xc2 (compile.cpp:2026) V [libjvm.so+0x9e42e3] Compile::Optimize()+0x583 (compile.cpp:2276) V [libjvm.so+0x9e81a4] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1b44 (compile.cpp:860) V [libjvm.so+0x83d245] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x1d5 (c2compiler.cpp:142) V [libjvm.so+0x9f3bbc] CompileBroker::invoke_compiler_on_method(CompileTask*)+0x92c (compileBroker.cpp:2299) V [libjvm.so+0x9f4848] CompileBroker::compiler_thread_loop()+0x468 (compileBroker.cpp:1958) V [libjvm.so+0xeb98ec] JavaThread::thread_main_inner()+0xcc (javaThread.cpp:721) V [libjvm.so+0x179b586] Thread::call_run()+0xb6 (thread.cpp:220) V [libjvm.so+0x14a8d47] thread_native_entry(Thread*)+0x127 (os_linux.cpp:789) `compiler/cha/StrengthReduceInterfaceCall.java` and `compiler/ciReplay/TestIncrementalInlining.java` fail as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17147#issuecomment-1884373123 From tholenstein at openjdk.org Wed Jan 10 08:33:28 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Wed, 10 Jan 2024 08:33:28 GMT Subject: RFR: JDK-8277869: Maven POMs are using HTTP links where HTTPS is available In-Reply-To: <_yC54VkkHOUc9a7YC6Wf-7QjqTiJkA9ieAWMlwJYApQ=.032ae2c6-0471-4b8f-bf78-dd57fb6c90db@github.com> References: <_yC54VkkHOUc9a7YC6Wf-7QjqTiJkA9ieAWMlwJYApQ=.032ae2c6-0471-4b8f-bf78-dd57fb6c90db@github.com> Message-ID: <96w5MYwXQAoIrPHpLChBCHwm9zACbJCq1UYQhGIhxXI=.08bf5e8b-a2ba-4b23-8734-73ec5743f115@github.com> On Mon, 8 Jan 2024 20:48:09 GMT, Vladimir Kozlov wrote: >> Replace `http` with `https` in xml files of IdealGraphVisualizer and LogCompilation. >> Moreover, replace the old URL (./maven-v4_0_0.xsd) with of the suggested one (./maven-4.0.0.xsd) for `xsi:schemaLocation` in pom.xml. >> >> Tested: IdealGraphVisualizer and LogCompilation build and run as expected. > > Looks good. Thanks for the reviews @vnkozlov and @TobiHartmann ------------- PR Comment: https://git.openjdk.org/jdk/pull/17302#issuecomment-1884392529 From tholenstein at openjdk.org Wed Jan 10 08:33:32 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Wed, 10 Jan 2024 08:33:32 GMT Subject: Integrated: JDK-8277869: Maven POMs are using HTTP links where HTTPS is available In-Reply-To: References: Message-ID: <-GmghZFPYEBDTNBFTXvcbWgk0bhUM8WD92ZZRImPJTI=.f1c52038-315e-445a-925c-c90a428136b2@github.com> On Mon, 8 Jan 2024 10:29:38 GMT, Tobias Holenstein wrote: > Replace `http` with `https` in xml files of IdealGraphVisualizer and LogCompilation. > Moreover, replace the old URL (./maven-v4_0_0.xsd) with of the suggested one (./maven-4.0.0.xsd) for `xsi:schemaLocation` in pom.xml. > > Tested: IdealGraphVisualizer and LogCompilation build and run as expected. This pull request has now been integrated. Changeset: 88378ed0 Author: Tobias Holenstein URL: https://git.openjdk.org/jdk/commit/88378ed0584c7eb0849b6fc1e361fd8ea0698caf Stats: 43 lines in 40 files changed: 1 ins; 1 del; 41 mod 8277869: Maven POMs are using HTTP links where HTTPS is available Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/17302 From rrich at openjdk.org Wed Jan 10 08:55:50 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 10 Jan 2024 08:55:50 GMT Subject: RFR: 8322294: Cleanup NativePostCallNop [v5] In-Reply-To: <6LS57mCF2fgaosnyfnNydaqfT3cD3F42xsDOujG5SgY=.2db5f614-f64d-4fe4-8e68-1c06e70205d3@github.com> References: <6LS57mCF2fgaosnyfnNydaqfT3cD3F42xsDOujG5SgY=.2db5f614-f64d-4fe4-8e68-1c06e70205d3@github.com> Message-ID: > This is a refactoring/cleanup of `NativePostCallNop` that simplifies the ppc64 port (dependent pr https://github.com/openjdk/jdk/pull/17171). > > * `frame::get_oop_map()` is moved to shared code > > * encoding / decoding details of the oopmap slot and the CodeBlob offset are moved from shared code to the platform dependent implementations of `bool NativePostCallNop::patch(int32_t oopmap_slot, int32_t cb_offset)` and `bool NativePostCallNop::decode(int32_t& oopmap_slot, int32_t& cb_offset)` > > The change passed our CI testing. JTReg tests: tier1-4 of hotspot and jdk. All of Langtools and jaxp. SPECjvm2008, SPECjbb2015, Renaissance Suite, and SAP specific tests. > All testing was done with fastdebug and release builds on the main platforms and also on Linux/PPC64le and AIX. > > EDIT 2023-12-22: Statistics > > The statistical numbers were generated with release builds. For riscv64 I used qemu. > The variance is high on all platforms. Up to 80% I think. Numbers with fastdebug are also very different. > Nevertheless, they are consistent within one run, and I'd expect errors in encoding or decoding to manifest in the numbers. > > | test/jdk/java/lang/Thread/virtual/stress/Skynet.java | x86_64: base | x86_64: pr | aarch64: base | aarch64: pr | riscv64: base | riscv64: pr | > |------------------------------------------------------|--------------|------------|---------------|-------------|---------------|-------------| > | PCN lookup success | 17517455 | 15339681 | 13179049 | 15980253 | 19400110 | 30017193 | > | PCN lookup failure | 328164 | 372555 | 237617 | 138164 | 415341 | 586476 | > | PCN decode success | 17513991 | 15336485 | 13176061 | 15977651 | 19397398 | 30014226 | > | PCN decode failure | 3464 | 3196 | 2988 | 2602 | 2712 | 2967 | > | PCN patch success | 2676 | 2465 | 2459 | 2089 | 2214 | 2259 | > | PCN patch cb offset failure | 0 | 0 | 0 | 0 | 0 | 0 | > | PCN patch oopmap slot failure | 0 | 0 | 0 | 0 | 0 | 0 | > > > | SpecJVM2008 compiler.compiler with fix iterations | x86_64: base | x8... Richard Reingruber has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'master' - Suggstion Andrew Co-authored-by: Andrew Haley - Add newline - Review Martin - 8322294: Cleanup NativePostCallNop ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17150/files - new: https://git.openjdk.org/jdk/pull/17150/files/6c1fd588..1dfa9628 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17150&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17150&range=03-04 Stats: 21335 lines in 773 files changed: 14969 ins; 3022 del; 3344 mod Patch: https://git.openjdk.org/jdk/pull/17150.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17150/head:pull/17150 PR: https://git.openjdk.org/jdk/pull/17150 From chagedorn at openjdk.org Wed Jan 10 09:02:22 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 10 Jan 2024 09:02:22 GMT Subject: RFR: JDK-8323190: Segfault during deoptimization of C2-compiled code In-Reply-To: References: Message-ID: On Wed, 10 Jan 2024 01:22:37 GMT, Cesar Soares Lucas wrote: > Currently, if `ReduceAllocationMerges` reduces an allocation merge that is used as a monitor C2 will SIGFAULT in `Process_OopMap_Node` because it's missing code to handle that case. This patch fixes C2 to properly handle reduced allocation merges that are used as monitors. > > Tested with Linux x86_64 hotspot_all. src/hotspot/share/opto/output.cpp line 1092: > 1090: ObjectMergeValue* mv = (ObjectMergeValue*) sv_for_node_id(objs, smerge->_idx); > 1091: > 1092: if (mv == NULL) { You should replace `NULL` with `nullptr` here and below. This also seems wrong here where you took the code from: https://github.com/openjdk/jdk/blob/88378ed0584c7eb0849b6fc1e361fd8ea0698caf/src/hotspot/share/opto/output.cpp#L775-L796 On a separate note, the code looks almost identical. Could it be shared somehow? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17333#discussion_r1447068114 From aph-open at littlepinkcloud.com Wed Jan 10 09:53:43 2024 From: aph-open at littlepinkcloud.com (Andrew Haley) Date: Wed, 10 Jan 2024 09:53:43 +0000 Subject: discuss about release barrier for final fields initialization In-Reply-To: <5e41f867-b31b-4ded-b737-8d0c869c8895.kuaiwei.kw@alibaba-inc.com> References: <5e41f867-b31b-4ded-b737-8d0c869c8895.kuaiwei.kw@alibaba-inc.com> Message-ID: <20965b9a-ec64-4d81-866e-b1a0a94ed1e0@littlepinkcloud.com> On 1/9/24 06:23, Kuai Wei wrote: > ? I found only C2 will insert release membar and C1 just insert storestore for both final and normal allocation. In Doug Lea's cookbook https://gee.cs.oswego.edu/dl/jmm/cookbook.html > Only storesotre is required. Alex has a great post on this topic https://shipilev.net/blog/2014/all-fields-are-final/ ?. It referred?a case why loadstore is needed. https://www.hboehm.info/c++mm/no_write_fences.html > I checked this case and IMO it looks some legacy architecture may break data dependency and cause problem. As I know, alpha architecture is an example. I think it doesn't > break on modern architecture. Is there other case I missed? I think it requires a very careful analysis of the compiler to be sure. The problem occurs if an optimizer knows what a store is going to do. If it does, then there's nothing to prevent a load from being elided, and your load dependency has gone. This isn't a problem with C1, because C1 doesn't do that kind of optimization. I don't know that C2 does either, or even whether it is allowed to do so. From what I remember of the conversation, we left the release barrier in because of an abundance of caution rather than any proof that a storestore was inadequate. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From rrich at openjdk.org Wed Jan 10 10:38:25 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 10 Jan 2024 10:38:25 GMT Subject: RFR: 8290965: PPC64: Implement post-call NOPs [v2] In-Reply-To: References: Message-ID: On Wed, 27 Dec 2023 18:27:22 GMT, Martin Doerr wrote: >> Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix comment >> >> Co-authored-by: Andrew Haley > > src/hotspot/cpu/ppc/frame_ppc.hpp line 414: > >> 412: // Constructors >> 413: inline frame(intptr_t* sp, intptr_t* fp, address pc); >> 414: inline frame(intptr_t* sp, address pc, kind knd = kind::nmethod); > > I think using `kind::nmethod` by default is potentially dangerous. The pc may be outside of the code cache and calling find_blob_fast would be unreliable. It's used by pns for debugging code. It doesn't look performance critical and we could use a conservative default. > I guess that we don't see issues because native code doesn't set bit 9 in CMPI/CMPLI. `pns` does not use this constructor. It uses `frame::frame(void* sp, void* fp, void* pc) : frame((intptr_t*)sp, (address)pc, kind::code_blob)`. So there's no problem. `pns` seems to be the only user of this one. It might good to use `kind::native` there. Using `kind::native` (or `kind::unknow`) as default instead of `kind::nmethod` is potentially problematic since there might be locations in shared code that should set `kind::nmethod`. I think this requires a clean-up of the shared frame api. Note also that using the wrong kind (wrong constructor on other platfroms) hit the assertion in `CodeCache::find_blob_and_oopmap` (that's how I noticed that the distinction is actually needed :)) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17171#discussion_r1447187050 From mli at openjdk.org Wed Jan 10 10:41:23 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 10 Jan 2024 10:41:23 GMT Subject: RFR: 8317721: RISC-V: Implement CRC32 intrinsic [v4] In-Reply-To: References: Message-ID: On Tue, 9 Jan 2024 16:47:08 GMT, ArsenyBochkarev wrote: >> Hi everyone! Please review this port of [AArch64](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L4224) `_updateBytesCRC32`, `_updateByteBufferCRC32` and `_updateCRC32` intrinsics. This patch introduces only the plain (non-vectorized, no Zbc) version. >> >> ### Correctness checks >> >> Tier 1/2 tests are ok. >> >> ### Performance results on T-Head board >> >> #### Results for enabled intrinsic: >> >> Used test is `test/micro/org/openjdk/bench/java/util/TestCRC32.java` >> >> | Benchmark | (count) | Mode | Cnt | Score | Error | Units | >> | --- | ---- | ----- | --- | ---- | --- | ---- | >> | CRC32.TestCRC32.testCRC32Update | 64 | thrpt | 24 | 3730.929 | 37.773 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 128 | thrpt | 24 | 2126.673 | 2.032 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 256 | thrpt | 24 | 1134.330 | 6.714 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 512 | thrpt | 24 | 584.017 | 2.267 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 2048 | thrpt | 24 | 151.173 | 0.346 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 16384 | thrpt | 24 | 19.113 | 0.008 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 65536 | thrpt | 24 | 4.647 | 0.022 | ops/ms | >> >> #### Results for disabled intrinsic: >> >> | Benchmark | (count) | Mode | Cnt | Score | Error | Units | >> | --------------------------------------------------- | ---------- | --------- | ---- | ----------- | --------- | ---------- | >> | CRC32.TestCRC32.testCRC32Update | 64 | thrpt | 15 | 798.365 | 35.486 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 128 | thrpt | 15 | 677.756 | 46.619 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 256 | thrpt | 15 | 552.781 | 27.143 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 512 | thrpt | 15 | 429.304 | 12.518 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 2048 | thrpt | 15 | 166.738 | 0.935 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 16384 | thrpt | 15 | 25.060 | 0.034 | ops/ms | >> | CRC32.TestCRC32.testCRC32Update | 65536 | thrpt | 15 | 6.196 | 0.030 | ops/ms | > > ArsenyBochkarev has updated the pull request incrementally with four additional commits since the last revision: > > - Fix unroll size > - Rename constants > - Partially unroll loop > - Optimize loop counter in L_by16_loop Same the performance trend is that: the larger the data size, the closer the performance gap. when size is `65536`, there seems a little perf regression. So I wonder how it will behave when the size is bigger than 65536, and whether we need to consider the size bigger than 65536 depends on what's the expected regular data size of java CRC32, are the larger data size (equal or larger than 65536) common cases? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17046#issuecomment-1884592765 From rrich at openjdk.org Wed Jan 10 12:20:45 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 10 Jan 2024 12:20:45 GMT Subject: RFR: 8322294: Cleanup NativePostCallNop [v5] In-Reply-To: References: <6LS57mCF2fgaosnyfnNydaqfT3cD3F42xsDOujG5SgY=.2db5f614-f64d-4fe4-8e68-1c06e70205d3@github.com> Message-ID: On Wed, 10 Jan 2024 08:55:50 GMT, Richard Reingruber wrote: >> This is a refactoring/cleanup of `NativePostCallNop` that simplifies the ppc64 port (dependent pr https://github.com/openjdk/jdk/pull/17171). >> >> * `frame::get_oop_map()` is moved to shared code >> >> * encoding / decoding details of the oopmap slot and the CodeBlob offset are moved from shared code to the platform dependent implementations of `bool NativePostCallNop::patch(int32_t oopmap_slot, int32_t cb_offset)` and `bool NativePostCallNop::decode(int32_t& oopmap_slot, int32_t& cb_offset)` >> >> The change passed our CI testing. JTReg tests: tier1-4 of hotspot and jdk. All of Langtools and jaxp. SPECjvm2008, SPECjbb2015, Renaissance Suite, and SAP specific tests. >> All testing was done with fastdebug and release builds on the main platforms and also on Linux/PPC64le and AIX. >> >> EDIT 2023-12-22: Statistics >> >> The statistical numbers were generated with release builds. For riscv64 I used qemu. >> The variance is high on all platforms. Up to 80% I think. Numbers with fastdebug are also very different. >> Nevertheless, they are consistent within one run, and I'd expect errors in encoding or decoding to manifest in the numbers. >> >> | test/jdk/java/lang/Thread/virtual/stress/Skynet.java | x86_64: base | x86_64: pr | aarch64: base | aarch64: pr | riscv64: base | riscv64: pr | >> |------------------------------------------------------|--------------|------------|---------------|-------------|---------------|-------------| >> | PCN lookup success | 17517455 | 15339681 | 13179049 | 15980253 | 19400110 | 30017193 | >> | PCN lookup failure | 328164 | 372555 | 237617 | 138164 | 415341 | 586476 | >> | PCN decode success | 17513991 | 15336485 | 13176061 | 15977651 | 19397398 | 30014226 | >> | PCN decode failure | 3464 | 3196 | 2988 | 2602 | 2712 | 2967 | >> | PCN patch success | 2676 | 2465 | 2459 | 2089 | 2214 | 2259 | >> | PCN patch cb offset failure | 0 | 0 | 0 | 0 | 0 | 0 | >> | PCN patch oopmap slot failure | 0 | 0 | 0 | 0 | 0 | 0 | >> >> >> | SpecJVM2008 compil... > > Richard Reingruber has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' > - Suggstion Andrew > > Co-authored-by: Andrew Haley > - Add newline > - Review Martin > - 8322294: Cleanup NativePostCallNop Tests are good after merging master. Shipping now... Thanks again for the feedback and reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/17150#issuecomment-1884739328 From rrich at openjdk.org Wed Jan 10 12:20:46 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 10 Jan 2024 12:20:46 GMT Subject: Integrated: 8322294: Cleanup NativePostCallNop In-Reply-To: <6LS57mCF2fgaosnyfnNydaqfT3cD3F42xsDOujG5SgY=.2db5f614-f64d-4fe4-8e68-1c06e70205d3@github.com> References: <6LS57mCF2fgaosnyfnNydaqfT3cD3F42xsDOujG5SgY=.2db5f614-f64d-4fe4-8e68-1c06e70205d3@github.com> Message-ID: On Mon, 18 Dec 2023 22:05:32 GMT, Richard Reingruber wrote: > This is a refactoring/cleanup of `NativePostCallNop` that simplifies the ppc64 port (dependent pr https://github.com/openjdk/jdk/pull/17171). > > * `frame::get_oop_map()` is moved to shared code > > * encoding / decoding details of the oopmap slot and the CodeBlob offset are moved from shared code to the platform dependent implementations of `bool NativePostCallNop::patch(int32_t oopmap_slot, int32_t cb_offset)` and `bool NativePostCallNop::decode(int32_t& oopmap_slot, int32_t& cb_offset)` > > The change passed our CI testing. JTReg tests: tier1-4 of hotspot and jdk. All of Langtools and jaxp. SPECjvm2008, SPECjbb2015, Renaissance Suite, and SAP specific tests. > All testing was done with fastdebug and release builds on the main platforms and also on Linux/PPC64le and AIX. > > EDIT 2023-12-22: Statistics > > The statistical numbers were generated with release builds. For riscv64 I used qemu. > The variance is high on all platforms. Up to 80% I think. Numbers with fastdebug are also very different. > Nevertheless, they are consistent within one run, and I'd expect errors in encoding or decoding to manifest in the numbers. > > | test/jdk/java/lang/Thread/virtual/stress/Skynet.java | x86_64: base | x86_64: pr | aarch64: base | aarch64: pr | riscv64: base | riscv64: pr | > |------------------------------------------------------|--------------|------------|---------------|-------------|---------------|-------------| > | PCN lookup success | 17517455 | 15339681 | 13179049 | 15980253 | 19400110 | 30017193 | > | PCN lookup failure | 328164 | 372555 | 237617 | 138164 | 415341 | 586476 | > | PCN decode success | 17513991 | 15336485 | 13176061 | 15977651 | 19397398 | 30014226 | > | PCN decode failure | 3464 | 3196 | 2988 | 2602 | 2712 | 2967 | > | PCN patch success | 2676 | 2465 | 2459 | 2089 | 2214 | 2259 | > | PCN patch cb offset failure | 0 | 0 | 0 | 0 | 0 | 0 | > | PCN patch oopmap slot failure | 0 | 0 | 0 | 0 | 0 | 0 | > > > | SpecJVM2008 compiler.compiler with fix iterations | x86_64: base | x8... This pull request has now been integrated. Changeset: 2e472fe7 Author: Richard Reingruber URL: https://git.openjdk.org/jdk/commit/2e472fe7ea98ca1f07a90d1ad6704e8b2bb3afcf Stats: 200 lines in 30 files changed: 51 ins; 114 del; 35 mod 8322294: Cleanup NativePostCallNop Reviewed-by: mdoerr, aph ------------- PR: https://git.openjdk.org/jdk/pull/17150 From rrich at openjdk.org Wed Jan 10 15:11:35 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 10 Jan 2024 15:11:35 GMT Subject: RFR: 8290965: PPC64: Implement post-call NOPs [v3] In-Reply-To: References: Message-ID: > #### Implementation of post call nops (PCNs) on ppc64. > > Depends on https://github.com/openjdk/jdk/pull/17150 > > About post call nops: > > - instruction(s) at return addresses of compiled java calls > - emitted iff vm continuations are enabled to support virtual threads > - encode data that can be be used to find the corresponding CodeBlob and oop map faster > - mt-safe patchable to trigger deoptimization > > Background: > > - Frames in continuation StackChunks are not visited if their compiled method is made not entrant (in contrast to frames on stack). > Instead all PCNs of the compiled method are patched to trigger deoptimization when control returns to such frames. > - With vm continuations, stacks are walked and inspected more frequently. This requires lookup of metadata like frame size and oop maps. As an optimization the offset of the CodeBlob to the PCN and the oop map slot are encoded as data in the PCN. > > Post call nops on ppc64 > > - 1 instruction, i.e. 4 bytes (either CMPI or CMPLI[1]) > x86_64: 1 instruction, 8 bytes > aarch64: 3 instruction, 12 bytes > [1] 3.1.10 Fixed Point Compare Instructions in Power ISA 3.1B > https://openpowerfoundation.org/specifications/isa/ > > - 26 bits data payload > x86_64: 32 bits; aarch64: 32 bits > - 9 bits dedicated to oop map slot. With 8 bits there where cases with SPECjvm2008 where the slot could not be encoded (on ppc64 and x86_64). > x86_64: 8 bits; aarch64: 8 bits > - 17 bits dedicated to cb offset. Effectively 19 bits due to instruction alignment. > x86_64: 24 bits; aarch64: 24 bits > - Also used when reconstructing the back chain after thawing continuation frames (see `Thaw::patch_caller_links`) > > - Refactored frame constructors to make use of fast CodeBlob lookup based on PCNs. > The fast lookup may only be used if the pc is known to be in the code cache because `CodeCache::find_blob_fast` can yield wrong results if it finds instructions outside the code cache that look just like PCNs. Callers of the frame class constructors need to pass `frame::kind::native` in that case to avoid errors. Other platforms don't make this explicit which is a problem in my eyes. Picking the wrong constructor can cause errors when porting and in future development. > > - Currently only the PCNs in nmethods are initialized. Therefore we don't even try to make a fast lookup based on PCNs if we know the CodeBlob is, e.g., a RuntimeStub. To achieve this we call the frame constructor passing `frame::kind::code_blob`. > > #### Statistics > > > | SpecJVM2008... Richard Reingruber has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: - Merge branch 'master' - Fix comment Co-authored-by: Andrew Haley - 8290965: PPC64: Implement post-call NOPs - 8322294: Cleanup NativePostCallNop ------------- Changes: https://git.openjdk.org/jdk/pull/17171/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17171&range=02 Stats: 133 lines in 13 files changed: 96 ins; 0 del; 37 mod Patch: https://git.openjdk.org/jdk/pull/17171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17171/head:pull/17171 PR: https://git.openjdk.org/jdk/pull/17171 From rrich at openjdk.org Wed Jan 10 15:17:30 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 10 Jan 2024 15:17:30 GMT Subject: RFR: 8290965: PPC64: Implement post-call NOPs [v2] In-Reply-To: References: Message-ID: On Wed, 27 Dec 2023 17:34:11 GMT, Martin Doerr wrote: >> Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix comment >> >> Co-authored-by: Andrew Haley > > src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 1191: > >> 1189: } >> 1190: // We use CMPI/CMPLI instructions to encode post call nops. >> 1191: // We set bit 9 to distinguish post call nops from real CMPI/CMPI instructions > > Should be CMPI/CMPLI. Maybe add that CMPI and CMPLI opcodes only differ in one bit which we use to encode data. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17171#discussion_r1447525235 From rrich at openjdk.org Wed Jan 10 15:22:30 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 10 Jan 2024 15:22:30 GMT Subject: RFR: 8290965: PPC64: Implement post-call NOPs [v2] In-Reply-To: References: Message-ID: <0K_jX-ImmJLihXnogDXZKQYhMn7JfdpRgRr_KVxGdcQ=.9c3d3e21-75c0-436c-978a-7907ad60ff95@github.com> On Wed, 27 Dec 2023 17:26:10 GMT, Martin Doerr wrote: > I think `kind::nmethod` should only be used if cb != nullptr which is not checked, here. Is this one performance critical? I don't quite understand: the purpose of using `kind::nmethod` is to allow for a fast lookup of the cb which is only done if cb != nullptr. See also my other response where `kind::nmethod` is default. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17171#discussion_r1447531558 From rrich at openjdk.org Wed Jan 10 15:55:23 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 10 Jan 2024 15:55:23 GMT Subject: RFR: 8290965: PPC64: Implement post-call NOPs [v2] In-Reply-To: <0K_jX-ImmJLihXnogDXZKQYhMn7JfdpRgRr_KVxGdcQ=.9c3d3e21-75c0-436c-978a-7907ad60ff95@github.com> References: <0K_jX-ImmJLihXnogDXZKQYhMn7JfdpRgRr_KVxGdcQ=.9c3d3e21-75c0-436c-978a-7907ad60ff95@github.com> Message-ID: On Wed, 10 Jan 2024 15:19:38 GMT, Richard Reingruber wrote: > Is this one performance critical? This is a good question. Honestly I have difficulties understanding why PCNs should be performance critical at all. AFAIK frames are only iterated on the slow path when freezing/thawing. Maybe the slow path is not that uncommen, e.g. if StackChunks are visited by GC. I wanted to use `kind::nmethod` as default whenever possible in order not to miss a place that actually is performance critical. See also https://github.com/openjdk/jdk/pull/8955#issuecomment-1142317441 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17171#discussion_r1447580465 From sviswanathan at openjdk.org Wed Jan 10 16:23:26 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 10 Jan 2024 16:23:26 GMT Subject: RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp [v2] In-Reply-To: References: Message-ID: On Tue, 9 Jan 2024 18:17:33 GMT, Emanuel Peter wrote: >> Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: >> >> review comments > > https://bugs.openjdk.org/secure/attachment/107681/Test_276.java > This is the regression test of the bug that is closed as duplicate of your issue, am I correct? > This is the duplicate bug: https://bugs.openjdk.org/browse/JDK-8322090 > > Fails with: `assert(regs[i] != regs[j]) failed: Multiple uses of register: xmm3` > > You need to at least verify if this bug is fixed with your patch, otherwise we would need to re-open it, since it would not be a duplicate. > > Culpable node seems to be: > `7274 MaxD === _ 363 363 [[ 4874 ]] !jvms: Test_276::mainTest @ bci:291 (line 1084)` Thanks a lot @eme64 and @TobiHartmann for the review, I will wait for the test results before integrating. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17315#issuecomment-1885172150 From kxu at openjdk.org Wed Jan 10 16:37:44 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 10 Jan 2024 16:37:44 GMT Subject: RFR: 8320237 C2: late inlining of method handle invoke causes duplicate lines in PrintInlining output [v2] In-Reply-To: References: Message-ID: > This PR resolves [JDK-8320237](https://bugs.openjdk.org/browse/JDK-8320237) > > The original behavior produces both a failure and success message upon late inlining which is confusing. The patch removes the failure message if inlining was successful. Huge thanks to @rwestrel for reporting and working out a solution. > > Unit test `test/hotspot/jtreg/compiler/inlining/TestDuplicatedLateInliningOutput.java` is added and passing. Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: fix VM crashes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17147/files - new: https://git.openjdk.org/jdk/pull/17147/files/3e53d03a..94d78fa1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17147&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17147&range=00-01 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17147.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17147/head:pull/17147 PR: https://git.openjdk.org/jdk/pull/17147 From duke at openjdk.org Wed Jan 10 16:57:47 2024 From: duke at openjdk.org (Zhiqiang Zang) Date: Wed, 10 Jan 2024 16:57:47 GMT Subject: RFR: 8322077: Add Ideal transformation: (~a) | (~b) => ~(a & b) [v13] In-Reply-To: References: Message-ID: > Hello, > > (~a) | (~b) => ~(a & b) is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1617C28-L1617C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. > > Thanks. Zhiqiang Zang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 34 commits: - Merge branch 'master' of https://github.com/openjdk/jdk into ornode-PNewDeMorganLawOrToAnd - update copyright dates. - Merge branch 'andnode-PNewDeMorganLawAndToOr' into ornode-PNewDeMorganLawOrToAnd - Revert "move the two helper functions to member functions of the node class." This reverts commit 7a962d69ac687c0476e54cd004037f2ebb0800ba. - Revert "update copyright dates." This reverts commit 3665de2f3d38b4fce3ba968ec6247dd2576ecc32. - Revert "move make_not back to AddNode because it cannot compile for architecture arm, s390x, ppc64le." This reverts commit 4ee8b089b6f119b1692dcd2c512d249d92734697. - Revert "adapt changes from the dependent pr." This reverts commit 0c8d10776fe55a28ea9f60160b4f6f57aabad5ab. - Revert "adapt to new changes from the dependant pr." This reverts commit b21e242b306d9cf01e1ba84ebba47ef5b471d98e. - adapt to new changes from the dependant pr. - Merge branch 'andnode-PNewDeMorganLawAndToOr' into ornode-PNewDeMorganLawOrToAnd - ... and 24 more: https://git.openjdk.org/jdk/compare/b86c3b7a...f908668b ------------- Changes: https://git.openjdk.org/jdk/pull/16334/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=16334&range=12 Stats: 369 lines in 5 files changed: 369 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/16334.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/16334/head:pull/16334 PR: https://git.openjdk.org/jdk/pull/16334 From cslucas at openjdk.org Wed Jan 10 17:24:06 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 10 Jan 2024 17:24:06 GMT Subject: RFR: JDK-8323190: Segfault during deoptimization of C2-compiled code [v2] In-Reply-To: References: Message-ID: > Currently, if `ReduceAllocationMerges` reduces an allocation merge that is used as a monitor C2 will SIGFAULT in `Process_OopMap_Node` because it's missing code to handle that case. This patch fixes C2 to properly handle reduced allocation merges that are used as monitors. > > Tested with Linux x86_64 hotspot_all. Cesar Soares Lucas has updated the pull request incrementally with three additional commits since the last revision: - Update test/hotspot/jtreg/compiler/escapeAnalysis/TestInvalidLocation.java Co-authored-by: Tobias Hartmann - Update src/hotspot/share/opto/output.cpp Co-authored-by: Tobias Hartmann - Update test/hotspot/jtreg/compiler/escapeAnalysis/TestInvalidLocation.java Co-authored-by: Tobias Hartmann ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17333/files - new: https://git.openjdk.org/jdk/pull/17333/files/95fe08dd..5e2f0089 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17333&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17333&range=00-01 Stats: 4 lines in 2 files changed: 1 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/17333.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17333/head:pull/17333 PR: https://git.openjdk.org/jdk/pull/17333 From phh at openjdk.org Wed Jan 10 17:32:25 2024 From: phh at openjdk.org (Paul Hohensee) Date: Wed, 10 Jan 2024 17:32:25 GMT Subject: RFR: 8322982: CTW fails to build after 8308753 [v3] In-Reply-To: References: Message-ID: On Mon, 8 Jan 2024 20:08:37 GMT, Xin Liu wrote: >> This patch fixes the build error of CTW by sidelining ModuleInfoWriter.java. ModuleInfoWriter uses Class-File API, which has transitioned to preview. >> If we really need to compile it, we have to append --enable-preview and --source N. >> >> The fact is CTW itself doesn't depend on ModuleInfoWriter. I think it's easier to maintain CTW if we filter it out in Makefile. > > Xin Liu has updated the pull request incrementally with one additional commit since the last revision: > > Wording and also remove add-modules required by ModuleInfoWriter.java Marked as reviewed by phh (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/17292#pullrequestreview-1813751239 From qamai at openjdk.org Wed Jan 10 18:08:39 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 10 Jan 2024 18:08:39 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v5] In-Reply-To: References: Message-ID: <6SOOtHaZrEN8UIKKLHBJniz_nUvZjRvV0TLofW7Xjxk=.debaf12a-b7c8-4b04-b217-3b8cd9b3c6f5@github.com> > Hi, > > This patch adds unsigned bounds and known bits constraints to `TypeInt` and `TypeLong`. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. The new constraints are applied to identity and value calls of the common nodes (Add, Sub, L/R/URShift, And, Or, Xor, bit counting, Cmp, Bool, ConvI2L/L2I), the detailed ideas for each node will be presented below. > > In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (~x & ones) == 0`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must normalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. > > Please kindly review, thanks very much. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-4 Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 27 commits: - Merge branch 'master' into improvevalue - Merge branch 'master' into improvevalue - improve add/sub implementation - Merge branch 'master' into improvevalue - typo - whitespace - fix tests for x86_32 - fix widen of ConvI2L - problem lists - format - ... and 17 more: https://git.openjdk.org/jdk/compare/f0169341...843ad076 ------------- Changes: https://git.openjdk.org/jdk/pull/15440/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15440&range=04 Stats: 3692 lines in 35 files changed: 1895 ins; 1235 del; 562 mod Patch: https://git.openjdk.org/jdk/pull/15440.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15440/head:pull/15440 PR: https://git.openjdk.org/jdk/pull/15440 From jbhateja at openjdk.org Wed Jan 10 18:09:29 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 10 Jan 2024 18:09:29 GMT Subject: RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp [v5] In-Reply-To: <-rzXk3WJ9aOgrgoyIVDpCeiQLBkq5N5yQCYp_42oMEo=.df74c3a7-508d-48a1-baa2-796a2015bea1@github.com> References: <-rzXk3WJ9aOgrgoyIVDpCeiQLBkq5N5yQCYp_42oMEo=.df74c3a7-508d-48a1-baa2-796a2015bea1@github.com> Message-ID: On Tue, 9 Jan 2024 23:46:45 GMT, Sandhya Viswanathan wrote: >> The C2_MacroAssembler::vminmax_fp and evminmax_fp were incorrectly checking for the two USE operand registers to be different and thus resulting in assertion failure. >> >> In x86_64.ad: >> >> instruct maxF_reg(legRegF dst, legRegF a, legRegF b, legRegF tmp, legRegF atmp, legRegF btmp) %{ >> ... >> effect(USE a, USE b, TEMP tmp, TEMP atmp, TEMP btmp); >> ... >> __ vminmax_fp(Op_MaxV, T_FLOAT, $dst$$XMMRegister, $a$$XMMRegister, $b$$XMMRegister, $tmp$$XMMRegister, $atmp$$XMMRegister, $btmp$$XMMRegister, Assembler::AVX_128bit); >> %} >> >> >> Changing the assert in vminmax_fp from: >> assert_different_registers(a, b, tmp, atmp, btmp); >> to: >> assert_different_registers(a, tmp, atmp, btmp); >> assert_different_registers(b, tmp, atmp, btmp); >> fixes the issue. >> >> Similar change done in evminmax_fp. >> >> Please review. >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > copyright year update Thanks for filing RFE, LGTM ------------- Marked as reviewed by jbhateja (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17315#pullrequestreview-1813816523 From sviswanathan at openjdk.org Wed Jan 10 18:12:28 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 10 Jan 2024 18:12:28 GMT Subject: RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp [v5] In-Reply-To: <8ZGiFoB4TkRgQSP67ekJ_Tw_uMnEyVNdU9GSa4bx69M=.f252a9b8-367c-49e6-916e-48dd0e6e936e@github.com> References: <8ZGiFoB4TkRgQSP67ekJ_Tw_uMnEyVNdU9GSa4bx69M=.f252a9b8-367c-49e6-916e-48dd0e6e936e@github.com> Message-ID: On Tue, 9 Jan 2024 15:14:33 GMT, Jatin Bhateja wrote: >> Should we "short cut" code when registers are the same? > >> Should we "short cut" code when registers are the same? > > Hi @sviswa7 , An identity transformation may be useful here to prevent generating MaxF/D in case both the arguments are same. Thanks a lot @jatin-bhateja for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17315#issuecomment-1885368974 From cslucas at openjdk.org Wed Jan 10 18:14:24 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 10 Jan 2024 18:14:24 GMT Subject: RFR: JDK-8323190: Segfault during deoptimization of C2-compiled code [v2] In-Reply-To: References: Message-ID: On Wed, 10 Jan 2024 08:57:58 GMT, Christian Hagedorn wrote: >> Cesar Soares Lucas has updated the pull request incrementally with three additional commits since the last revision: >> >> - Update test/hotspot/jtreg/compiler/escapeAnalysis/TestInvalidLocation.java >> >> Co-authored-by: Tobias Hartmann >> - Update src/hotspot/share/opto/output.cpp >> >> Co-authored-by: Tobias Hartmann >> - Update test/hotspot/jtreg/compiler/escapeAnalysis/TestInvalidLocation.java >> >> Co-authored-by: Tobias Hartmann > > src/hotspot/share/opto/output.cpp line 1092: > >> 1090: ObjectMergeValue* mv = (ObjectMergeValue*) sv_for_node_id(objs, smerge->_idx); >> 1091: >> 1092: if (mv == NULL) { > > You should replace `NULL` with `nullptr` here and below. This also seems wrong here where you took the code from: > https://github.com/openjdk/jdk/blob/88378ed0584c7eb0849b6fc1e361fd8ea0698caf/src/hotspot/share/opto/output.cpp#L775-L796 > > On a separate note, the code looks almost identical. Could it be shared somehow? Thank you for reviewing @chhagedorn. I've converted the NULLs to nullptrs. However, I'll defer the refactoring of the identical code to a RFE - mainly because I'll have to backport the current patch and I'd like to keep it as minimal as possible. Please let me know if you disagree. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17333#discussion_r1447752271 From cslucas at openjdk.org Wed Jan 10 18:20:43 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 10 Jan 2024 18:20:43 GMT Subject: RFR: JDK-8323190: Segfault during deoptimization of C2-compiled code [v3] In-Reply-To: <6jsDdUuD6X9u8CT-4SUN_r2VUgPA4QuPCMLQyz-myaY=.52f1830e-0c87-44bf-931a-c272cabb74b4@github.com> References: <6jsDdUuD6X9u8CT-4SUN_r2VUgPA4QuPCMLQyz-myaY=.52f1830e-0c87-44bf-931a-c272cabb74b4@github.com> Message-ID: On Wed, 10 Jan 2024 08:00:38 GMT, Tobias Hartmann wrote: > It's concerning though that we don't have any other test covering this. Would it make sense to extend AllocationMergesTests.java to cover some more variants? Thank you for reviewing @TobiHartmann ! I think `AllocationMergesTests.java` isn't the ideal place for these tests. The tests in `AllocationMergesTests.java` are for checking the IR shape after the optimization, the current issue was actually because of a problem emitting debug info for a compilation unit - it's not something that we can capture with the IR-framework I believe. In this other PR (https://github.com/openjdk/jdk/pull/15825) I have a test file that I think will be more appropriate for this kind of test. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17333#issuecomment-1885378367 From cslucas at openjdk.org Wed Jan 10 18:20:42 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 10 Jan 2024 18:20:42 GMT Subject: RFR: JDK-8323190: Segfault during deoptimization of C2-compiled code [v3] In-Reply-To: References: Message-ID: > Currently, if `ReduceAllocationMerges` reduces an allocation merge that is used as a monitor C2 will SIGFAULT in `Process_OopMap_Node` because it's missing code to handle that case. This patch fixes C2 to properly handle reduced allocation merges that are used as monitors. > > Tested with Linux x86_64 hotspot_all. Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: Convert NULL to nullptr. Remove type cast. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17333/files - new: https://git.openjdk.org/jdk/pull/17333/files/5e2f0089..8c21a4b5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17333&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17333&range=01-02 Stats: 7 lines in 2 files changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/17333.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17333/head:pull/17333 PR: https://git.openjdk.org/jdk/pull/17333 From kvn at openjdk.org Wed Jan 10 18:29:25 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 10 Jan 2024 18:29:25 GMT Subject: RFR: 8322743: assert(held_monitor_count() == jni_monitor_count()) failed In-Reply-To: References: Message-ID: On Wed, 10 Jan 2024 02:07:29 GMT, Dean Long wrote: > I'm wondering if there is a simpler solution. What if in `Parse::load_interpreter_state` we maark the lock objects from the interpreter as global escape? Thank you, Dean, for looking on changes. You are correct, we can mark created `BoxLock` node in `Parse::load_interpreter_state` as having escaped object. But in general case it could be only dead path where such object is referenced. Also it could be other cases where EA think that object escapes on one of paths. I wanted to check graph only after some transformations which happens before EA and use EA analysis to find escaped objects. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17331#issuecomment-1885396689 From kvn at openjdk.org Wed Jan 10 18:55:27 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 10 Jan 2024 18:55:27 GMT Subject: RFR: 8322735: C2: minor improvements of bubble sort used in SuperWord::packset_sort [v3] In-Reply-To: References: Message-ID: On Tue, 9 Jan 2024 05:26:43 GMT, Denghui Dong wrote: >> A minor improvement could be made for bubble sort in SuperWord::packset_sort to reduce the comparison count in bad cases. >> >> See https://en.wikipedia.org/wiki/Bubble_sort > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > update Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17190#pullrequestreview-1813896857 From jbhateja at openjdk.org Wed Jan 10 19:20:26 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 10 Jan 2024 19:20:26 GMT Subject: RFR: 8322768: Optimize non-subword vector compress and expand APIs for AVX2 target. [v5] In-Reply-To: References: <_udOCEVG86x9V_WvYqFTaYnvmXdiZ7LzqxzR-D_ygYs=.db3ed37e-a8fa-413b-83c6-a785aba072ff@github.com> Message-ID: On Tue, 9 Jan 2024 16:48:56 GMT, Jatin Bhateja wrote: >> Hi, >> >> Patch optimizes non-subword vector compress and expand APIs for x86 AVX2 only targets. >> Upcoming E-core Xeons (Sierra Forest) and Hybrid CPUs only support AVX2 instruction set. >> These are very frequently used APIs in columnar database filter operation. >> >> Implementation uses a lookup table to record permute indices. Table index is computed using >> mask argument of compress/expand operation. >> >> Following are the performance number of JMH micro included with the patch. >> >> >> System : Intel(R) Xeon(R) Platinum 8480+ (Sapphire Rapids) >> >> Baseline: >> Benchmark (size) Mode Cnt Score Error Units >> ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 142.767 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 71.436 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 35.992 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 182.151 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 91.096 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 44.757 ops/ms >> ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 184.099 ops/ms >> ColumnFilterBenchmark.filterIntColumn 2047 thrpt 2 91.981 ops/ms >> ColumnFilterBenchmark.filterIntColumn 4096 thrpt 2 45.170 ops/ms >> ColumnFilterBenchmark.filterLongColumn 1024 thrpt 2 148.017 ops/ms >> ColumnFilterBenchmark.filterLongColumn 2047 thrpt 2 73.516 ops/ms >> ColumnFilterBenchmark.filterLongColumn 4096 thrpt 2 36.844 ops/ms >> >> Withopt: >> Benchmark (size) Mode Cnt Score Error Units >> ColumnFilterBenchmark.filterDoubleColumn 1024 thrpt 2 2051.707 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 2047 thrpt 2 914.072 ops/ms >> ColumnFilterBenchmark.filterDoubleColumn 4096 thrpt 2 489.898 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 1024 thrpt 2 5324.195 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 2047 thrpt 2 2587.229 ops/ms >> ColumnFilterBenchmark.filterFloatColumn 4096 thrpt 2 1278.665 ops/ms >> ColumnFilterBenchmark.filterIntColumn 1024 thrpt 2 4149.384 ops/ms >> ColumnFilterBenchmark.filterIntColumn 2047 thrpt ... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Using emulated variable blend E-Core optimized instruction. Following are the performance numbers for existing Vector API JMH micro benchmark over Meteor Lake - Crestmont E-cores. ![image](https://github.com/openjdk/jdk/assets/59989778/dab762f8-2379-4fcf-90da-f765e907c6c1) ------------- PR Comment: https://git.openjdk.org/jdk/pull/17261#issuecomment-1885525420 From xliu at openjdk.org Wed Jan 10 19:44:31 2024 From: xliu at openjdk.org (Xin Liu) Date: Wed, 10 Jan 2024 19:44:31 GMT Subject: Integrated: 8322982: CTW fails to build after 8308753 In-Reply-To: References: Message-ID: On Mon, 8 Jan 2024 03:19:52 GMT, Xin Liu wrote: > This patch fixes the build error of CTW by sidelining ModuleInfoWriter.java. ModuleInfoWriter uses Class-File API, which has transitioned to preview. > If we really need to compile it, we have to append --enable-preview and --source N. > > The fact is CTW itself doesn't depend on ModuleInfoWriter. I think it's easier to maintain CTW if we filter it out in Makefile. This pull request has now been integrated. Changeset: d89602a5 Author: Xin Liu URL: https://git.openjdk.org/jdk/commit/d89602a53f173e4fc1e0aa10bb0ffdf7232456cb Stats: 8 lines in 1 file changed: 1 ins; 4 del; 3 mod 8322982: CTW fails to build after 8308753 Reviewed-by: shade, phh ------------- PR: https://git.openjdk.org/jdk/pull/17292 From duke at openjdk.org Wed Jan 10 20:32:25 2024 From: duke at openjdk.org (Zhiqiang Zang) Date: Wed, 10 Jan 2024 20:32:25 GMT Subject: RFR: 8322077: Add Ideal transformation: (~a) | (~b) => ~(a & b) [v4] In-Reply-To: References: <5KKTYmY7dJ4nW0OEQ2UPuloIWOYc-pI9M8HRjoaRzw4=.f5eda6f9-c38c-4a2b-9690-cbb1791a2622@github.com> <2yuELMZxtnZVtWL2rhdtdzwIjg0tgZEi0csMHasBXtQ=.3e62b735-b4e5-4bfe-9594-bfb66e19205c@github.com> Message-ID: On Wed, 10 Jan 2024 07:47:35 GMT, Emanuel Peter wrote: >> @eme64 @TobiHartmann Thanks for the comments. All addressed. >> >> I rebased this PR onto #16333 so I was able to add these tests for using both optimizations. (the history was messed up). > > @CptGit can you merge from master again, please? It looks now like you are pushing both the changes here and the ones from your previous PR. Once you did that, I'd like to run some testing before we push this. @eme64 Yes I merged. Looks clean now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16334#issuecomment-1885669143 From kxu at openjdk.org Wed Jan 10 22:23:23 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 10 Jan 2024 22:23:23 GMT Subject: RFR: 8320237 C2: late inlining of method handle invoke causes duplicate lines in PrintInlining output In-Reply-To: References: Message-ID: On Wed, 10 Jan 2024 08:13:46 GMT, Tobias Hartmann wrote: >> This PR resolves [JDK-8320237](https://bugs.openjdk.org/browse/JDK-8320237) >> >> The original behavior produces both a failure and success message upon late inlining which is confusing. The patch removes the failure message if inlining was successful. Huge thanks to @rwestrel for reporting and working out a solution. >> >> Unit test `test/hotspot/jtreg/compiler/inlining/TestDuplicatedLateInliningOutput.java` is added and passing. > > `compiler/print/PrintInlining.java` fails with `-ea -esa -XX:CompileThreshold=100 -XX:+UnlockExperimentalVMOptions -server -XX:+TieredCompilation`: > > > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (workspace/open/src/hotspot/share/opto/compile.cpp:4601), pid=418042, tid=418058 > # assert(_print_inlining_stream->size() > 0) failed: missing inlining msg > # > # JRE version: Java(TM) SE Runtime Environment (23.0) (fastdebug build 23-internal-2024-01-10-0732483.tobias.hartmann.jdk2) > # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 23-internal-2024-01-10-0732483.tobias.hartmann.jdk2, compiled mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) > # Problematic frame: > # V [libjvm.so+0x9db4e6] Compile::print_inlining_update_delayed(CallGenerator*)+0x1c6 > > Current CompileTask: > C2:643 171 b 4 java.lang.String::substring (58 bytes) > > Stack: [0x00007f59706d4000,0x00007f59707d4000], sp=0x00007f59707cf220, free space=1004k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.so+0x9db4e6] Compile::print_inlining_update_delayed(CallGenerator*)+0x1c6 (compile.cpp:4601) > V [libjvm.so+0x844d7e] CallGenerator::do_late_inline_helper()+0x8ee (callGenerator.cpp:687) > V [libjvm.so+0x9e1a52] Compile::inline_boxing_calls(PhaseIterGVN&)+0xc2 (compile.cpp:2026) > V [libjvm.so+0x9e42e3] Compile::Optimize()+0x583 (compile.cpp:2276) > V [libjvm.so+0x9e81a4] Compile::Compile(ciEnv*, ciMethod*, int, Options, DirectiveSet*)+0x1b44 (compile.cpp:860) > V [libjvm.so+0x83d245] C2Compiler::compile_method(ciEnv*, ciMethod*, int, bool, DirectiveSet*)+0x1d5 (c2compiler.cpp:142) > V [libjvm.so+0x9f3bbc] CompileBroker::invoke_compiler_on_method(CompileTask*)+0x92c (compileBroker.cpp:2299) > V [libjvm.so+0x9f4848] CompileBroker::compiler_thread_loop()+0x468 (compileBroker.cpp:1958) > V [libjvm.so+0xeb98ec] JavaThread::thread_main_inner()+0xcc (javaThread.cpp:721) > V [libjvm.so+0x179b586] Thread::call_run()+0xb6 (thread.cpp:220) > V [libjvm.so+0x14a8d47] thread_native_entry(Thread*)+0x127 (os_linux.cpp:789) > > > `compiler/cha/StrengthReduceInterfaceCall.java` and `compiler/ciReplay/TestIncrementalInlining.java` fail as well. @TobiHartmann Thanks for the report. The tests were crashing in fastdebug config with or without those specific flags. The latest commit should fix the problem. Please take a look. Thank you very much. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17147#issuecomment-1885834913 From dlong at openjdk.org Wed Jan 10 23:05:21 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 10 Jan 2024 23:05:21 GMT Subject: RFR: 8322743: assert(held_monitor_count() == jni_monitor_count()) failed In-Reply-To: References: Message-ID: On Tue, 9 Jan 2024 22:57:27 GMT, Vladimir Kozlov wrote: > Corner case with a local (not escaped) object used for synchronization. C2 Escape Analysis thinks that it can eliminate locks for it. In most cases it is true but not in this case. > > > for (int i = 0; i < 2; ++i) { > Object o = new Object(); > synchronized (o) { // monitorenter > // Trigger OSR compilation > for (int j = 0; j < 100_000; ++j) { > > The test has nested loop which trigger OSR compilation. The locked object comes from Interpreter into compiled OSR code. During parsing C2 creates an other non escaped object and correctly merge both together (with Phi node) so that non escaped object is not scalar replaceable. Because it does not globally escapes EA still removes locks for it and, as result, also for merged locked object from Interpreter which is the bug. > > The fix is to check that synchronized block does not have any associated escaped objects when EA decides if locks can be eliminated. > > Added regression test prepared by @TobiHartmann. Tested tier1-5, xcomp and stress. > Performance testing show no difference. I was thinking that the OSR situation is similar to this: for (int i = 0; i < 2; ++i) { Object o = osr ? static_volatile_field /* black hole, can't eliminate */ : new Object() /* can eliminate */; synchronized (o) { // monitorenter // Trigger OSR compilation for (int j = 0; j < 100_000; ++j) { but maybe we can do better. If C2 can eliminate allocations/locks for non-escaping objects, and that works in one direction C2 --> interpreter (deopt), then the reverse direction, interpreter --> C2 (OSR) might also be made to work. In other words, I think we could eliminate the lock, even in the OSR case. We know from EA that the object coming from the interpreter does not escape, so if load_interpreter_state did the reverse of deopt, we would end up with a scalar-replaced object. Deopt does scalar-replaced object --> materialized, so OSR would need to do materialized --> scalar-replaced object. The fields of the scalar-replaced object would be populated from the fields of the interpreter object, but ignoring fields with a default (0) value. Assuming I'm right, and this could work, that doesn't mean it's worth doing. I'm just throwing this idea out mostly for completeness. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17331#issuecomment-1885884165 From dlong at openjdk.org Wed Jan 10 23:45:22 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 10 Jan 2024 23:45:22 GMT Subject: RFR: 8322743: assert(held_monitor_count() == jni_monitor_count()) failed In-Reply-To: References: Message-ID: On Tue, 9 Jan 2024 22:57:27 GMT, Vladimir Kozlov wrote: > Corner case with a local (not escaped) object used for synchronization. C2 Escape Analysis thinks that it can eliminate locks for it. In most cases it is true but not in this case. > > > for (int i = 0; i < 2; ++i) { > Object o = new Object(); > synchronized (o) { // monitorenter > // Trigger OSR compilation > for (int j = 0; j < 100_000; ++j) { > > The test has nested loop which trigger OSR compilation. The locked object comes from Interpreter into compiled OSR code. During parsing C2 creates an other non escaped object and correctly merge both together (with Phi node) so that non escaped object is not scalar replaceable. Because it does not globally escapes EA still removes locks for it and, as result, also for merged locked object from Interpreter which is the bug. > > The fix is to check that synchronized block does not have any associated escaped objects when EA decides if locks can be eliminated. > > Added regression test prepared by @TobiHartmann. Tested tier1-5, xcomp and stress. > Performance testing show no difference. Nevermind, object fields from the interpreter could have any value, so my idea doesn't work. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17331#issuecomment-1885931501 From kvn at openjdk.org Thu Jan 11 00:01:21 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 11 Jan 2024 00:01:21 GMT Subject: RFR: 8322743: assert(held_monitor_count() == jni_monitor_count()) failed In-Reply-To: References: Message-ID: On Tue, 9 Jan 2024 22:57:27 GMT, Vladimir Kozlov wrote: > Corner case with a local (not escaped) object used for synchronization. C2 Escape Analysis thinks that it can eliminate locks for it. In most cases it is true but not in this case. > > > for (int i = 0; i < 2; ++i) { > Object o = new Object(); > synchronized (o) { // monitorenter > // Trigger OSR compilation > for (int j = 0; j < 100_000; ++j) { > > The test has nested loop which trigger OSR compilation. The locked object comes from Interpreter into compiled OSR code. During parsing C2 creates an other non escaped object and correctly merge both together (with Phi node) so that non escaped object is not scalar replaceable. Because it does not globally escapes EA still removes locks for it and, as result, also for merged locked object from Interpreter which is the bug. > > The fix is to check that synchronized block does not have any associated escaped objects when EA decides if locks can be eliminated. > > Added regression test prepared by @TobiHartmann. Tested tier1-5, xcomp and stress. > Performance testing show no difference. "We know from EA that the object coming from the interpreter does not escape" - we don't know what happens in Interpreter to this object. There is no information where this object is coming from (no method and no bci info). We only know that we have monitor at slot 0 which uses this object. Yes, we can do bytecode analysis to determine that but it is a lot more code. There could be other, more complicated, ways to remove locks for this case. I was thinking about splitting `unlock(obj)` through Phi node to keep separate `unlock` for object coming from Interpreter. Unfortunately it is not enough. We need also to keep separate synchronization blocks defined by BoxLock node. Otherwise we still eliminate all locks/unlocks during locks elimination [macro.cpp#L1946](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/macro.cpp#L1946). Note, we can't eliminate only part of locks/unlocks associated with one synchronization block. Otherwise we can't guarantee that we have balanced locks and unlocks (we had bugs about it). So we either eliminate or keep all of them. I think my fix is conservative solution for this issue. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17331#issuecomment-1885946951 From vlivanov at openjdk.org Thu Jan 11 00:17:23 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 11 Jan 2024 00:17:23 GMT Subject: RFR: 8322743: assert(held_monitor_count() == jni_monitor_count()) failed In-Reply-To: References: Message-ID: On Wed, 10 Jan 2024 23:58:36 GMT, Vladimir Kozlov wrote: > I think my fix is conservative solution for this issue. It's still not clear to me why conservatively marking all objects coming from interpreter as globally escaped wouldn't work (what Dean initially proposed). My reading of your response is that it may be way too conservative: > But in general case it could be only dead path where such object is referenced. Is it your main concern? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17331#issuecomment-1885966563 From kvn at openjdk.org Thu Jan 11 00:38:21 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 11 Jan 2024 00:38:21 GMT Subject: RFR: 8322743: assert(held_monitor_count() == jni_monitor_count()) failed In-Reply-To: References: Message-ID: On Thu, 11 Jan 2024 00:14:38 GMT, Vladimir Ivanov wrote: > > I think my fix is conservative solution for this issue. > > It's still not clear to me why conservatively marking all objects coming from interpreter as globally escaped wouldn't work (what Dean initially proposed). It would work only for this OSR case. > > My reading of your response is that it may be way too conservative: > > > But in general case it could be only dead path where such object is referenced. > > Is it your main concern? First, I am concern that marking synchronization region as `has_escaped_object` during parsing when we load OSR state could be premature and later we can still eliminate locks if we don't do that. That was my comment about dead path. Second, marking during OSR load could be not enough. We may get an escaped locked object not only in such case. And **not** checking all objects in EA will miss it. Which may be not true and I am paranoid. I think my fix cover all cases. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17331#issuecomment-1885987490 From qamai at openjdk.org Thu Jan 11 03:23:31 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 11 Jan 2024 03:23:31 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v5] In-Reply-To: <6SOOtHaZrEN8UIKKLHBJniz_nUvZjRvV0TLofW7Xjxk=.debaf12a-b7c8-4b04-b217-3b8cd9b3c6f5@github.com> References: <6SOOtHaZrEN8UIKKLHBJniz_nUvZjRvV0TLofW7Xjxk=.debaf12a-b7c8-4b04-b217-3b8cd9b3c6f5@github.com> Message-ID: On Wed, 10 Jan 2024 18:08:39 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch adds unsigned bounds and known bits constraints to `TypeInt` and `TypeLong`. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. The new constraints are applied to identity and value calls of the common nodes (Add, Sub, L/R/URShift, And, Or, Xor, bit counting, Cmp, Bool, ConvI2L/L2I), the detailed ideas for each node will be presented below. >> >> In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (~x & ones) == 0`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must normalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. >> >> Please kindly review, thanks very much. >> >> Testing >> >> - [x] GHA >> - [x] Linux x64, tier 1-4 > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 27 commits: > > - Merge branch 'master' into improvevalue > - Merge branch 'master' into improvevalue > - improve add/sub implementation > - Merge branch 'master' into improvevalue > - typo > - whitespace > - fix tests for x86_32 > - fix widen of ConvI2L > - problem lists > - format > - ... and 17 more: https://git.openjdk.org/jdk/compare/f0169341...843ad076 May someone give their opinion on this PR, please? Thanks a lot. ------------- PR Comment: https://git.openjdk.org/jdk/pull/15440#issuecomment-1886159291 From thartmann at openjdk.org Thu Jan 11 06:52:23 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 11 Jan 2024 06:52:23 GMT Subject: RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp [v5] In-Reply-To: <-rzXk3WJ9aOgrgoyIVDpCeiQLBkq5N5yQCYp_42oMEo=.df74c3a7-508d-48a1-baa2-796a2015bea1@github.com> References: <-rzXk3WJ9aOgrgoyIVDpCeiQLBkq5N5yQCYp_42oMEo=.df74c3a7-508d-48a1-baa2-796a2015bea1@github.com> Message-ID: On Tue, 9 Jan 2024 23:46:45 GMT, Sandhya Viswanathan wrote: >> The C2_MacroAssembler::vminmax_fp and evminmax_fp were incorrectly checking for the two USE operand registers to be different and thus resulting in assertion failure. >> >> In x86_64.ad: >> >> instruct maxF_reg(legRegF dst, legRegF a, legRegF b, legRegF tmp, legRegF atmp, legRegF btmp) %{ >> ... >> effect(USE a, USE b, TEMP tmp, TEMP atmp, TEMP btmp); >> ... >> __ vminmax_fp(Op_MaxV, T_FLOAT, $dst$$XMMRegister, $a$$XMMRegister, $b$$XMMRegister, $tmp$$XMMRegister, $atmp$$XMMRegister, $btmp$$XMMRegister, Assembler::AVX_128bit); >> %} >> >> >> Changing the assert in vminmax_fp from: >> assert_different_registers(a, b, tmp, atmp, btmp); >> to: >> assert_different_registers(a, tmp, atmp, btmp); >> assert_different_registers(b, tmp, atmp, btmp); >> fixes the issue. >> >> Similar change done in evminmax_fp. >> >> Please review. >> >> Best Regards, >> Sandhya > > Sandhya Viswanathan has updated the pull request incrementally with one additional commit since the last revision: > > copyright year update Testing all passed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17315#issuecomment-1886401306 From chagedorn at openjdk.org Thu Jan 11 07:50:23 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 11 Jan 2024 07:50:23 GMT Subject: RFR: JDK-8323190: Segfault during deoptimization of C2-compiled code [v3] In-Reply-To: References: Message-ID: <-BFrEysV5oZgA6gf67EwVlmFC5Lkpv7V-N5HvQ6B_sI=.b9e08a64-8b6f-418e-8810-f908cbeb68c5@github.com> On Wed, 10 Jan 2024 18:11:14 GMT, Cesar Soares Lucas wrote: >> src/hotspot/share/opto/output.cpp line 1092: >> >>> 1090: ObjectMergeValue* mv = (ObjectMergeValue*) sv_for_node_id(objs, smerge->_idx); >>> 1091: >>> 1092: if (mv == NULL) { >> >> You should replace `NULL` with `nullptr` here and below. This also seems wrong here where you took the code from: >> https://github.com/openjdk/jdk/blob/88378ed0584c7eb0849b6fc1e361fd8ea0698caf/src/hotspot/share/opto/output.cpp#L775-L796 >> >> On a separate note, the code looks almost identical. Could it be shared somehow? > > Thank you for reviewing @chhagedorn. I've converted the NULLs to nullptrs. However, I'll defer the refactoring of the identical code to a RFE - mainly because I'll have to backport the current patch and I'd like to keep it as minimal as possible. Please let me know if you disagree. That's perfectly fine, especially since the code for the `is_SafePointScalarObject()` case was already duplicated before. So, we could change both in one go in a follow-up RFE. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17333#discussion_r1448432678 From chagedorn at openjdk.org Thu Jan 11 07:54:24 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 11 Jan 2024 07:54:24 GMT Subject: RFR: JDK-8323190: Segfault during deoptimization of C2-compiled code [v3] In-Reply-To: References: Message-ID: <5kN_x85owu9QMT9jIpfo-URWZFpJjUuaL97fVQfo1Zk=.c8bc54ac-8c2a-4ad8-a09e-a87b6dc354d9@github.com> On Wed, 10 Jan 2024 18:20:42 GMT, Cesar Soares Lucas wrote: >> Currently, if `ReduceAllocationMerges` reduces an allocation merge that is used as a monitor C2 will SIGFAULT in `Process_OopMap_Node` because it's missing code to handle that case. This patch fixes C2 to properly handle reduced allocation merges that are used as monitors. >> >> Tested with Linux x86_64 hotspot_all. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Convert NULL to nullptr. Remove type cast. Update looks good, thanks. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17333#pullrequestreview-1814889165 From epeter at openjdk.org Thu Jan 11 08:25:36 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 11 Jan 2024 08:25:36 GMT Subject: RFR: 8322077: Add Ideal transformation: (~a) | (~b) => ~(a & b) [v13] In-Reply-To: References: Message-ID: <_b5ODSBa8YhAf5i7hafehvmw40MAdi4z5yF0EicXBUE=.bb562b91-9751-4dab-a487-0e9961b1f199@github.com> On Wed, 10 Jan 2024 16:57:47 GMT, Zhiqiang Zang wrote: >> Hello, >> >> (~a) | (~b) => ~(a & b) is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1617C28-L1617C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. >> >> Thanks. > > Zhiqiang Zang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 34 commits: > > - Merge branch 'master' of https://github.com/openjdk/jdk into ornode-PNewDeMorganLawOrToAnd > - update copyright dates. > - Merge branch 'andnode-PNewDeMorganLawAndToOr' into ornode-PNewDeMorganLawOrToAnd > - Revert "move the two helper functions to member functions of the node class." > > This reverts commit 7a962d69ac687c0476e54cd004037f2ebb0800ba. > - Revert "update copyright dates." > > This reverts commit 3665de2f3d38b4fce3ba968ec6247dd2576ecc32. > - Revert "move make_not back to AddNode because it cannot compile for architecture arm, s390x, ppc64le." > > This reverts commit 4ee8b089b6f119b1692dcd2c512d249d92734697. > - Revert "adapt changes from the dependent pr." > > This reverts commit 0c8d10776fe55a28ea9f60160b4f6f57aabad5ab. > - Revert "adapt to new changes from the dependant pr." > > This reverts commit b21e242b306d9cf01e1ba84ebba47ef5b471d98e. > - adapt to new changes from the dependant pr. > - Merge branch 'andnode-PNewDeMorganLawAndToOr' into ornode-PNewDeMorganLawOrToAnd > - ... and 24 more: https://git.openjdk.org/jdk/compare/b86c3b7a...f908668b Testing running for commit 34... ------------- PR Comment: https://git.openjdk.org/jdk/pull/16334#issuecomment-1886601525 From thartmann at openjdk.org Thu Jan 11 08:30:25 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 11 Jan 2024 08:30:25 GMT Subject: RFR: JDK-8323190: Segfault during deoptimization of C2-compiled code [v3] In-Reply-To: References: <6jsDdUuD6X9u8CT-4SUN_r2VUgPA4QuPCMLQyz-myaY=.52f1830e-0c87-44bf-931a-c272cabb74b4@github.com> Message-ID: On Wed, 10 Jan 2024 18:16:42 GMT, Cesar Soares Lucas wrote: > I think AllocationMergesTests.java isn't the ideal place for these tests. The tests in AllocationMergesTests.java are for checking the IR shape after the optimization, the current issue was actually because of a problem emitting debug info for a compilation unit - it's not something that we can capture with the IR-framework I believe Right, what I meant is that this issue shows that we don't have enough test coverage for all the cases optimized by [JDK-8287061](https://bugs.openjdk.org/browse/JDK-8287061). Or do we have an existing test for reduced allocation merges that are used as monitors? Ideally, we would have an IR framework test for the important cases that would then check both that the code is optimized as expected as well as that it's correct (no crash, correct result, ...). I'm fine with adding more tests with https://github.com/openjdk/jdk/pull/15825 though. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17333#issuecomment-1886607265 From thartmann at openjdk.org Thu Jan 11 08:30:24 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 11 Jan 2024 08:30:24 GMT Subject: RFR: JDK-8323190: Segfault during deoptimization of C2-compiled code [v3] In-Reply-To: References: Message-ID: <_NKT-XZ-IwknOUt5nqBmxAFXrI2cOSYjQcFmuERRc4I=.149c1d56-f82f-4469-b2ac-6bca87e98f8c@github.com> On Wed, 10 Jan 2024 18:20:42 GMT, Cesar Soares Lucas wrote: >> Currently, if `ReduceAllocationMerges` reduces an allocation merge that is used as a monitor C2 will SIGFAULT in `Process_OopMap_Node` because it's missing code to handle that case. This patch fixes C2 to properly handle reduced allocation merges that are used as monitors. >> >> Tested with Linux x86_64 hotspot_all. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Convert NULL to nullptr. Remove type cast. Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17333#pullrequestreview-1814951569 From rrich at openjdk.org Thu Jan 11 08:57:52 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 11 Jan 2024 08:57:52 GMT Subject: RFR: 8290965: PPC64: Implement post-call NOPs [v4] In-Reply-To: References: Message-ID: > #### Implementation of post call nops (PCNs) on ppc64. > > Depends on https://github.com/openjdk/jdk/pull/17150 > > About post call nops: > > - instruction(s) at return addresses of compiled java calls > - emitted iff vm continuations are enabled to support virtual threads > - encode data that can be be used to find the corresponding CodeBlob and oop map faster > - mt-safe patchable to trigger deoptimization > > Background: > > - Frames in continuation StackChunks are not visited if their compiled method is made not entrant (in contrast to frames on stack). > Instead all PCNs of the compiled method are patched to trigger deoptimization when control returns to such frames. > - With vm continuations, stacks are walked and inspected more frequently. This requires lookup of metadata like frame size and oop maps. As an optimization the offset of the CodeBlob to the PCN and the oop map slot are encoded as data in the PCN. > > Post call nops on ppc64 > > - 1 instruction, i.e. 4 bytes (either CMPI or CMPLI[1]) > x86_64: 1 instruction, 8 bytes > aarch64: 3 instruction, 12 bytes > [1] 3.1.10 Fixed Point Compare Instructions in Power ISA 3.1B > https://openpowerfoundation.org/specifications/isa/ > > - 26 bits data payload > x86_64: 32 bits; aarch64: 32 bits > - 9 bits dedicated to oop map slot. With 8 bits there where cases with SPECjvm2008 where the slot could not be encoded (on ppc64 and x86_64). > x86_64: 8 bits; aarch64: 8 bits > - 17 bits dedicated to cb offset. Effectively 19 bits due to instruction alignment. > x86_64: 24 bits; aarch64: 24 bits > - Also used when reconstructing the back chain after thawing continuation frames (see `Thaw::patch_caller_links`) > > - Refactored frame constructors to make use of fast CodeBlob lookup based on PCNs. > The fast lookup may only be used if the pc is known to be in the code cache because `CodeCache::find_blob_fast` can yield wrong results if it finds instructions outside the code cache that look just like PCNs. Callers of the frame class constructors need to pass `frame::kind::native` in that case to avoid errors. Other platforms don't make this explicit which is a problem in my eyes. Picking the wrong constructor can cause errors when porting and in future development. > > - Currently only the PCNs in nmethods are initialized. Therefore we don't even try to make a fast lookup based on PCNs if we know the CodeBlob is, e.g., a RuntimeStub. To achieve this we call the frame constructor passing `frame::kind::code_blob`. > > #### Statistics > > > | SpecJVM2008... Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: Review Martin ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17171/files - new: https://git.openjdk.org/jdk/pull/17171/files/5852ea38..05fa480f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17171&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17171&range=02-03 Stats: 16 lines in 9 files changed: 0 ins; 0 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/17171.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17171/head:pull/17171 PR: https://git.openjdk.org/jdk/pull/17171 From roland at openjdk.org Thu Jan 11 09:03:42 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 11 Jan 2024 09:03:42 GMT Subject: RFR: 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 [v14] In-Reply-To: References: Message-ID: <0kbxDa-wMnnD6VR6S0O6DYDfoQ0BVeXTg1cx1CEheGI=.3e14d559-c3e4-4962-b3f2-2b45a5ce4771@github.com> On Tue, 9 Jan 2024 17:07:14 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > Thanks, still LGTM @eme64 @chhagedorn thanks for the reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/16886#issuecomment-1886660971 From roland at openjdk.org Thu Jan 11 09:03:45 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 11 Jan 2024 09:03:45 GMT Subject: Integrated: 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 In-Reply-To: References: Message-ID: On Wed, 29 Nov 2023 15:52:59 GMT, Roland Westrelin wrote: > Range check smearing and range check predication make an array access > dependent on 2 (or more in the case of RC smearing) conditions. As a > consequence, if a range check can be eliminated because there's an > identical dominating range check, the control dependent nodes that > could float and become dependent on the dominating range check cannot > be allowed to float because there's a risk that they would then bypass > one of the checks that make the access legal. > > `IfNode::dominated_by()` and `PhaseIdealLoop::dominated_by()` have > logic to prevent this: nodes that are control dependent on a range > check or predicate are not allowed to float. This is however not > sufficient as demonstrated by the test cases. > > In `TestArrayAccessAboveRCAfterSmearingOrPredication.testRangeCheckSmearing()`: > > > v += array[i]; > if (flag2) { > if (flag3) { > field = 0x42; > } > } > if (flagField == 1) { > v += array[i]; > } > > > The range check for the second `array[i]` load is replaced by the > dominating range check for the first `array[i]` but because the second > `array[i]` load could really be dependent on multiple range checks (in > case smearing happened which is not the case here), c2 doesn't allow > the second `array[i]` to float when the second range check is > removed. The second `array[i]` is then control dependent on: > > > if (flagField == 1) { > > > which is next found to be dominated by the same test: > > > if (flag == 1) { > > > and is removed. However nothing in `dominated_by()` treats node > dependent on tests that are not range check or predicates > specially. So the second `array[i]` is allowed to float and become > dependent on: > > > if (flag == 1) { > > > which is above the range check for that access. The test method in its > last invocation is passed an index for the array access that's widely > out of range. The array load happens before the range check and > crashes the VM. `testLoopPredication()` is a similar test where array > loads become dependent on predicates and end up above range checks. > > `TestArrayAccessCastIIAboveRC.java` is the test case from the bug > where for similar reasons a range check `CastII` ends up above its > range check, becomes top because its input becomes some integer that > conflicts with its type (but there's no condition to catch it). The > graph becomes broken and c2 crashes. > > Logic in the `dominated_by()` methods ... This pull request has now been integrated. Changeset: b922f8d4 Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/b922f8d45951250b7c39cb179b9bc1a8a6256a9e Stats: 400 lines in 14 files changed: 342 ins; 27 del; 31 mod 8319793: C2 compilation fails with "Bad graph detected in build_loop_late" after JDK-8279888 Reviewed-by: chagedorn, epeter ------------- PR: https://git.openjdk.org/jdk/pull/16886 From dlunden at openjdk.org Thu Jan 11 10:23:36 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 11 Jan 2024 10:23:36 GMT Subject: RFR: 8322996: BoxLockNode creation fails with assert(reg < CHUNK_SIZE) failed: sanity Message-ID: This changeset fixes an issue where deeply nested synchronized statements triggered an assert in C2. Changes: - Bail out on compilation when we create a `BoxLockNode` with a slot index that cannot fit in a `RegMask`. This is similar to how we handle the case when we do not have space to represent arguments in [`opto/matcher.cpp`](https://github.com/openjdk/jdk/blob/58b01dce054c50bcb5a28aad4c1b574acaa90f6d/src/hotspot/share/opto/matcher.cpp#L314-L318) - Add a regression test. Testing: - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/7446998688) - tier1, tier2, tier3, tier4, and tier5 on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64 ------------- Commit messages: - Fix issue and add test case Changes: https://git.openjdk.org/jdk/pull/17370/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17370&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8322996 Stats: 251 lines in 4 files changed: 248 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/17370.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17370/head:pull/17370 PR: https://git.openjdk.org/jdk/pull/17370 From kuaiwei.kw at alibaba-inc.com Thu Jan 11 11:58:26 2024 From: kuaiwei.kw at alibaba-inc.com (Kuai Wei) Date: Thu, 11 Jan 2024 19:58:26 +0800 Subject: =?UTF-8?B?UmU6IGRpc2N1c3MgYWJvdXQgcmVsZWFzZSBiYXJyaWVyIGZvciBmaW5hbCBmaWVsZHMgaW5p?= =?UTF-8?B?dGlhbGl6YXRpb24=?= In-Reply-To: References: <5e41f867-b31b-4ded-b737-8d0c869c8895.kuaiwei.kw@alibaba-inc.com> <20965b9a-ec64-4d81-866e-b1a0a94ed1e0@littlepinkcloud.com>, Message-ID: <822b0f93-ae9e-4196-8d20-1b87286b91d5.kuaiwei.kw@alibaba-inc.com> Hi Andrew and Dean, Thanks for reply. I checked the previous discussion and not clear about the root cause. If you can provide more detail about the optimize, like what load or load dependency will be elided, so we may check chance to detect or prevent. I list some cases I'm thinking 1) loaded value is used by final filed store, like x.final_field=x.other +1; it has data dependency, and can not be reordered by compiler 2) load from final field after final store x.final_field = xxx; t=x.final_field; The loaded value is always the final value. It's safe to elide below the barrier. 3) load from final field before final store t=x.final_field; x.final_field = xxx; The load could be elided with a non-final value, but it looks an expected behavior. Thanks, Kuai Wei From: Andrew Haley > Date: Wed, Jan 10, 2024 at 5:54?PM Subject: Re: discuss about release barrier for final fields initialization To: > On 1/9/24 06:23, Kuai Wei wrote: > I found only C2 will insert release membar and C1 just insert storestore for both final and normal allocation. In Doug Lea's cookbook https://gee.cs.oswego.edu/dl/jmm/cookbook.html > > Only storesotre is required. Alex has a great post on this topic https://shipilev.net/blog/2014/all-fields-are-final/ > . It referred a case why loadstore is needed. https://www.hboehm.info/c++mm/no_write_fences.html > > I checked this case and IMO it looks some legacy architecture may break data dependency and cause problem. As I know, alpha architecture is an example. I think it doesn't > break on modern architecture. Is there other case I missed? I think it requires a very careful analysis of the compiler to be sure. The problem occurs if an optimizer knows what a store is going to do. If it does, then there's nothing to prevent a load from being elided, and your load dependency has gone. This isn't a problem with C1, because C1 doesn't do that kind of optimization. I don't know that C2 does either, or even whether it is allowed to do so. From what I remember of the conversation, we left the release barrier in because of an abundance of caution rather than any proof that a storestore was inadequate. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. > https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 -------------- next part -------------- An HTML attachment was scrubbed... URL: From shade at openjdk.org Thu Jan 11 12:23:35 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 11 Jan 2024 12:23:35 GMT Subject: RFR: 8323584: AArch64: Unnecessary ResourceMark in NativeCall::set_destination_mt_safe Message-ID: Was looking at `ICBuffer` cleaning paths that run at safepoint. In `NativeCall::set_destination_mt_safe`, there is a `ResourceMark` that does not seem to have any purpose: no code in its scope uses resource allocation. There is adjacent dead code too. Looks like a development/debugging leftover. Additional testing: - [ ] Linux x86_64 AArch64 server fastdebug, `tier{1,2,3}` ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/17372/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17372&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8323584 Stats: 3 lines in 1 file changed: 0 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17372.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17372/head:pull/17372 PR: https://git.openjdk.org/jdk/pull/17372 From epeter at openjdk.org Thu Jan 11 12:37:32 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 11 Jan 2024 12:37:32 GMT Subject: RFR: 8323577: C2 SuperWord: remove AlignVector restrictions on IR tests added in JDK-8305055 Message-ID: These IR rules were restricted in [JDK-8305055](https://bugs.openjdk.org/browse/JDK-8305055), [PR for comparison](https://git.openjdk.org/jdk/pull/13236). This had to be done because those cases were no longer vectorized with `AlignVector` after my bugfix [JDK-8298935](https://bugs.openjdk.org/browse/JDK-8298935), they were "collateral damage". Before this bugfix, we would vectorize, even though the alignment constraints had rejected some memops, but then they were re-intriduced during pair extension. This re-introduction led to this bug for other reasons. I proposed to restore vectorization (i.e. fix the "collateral damage") by improving alignment-constraints ([JDK-8303827](https://bugs.openjdk.org/browse/JDK-8303827)). Since I had to already completely rework the alignment constraints because of a bug, I relaxed the constraints: [JDK-8310190](https://bugs.openjdk.org/browse/JDK-8310190) Now I can remove the restrictions on those rules. ------------- Commit messages: - 8323577 Changes: https://git.openjdk.org/jdk/pull/17369/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17369&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8323577 Stats: 13 lines in 2 files changed: 0 ins; 12 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17369.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17369/head:pull/17369 PR: https://git.openjdk.org/jdk/pull/17369 From rcastanedalo at openjdk.org Thu Jan 11 12:44:24 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 11 Jan 2024 12:44:24 GMT Subject: RFR: 8322996: BoxLockNode creation fails with assert(reg < CHUNK_SIZE) failed: sanity In-Reply-To: References: Message-ID: On Thu, 11 Jan 2024 10:19:12 GMT, Daniel Lund?n wrote: > This changeset fixes an issue where deeply nested synchronized statements triggered an assert in C2. > > Changes: > - Bail out on compilation when we create a `BoxLockNode` with a slot index that cannot fit in a `RegMask`. This is similar to how we handle the case when we do not have space to represent arguments in [`opto/matcher.cpp`](https://github.com/openjdk/jdk/blob/58b01dce054c50bcb5a28aad4c1b574acaa90f6d/src/hotspot/share/opto/matcher.cpp#L314-L318) > - Add a regression test. > > Testing: > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/7446998688) > - tier1, tier2, tier3, tier4, and tier5 on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64 Changes requested by rcastanedalo (Reviewer). src/hotspot/share/opto/locknode.cpp line 47: > 45: init_flags(Flag_rematerialize); > 46: OptoReg::Name reg = OptoReg::stack2reg(_slot); > 47: if (!RegMask::can_represent_arg(reg)) { I am not very familiar with this code, but would it be possible to use `!RegMask::can_represent(reg)` instead of `!RegMask::can_represent_arg(reg)` here? Or is it necessary to use the latter (which is stricter) for correctness? test/hotspot/jtreg/compiler/c2/TestNestedSynchronize.java line 27: > 25: * @test > 26: * @bug 8322996 > 27: * @requires vm.debug I suggest removing this line for better test coverage, the test does not really require debug mode. test/hotspot/jtreg/compiler/c2/TestNestedSynchronize.java line 32: > 30: * > 31: * @run main/othervm -XX:CompileCommand=compileonly,compiler.c2.TestNestedSynchronize::test > 32: * -XX:-TieredCompilation -Xcomp No need to use `-XX:-TieredCompilation` here (already in `-Xcomp` mode). test/hotspot/jtreg/compiler/c2/TestNestedSynchronize.java line 36: > 34: */ > 35: > 36: package compiler.c2; The test case might fit better under `test/hotspot/jtreg/compiler/locks`. ------------- PR Review: https://git.openjdk.org/jdk/pull/17370#pullrequestreview-1815464703 PR Review Comment: https://git.openjdk.org/jdk/pull/17370#discussion_r1448790888 PR Review Comment: https://git.openjdk.org/jdk/pull/17370#discussion_r1448797933 PR Review Comment: https://git.openjdk.org/jdk/pull/17370#discussion_r1448798283 PR Review Comment: https://git.openjdk.org/jdk/pull/17370#discussion_r1448801274 From tholenstein at openjdk.org Thu Jan 11 12:45:21 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 11 Jan 2024 12:45:21 GMT Subject: RFR: 8323584: AArch64: Unnecessary ResourceMark in NativeCall::set_destination_mt_safe In-Reply-To: References: Message-ID: <3Asv6oRvd6Ht-E7p-9l6zqfmNDsQgdTOlVVHM9sdqOo=.306bcd45-1b00-44ca-97d6-19976c6bd2f9@github.com> On Thu, 11 Jan 2024 12:17:21 GMT, Aleksey Shipilev wrote: > Was looking at `ICBuffer` cleaning paths that run at safepoint. In `NativeCall::set_destination_mt_safe`, there is a `ResourceMark` that does not seem to have any purpose: no code in its scope uses resource allocation. There is adjacent dead code too. Looks like a development/debugging leftover. > > Additional testing: > - [ ] Linux x86_64 AArch64 server fastdebug, `tier{1,2,3}` Thanks for removing! Looks good to me ------------- Marked as reviewed by tholenstein (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17372#pullrequestreview-1815483070 From rcastanedalo at openjdk.org Thu Jan 11 13:06:25 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 11 Jan 2024 13:06:25 GMT Subject: RFR: 8323577: C2 SuperWord: remove AlignVector restrictions on IR tests added in JDK-8305055 In-Reply-To: References: Message-ID: On Thu, 11 Jan 2024 10:16:14 GMT, Emanuel Peter wrote: > These IR rules were restricted in [JDK-8305055](https://bugs.openjdk.org/browse/JDK-8305055), [PR for comparison](https://git.openjdk.org/jdk/pull/13236). > > This had to be done because those cases were no longer vectorized with `AlignVector` after my bugfix [JDK-8298935](https://bugs.openjdk.org/browse/JDK-8298935), they were "collateral damage". Before this bugfix, we would vectorize, even though the alignment constraints had rejected some memops, but then they were re-intriduced during pair extension. This re-introduction led to this bug for other reasons. I proposed to restore vectorization (i.e. fix the "collateral damage") by improving alignment-constraints ([JDK-8303827](https://bugs.openjdk.org/browse/JDK-8303827)). > > Since I had to already completely rework the alignment constraints because of a bug, I relaxed the constraints: > [JDK-8310190](https://bugs.openjdk.org/browse/JDK-8310190) > > Now I can remove the restrictions on those rules. Looks good. ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17369#pullrequestreview-1815535313 From chagedorn at openjdk.org Thu Jan 11 13:17:25 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 11 Jan 2024 13:17:25 GMT Subject: RFR: 8323577: C2 SuperWord: remove AlignVector restrictions on IR tests added in JDK-8305055 In-Reply-To: References: Message-ID: On Thu, 11 Jan 2024 10:16:14 GMT, Emanuel Peter wrote: > These IR rules were restricted in [JDK-8305055](https://bugs.openjdk.org/browse/JDK-8305055), [PR for comparison](https://git.openjdk.org/jdk/pull/13236). > > This had to be done because those cases were no longer vectorized with `AlignVector` after my bugfix [JDK-8298935](https://bugs.openjdk.org/browse/JDK-8298935), they were "collateral damage". Before this bugfix, we would vectorize, even though the alignment constraints had rejected some memops, but then they were re-intriduced during pair extension. This re-introduction led to this bug for other reasons. I proposed to restore vectorization (i.e. fix the "collateral damage") by improving alignment-constraints ([JDK-8303827](https://bugs.openjdk.org/browse/JDK-8303827)). > > Since I had to already completely rework the alignment constraints because of a bug, I relaxed the constraints: > [JDK-8310190](https://bugs.openjdk.org/browse/JDK-8310190) > > Now I can remove the restrictions on those rules. Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17369#pullrequestreview-1815556911 From mdoerr at openjdk.org Thu Jan 11 13:21:28 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 11 Jan 2024 13:21:28 GMT Subject: RFR: 8290965: PPC64: Implement post-call NOPs [v4] In-Reply-To: References: Message-ID: On Thu, 11 Jan 2024 08:57:52 GMT, Richard Reingruber wrote: >> #### Implementation of post call nops (PCNs) on ppc64. >> >> Depends on https://github.com/openjdk/jdk/pull/17150 >> >> About post call nops: >> >> - instruction(s) at return addresses of compiled java calls >> - emitted iff vm continuations are enabled to support virtual threads >> - encode data that can be be used to find the corresponding CodeBlob and oop map faster >> - mt-safe patchable to trigger deoptimization >> >> Background: >> >> - Frames in continuation StackChunks are not visited if their compiled method is made not entrant (in contrast to frames on stack). >> Instead all PCNs of the compiled method are patched to trigger deoptimization when control returns to such frames. >> - With vm continuations, stacks are walked and inspected more frequently. This requires lookup of metadata like frame size and oop maps. As an optimization the offset of the CodeBlob to the PCN and the oop map slot are encoded as data in the PCN. >> >> Post call nops on ppc64 >> >> - 1 instruction, i.e. 4 bytes (either CMPI or CMPLI[1]) >> x86_64: 1 instruction, 8 bytes >> aarch64: 3 instruction, 12 bytes >> [1] 3.1.10 Fixed Point Compare Instructions in Power ISA 3.1B >> https://openpowerfoundation.org/specifications/isa/ >> >> - 26 bits data payload >> x86_64: 32 bits; aarch64: 32 bits >> - 9 bits dedicated to oop map slot. With 8 bits there where cases with SPECjvm2008 where the slot could not be encoded (on ppc64 and x86_64). >> x86_64: 8 bits; aarch64: 8 bits >> - 17 bits dedicated to cb offset. Effectively 19 bits due to instruction alignment. >> x86_64: 24 bits; aarch64: 24 bits >> - Also used when reconstructing the back chain after thawing continuation frames (see `Thaw::patch_caller_links`) >> >> - Refactored frame constructors to make use of fast CodeBlob lookup based on PCNs. >> The fast lookup may only be used if the pc is known to be in the code cache because `CodeCache::find_blob_fast` can yield wrong results if it finds instructions outside the code cache that look just like PCNs. Callers of the frame class constructors need to pass `frame::kind::native` in that case to avoid errors. Other platforms don't make this explicit which is a problem in my eyes. Picking the wrong constructor can cause errors when porting and in future development. >> >> - Currently only the PCNs in nmethods are initialized. Therefore we don't even try to make a fast lookup based on PCNs if we know the CodeBlob is, e.g., a RuntimeStub. To achieve this we call the frame cons... > > Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: > > Review Martin Thanks for the updates! The constructors should still be used with care, but I think your code is at least as good as other platforms (rather better IMHO). ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17171#pullrequestreview-1815566892 From dlunden at openjdk.org Thu Jan 11 13:51:25 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 11 Jan 2024 13:51:25 GMT Subject: RFR: 8322996: BoxLockNode creation fails with assert(reg < CHUNK_SIZE) failed: sanity In-Reply-To: References: Message-ID: <_hY0TrBgPqf3CtprnvKjNi3158j2U-49RGnP44f3p1c=.be4e2ce8-1163-4820-82c9-0a0ff2900dfa@github.com> On Thu, 11 Jan 2024 12:34:28 GMT, Roberto Casta?eda Lozano wrote: >> This changeset fixes an issue where deeply nested synchronized statements triggered an assert in C2. >> >> Changes: >> - Bail out on compilation when we create a `BoxLockNode` with a slot index that cannot fit in a `RegMask`. This is similar to how we handle the case when we do not have space to represent arguments in [`opto/matcher.cpp`](https://github.com/openjdk/jdk/blob/58b01dce054c50bcb5a28aad4c1b574acaa90f6d/src/hotspot/share/opto/matcher.cpp#L314-L318) >> - Add a regression test. >> >> Testing: >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/7446998688) >> - tier1, tier2, tier3, tier4, and tier5 on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64 > > src/hotspot/share/opto/locknode.cpp line 47: > >> 45: init_flags(Flag_rematerialize); >> 46: OptoReg::Name reg = OptoReg::stack2reg(_slot); >> 47: if (!RegMask::can_represent_arg(reg)) { > > I am not very familiar with this code, but would it be possible to use `!RegMask::can_represent(reg)` instead of `!RegMask::can_represent_arg(reg)` here? Or is it necessary to use the latter (which is stricter) for correctness? That is a fair question, and I'm not sure what is the preferred solution. The number of stack slots for a monitor seems to be determined by [`sync_stack_slots`](https://github.com/dlunde/jdk/blob/06d6b4be9750a326f87acf04a3dc717e307d14d5/src/hotspot/share/opto/compile.hpp#L1166-L1167). If I'm not mistaken the value of `sync_stack_slots()` varies between platforms (on my machine it is `int Compile::sync_stack_slots() const { return 2; }`). Therefore, I don't think `can_represent` always works. However, we should maybe have a new function `can_represent_sync_entry` (or similar) instead of `can_represent_arg`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17370#discussion_r1448893468 From tholenstein at openjdk.org Thu Jan 11 16:13:33 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 11 Jan 2024 16:13:33 GMT Subject: RFR: JDK-8316756: C2 EA fails with "missing memory path" when encountering unsafe_arraycopy stub call Message-ID: Before https://github.com/openjdk/jdk/pull/5259 the graph of the following program looked like this during Escape Analysis: static int test() { MyClass obj = new MyClass(); // Non-escaping to trigger Escape Analysis UNSAFE.copyMemory(null, SRC_BASE, null, DST_BASE, 4); obj.x = 42; return obj.x; } With MemBarCPUOrder: working Setting `RC_NARROW_MEM` flag in `LibraryCallKit::inline_unsafe_copyMemory()` removes the `428 MergeMem` node. Without a `MergeMem` node after `432 StoreB` EA failes in Phase 2 of `ConnectionGraph::split_unique_types(...)` when trying to push the allocation's users on the appropriate worklist - `429 CallLeafNoFP` is not an expected user of `428 StoreB`. Therefore the assert `"EA: missing memory path"` is hit. Without MemBarCPUOrdera and after setting `RC_NARROW_MEM`: failing ### Proposed Fix Dropping the `RC_NARROW_MEM` flag in `LibraryCallKit::inline_unsafe_copyMemory()` causes the introduction of a `MergeMem` between `StoreB` and `CallLeafNoFP`, so the corresponding code in EA doesn't encounter a `CallLeafNoFP` anymore: fixed Testing: Tier1-4 passed ------------- Commit messages: - JDK-8316756: C2 EA fails with "missing memory path" when encountering unsafe_arraycopy stub call Changes: https://git.openjdk.org/jdk/pull/17347/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17347&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8316756 Stats: 106 lines in 2 files changed: 67 ins; 37 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/17347.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17347/head:pull/17347 PR: https://git.openjdk.org/jdk/pull/17347 From dlunden at openjdk.org Thu Jan 11 16:37:50 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 11 Jan 2024 16:37:50 GMT Subject: RFR: 8322996: BoxLockNode creation fails with assert(reg < CHUNK_SIZE) failed: sanity [v2] In-Reply-To: References: Message-ID: > This changeset fixes an issue where deeply nested synchronized statements triggered an assert in C2. > > Changes: > - Bail out on compilation when we create a `BoxLockNode` with a slot index that cannot fit in a `RegMask`. This is similar to how we handle the case when we do not have space to represent arguments in [`opto/matcher.cpp`](https://github.com/openjdk/jdk/blob/58b01dce054c50bcb5a28aad4c1b574acaa90f6d/src/hotspot/share/opto/matcher.cpp#L314-L318) > - Add a regression test. > > Testing: > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/7446998688) > - tier1, tier2, tier3, tier4, and tier5 on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64 Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: Fixes after comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17370/files - new: https://git.openjdk.org/jdk/pull/17370/files/06d6b4be..735543d0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17370&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17370&range=00-01 Stats: 10 lines in 3 files changed: 7 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17370.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17370/head:pull/17370 PR: https://git.openjdk.org/jdk/pull/17370 From dlunden at openjdk.org Thu Jan 11 16:50:52 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 11 Jan 2024 16:50:52 GMT Subject: RFR: 8322996: BoxLockNode creation fails with assert(reg < CHUNK_SIZE) failed: sanity [v3] In-Reply-To: References: Message-ID: <66zey4Kt56pbcieRZ0AjBLemzwsAQcHCdDG1nlK-P1c=.14dc02d6-bef8-4239-9f39-02456756b7ea@github.com> > This changeset fixes an issue where deeply nested synchronized statements triggered an assert in C2. > > Changes: > - Bail out on compilation when we create a `BoxLockNode` with a slot index that cannot fit in a `RegMask`. This is similar to how we handle the case when we do not have space to represent arguments in [`opto/matcher.cpp`](https://github.com/openjdk/jdk/blob/58b01dce054c50bcb5a28aad4c1b574acaa90f6d/src/hotspot/share/opto/matcher.cpp#L314-L318) > - Add a regression test. > > Testing: > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/7446998688) > - tier1, tier2, tier3, tier4, and tier5 on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64 Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: Remove superfluous -TieredCompilation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/17370/files - new: https://git.openjdk.org/jdk/pull/17370/files/735543d0..9ab6e561 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=17370&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=17370&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17370.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17370/head:pull/17370 PR: https://git.openjdk.org/jdk/pull/17370 From sviswanathan at openjdk.org Thu Jan 11 16:57:34 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 11 Jan 2024 16:57:34 GMT Subject: Integrated: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp In-Reply-To: References: Message-ID: On Tue, 9 Jan 2024 00:01:04 GMT, Sandhya Viswanathan wrote: > The C2_MacroAssembler::vminmax_fp and evminmax_fp were incorrectly checking for the two USE operand registers to be different and thus resulting in assertion failure. > > In x86_64.ad: > > instruct maxF_reg(legRegF dst, legRegF a, legRegF b, legRegF tmp, legRegF atmp, legRegF btmp) %{ > ... > effect(USE a, USE b, TEMP tmp, TEMP atmp, TEMP btmp); > ... > __ vminmax_fp(Op_MaxV, T_FLOAT, $dst$$XMMRegister, $a$$XMMRegister, $b$$XMMRegister, $tmp$$XMMRegister, $atmp$$XMMRegister, $btmp$$XMMRegister, Assembler::AVX_128bit); > %} > > > Changing the assert in vminmax_fp from: > assert_different_registers(a, b, tmp, atmp, btmp); > to: > assert_different_registers(a, tmp, atmp, btmp); > assert_different_registers(b, tmp, atmp, btmp); > fixes the issue. > > Similar change done in evminmax_fp. > > Please review. > > Best Regards, > Sandhya This pull request has now been integrated. Changeset: e10d1400 Author: Sandhya Viswanathan URL: https://git.openjdk.org/jdk/commit/e10d14004fa25998231ab1d2611b75aea9b5c67d Stats: 16 lines in 2 files changed: 14 ins; 0 del; 2 mod 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp Co-authored-by: Volodymyr Paprotski Reviewed-by: kvn, thartmann, epeter, jbhateja ------------- PR: https://git.openjdk.org/jdk/pull/17315 From duke at openjdk.org Thu Jan 11 17:47:45 2024 From: duke at openjdk.org (Joshua Cao) Date: Thu, 11 Jan 2024 17:47:45 GMT Subject: RFR: 8323220: Reassociate loop invariants involved in Cmps and Add/Subs Message-ID: // inv1 == (x + inv2) => ( inv1 - inv2 ) == x // inv1 == (x - inv2) => ( inv1 + inv2 ) == x // inv1 == (inv2 - x) => (-inv1 + inv2 ) == x For example, fn(inv1, inv2) while(...) x = foobar() if inv1 == x + inv2 blackhole() We can transform this into fn(inv1, inv2) t = inv1 - inv2 while(...) x = foobar() if t == x blackhole() I have two examples in JDK source code 1. https://github.com/openjdk/jdk/blob/b78896b9aafcb15f453eaed6e154a5461581407b/src/java.base/share/classes/java/lang/invoke/LambdaFormEditor.java#L910. LHS `1` and RHS `pos` are both loop invariant 2. https://github.com/openjdk/jdk/blob/b78896b9aafcb15f453eaed6e154a5461581407b/src/jdk.zipfs/share/classes/jdk/nio/zipfs/ZipFileSystem.java#L1606. In separate transformation, the `>` is transformed into `!=` (not sure why TBH), and both sides have invariants Passes tier1 locally on Linux machine. Passes GHA on my fork. ------------- Commit messages: - 8323220: Reassociate loop invariants involved in Cmps and Add/Subs Changes: https://git.openjdk.org/jdk/pull/17375/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17375&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8323220 Stats: 270 lines in 3 files changed: 258 ins; 0 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/17375.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17375/head:pull/17375 PR: https://git.openjdk.org/jdk/pull/17375 From kvn at openjdk.org Thu Jan 11 18:02:22 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 11 Jan 2024 18:02:22 GMT Subject: RFR: JDK-8316756: C2 EA fails with "missing memory path" when encountering unsafe_arraycopy stub call In-Reply-To: References: Message-ID: On Wed, 10 Jan 2024 13:35:17 GMT, Tobias Holenstein wrote: > Before https://github.com/openjdk/jdk/pull/5259 the graph of the following program looked like this during Escape Analysis: > > > static int test() { > MyClass obj = new MyClass(); // Non-escaping to trigger Escape Analysis > UNSAFE.copyMemory(null, SRC_BASE, null, DST_BASE, 4); > obj.x = 42; > return obj.x; > } > > With MemBarCPUOrder: > working > > Setting `RC_NARROW_MEM` flag in `LibraryCallKit::inline_unsafe_copyMemory()` removes the `428 MergeMem` node. Without a `MergeMem` node after `432 StoreB` EA failes in Phase 2 of `ConnectionGraph::split_unique_types(...)` when trying to push the allocation's users on the appropriate worklist - `429 CallLeafNoFP` is not an expected user of `428 StoreB`. Therefore the assert `"EA: missing memory path"` is hit. > Without MemBarCPUOrdera and after setting `RC_NARROW_MEM`: > failing > > > ### Proposed Fix > Dropping the `RC_NARROW_MEM` flag in `LibraryCallKit::inline_unsafe_copyMemory()` causes the introduction of a `MergeMem` between `StoreB` and `CallLeafNoFP`, so the corresponding code in EA doesn't encounter a `CallLeafNoFP` anymore: > fixed > > Testing: Tier1-4 passed This should be reviewed by @iwanowww. I have too many question about this. Based on code and test `UNSAFE.copyMemory()` copies "native" memory which should not affect anything. [#5259](https://github.com/openjdk/jdk/pull/5259) sets RC_NARROW_MEM exactly for that as I understand. Flag setting (StoreB nodes) in `JavaThread::_doing_unsafe_access` is also not affected but it is volatile field and these stores should be staying where they are. They can't go up or down. And it should be accomplished by some kind of barriers between `StoreB` and `unsafe_Arraycopy` call. But the call's memory edge should not point to `StoreB` - it is incorrect since it does not affect that field in this case. Call's memory should point to root memory in this case. Operating on fields of new `MyClass` object could be moved around and object can be eliminated since it does not escape. ------------- PR Review: https://git.openjdk.org/jdk/pull/17347#pullrequestreview-1816177520 From sviswanathan at openjdk.org Thu Jan 11 18:12:32 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 11 Jan 2024 18:12:32 GMT Subject: [jdk22] RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp Message-ID: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp ------------- Commit messages: - Backport e10d14004fa25998231ab1d2611b75aea9b5c67d Changes: https://git.openjdk.org/jdk22/pull/62/files Webrev: https://webrevs.openjdk.org/?repo=jdk22&pr=62&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8321712 Stats: 16 lines in 2 files changed: 14 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk22/pull/62.diff Fetch: git fetch https://git.openjdk.org/jdk22.git pull/62/head:pull/62 PR: https://git.openjdk.org/jdk22/pull/62 From kvn at openjdk.org Thu Jan 11 18:18:23 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 11 Jan 2024 18:18:23 GMT Subject: RFR: JDK-8316756: C2 EA fails with "missing memory path" when encountering unsafe_arraycopy stub call In-Reply-To: References: Message-ID: On Wed, 10 Jan 2024 13:35:17 GMT, Tobias Holenstein wrote: > Before https://github.com/openjdk/jdk/pull/5259 the graph of the following program looked like this during Escape Analysis: > > > static int test() { > MyClass obj = new MyClass(); // Non-escaping to trigger Escape Analysis > UNSAFE.copyMemory(null, SRC_BASE, null, DST_BASE, 4); > obj.x = 42; > return obj.x; > } > > With MemBarCPUOrder: > working > > Setting `RC_NARROW_MEM` flag in `LibraryCallKit::inline_unsafe_copyMemory()` removes the `428 MergeMem` node. Without a `MergeMem` node after `432 StoreB` EA failes in Phase 2 of `ConnectionGraph::split_unique_types(...)` when trying to push the allocation's users on the appropriate worklist - `429 CallLeafNoFP` is not an expected user of `428 StoreB`. Therefore the assert `"EA: missing memory path"` is hit. > Without MemBarCPUOrdera and after setting `RC_NARROW_MEM`: > failing > > > ### Proposed Fix > Dropping the `RC_NARROW_MEM` flag in `LibraryCallKit::inline_unsafe_copyMemory()` causes the introduction of a `MergeMem` between `StoreB` and `CallLeafNoFP`, so the corresponding code in EA doesn't encounter a `CallLeafNoFP` anymore: > fixed > > Testing: Tier1-4 passed `JavaThread::_doing_unsafe_access` field is checked by runtime when we SEGV to find that it happens in `unsafe_arraycopy` code. Again, `unsafe_arraycopy` does not affect this field. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17347#issuecomment-1887710723 From kvn at openjdk.org Thu Jan 11 18:22:22 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 11 Jan 2024 18:22:22 GMT Subject: RFR: JDK-8316756: C2 EA fails with "missing memory path" when encountering unsafe_arraycopy stub call In-Reply-To: References: Message-ID: On Wed, 10 Jan 2024 13:35:17 GMT, Tobias Holenstein wrote: > Before https://github.com/openjdk/jdk/pull/5259 the graph of the following program looked like this during Escape Analysis: > > > static int test() { > MyClass obj = new MyClass(); // Non-escaping to trigger Escape Analysis > UNSAFE.copyMemory(null, SRC_BASE, null, DST_BASE, 4); > obj.x = 42; > return obj.x; > } > > With MemBarCPUOrder: > working > > Setting `RC_NARROW_MEM` flag in `LibraryCallKit::inline_unsafe_copyMemory()` removes the `428 MergeMem` node. Without a `MergeMem` node after `432 StoreB` EA failes in Phase 2 of `ConnectionGraph::split_unique_types(...)` when trying to push the allocation's users on the appropriate worklist - `429 CallLeafNoFP` is not an expected user of `428 StoreB`. Therefore the assert `"EA: missing memory path"` is hit. > Without MemBarCPUOrdera and after setting `RC_NARROW_MEM`: > failing > > > ### Proposed Fix > Dropping the `RC_NARROW_MEM` flag in `LibraryCallKit::inline_unsafe_copyMemory()` causes the introduction of a `MergeMem` between `StoreB` and `CallLeafNoFP`, so the corresponding code in EA doesn't encounter a `CallLeafNoFP` anymore: > fixed > > Testing: Tier1-4 passed The result of `unsafe_arraycopy` should not affect memory too. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17347#issuecomment-1887716294 From duke at openjdk.org Thu Jan 11 19:17:24 2024 From: duke at openjdk.org (Joshua Cao) Date: Thu, 11 Jan 2024 19:17:24 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v2] In-Reply-To: <_0ZJL7u55Fcg1yID2yjH4DHPkrgKTKeekpYtWG1YsAI=.e9caec05-a88d-4123-832d-6699a1990e49@github.com> References: <42h7t16pyeYV2jszIztjGu0JE2ZZWnnJCiyRd2s2oLg=.fffb35a5-e208-442c-9157-ec5d3fcaa31d@github.com> <_0ZJL7u55Fcg1yID2yjH4DHPkrgKTKeekpYtWG1YsAI=.e9caec05-a88d-4123-832d-6699a1990e49@github.com> Message-ID: On Mon, 8 Jan 2024 14:55:19 GMT, Roland Westrelin wrote: > When `InlineTree::ok_to_inline()` is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the `InlineTree::ok_to_inline()` has some useful information that's lost when late inlining happens? Yeah I think you're right. It should not matter for the string/methodhandle/vector/boxing late inlines. But we can lose information for generic late inlines, for example a hot method that could not get inlined earlier due to lack of budget. I'll look into it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16595#issuecomment-1887800021 From kvn at openjdk.org Thu Jan 11 19:29:27 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 11 Jan 2024 19:29:27 GMT Subject: [jdk22] RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp In-Reply-To: References: Message-ID: On Thu, 11 Jan 2024 18:07:43 GMT, Sandhya Viswanathan wrote: > Clean backport of JDK-8321712 to JDK 22. Original [PR](https://github.com/openjdk/jdk/pull/17315) was reviewed by Vladimir Kozlov, Tobias Hartmann, Emanuel Peter, and Jatin Bhateja. Please review and approve. > > Best Regards, > Sandhya Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk22/pull/62#pullrequestreview-1816417137 From duke at openjdk.org Thu Jan 11 20:10:59 2024 From: duke at openjdk.org (Joshua Cao) Date: Thu, 11 Jan 2024 20:10:59 GMT Subject: RFR: 8323220: Reassociate loop invariants involved in Cmps and Add/Subs In-Reply-To: References: Message-ID: On Thu, 11 Jan 2024 17:41:53 GMT, Joshua Cao wrote: > // inv1 == (x + inv2) => ( inv1 - inv2 ) == x > // inv1 == (x - inv2) => ( inv1 + inv2 ) == x > // inv1 == (inv2 - x) => (-inv1 + inv2 ) == x > > > For example, > > > fn(inv1, inv2) > while(...) > x = foobar() > if inv1 == x + inv2 > blackhole() > > > We can transform this into > > > fn(inv1, inv2) > t = inv1 - inv2 > while(...) > x = foobar() > if t == x > blackhole() > > > Here is an example: https://github.com/openjdk/jdk/blob/b78896b9aafcb15f453eaed6e154a5461581407b/src/java.base/share/classes/java/lang/invoke/LambdaFormEditor.java#L910. LHS `1` and RHS `pos` are both loop invariant > > Passes tier1 locally on Linux machine. Passes GHA on my fork. test/hotspot/jtreg/compiler/loopopts/InvariantCodeMotionReassociateCmp.java line 191: > 189: @Arguments({Argument.NUMBER_42, Argument.NUMBER_42}) > 190: @IR(failOn = {IRNode.SUB_I}) > 191: public void leDontReassociate(int inv1, int inv2) { I added DontReassociate tests for `le`, `gt`, and `ge`. For `lt`, C2 generates a second `SUB_I` as part of other transformations. IR matching for ADD/SUB is pretty hard in general. They commonly are created as part of other transformations. Any suggestions on how I can test this better is appreciated. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17375#discussion_r1449334691 From kvn at openjdk.org Thu Jan 11 20:11:00 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 11 Jan 2024 20:11:00 GMT Subject: RFR: 8322996: BoxLockNode creation fails with assert(reg < CHUNK_SIZE) failed: sanity [v3] In-Reply-To: <66zey4Kt56pbcieRZ0AjBLemzwsAQcHCdDG1nlK-P1c=.14dc02d6-bef8-4239-9f39-02456756b7ea@github.com> References: <66zey4Kt56pbcieRZ0AjBLemzwsAQcHCdDG1nlK-P1c=.14dc02d6-bef8-4239-9f39-02456756b7ea@github.com> Message-ID: On Thu, 11 Jan 2024 16:50:52 GMT, Daniel Lund?n wrote: >> This changeset fixes an issue where deeply nested synchronized statements triggered an assert in C2. >> >> Changes: >> - Bail out on compilation when we create a `BoxLockNode` with a slot index that cannot fit in a `RegMask`. This is similar to how we handle the case when we do not have space to represent arguments in [`opto/matcher.cpp`](https://github.com/openjdk/jdk/blob/58b01dce054c50bcb5a28aad4c1b574acaa90f6d/src/hotspot/share/opto/matcher.cpp#L314-L318) >> - Add a regression test. >> >> Testing: >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/7446998688) >> - tier1, tier2, tier3, tier4, and tier5 on windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64 > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Remove superfluous -TieredCompilation src/hotspot/share/opto/locknode.hpp line 66: > 64: return (int)reg < (int)(RegMask::CHUNK_SIZE - 1 - Compile::current()->sync_stack_slots()); > 65: } > 66: I think it should be in `regmask.hpp` together with other `can_represent_*` methods. Then you don't need part of the comment about those methods. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17370#discussion_r1449340301 From vlivanov at openjdk.org Thu Jan 11 22:51:18 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 11 Jan 2024 22:51:18 GMT Subject: RFR: JDK-8316756: C2 EA fails with "missing memory path" when encountering unsafe_arraycopy stub call In-Reply-To: References: Message-ID: On Wed, 10 Jan 2024 13:35:17 GMT, Tobias Holenstein wrote: > Before https://github.com/openjdk/jdk/pull/5259 the graph of the following program looked like this during Escape Analysis: > > > static int test() { > MyClass obj = new MyClass(); // Non-escaping to trigger Escape Analysis > UNSAFE.copyMemory(null, SRC_BASE, null, DST_BASE, 4); > obj.x = 42; > return obj.x; > } > > With MemBarCPUOrder: > working > > Setting `RC_NARROW_MEM` flag in `LibraryCallKit::inline_unsafe_copyMemory()` removes the `428 MergeMem` node. Without a `MergeMem` node after `432 StoreB` EA failes in Phase 2 of `ConnectionGraph::split_unique_types(...)` when trying to push the allocation's users on the appropriate worklist - `429 CallLeafNoFP` is not an expected user of `428 StoreB`. Therefore the assert `"EA: missing memory path"` is hit. > Without MemBarCPUOrdera and after setting `RC_NARROW_MEM`: > failing > > > ### Proposed Fix > Dropping the `RC_NARROW_MEM` flag in `LibraryCallKit::inline_unsafe_copyMemory()` causes the introduction of a `MergeMem` between `StoreB` and `CallLeafNoFP`, so the corresponding code in EA doesn't encounter a `CallLeafNoFP` anymore: > fixed > > Testing: Tier1-4 passed Yes, proposed fix undoes some optimizations JDK-8269119 introduced: `RC_NARROW_MEM` was introduced to optimally represent memory effects of native-to-native memory copy. The whole off-heap memory state is tracked by a single raw memory slice, so it qualifies to be treated as operating on narrow memory. The IR shape as it is now looks fine. JVM models non-heap memory operations as raw accesses, but they are serialized on a single memory alias (raw memory). IMO the bug is in EA code which doesn't properly handle calls with narrow memory effects. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17347#issuecomment-1888090765 From vlivanov at openjdk.org Thu Jan 11 23:34:18 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 11 Jan 2024 23:34:18 GMT Subject: RFR: JDK-8316756: C2 EA fails with "missing memory path" when encountering unsafe_arraycopy stub call In-Reply-To: References: Message-ID: On Thu, 11 Jan 2024 17:59:59 GMT, Vladimir Kozlov wrote: > Flag setting (StoreB nodes) in JavaThread::_doing_unsafe_access is also not affected but it is volatile field and these stores should be staying where they are. They can't go up or down. In a private discussion with @tobiasholenstein, I proposed to move those stores into the corresponding stub. It would complicate the implementation a bit (platform-specific vs cross-platform implementation), but simplify things on IR level. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17347#issuecomment-1888131977 From kvn at openjdk.org Thu Jan 11 23:43:18 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 11 Jan 2024 23:43:18 GMT Subject: RFR: JDK-8316756: C2 EA fails with "missing memory path" when encountering unsafe_arraycopy stub call In-Reply-To: References: Message-ID: On Thu, 11 Jan 2024 22:48:44 GMT, Vladimir Ivanov wrote: > all non-heap memory operations as raw accesses. Right. StoreB is also RAW access. My previous comment is incorrect - StoreB can be memory for `unsafe_arraycopy` and such it can preserve the order of execution. I agree with moving Stores into stub. C2 don't need to know about them. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17347#issuecomment-1888138970 From fyang at openjdk.org Fri Jan 12 01:25:22 2024 From: fyang at openjdk.org (Fei Yang) Date: Fri, 12 Jan 2024 01:25:22 GMT Subject: RFR: 8323584: AArch64: Unnecessary ResourceMark in NativeCall::set_destination_mt_safe In-Reply-To: References: Message-ID: <0bmTMOxYMebuuZyS-Yxg31x_nxEESCvTCsI2twowt9w=.e7b46583-d9eb-4fa7-bc51-903dfe51e7c0@github.com> On Thu, 11 Jan 2024 12:17:21 GMT, Aleksey Shipilev wrote: > Was looking at `ICBuffer` cleaning paths that run at safepoint. In `NativeCall::set_destination_mt_safe`, there is a `ResourceMark` that does not seem to have any purpose: no code in its scope uses resource allocation. There is adjacent dead code too. Looks like a development/debugging leftover. > > Additional testing: > - [ ] Linux x86_64 AArch64 server fastdebug, `tier{1,2,3}` LGTM. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17372#pullrequestreview-1817120885 From thartmann at openjdk.org Fri Jan 12 08:44:19 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 12 Jan 2024 08:44:19 GMT Subject: [jdk22] RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp In-Reply-To: References: Message-ID: <98CU5sgN8QGWIFJn5NYdo7c29T9WSstoqV17aed-sFU=.d6645440-5429-4a74-a30c-6a5b888fb648@github.com> On Thu, 11 Jan 2024 18:07:43 GMT, Sandhya Viswanathan wrote: > Clean backport of JDK-8321712 to JDK 22. Original [PR](https://github.com/openjdk/jdk/pull/17315) was reviewed by Vladimir Kozlov, Tobias Hartmann, Emanuel Peter, and Jatin Bhateja. Please review and approve. > > Best Regards, > Sandhya Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk22/pull/62#pullrequestreview-1817668195 From aph at openjdk.org Fri Jan 12 08:52:23 2024 From: aph at openjdk.org (Andrew Haley) Date: Fri, 12 Jan 2024 08:52:23 GMT Subject: RFR: 8323584: AArch64: Unnecessary ResourceMark in NativeCall::set_destination_mt_safe In-Reply-To: References: Message-ID: On Thu, 11 Jan 2024 12:17:21 GMT, Aleksey Shipilev wrote: > Was looking at `ICBuffer` cleaning paths that run at safepoint. In `NativeCall::set_destination_mt_safe`, there is a `ResourceMark` that does not seem to have any purpose: no code in its scope uses resource allocation. There is adjacent dead code too. Looks like a development/debugging leftover. > > Additional testing: > - [ ] Linux x86_64 AArch64 server fastdebug, `tier{1,2,3}` There are more of those. While they don't have much of an effect on runtime, it might be worth a cleanup pass to remove them in one go. ------------- Marked as reviewed by aph (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17372#pullrequestreview-1817683863 From cslucas at openjdk.org Fri Jan 12 10:47:32 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Fri, 12 Jan 2024 10:47:32 GMT Subject: Integrated: JDK-8323190: Segfault during deoptimization of C2-compiled code In-Reply-To: References: Message-ID: <_eqVXpIpEfN4cEN6TlHb8bgzICxvJga5i1Fz4c6AP9U=.1a779876-80d7-4ee8-a8eb-01ead5e03053@github.com> On Wed, 10 Jan 2024 01:22:37 GMT, Cesar Soares Lucas wrote: > Currently, if `ReduceAllocationMerges` reduces an allocation merge that is used as a monitor C2 will SIGFAULT in `Process_OopMap_Node` because it's missing code to handle that case. This patch fixes C2 to properly handle reduced allocation merges that are used as monitors. > > Tested with Linux x86_64 hotspot_all. This pull request has now been integrated. Changeset: ed182223 Author: Cesar Soares Lucas Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/ed182223655feee5356d42a94dd74950e9595724 Stats: 92 lines in 2 files changed: 89 ins; 0 del; 3 mod 8323190: Segfault during deoptimization of C2-compiled code Reviewed-by: thartmann, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/17333 From thartmann at openjdk.org Fri Jan 12 10:52:03 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 12 Jan 2024 10:52:03 GMT Subject: [jdk22] RFR: 8323190: Segfault during deoptimization of C2-compiled code Message-ID: Hi all, This pull request contains a backport of commit [ed182223](https://github.com/openjdk/jdk/commit/ed182223655feee5356d42a94dd74950e9595724) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. The commit being backported was authored by Cesar Soares Lucas on 12 Jan 2024 and was reviewed by Tobias Hartmann and Christian Hagedorn. Thanks! ------------- Commit messages: - Backport ed182223655feee5356d42a94dd74950e9595724 Changes: https://git.openjdk.org/jdk22/pull/67/files Webrev: https://webrevs.openjdk.org/?repo=jdk22&pr=67&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8323190 Stats: 92 lines in 2 files changed: 89 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk22/pull/67.diff Fetch: git fetch https://git.openjdk.org/jdk22.git pull/67/head:pull/67 PR: https://git.openjdk.org/jdk22/pull/67 From rcastanedalo at openjdk.org Fri Jan 12 10:56:14 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 12 Jan 2024 10:56:14 GMT Subject: RFR: 8322692: ZGC: avoid over-unrolling due to hidden barrier size Message-ID: This changeset refines the C2 loop unrolling heuristic by including an estimation of the final size of (Generational) ZGC barriers in the loop size computation. These are not exposed in C2's intermediate representation and thus currently ignored by the heuristic, which can lead to over-unrolling. #### Testing - tier1-5, stress test, fuzzing (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64). - tier6-9 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64, ZGC-specific tests only). #### Performance and code size evaluation - DaCapo, SPECjvm2008, SPECjbb2015 (linux-x64 with `-XX:+UseZGC -XX:+ZGenerational`). The changeset reduces slightly the size of the C2-generated code (around 0.5% fewer bytes per compiled bytecode for the `fop` and `luindex` DaCapo benchmarks) and has no overall significant performance effect. ------------- Commit messages: - Take into account late barrier size estimation in C2 unrolling heuristics Changes: https://git.openjdk.org/jdk/pull/17367/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17367&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8322692 Stats: 110 lines in 7 files changed: 109 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17367.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17367/head:pull/17367 PR: https://git.openjdk.org/jdk/pull/17367 From qamai at openjdk.org Fri Jan 12 10:56:17 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 12 Jan 2024 10:56:17 GMT Subject: RFR: 8322692: ZGC: avoid over-unrolling due to hidden barrier size In-Reply-To: References: Message-ID: On Thu, 11 Jan 2024 08:47:41 GMT, Roberto Casta?eda Lozano wrote: > This changeset refines the C2 loop unrolling heuristic by including an estimation of the final size of (Generational) ZGC barriers in the loop size computation. These are not exposed in C2's intermediate representation and thus currently ignored by the heuristic, which can lead to over-unrolling. > > #### Testing > > - tier1-5, stress test, fuzzing (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64). > - tier6-9 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64, ZGC-specific tests only). > > #### Performance and code size evaluation > > - DaCapo, SPECjvm2008, SPECjbb2015 (linux-x64 with `-XX:+UseZGC -XX:+ZGenerational`). The changeset reduces slightly the size of the C2-generated code (around 0.5% fewer bytes per compiled bytecode for the `fop` and `luindex` DaCapo benchmarks) and has no overall significant performance effect. src/hotspot/share/gc/z/c2/zBarrierSetC2.cpp line 334: > 332: // seven more nodes (CallLeaf, control Proj, memory Proj, data Proj, Region, > 333: // memory Phi, data Phi). > 334: return uncolor_or_color_size + 12; I thought the runtime call does not lie inside the loop. Is it necessary to take them into account, too? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17367#discussion_r1448602659 From rcastanedalo at openjdk.org Fri Jan 12 10:56:19 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 12 Jan 2024 10:56:19 GMT Subject: RFR: 8322692: ZGC: avoid over-unrolling due to hidden barrier size In-Reply-To: References: Message-ID: On Thu, 11 Jan 2024 10:06:54 GMT, Quan Anh Mai wrote: >> This changeset refines the C2 loop unrolling heuristic by including an estimation of the final size of (Generational) ZGC barriers in the loop size computation. These are not exposed in C2's intermediate representation and thus currently ignored by the heuristic, which can lead to over-unrolling. >> >> #### Testing >> >> - tier1-5, stress test, fuzzing (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64). >> - tier6-9 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64, ZGC-specific tests only). >> >> #### Performance and code size evaluation >> >> - DaCapo, SPECjvm2008, SPECjbb2015 (linux-x64 with `-XX:+UseZGC -XX:+ZGenerational`). The changeset reduces slightly the size of the C2-generated code (around 0.5% fewer bytes per compiled bytecode for the `fop` and `luindex` DaCapo benchmarks) and has no overall significant performance effect. > > src/hotspot/share/gc/z/c2/zBarrierSetC2.cpp line 334: > >> 332: // seven more nodes (CallLeaf, control Proj, memory Proj, data Proj, Region, >> 333: // memory Phi, data Phi). >> 334: return uncolor_or_color_size + 12; > > I thought the runtime call does not lie inside the loop. Is it necessary to take them into account, too? Conceptually, the runtime call belongs to the loop, even if it is laid out in the cold section of the method. The current unrolling heuristic counts all basic blocks in the loop, regardless of whether they are hot or cold and how they are arranged in the final code. This changeset does the same for consistency. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17367#discussion_r1448818759 From chagedorn at openjdk.org Fri Jan 12 11:02:38 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 12 Jan 2024 11:02:38 GMT Subject: RFR: 8323101: C2: assert(n->in(0) == nullptr) failed: divisions with zero check should already have bailed out earlier in split-if Message-ID: <7RQTgz8ZyoAIk6gpdZgDCFsqIBA_WK6IJ6OkHFLg_ts=.da8bb3a9-674c-4b2e-863f-0b2afcb37f34@github.com> The assertion added by [JDK-8299259](https://bugs.openjdk.org/browse/JDK-8299259) is wrong. I've originally assumed that at this point, there are no pinned `Div/Mod` nodes anymore that we possibly want to split through a phi due to bailing out earlier here: https://github.com/openjdk/jdk/blob/3e19bf88d5b51fe10c183f930b99bce961a368c1/src/hotspot/share/opto/loopopts.cpp#L1137-L1140 The assumption was that a `Div/Mod` node with a control input to the zero-check `IfProj` could only have a `Phi` input with a `Region` that is further up in the graph. This is normally true. However, in `testIntDiv()`, we split an `If` with `do_split_if()` and need to empty the basic block. We split the store `iFld = sub` up. This includes the `StoreI` as well as the `AddI` which also has the `Region` as current `ctrl` (i.e. returned with `get_ctrl()`). The `AddI` also has the `DivI` as output which, however, is not pushed up since it's not part of the same basic block. We end up with the following graph after the completion of `do_split_if()`: ![image](https://github.com/openjdk/jdk/assets/17833009/b76293e1-a593-4319-b026-254be3c098fc) The `DivI` now has `252 Phi` as input which merges the split `AddI` nodes. `246 Region` of the `252 Phi` is further down than the control input `83 IfTrue` of the `DivI`.This is rather unusual and thus was missed when the assert was added in JDK-8299259. When finally processing the `DivI` node in the DFS walk of Split-If, we fail with the assertion. The fix is straight forward to turn this assert into a simple bailout: We should not split a `Div/Mod` node that is pinned (i.e. has a zero check). While working on this bug, I've also tried to trigger the assert with `DivL/ModL` nodes. However, this did not work because `split_up()` does not split the `Add` node up. The reason is that we set late ctrl to early ctrl for the `DivL/ModL` node (and thus also set the same late ctrl for the `Add` node) while we do not do that for `DivI/ModI` nodes. It seems that we miss to treat `DivL/ModL` nodes as unpinned here which would allow us to set a later ctrl: https://github.com/openjdk/jdk/blob/82a63a03c0155288e8e43b9f766c8be70be50b6a/src/hotspot/share/opto/loopnode.cpp#L6091-L6101 I've done some digging and found that `DivL/ModL` nodes were added after this switch statment. So, I assume we simply forgot to also treat them as unpinned here. It's not wrong but I think just an unnecessary limitation. I filed [JDK-8323652](https://bugs.openjdk.org/browse/JDK-8323652) to fix this. Either way, I left the code for the long cases in even though they do not trigger. They should once JDK-8323652 is fixed. Thanks, Christian ------------- Commit messages: - 8323101: C2: assert(n->in(0) == nullptr) failed: divisions with zero check should already have bailed out earlier in split-if Changes: https://git.openjdk.org/jdk/pull/17394/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17394&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8323101 Stats: 215 lines in 2 files changed: 214 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/17394.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17394/head:pull/17394 PR: https://git.openjdk.org/jdk/pull/17394 From epeter at openjdk.org Fri Jan 12 11:45:20 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 12 Jan 2024 11:45:20 GMT Subject: [jdk22] RFR: 8323190: Segfault during deoptimization of C2-compiled code In-Reply-To: References: Message-ID: On Fri, 12 Jan 2024 10:45:11 GMT, Tobias Hartmann wrote: > Hi all, > > This pull request contains a backport of commit [ed182223](https://github.com/openjdk/jdk/commit/ed182223655feee5356d42a94dd74950e9595724) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Cesar Soares Lucas on 12 Jan 2024 and was reviewed by Tobias Hartmann and Christian Hagedorn. > > Thanks! LGTM ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk22/pull/67#pullrequestreview-1818008006 From duke at openjdk.org Fri Jan 12 11:49:19 2024 From: duke at openjdk.org (Yude Lin) Date: Fri, 12 Jan 2024 11:49:19 GMT Subject: RFR: 8323122: AArch64: Increase itable stub size estimate In-Reply-To: References: Message-ID: On Wed, 10 Jan 2024 02:03:08 GMT, Yude Lin wrote: > Since [JDK-8307352](https://bugs.openjdk.org/browse/JDK-8307352), itable stub size has grown by 20 bytes on linux-aarch64. In particular, the "slop-counted" code increases from 100->120 bytes, where the current estimate is 124 bytes. I haven't found a case where it exceeds the estimate. For now this size is stable across the few linux-aarch64 configurations I ran with. It doesn't vary (for example) by different klass decoding schemes. But I think the idea of the estimate is that we can never know. I propose we increase the estimate to be safe. > > Passed hotspot/jtreg/:tier1 Can I get a review on this small patch please : ) ------------- PR Comment: https://git.openjdk.org/jdk/pull/17336#issuecomment-1888950779 From thartmann at openjdk.org Fri Jan 12 11:49:22 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 12 Jan 2024 11:49:22 GMT Subject: [jdk22] RFR: 8323190: Segfault during deoptimization of C2-compiled code In-Reply-To: References: Message-ID: <7KK3ymYR5c1IVbrS7xj8QjkFiDO7MdWST5mETMckagg=.eb430ce6-5528-4905-b3ce-6bfba08e552c@github.com> On Fri, 12 Jan 2024 10:45:11 GMT, Tobias Hartmann wrote: > Hi all, > > This pull request contains a backport of commit [ed182223](https://github.com/openjdk/jdk/commit/ed182223655feee5356d42a94dd74950e9595724) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Cesar Soares Lucas on 12 Jan 2024 and was reviewed by Tobias Hartmann and Christian Hagedorn. > > Thanks! Thanks, Emanuel! ------------- PR Comment: https://git.openjdk.org/jdk22/pull/67#issuecomment-1888951022 From shade at openjdk.org Fri Jan 12 11:59:19 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 12 Jan 2024 11:59:19 GMT Subject: RFR: 8323584: AArch64: Unnecessary ResourceMark in NativeCall::set_destination_mt_safe In-Reply-To: References: Message-ID: On Fri, 12 Jan 2024 08:49:27 GMT, Andrew Haley wrote: > There are more of those. While they don't have much of an effect on runtime, it might be worth a cleanup pass to remove them in one go. Right. I would prefer to remove `ResourceMark`-s one by one, though, because one needs to go through all callees in the scope to actually verify the absence of resource allocations. I would not trust testing to find missing RMs reliably, especially when obscure uses are hiding on some branches. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17372#issuecomment-1888982565 From thartmann at openjdk.org Fri Jan 12 12:45:24 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 12 Jan 2024 12:45:24 GMT Subject: [jdk22] Integrated: 8323190: Segfault during deoptimization of C2-compiled code In-Reply-To: References: Message-ID: On Fri, 12 Jan 2024 10:45:11 GMT, Tobias Hartmann wrote: > Hi all, > > This pull request contains a backport of commit [ed182223](https://github.com/openjdk/jdk/commit/ed182223655feee5356d42a94dd74950e9595724) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Cesar Soares Lucas on 12 Jan 2024 and was reviewed by Tobias Hartmann and Christian Hagedorn. > > Thanks! This pull request has now been integrated. Changeset: d115295d Author: Tobias Hartmann URL: https://git.openjdk.org/jdk22/commit/d115295df8ccfec8670878ab5a7dc8d8661025d9 Stats: 92 lines in 2 files changed: 89 ins; 0 del; 3 mod 8323190: Segfault during deoptimization of C2-compiled code Reviewed-by: epeter Backport-of: ed182223655feee5356d42a94dd74950e9595724 ------------- PR: https://git.openjdk.org/jdk22/pull/67 From rcastanedalo at openjdk.org Fri Jan 12 14:04:22 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 12 Jan 2024 14:04:22 GMT Subject: RFR: 8322692: ZGC: avoid over-unrolling due to hidden barrier size In-Reply-To: References: Message-ID: On Thu, 11 Jan 2024 08:47:41 GMT, Roberto Casta?eda Lozano wrote: > This changeset refines the C2 loop unrolling heuristic by including an estimation of the final size of (Generational) ZGC barriers in the loop size computation. These are not exposed in C2's intermediate representation and thus currently ignored by the heuristic, which can lead to over-unrolling. > > #### Testing > > - tier1-5, stress test, fuzzing (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64). > - tier6-9 (windows-x64, linux-x64, linux-aarch64, macosx-x64, macosx-aarch64, ZGC-specific tests only). > > #### Performance and code size evaluation > > - DaCapo, SPECjvm2008, SPECjbb2015 (linux-x64 with `-XX:+UseZGC -XX:+ZGenerational`). The changeset reduces slightly the size of the C2-generated code (around 0.5% fewer bytes per compiled bytecode for the `fop` and `luindex` DaCapo benchmarks) and has no overall significant performance effect. Switching back to draft mode to address some offline comments from Erik ?sterlund. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17367#issuecomment-1889288008 From dnsimon at openjdk.org Fri Jan 12 14:34:49 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 12 Jan 2024 14:34:49 GMT Subject: RFR: 8323616: [JVMCI] TestInvalidJVMCIOption.java fails intermittently with NPE Message-ID: <8KAT1E7MkSLCl1JrnEBjDNvgmaCn_ryBbGVMtY0uiTQ=.e6bc008c-5338-4361-9042-20f1ba5e65cd@github.com> This PR removes an assertion from `TestInvalidJVMCIOption` that can fail intermittently due to a race between JIT initialization and runtime class initialization. The only thing the test should guarantee is that an invalid option is detected and results in a VM exit. ------------- Commit messages: - remove racy (and unnecessary) assertion in TestInvalidJVMCIOption Changes: https://git.openjdk.org/jdk/pull/17397/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17397&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8323616 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/17397.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17397/head:pull/17397 PR: https://git.openjdk.org/jdk/pull/17397 From thartmann at openjdk.org Fri Jan 12 14:38:25 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 12 Jan 2024 14:38:25 GMT Subject: RFR: 8323616: [JVMCI] TestInvalidJVMCIOption.java fails intermittently with NPE In-Reply-To: <8KAT1E7MkSLCl1JrnEBjDNvgmaCn_ryBbGVMtY0uiTQ=.e6bc008c-5338-4361-9042-20f1ba5e65cd@github.com> References: <8KAT1E7MkSLCl1JrnEBjDNvgmaCn_ryBbGVMtY0uiTQ=.e6bc008c-5338-4361-9042-20f1ba5e65cd@github.com> Message-ID: On Fri, 12 Jan 2024 14:25:29 GMT, Doug Simon wrote: > This PR removes an assertion from `TestInvalidJVMCIOption` that can fail intermittently due to a race between JIT initialization and runtime class initialization. > The only thing the test should guarantee is that an invalid option is detected and results in a VM exit. Looks good and trivial. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17397#pullrequestreview-1818397891 From thartmann at openjdk.org Fri Jan 12 14:45:29 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 12 Jan 2024 14:45:29 GMT Subject: RFR: 8322077: Add Ideal transformation: (~a) | (~b) => ~(a & b) [v13] In-Reply-To: References: Message-ID: On Wed, 10 Jan 2024 16:57:47 GMT, Zhiqiang Zang wrote: >> Hello, >> >> (~a) | (~b) => ~(a & b) is a widely seen pattern, for example it is implemented for LLVM [here](https://github.com/llvm/llvm-project/blob/397f1ce9efb4eea1ee10fe4833f733b8c7abd878/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L1617C28-L1617C28); however it is missing in current implementation of hotspot. This pull request adds this transformation and associated tests. >> >> Thanks. > > Zhiqiang Zang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 34 commits: > > - Merge branch 'master' of https://github.com/openjdk/jdk into ornode-PNewDeMorganLawOrToAnd > - update copyright dates. > - Merge branch 'andnode-PNewDeMorganLawAndToOr' into ornode-PNewDeMorganLawOrToAnd > - Revert "move the two helper functions to member functions of the node class." > > This reverts commit 7a962d69ac687c0476e54cd004037f2ebb0800ba. > - Revert "update copyright dates." > > This reverts commit 3665de2f3d38b4fce3ba968ec6247dd2576ecc32. > - Revert "move make_not back to AddNode because it cannot compile for architecture arm, s390x, ppc64le." > > This reverts commit 4ee8b089b6f119b1692dcd2c512d249d92734697. > - Revert "adapt changes from the dependent pr." > > This reverts commit 0c8d10776fe55a28ea9f60160b4f6f57aabad5ab. > - Revert "adapt to new changes from the dependant pr." > > This reverts commit b21e242b306d9cf01e1ba84ebba47ef5b471d98e. > - adapt to new changes from the dependant pr. > - Merge branch 'andnode-PNewDeMorganLawAndToOr' into ornode-PNewDeMorganLawOrToAnd > - ... and 24 more: https://git.openjdk.org/jdk/compare/b86c3b7a...f908668b All tests passed, this is good to go. ------------- PR Comment: https://git.openjdk.org/jdk/pull/16334#issuecomment-1889393797 From thartmann at openjdk.org Fri Jan 12 14:46:21 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 12 Jan 2024 14:46:21 GMT Subject: RFR: 8320237 C2: late inlining of method handle invoke causes duplicate lines in PrintInlining output [v2] In-Reply-To: References: Message-ID: On Wed, 10 Jan 2024 16:37:44 GMT, Kangcheng Xu wrote: >> This PR resolves [JDK-8320237](https://bugs.openjdk.org/browse/JDK-8320237) >> >> The original behavior produces both a failure and success message upon late inlining which is confusing. The patch removes the failure message if inlining was successful. Huge thanks to @rwestrel for reporting and working out a solution. >> >> Unit test `test/hotspot/jtreg/compiler/inlining/TestDuplicatedLateInliningOutput.java` is added and passing. > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > fix VM crashes Thanks, I'll re-run testing. Could you please explain what the problem was? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17147#issuecomment-1889397470 From dlunden at openjdk.org Fri Jan 12 15:09:20 2024 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Fri, 12 Jan 2024 15:09:20 GMT Subject: RFR: 8322996: BoxLockNode creation fails with assert(reg < CHUNK_SIZE) failed: sanity [v3] In-Reply-To: References: <66zey4Kt56pbcieRZ0AjBLemzwsAQcHCdDG1nlK-P1c=.14dc02d6-bef8-4239-9f39-02456756b7ea@github.com> Message-ID: On Thu, 11 Jan 2024 20:05:26 GMT, Vladimir Kozlov wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove superfluous -TieredCompilation > > src/hotspot/share/opto/locknode.hpp line 66: > >> 64: return (int)reg < (int)(RegMask::CHUNK_SIZE - 1 - Compile::current()->sync_stack_slots()); >> 65: } >> 66: > > I think it should be in `regmask.hpp` together with other `can_represent_*` methods. Then you don't need part of the comment about those methods. Thanks @vnkozlov. Do you know if we can directly use `can_represent` instead, and not take `sync_stack_slots()` into account? The field `_inmask` in `BoxLockNode` seems to only specify a single register (one bit in the mask). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/17370#discussion_r1450578513 From ddong at openjdk.org Fri Jan 12 15:24:30 2024 From: ddong at openjdk.org (Denghui Dong) Date: Fri, 12 Jan 2024 15:24:30 GMT Subject: RFR: 8322735: C2: minor improvements of bubble sort used in SuperWord::packset_sort [v3] In-Reply-To: References: Message-ID: On Tue, 9 Jan 2024 05:26:43 GMT, Denghui Dong wrote: >> A minor improvement could be made for bubble sort in SuperWord::packset_sort to reduce the comparison count in bad cases. >> >> See https://en.wikipedia.org/wiki/Bubble_sort > > Denghui Dong has updated the pull request incrementally with one additional commit since the last revision: > > update Thanks for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17190#issuecomment-1889494658 From ddong at openjdk.org Fri Jan 12 15:24:32 2024 From: ddong at openjdk.org (Denghui Dong) Date: Fri, 12 Jan 2024 15:24:32 GMT Subject: Integrated: 8322735: C2: minor improvements of bubble sort used in SuperWord::packset_sort In-Reply-To: References: Message-ID: On Mon, 25 Dec 2023 15:34:55 GMT, Denghui Dong wrote: > A minor improvement could be made for bubble sort in SuperWord::packset_sort to reduce the comparison count in bad cases. > > See https://en.wikipedia.org/wiki/Bubble_sort This pull request has now been integrated. Changeset: c5e72450 Author: Denghui Dong URL: https://git.openjdk.org/jdk/commit/c5e72450966ad50d57a8d22e9d634bfcb319aee9 Stats: 7 lines in 1 file changed: 0 ins; 2 del; 5 mod 8322735: C2: minor improvements of bubble sort used in SuperWord::packset_sort Reviewed-by: epeter, kvn ------------- PR: https://git.openjdk.org/jdk/pull/17190 From roland at openjdk.org Fri Jan 12 15:31:23 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 12 Jan 2024 15:31:23 GMT Subject: RFR: 8320237 C2: late inlining of method handle invoke causes duplicate lines in PrintInlining output [v2] In-Reply-To: References: Message-ID: On Fri, 12 Jan 2024 14:43:39 GMT, Tobias Hartmann wrote: > Could you please explain what the problem was? I did the fix. For virtual and method handle inline calls, late inlining happens because we couldn't resolve the call to a single target before and the late inlining logic goes through the inlining heuristics again to find one. As a result, a new inlining message is produced. For other type of calls, the call is known to successfully inline but was delayed due to lack of nodes. When late inlining succeeds then, it's because the graph has shrunk enough but there's no new inlining message. The other thing is that the print inlining logic is conditioned on PrintInlining and PrintIntrinsics. When PrintIntrinsics only is true, for method handle and virtual calls, during late inlining no new inlining message is produced. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17147#issuecomment-1889510391 From tholenstein at openjdk.org Fri Jan 12 15:47:21 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Fri, 12 Jan 2024 15:47:21 GMT Subject: RFR: JDK-8316756: C2 EA fails with "missing memory path" when encountering unsafe_arraycopy stub call In-Reply-To: References: Message-ID: On Thu, 11 Jan 2024 23:31:15 GMT, Vladimir Ivanov wrote: > > Flag setting (StoreB nodes) in JavaThread::_doing_unsafe_access is also not affected but it is volatile field and these stores should be staying where they are. They can't go up or down. > > In a private discussion with @tobiasholenstein, I proposed to move those stores into the corresponding stub. It would complicate the implementation a bit (platform-specific vs cross-platform implementation), but simplify things on IR level. So you think we should go for that solution instead of this fix? ------------- PR Comment: https://git.openjdk.org/jdk/pull/17347#issuecomment-1889534817 From epeter at openjdk.org Fri Jan 12 16:20:28 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 12 Jan 2024 16:20:28 GMT Subject: RFR: 8323641: Test compiler/loopopts/superword/TestAlignVectorFuzzer.java timed out Message-ID: It seems that allowing `90%` of the timeout-time was cutting it too close. Some individual tests can take more time occasionally, one even took more than `80 sec` (very rare). Why do these tests take so long? - Sometimes compilation can take quite a bit of time. `-XX:LoopUnrollLimit=250` already increases the number of nodes allowed in a loop-body, and with unrolling this increases significantly. SuperWord then has to work through all these nodes, and has a lot of quadratic and higher complexity loops. - The loop bodies are quite large (hand unrolled), and often lead to partial vectorization, with lots of scalar memory ops. This produces quite sub-optimal code, with a bit too many instructions in the loop body. Combined with lots of array copying, this probably takes quite a hit on caches. I will investigate SuperWord compilation time in the future, and lower the runtime complexity if neccessary/possible, it is part of my autovectorization plans. I now lowered the allowance down to `40%`, which is hopefully small enough to avoid timeout, while still allowing sufficient many run to get decent test coverage. ------------- Commit messages: - reduce allowance even more, and fix typos - 8323641 Changes: https://git.openjdk.org/jdk/pull/17389/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17389&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8323641 Stats: 7 lines in 1 file changed: 2 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/17389.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17389/head:pull/17389 PR: https://git.openjdk.org/jdk/pull/17389 From chagedorn at openjdk.org Fri Jan 12 16:33:21 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 12 Jan 2024 16:33:21 GMT Subject: RFR: 8323641: Test compiler/loopopts/superword/TestAlignVectorFuzzer.java timed out In-Reply-To: References: Message-ID: <5OAn4KSIJ8gtoBjvYcx8M71lSHs_zzImW0MMIhX0ZOE=.6fa3b25d-943d-4068-8edc-f9e77e616f83@github.com> On Fri, 12 Jan 2024 08:22:54 GMT, Emanuel Peter wrote: > It seems that allowing `90%` of the timeout-time was cutting it too close. Some individual tests can take more time occasionally, one even took more than `80 sec` (very rare). > > Why do these tests take so long? > - Sometimes compilation can take quite a bit of time. `-XX:LoopUnrollLimit=250` already increases the number of nodes allowed in a loop-body, and with unrolling this increases significantly. SuperWord then has to work through all these nodes, and has a lot of quadratic and higher complexity loops. > - The loop bodies are quite large (hand unrolled), and often lead to partial vectorization, with lots of scalar memory ops. This produces quite sub-optimal code, with a bit too many instructions in the loop body. Combined with lots of array copying, this probably takes quite a hit on caches. > > I will investigate SuperWord compilation time in the future, and lower the runtime complexity if neccessary/possible, it is part of my autovectorization plans. > > I now lowered the allowance down to `40%`, which is hopefully small enough to avoid timeout, while still allowing sufficient many run to get decent test coverage. That looks reasonable. test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVectorFuzzer.java line 565: > 563: 20_000; > 564: System.out.println("Time Allowance: " + test_time_allowance_diff); > 565: long test_time_allowance = System.currentTimeMillis() + test_time_allowance_diff; Nit: You should use CamelCase for local Java variables. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/17389#pullrequestreview-1818631559 PR Review Comment: https://git.openjdk.org/jdk/pull/17389#discussion_r1450676162 From duke at openjdk.org Fri Jan 12 16:47:26 2024 From: duke at openjdk.org (Zhiqiang Zang) Date: Fri, 12 Jan 2024 16:47:26 GMT Subject: RFR: 8322077: Add Ideal transformation: (~a) | (~b) => ~(a & b) [v13] In-Reply-To: <_b5ODSBa8YhAf5i7hafehvmw40MAdi4z5yF0EicXBUE=.bb562b91-9751-4dab-a487-0e9961b1f199@github.com> References: <_b5ODSBa8YhAf5i7hafehvmw40MAdi4z5yF0EicXBUE=.bb562b91-9751-4dab-a487-0e9961b1f199@github.com> Message-ID: On Thu, 11 Jan 2024 08:23:02 GMT, Emanuel Peter wrote: >> Zhiqiang Zang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 34 commits: >> >> - Merge branch 'master' of https://github.com/openjdk/jdk into ornode-PNewDeMorganLawOrToAnd >> - update copyright dates. >> - Merge branch 'andnode-PNewDeMorganLawAndToOr' into ornode-PNewDeMorganLawOrToAnd >> - Revert "move the two helper functions to member functions of the node class." >> >> This reverts commit 7a962d69ac687c0476e54cd004037f2ebb0800ba. >> - Revert "update copyright dates." >> >> This reverts commit 3665de2f3d38b4fce3ba968ec6247dd2576ecc32. >> - Revert "move make_not back to AddNode because it cannot compile for architecture arm, s390x, ppc64le." >> >> This reverts commit 4ee8b089b6f119b1692dcd2c512d249d92734697. >> - Revert "adapt changes from the dependent pr." >> >> This reverts commit 0c8d10776fe55a28ea9f60160b4f6f57aabad5ab. >> - Revert "adapt to new changes from the dependant pr." >> >> This reverts commit b21e242b306d9cf01e1ba84ebba47ef5b471d98e. >> - adapt to new changes from the dependant pr. >> - Merge branch 'andnode-PNewDeMorganLawAndToOr' into ornode-PNewDeMorganLawOrToAnd >> - ... and 24 more: https://git.openjdk.org/jdk/compare/b86c3b7a...f908668b > > Testing running for commit 34... @eme64 @TobiHartmann Thanks for testing! Can you sponsor it when you get a chance thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/16334#issuecomment-1889628035 From sviswanathan at openjdk.org Fri Jan 12 17:02:25 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 12 Jan 2024 17:02:25 GMT Subject: [jdk22] RFR: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp In-Reply-To: References: Message-ID: On Thu, 11 Jan 2024 19:26:57 GMT, Vladimir Kozlov wrote: >> Clean backport of JDK-8321712 to JDK 22. Original [PR](https://github.com/openjdk/jdk/pull/17315) was reviewed by Vladimir Kozlov, Tobias Hartmann, Emanuel Peter, and Jatin Bhateja. Please review and approve. >> >> Best Regards, >> Sandhya > > Good. Thanks a lot @vnkozlov @TobiHartmann. ------------- PR Comment: https://git.openjdk.org/jdk22/pull/62#issuecomment-1889646764 From sviswanathan at openjdk.org Fri Jan 12 17:02:27 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Fri, 12 Jan 2024 17:02:27 GMT Subject: [jdk22] Integrated: 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp In-Reply-To: References: Message-ID: On Thu, 11 Jan 2024 18:07:43 GMT, Sandhya Viswanathan wrote: > Clean backport of JDK-8321712 to JDK 22. Original [PR](https://github.com/openjdk/jdk/pull/17315) was reviewed by Vladimir Kozlov, Tobias Hartmann, Emanuel Peter, and Jatin Bhateja. Please review and approve. > > Best Regards, > Sandhya This pull request has now been integrated. Changeset: b0920c24 Author: Sandhya Viswanathan URL: https://git.openjdk.org/jdk22/commit/b0920c24cd83d85a846a60fe2d784a48dd8c9b52 Stats: 16 lines in 2 files changed: 14 ins; 0 del; 2 mod 8321712: C2: "failed: Multiple uses of register" in C2_MacroAssembler::vminmax_fp Reviewed-by: kvn, thartmann Backport-of: e10d14004fa25998231ab1d2611b75aea9b5c67d ------------- PR: https://git.openjdk.org/jdk22/pull/62 From kvn at openjdk.org Fri Jan 12 17:24:20 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 12 Jan 2024 17:24:20 GMT Subject: RFR: JDK-8316756: C2 EA fails with "missing memory path" when encountering unsafe_arraycopy stub call In-Reply-To: References: Message-ID: On Fri, 12 Jan 2024 15:44:21 GMT, Tobias Holenstein wrote: > > > Flag setting (StoreB nodes) in JavaThread::_doing_unsafe_access is also not affected but it is volatile field and these stores should be staying where they are. They can't go up or down. > > > > > > In a private discussion with @tobiasholenstein, I proposed to move those stores into the corresponding stub. It would complicate the implementation a bit (platform-specific vs cross-platform implementation), but simplify things on IR level. > > So you think we should go for that solution instead of this fix? Yes. You may still need to fix EA to recognize RAW memory for `unsafe_arraycopy`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/17347#issuecomment-1889681521 From kvn at openjdk.org Fri Jan 12 19:01:19 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 12 Jan 2024 19:01:19 GMT Subject: RFR: JDK-8316756: C2 EA fails with "missing memory path" when encountering unsafe_arraycopy stub call In-Reply-To: References: Message-ID: On Wed, 10 Jan 2024 13:35:17 GMT, Tobias Holenstein wrote: > Before https://github.com/openjdk/jdk/pull/5259 the graph of the following program looked like this during Escape Analysis: > > > static int test() { > MyClass obj = new MyClass(); // Non-escaping to trigger Escape Analysis > UNSAFE.copyMemory(null, SRC_BASE, null, DST_BASE, 4); > obj.x = 42; > return obj.x; > } > > With MemBarCPUOrder: > working > > Setting `RC_NARROW_MEM` flag in `LibraryCallKit::inline_unsafe_copyMemory()` removes the `428 MergeMem` node. Without a `MergeMem` node after `432 StoreB` EA failes in Phase 2 of `ConnectionGraph::split_unique_types(...)` when trying to push the allocation's users on the appropriate worklist - `429 CallLeafNoFP` is not an expec