From xliu at openjdk.org Sat Jul 1 07:28:01 2023 From: xliu at openjdk.org (Xin Liu) Date: Sat, 1 Jul 2023 07:28:01 GMT Subject: Integrated: 8311125: Remove unused parameter 'phase' in AllocateNode::Ideal_allocation In-Reply-To: References: Message-ID: On Thu, 29 Jun 2023 22:46:57 GMT, Xin Liu wrote: > There are 2 overloaded AllocateNode::Idea_allocation() in graphkit.cpp. > One of them never uses 'phase' in the pattern-matching effort. > > C++ compiler may emit a warning for the unused parameter. We will need to take care of it if we treat > warning as error. It also unnecessarily couple CheckCastPP with PhaseValue. In some places, we have to > gain the instance for it. > > I would like to remove 'phase' as parameter. This is a pure clean-up. The other Idea_allocation() does > use PhaseValue* phase to get constant nodes, so leave it alone. This pull request has now been integrated. Changeset: d2e11593 Author: Xin Liu URL: https://git.openjdk.org/jdk/commit/d2e11593006dc32fb8ebbaf12488b8758c8a19ee Stats: 43 lines in 12 files changed: 0 ins; 0 del; 43 mod 8311125: Remove unused parameter 'phase' in AllocateNode::Ideal_allocation Reviewed-by: chagedorn, kvn ------------- PR: https://git.openjdk.org/jdk/pull/14719 From duke at openjdk.org Sat Jul 1 07:59:08 2023 From: duke at openjdk.org (Swati Sharma) Date: Sat, 1 Jul 2023 07:59:08 GMT Subject: RFR: 8311178: JMH tests don't scale well when sharing output buffers Message-ID: The below benchmark files have scaling issues due to cache contention and leads to poor scaling when run on multiple threads. The patch sets the scope from benchmark level to thread level to fix the issue: - org/openjdk/bench/java/io/DataOutputStreamTest.java - org/openjdk/bench/java/lang/ArrayCopyObject.java - org/openjdk/bench/java/lang/ArrayFiddle.java - org/openjdk/bench/java/time/format/DateTimeFormatterBench.java - org/openjdk/bench/jdk/incubator/vector/IndexInRangeBenchmark.java - org/openjdk/bench/jdk/incubator/vector/MemorySegmentVectorAccess.java - org/openjdk/bench/jdk/incubator/vector/StoreMaskedBenchmark.java - org/openjdk/bench/jdk/incubator/vector/StoreMaskedIOOBEBenchmark.java - org/openjdk/bench/vm/compiler/ArrayFill.java - org/openjdk/bench/vm/compiler/IndexVector.java Also removing the static scope for variables in org/openjdk/bench/jdk/incubator/vector/VectorFPtoIntCastOperations.java for better scaling. Please review and share your feedback. Thanks, Swati ------------- Commit messages: - 8311178: JMH tests don't scale well when sharing output buffers Changes: https://git.openjdk.org/jdk/pull/14746/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14746&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8311178 Stats: 17 lines in 12 files changed: 0 ins; 0 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/14746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14746/head:pull/14746 PR: https://git.openjdk.org/jdk/pull/14746 From stsypanov at openjdk.org Sat Jul 1 19:07:55 2023 From: stsypanov at openjdk.org (Sergey Tsypanov) Date: Sat, 1 Jul 2023 19:07:55 GMT Subject: RFR: 8311178: JMH tests don't scale well when sharing output buffers In-Reply-To: References: Message-ID: <-bJ0RJ4qNzcZoXn6WmqPd7VbtBEqoi-iMmUy3D-MNJo=.13545140-598a-4691-a13c-9c559756c918@github.com> On Sat, 1 Jul 2023 07:53:17 GMT, Swati Sharma wrote: > The below benchmark files have scaling issues due to cache contention and leads to poor scaling when run on multiple threads. The patch sets the scope from benchmark level to thread level to fix the issue: > - org/openjdk/bench/java/io/DataOutputStreamTest.java > - org/openjdk/bench/java/lang/ArrayCopyObject.java > - org/openjdk/bench/java/lang/ArrayFiddle.java > - org/openjdk/bench/java/time/format/DateTimeFormatterBench.java > - org/openjdk/bench/jdk/incubator/vector/IndexInRangeBenchmark.java > - org/openjdk/bench/jdk/incubator/vector/MemorySegmentVectorAccess.java > - org/openjdk/bench/jdk/incubator/vector/StoreMaskedBenchmark.java > - org/openjdk/bench/jdk/incubator/vector/StoreMaskedIOOBEBenchmark.java > - org/openjdk/bench/vm/compiler/ArrayFill.java > - org/openjdk/bench/vm/compiler/IndexVector.java > > Also removing the static scope for variables in org/openjdk/bench/jdk/incubator/vector/VectorFPtoIntCastOperations.java for better scaling. > > Please review and share your feedback. > > Thanks, > Swati test/micro/org/openjdk/bench/java/lang/ArrayCopyObject.java line 64: > 62: } > 63: > 64: @State(Scope.Thread) Are you sure it makes sense as in `main()` method we set `fork(1)` so there's only one thread running the benchmark? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14746#discussion_r1248932401 From dqu at openjdk.org Sun Jul 2 09:25:12 2023 From: dqu at openjdk.org (Daohan Qu) Date: Sun, 2 Jul 2023 09:25:12 GMT Subject: RFR: 8310331: JitTester: Exclude java.lang.Math.random Message-ID: <7mcEVlD5zO5ds7WO9kLAlsahYblH-kkIYh5ld0AL_tg=.23e79233-3e96-4bd8-924f-75394313a184@github.com> Test cases generated by JitTester might contain calls to `java.lang.Math.random()`. We could not set a seed for this random call. (In its implementation, `java.lang.Math` create `java.util.Random` instance statically (using the constructor `Random()`) and there is no way to set a seed for it.) Such tests might show up different variable values/printouts on each execution (Please refer to [the issue description](https://bugs.openjdk.org/browse/JDK-8310331)). Since it is meaningless to generate test cases with "unreproducible" results and JitTester has been able to assign random values to the generated variables (this seed could be set). Maybe we could just exclude the use of `java.lang.Math.random()` in JitTester's test case generation. ------------- Commit messages: - Exclude Math.random() from test case generated by jittester Changes: https://git.openjdk.org/jdk/pull/14748/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14748&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8310331 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14748.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14748/head:pull/14748 PR: https://git.openjdk.org/jdk/pull/14748 From dqu at openjdk.org Sun Jul 2 09:40:54 2023 From: dqu at openjdk.org (Daohan Qu) Date: Sun, 2 Jul 2023 09:40:54 GMT Subject: RFR: 8310331: JitTester: Exclude java.lang.Math.random In-Reply-To: <7mcEVlD5zO5ds7WO9kLAlsahYblH-kkIYh5ld0AL_tg=.23e79233-3e96-4bd8-924f-75394313a184@github.com> References: <7mcEVlD5zO5ds7WO9kLAlsahYblH-kkIYh5ld0AL_tg=.23e79233-3e96-4bd8-924f-75394313a184@github.com> Message-ID: On Sun, 2 Jul 2023 09:16:55 GMT, Daohan Qu wrote: > Test cases generated by JitTester might contain calls to `java.lang.Math.random()`. We could not set a seed for this random call. (In its implementation, `java.lang.Math` create `java.util.Random` instance statically (using the constructor `Random()`) and there is no way to set a seed for it.) > > Such tests might show up different variable values/printouts on each execution (Please refer to [the issue description](https://bugs.openjdk.org/browse/JDK-8310331)). > > Since it is meaningless to generate test cases with "unreproducible" results and JitTester has been able to assign random values to the generated variables (this seed could be set). Maybe we could just exclude the use of `java.lang.Math.random()` in JitTester's test case generation. It seems to be a leftover from [JDK-8239500: jittester shouldn't use non-deterministic System methods](https://bugs.openjdk.org/browse/JDK-8239500). ------------- PR Comment: https://git.openjdk.org/jdk/pull/14748#issuecomment-1616534623 From dholmes at openjdk.org Sun Jul 2 23:49:00 2023 From: dholmes at openjdk.org (David Holmes) Date: Sun, 2 Jul 2023 23:49:00 GMT Subject: RFR: 8309209: C2 failed "assert(_stack_guard_state == stack_guard_reserved_disabled) failed: inconsistent state" [v7] In-Reply-To: References: Message-ID: > This appears to be the same kind of issue as reported in [JDK-8146697](https://bugs.openjdk.org/browse/JDK-8146697) way back in Java 9, which was only "fixed" on x86. The current failure was seen on Aarch64. It seems prudent to apply the same changes to all the other platforms. I've done Aarch64, and took a guess at RISC-V but do not know PPC or S390, so I am looking to others to provide the appropriate equivalent code changes there. > > Testing so far is Aarch64 only: > - Tiers 1-3 > - 50x the closed stackoverflow test that failed previously > - 25x vmTestbase/nsk/stress/stack/* > > As these failures are so rare, passing tests don't really tell us much. This is more an attempt at additional robustness. > > Thanks. David Holmes has updated the pull request incrementally with one additional commit since the last revision: Made comment and assertion consistent on all platforms ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14669/files - new: https://git.openjdk.org/jdk/pull/14669/files/2070db9d..c9d2808e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14669&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14669&range=05-06 Stats: 4 lines in 3 files changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14669.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14669/head:pull/14669 PR: https://git.openjdk.org/jdk/pull/14669 From dholmes at openjdk.org Sun Jul 2 23:49:01 2023 From: dholmes at openjdk.org (David Holmes) Date: Sun, 2 Jul 2023 23:49:01 GMT Subject: RFR: 8309209: C2 failed "assert(_stack_guard_state == stack_guard_reserved_disabled) failed: inconsistent state" [v6] In-Reply-To: References: Message-ID: On Fri, 30 Jun 2023 09:41:43 GMT, Martin Doerr wrote: > Thanks for adding the assertions. I think they would be good to have for all platforms. I have tested it on PPC64, too. Thanks @TheRealMDoerr . I have made the comment and assertion the same on all platforms. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14669#issuecomment-1616999341 From dholmes at openjdk.org Mon Jul 3 04:27:08 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 3 Jul 2023 04:27:08 GMT Subject: RFR: 8309209: C2 failed "assert(_stack_guard_state == stack_guard_reserved_disabled) failed: inconsistent state" [v7] In-Reply-To: References: Message-ID: On Sun, 2 Jul 2023 23:49:00 GMT, David Holmes wrote: >> This appears to be the same kind of issue as reported in [JDK-8146697](https://bugs.openjdk.org/browse/JDK-8146697) way back in Java 9, which was only "fixed" on x86. The current failure was seen on Aarch64. It seems prudent to apply the same changes to all the other platforms. I've done Aarch64, and took a guess at RISC-V but do not know PPC or S390, so I am looking to others to provide the appropriate equivalent code changes there. >> >> Testing so far is Aarch64 only: >> - Tiers 1-3 >> - 50x the closed stackoverflow test that failed previously >> - 25x vmTestbase/nsk/stress/stack/* >> >> As these failures are so rare, passing tests don't really tell us much. This is more an attempt at additional robustness. >> >> Thanks. > > David Holmes has updated the pull request incrementally with one additional commit since the last revision: > > Made comment and assertion consistent on all platforms Re-testing has passed okay. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14669#issuecomment-1617259141 From dholmes at openjdk.org Mon Jul 3 04:27:10 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 3 Jul 2023 04:27:10 GMT Subject: Integrated: 8309209: C2 failed "assert(_stack_guard_state == stack_guard_reserved_disabled) failed: inconsistent state" In-Reply-To: References: Message-ID: On Tue, 27 Jun 2023 06:12:47 GMT, David Holmes wrote: > This appears to be the same kind of issue as reported in [JDK-8146697](https://bugs.openjdk.org/browse/JDK-8146697) way back in Java 9, which was only "fixed" on x86. The current failure was seen on Aarch64. It seems prudent to apply the same changes to all the other platforms. I've done Aarch64, and took a guess at RISC-V but do not know PPC or S390, so I am looking to others to provide the appropriate equivalent code changes there. > > Testing so far is Aarch64 only: > - Tiers 1-3 > - 50x the closed stackoverflow test that failed previously > - 25x vmTestbase/nsk/stress/stack/* > > As these failures are so rare, passing tests don't really tell us much. This is more an attempt at additional robustness. > > Thanks. This pull request has now been integrated. Changeset: 52ee5700 Author: David Holmes URL: https://git.openjdk.org/jdk/commit/52ee570025589d4d813ec4deae1f6133ca83156b Stats: 25 lines in 5 files changed: 25 ins; 0 del; 0 mod 8309209: C2 failed "assert(_stack_guard_state == stack_guard_reserved_disabled) failed: inconsistent state" Co-authored-by: Fei Yang Co-authored-by: Martin Doerr Co-authored-by: Amit Kumar Reviewed-by: aph, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/14669 From thartmann at openjdk.org Mon Jul 3 05:18:57 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 3 Jul 2023 05:18:57 GMT Subject: RFR: 8303279: C2: crash in SubTypeCheckNode::sub() at IGVN split if [v3] In-Reply-To: <5Oi18HUJNORp0_wYO2nD_aUGaoyvJCuyT4YmppEWvgA=.f40a1a10-7ee2-483d-bb6c-2fb7ec3218ad@github.com> References: <5Oi18HUJNORp0_wYO2nD_aUGaoyvJCuyT4YmppEWvgA=.f40a1a10-7ee2-483d-bb6c-2fb7ec3218ad@github.com> Message-ID: On Thu, 29 Jun 2023 07:38:15 GMT, Roland Westrelin wrote: >> The crash occurs because at split if during IGVN, a `SubTypeCheck` is >> created with null as input. That happens because the control path the >> `SubTypeCheck` is cloned for is dead. To fix that I propose delaying >> split if until dead paths are collapsed. >> >> I added an assert to check a nullable first input to `SubTypeCheck` >> nodes (which should be impossible because it should be null >> checked). When I ran testing, a number of cases showed up with known >> non null values non properly marked as non null. I fixed them. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Looks good to me too. test/hotspot/jtreg/compiler/splitif/TestCrashAtIGVNSplitIfSubType.java line 28: > 26: * @bug 8303279 > 27: * @summary C2: crash in SubTypeCheckNode::sub() at IGVN split if > 28: * @run main/othervm -XX:-TieredCompilation -XX:-BackgroundCompilation -XX:+UnlockDiagnosticVMOptions -XX:+StressIGVN -XX:StressSeed=598200189 TestCrashAtIGVNSplitIfSubType Maybe add a `@run` without a fixed seed to give this a chance to still trigger in the future. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14678#pullrequestreview-1510392072 PR Review Comment: https://git.openjdk.org/jdk/pull/14678#discussion_r1250249977 From thartmann at openjdk.org Mon Jul 3 05:23:05 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 3 Jul 2023 05:23:05 GMT Subject: RFR: 8310331: JitTester: Exclude java.lang.Math.random In-Reply-To: <7mcEVlD5zO5ds7WO9kLAlsahYblH-kkIYh5ld0AL_tg=.23e79233-3e96-4bd8-924f-75394313a184@github.com> References: <7mcEVlD5zO5ds7WO9kLAlsahYblH-kkIYh5ld0AL_tg=.23e79233-3e96-4bd8-924f-75394313a184@github.com> Message-ID: On Sun, 2 Jul 2023 09:16:55 GMT, Daohan Qu wrote: > Test cases generated by JitTester might contain calls to `java.lang.Math.random()`. We could not set a seed for this random call. (In its implementation, `java.lang.Math` create `java.util.Random` instance statically (using the constructor `Random()`) and there is no way to set a seed for it.) > > Such tests might show up different variable values/printouts on each execution (Please refer to [the issue description](https://bugs.openjdk.org/browse/JDK-8310331)). > > Since it is meaningless to generate test cases with "unreproducible" results and JitTester has been able to assign random values to the generated variables (this seed could be set). Maybe we could just exclude the use of `java.lang.Math.random()` in JitTester's test case generation. Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14748#pullrequestreview-1510394923 From thartmann at openjdk.org Mon Jul 3 05:53:58 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 3 Jul 2023 05:53:58 GMT Subject: RFR: 8309531: Incorrect result with unwrapped iotaShuffle. [v5] In-Reply-To: <3Qaa5jHFD9FxiCcArBu7mhPIMmbQKcq1_YNjOTaM6hU=.673059d1-200f-4535-93ae-f79a4e5ef06b@github.com> References: <0OY_OdZGlSeQhVmmSyYitp4XjtKSXziY1p0wKy6V68s=.68c8351c-7f6e-44ca-91dd-35fa83cfbb8d@github.com> <3Qaa5jHFD9FxiCcArBu7mhPIMmbQKcq1_YNjOTaM6hU=.673059d1-200f-4535-93ae-f79a4e5ef06b@github.com> Message-ID: On Fri, 30 Jun 2023 10:50:01 GMT, Jatin Bhateja wrote: >> Patch fixes following two issues in iotaShuffle inline expander with unwrapped indices. >> 1) Disable intrinsification if effective index do not lie within byte value range. >> 2) Use GT predicate while computing comparison mask for all the indices above vector length. >> >> No performance degradation seen with existing slice/unslice operations which internally calls wrapped iotaShuffle. >> >> This interim patch addresses incorrectness around iotaShuffle till we introduce modified shuffle implementations >> with JDK-8310691. >> >> Kindly review and share feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolutions. `TestVectorShuffleIota.java` fails with `bad AD file` on macosx-aarch64-debug: ``` o647 VectorMaskCmp === _ o707 o645 [[ o650 ]] 17 #vectord[4]:{byte} --N: o647 VectorMaskCmp === _ o707 o645 [[ o650 ]] 17 #vectord[4]:{byte} --N: o707 Binary === _ o646 o643 [[ o647 ]] _Binary_vReg_vReg 0 _Binary_vReg_vReg --N: o646 ReplicateB === _ o167 [[ o707 o649 ]] #vectord[4]:{byte} VREG 0 VREG VECD 0 VECD --N: o643 LShiftVB === _ o638 o642 [[ o707 o648 ]] #vectord[4]:{byte} VREG 0 VREG VECD 0 VECD --N: o645 ConI === o0 [[ o647 ]] #int:17 IMMI 0 IMMI IMMI_GT_1 0 IMMI_GT_1 IMMI_POSITIVE 0 IMMI_POSITIVE IMMI_CMPU_COND 0 IMMI_CMPU_COND IMMI26 0 IMMI26 IMMI19 0 IMMI19 IMMIU7 0 IMMIU7 IMMIU12 0 IMMIU12 IMMIOFFSET 0 IMMIOFFSET IMMIOFFSET1 0 IMMIOFFSET1 IMMIOFFSET2 0 IMMIOFFSET2 IMMIOFFSET4 0 IMMIOFFSET4 IMMIOFFSET8 0 IMMIOFFSET8 IMMIOFFSET16 0 IMMIOFFSET16 IMMI8 0 IMMI8 IMMI8_SHIFT8 0 IMMI8_SHIFT8 IMMBADDSUBV 0 IMMBADDSUBV IMMIADDSUB 0 IMMIADDSUB IMMIADDSUBV 0 IMMIADDSUBV IMMBLOG 0 IMMBLOG IREGI 100 loadConI IREGINOSP 100 loadConI IREGI_R0 100 loadConI IREGI_R2 100 loadConI IREGI_R3 100 loadConI IREGI_R4 100 loadConI IREGIORL2I 100 IREGI ------------- PR Comment: https://git.openjdk.org/jdk/pull/14700#issuecomment-1617401341 From chagedorn at openjdk.org Mon Jul 3 06:08:54 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 3 Jul 2023 06:08:54 GMT Subject: RFR: 8308103: Massive (up to ~30x) increase in C2 compilation time since JDK 17 In-Reply-To: References: Message-ID: <8BHYhGbLecJ2CGb5QSI1L-FJ8Ju74GBwXk39tD9f3as=.d7ffb74c-0269-420c-b7e2-e713ba3b92ca@github.com> On Fri, 30 Jun 2023 13:23:38 GMT, Roland Westrelin wrote: > A long chain of nodes are sunk out of a loop. Every time a node is > moved out of the loop, a cast is created to pin the node out of the > loop. When its input is next sunk, the cast is removed (the cast is > replaced by its input) and a new cast is created. Some nodes on the > chain have 2 other nodes in the chain as uses. When such a node is > sunk, 2 cast nodes are created, one for each use. So as the compiler > moves forward in the chain, the number of cast to remove grows. From > some profiling, removing those casts is what takes a lot of time. > > The fix I propose is, when a node is processed, to check whether a > cast at the out of loop control was already created for that node and > to reuse it. > > The test case takes 6 minutes when I run it without the fix and 3 > seconds with it. test/hotspot/jtreg/compiler/loopopts/TestSinkingNodesCausesLongCompilation.java line 28: > 26: * @bug 8308103 > 27: * @summary Massive (up to ~30x) increase in C2 compilation time since JDK 17 > 28: * @run main/othervm -Xcomp -XX:CompileOnly=TestSinkingNodesCausesLongCompilation::mainTest -XX:RepeatCompilation=30 TestSinkingNodesCausesLongCompilation You should add `-XX:+UnlockDiagnosticVMOptions` since `RepeatCompilation` is diagnostic. test/hotspot/jtreg/compiler/loopopts/TestSinkingNodesCausesLongCompilation.java line 58: > 56: public static void main(String[] strArr) { > 57: TestSinkingNodesCausesLongCompilation _instance = new TestSinkingNodesCausesLongCompilation(); > 58: for (int i = 0; i < 10; i++ ) { Suggestion: for (int i = 0; i < 10; i++) { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14732#discussion_r1250290774 PR Review Comment: https://git.openjdk.org/jdk/pull/14732#discussion_r1250291448 From xgong at openjdk.org Mon Jul 3 06:34:53 2023 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 3 Jul 2023 06:34:53 GMT Subject: RFR: 8309531: Incorrect result with unwrapped iotaShuffle. [v5] In-Reply-To: References: <0OY_OdZGlSeQhVmmSyYitp4XjtKSXziY1p0wKy6V68s=.68c8351c-7f6e-44ca-91dd-35fa83cfbb8d@github.com> <3Qaa5jHFD9FxiCcArBu7mhPIMmbQKcq1_YNjOTaM6hU=.673059d1-200f-4535-93ae-f79a4e5ef06b@github.com> Message-ID: <57TmaArrOUxndp4pnfHZyI6vJgSjoS5S-y10PJpHH1M=.106869dc-7f63-4e68-9eba-7e5631747041@github.com> On Mon, 3 Jul 2023 05:50:40 GMT, Tobias Hartmann wrote: > `TestVectorShuffleIota.java` fails with `bad AD file` on macosx-aarch64-debug: > > ``` > o647 VectorMaskCmp === _ o707 o645 [[ o650 ]] 17 #vectord[4]:{byte} > > --N: o647 VectorMaskCmp === _ o707 o645 [[ o650 ]] 17 #vectord[4]:{byte} > > --N: o707 Binary === _ o646 o643 [[ o647 ]] > _Binary_vReg_vReg 0 _Binary_vReg_vReg > > --N: o646 ReplicateB === _ o167 [[ o707 o649 ]] #vectord[4]:{byte} > VREG 0 VREG > VECD 0 VECD > > --N: o643 LShiftVB === _ o638 o642 [[ o707 o648 ]] #vectord[4]:{byte} > VREG 0 VREG > VECD 0 VECD > > --N: o645 ConI === o0 [[ o647 ]] #int:17 > IMMI 0 IMMI > IMMI_GT_1 0 IMMI_GT_1 > IMMI_POSITIVE 0 IMMI_POSITIVE > IMMI_CMPU_COND 0 IMMI_CMPU_COND > IMMI26 0 IMMI26 > IMMI19 0 IMMI19 > IMMIU7 0 IMMIU7 > IMMIU12 0 IMMIU12 > IMMIOFFSET 0 IMMIOFFSET > IMMIOFFSET1 0 IMMIOFFSET1 > IMMIOFFSET2 0 IMMIOFFSET2 > IMMIOFFSET4 0 IMMIOFFSET4 > IMMIOFFSET8 0 IMMIOFFSET8 > IMMIOFFSET16 0 IMMIOFFSET16 > IMMI8 0 IMMI8 > IMMI8_SHIFT8 0 IMMI8_SHIFT8 > IMMBADDSUBV 0 IMMBADDSUBV > IMMIADDSUB 0 IMMIADDSUB > IMMIADDSUBV 0 IMMIADDSUBV > IMMBLOG 0 IMMBLOG > IREGI 100 loadConI > IREGINOSP 100 loadConI > IREGI_R0 100 loadConI > IREGI_R2 100 loadConI > IREGI_R3 100 loadConI > IREGI_R4 100 loadConI > IREGIORL2I 100 IREGI > ``` I saw the same failure on Arch64 NEON system on linux. It seems the `do_wrap` should be changed to `!do_wrap` as I commented. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14700#issuecomment-1617444184 From xgong at openjdk.org Mon Jul 3 06:34:56 2023 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 3 Jul 2023 06:34:56 GMT Subject: RFR: 8309531: Incorrect result with unwrapped iotaShuffle. [v5] In-Reply-To: <3Qaa5jHFD9FxiCcArBu7mhPIMmbQKcq1_YNjOTaM6hU=.673059d1-200f-4535-93ae-f79a4e5ef06b@github.com> References: <0OY_OdZGlSeQhVmmSyYitp4XjtKSXziY1p0wKy6V68s=.68c8351c-7f6e-44ca-91dd-35fa83cfbb8d@github.com> <3Qaa5jHFD9FxiCcArBu7mhPIMmbQKcq1_YNjOTaM6hU=.673059d1-200f-4535-93ae-f79a4e5ef06b@github.com> Message-ID: On Fri, 30 Jun 2023 10:50:01 GMT, Jatin Bhateja wrote: >> Patch fixes following two issues in iotaShuffle inline expander with unwrapped indices. >> 1) Disable intrinsification if effective index do not lie within byte value range. >> 2) Use GT predicate while computing comparison mask for all the indices above vector length. >> >> No performance degradation seen with existing slice/unslice operations which internally calls wrapped iotaShuffle. >> >> This interim patch addresses incorrectness around iotaShuffle till we introduce modified shuffle implementations >> with JDK-8310691. >> >> Kindly review and share feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolutions. src/hotspot/share/opto/vectorIntrinsics.cpp line 635: > 633: } > 634: > 635: if (do_wrap && `do_wrap` should be `!do_wrap` instead? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14700#discussion_r1250320754 From epeter at openjdk.org Mon Jul 3 06:53:08 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 3 Jul 2023 06:53:08 GMT Subject: RFR: 8309660: C2: failed: XMM register should be 0-15 (UseKNLSetting and ConvF2HF) In-Reply-To: References: Message-ID: On Wed, 21 Jun 2023 23:41:58 GMT, Sandhya Viswanathan wrote: >> @eme64 Yes that was my mistake, that node requires AVX512VL so `vlRegF` and `regF` are the same. >> >>> Is there a way to stress-test the registers? >> >> Can we randomise the allocated register during register allocation? >> >> Thanks. > >> @merykitty Yes, randomization would be great. I don't know much about the register allocator, so feel free to do something like that if you want and have time ;) >> >> @sviswa7 Is there something you want me to change still? > > No additional changes from my side. Thanks @sviswa7 for the help with the patch! Thanks @jatin-bhateja for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14379#issuecomment-1617468870 From epeter at openjdk.org Mon Jul 3 06:53:10 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 3 Jul 2023 06:53:10 GMT Subject: Integrated: 8309660: C2: failed: XMM register should be 0-15 (UseKNLSetting and ConvF2HF) In-Reply-To: References: Message-ID: On Thu, 8 Jun 2023 14:45:54 GMT, Emanuel Peter wrote: > Context: `Float.floatToFloat16` -> `vcvtps2ph`. > > **Problem** > > vcvtps2ph > pre=Assembler::VEX_SIMD_66 > opc=Assembler::VEX_OPCODE_0F_3A > VEX.128.66.0F3A > requires F16C > > https://www.felixcloutier.com/x86/vcvtps2ph > > So this is a non-AVX512 feature, and we should only use the registers `xmm0-15`. > > There is also a AVX512 version, but it requires `AVX512VL and AVX512F`. > > So on `x64`, we should only use registers `xmm0-15` if we do not have `AVX512VL`, and if we have it, then we can use `xmm0-31`. > > **Suggested Solution** > As @sviswa7 suggested, we should just use the `vlRegF` instead of `regF`, see discussion in comments. > > **Testing** > I simulated the patch on intel's `sde`. So now I'm confident that I don't generate code that uses `AVX512VL` registers (XMM16-31). > > Running: tier1-6 + stress testing. This pull request has now been integrated. Changeset: 2c29705d Author: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/2c29705d7bc9cf3d9884abf81ba6d3eeff881d73 Stats: 36 lines in 2 files changed: 32 ins; 0 del; 4 mod 8309660: C2: failed: XMM register should be 0-15 (UseKNLSetting and ConvF2HF) Co-authored-by: Sandhya Viswanathan Reviewed-by: sviswanathan, jbhateja ------------- PR: https://git.openjdk.org/jdk/pull/14379 From chagedorn at openjdk.org Mon Jul 3 06:53:53 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 3 Jul 2023 06:53:53 GMT Subject: RFR: 8310331: JitTester: Exclude java.lang.Math.random In-Reply-To: <7mcEVlD5zO5ds7WO9kLAlsahYblH-kkIYh5ld0AL_tg=.23e79233-3e96-4bd8-924f-75394313a184@github.com> References: <7mcEVlD5zO5ds7WO9kLAlsahYblH-kkIYh5ld0AL_tg=.23e79233-3e96-4bd8-924f-75394313a184@github.com> Message-ID: On Sun, 2 Jul 2023 09:16:55 GMT, Daohan Qu wrote: > Test cases generated by JitTester might contain calls to `java.lang.Math.random()`. We could not set a seed for this random call. (In its implementation, `java.lang.Math` create `java.util.Random` instance statically (using the constructor `Random()`) and there is no way to set a seed for it.) > > Such tests might show up different variable values/printouts on each execution (Please refer to [the issue description](https://bugs.openjdk.org/browse/JDK-8310331)). > > Since it is meaningless to generate test cases with "unreproducible" results and JitTester has been able to assign random values to the generated variables (this seed could be set). Maybe we could just exclude the use of `java.lang.Math.random()` in JitTester's test case generation. Looks good. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14748#pullrequestreview-1510491679 From epeter at openjdk.org Mon Jul 3 07:19:09 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 3 Jul 2023 07:19:09 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v15] In-Reply-To: References: Message-ID: > For some changes to `SuperWord`, and maybe auto-vectorization in general, I want to strengthen the IR Framework. > > **Motivation** > I want to not just find the relevant IR nodes, but also assert that they have the maximal length that they could have on the respective platform (given the CPU features and `MaxVectorSize`). Without this verification it is possible that a future change leads to a regression where we still vectorize but at shorter vector widths as before - leading to performance loss. > > **How to use it** > > All `IRNode`s in `test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java` that are created with `vectorNode` are now all matched with their `type` and `size`. The regex might now look something like this: > > `"(\d+(\s){2}(VectorCastF2X.*)+(\s){2}===.*vector[A-Za-z][8]:{int})"` > which would match with IR nodes dumped like that: > `1150 VectorCastF2X === _ 1151 [[ 1146 ]] #vectory[8]:{int} ...` > > The goal was to keep it simple and straight forward. In most cases, you can just use the nodes as before, and implicitly we now check for maximal size automatically. However, in some cases we want to ensure there is no or only a limited number of nodes (`failOn` or comparison `<` or `<=` or `=0`) - in those cases we usually want to make sure there is not any node of any size, so we match with any size by default. The size can also explicitly be constrained using `IRNode.VECTOR_SIZE`. > > Some examples: > 1. `@IR(counts = {IRNode.LOAD_VECTOR_I, " >0 "})` -> search for a `LoadVector` node with `type` `int`, and maximal `size` possible on the machine (limited by CPU features and `MaxVectorSize`). This is the most common use case. > 2. `@IR(failOn = { IRNode.LOAD_VECTOR_L, IRNode.STORE_VECTOR })` -> fail if there is a `LoadVector` with type `long`, of `any` size. > 3. `@IR(counts = { IRNode.XOR_VI, IRNode.VECTOR_SIZE_4, " > 0 "})` -> find at least one `XorV` node with type `int` and exactly `4` elements. Useful for VectorAPI when the vector species is fixed. > 4. `@IR(counts = { IRNode.LOAD_VECTOR_D, IRNode.VECTOR_SIZE + "min(4, max_double)", " >0 " })` -> search for a `LoadVector` node with `type` `double`, and `size` exactly equals to `min(4, max_double)` (so 4 elements, or if the hardware allows fewer `doubles`, then that number). > 5. `@IR(counts = { IRNode.ABS_VF, IRNode.VECTOR_SIZE + "min(LoopMaxUnroll, max_float)", ">= 1" })` -> find at least one `AbsV` nodes with type `float`, and the `size` exactly equals to the smaller of `LoopMaxUnroll` or the maximal size allow... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Fix 2 IR framework tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14539/files - new: https://git.openjdk.org/jdk/pull/14539/files/af21a9f0..e7f442e7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14539&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14539&range=13-14 Stats: 16 lines in 2 files changed: 1 ins; 0 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/14539.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14539/head:pull/14539 PR: https://git.openjdk.org/jdk/pull/14539 From pli at openjdk.org Mon Jul 3 07:37:22 2023 From: pli at openjdk.org (Pengfei Li) Date: Mon, 3 Jul 2023 07:37:22 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v2] In-Reply-To: References: Message-ID: > ## TL;DR > > This patch completely re-implements C2's experimental post loop vectorization for better stability, maintainability and performance. Compared with the original implementation, this new implementation adds a standalone loop phase in C2's ideal loop phases and can vectorize more post loops. The original implementation and all code related to multi-versioned post loops are deleted in this patch. More details about this patch can be found in the document replied in this pull request. Pengfei Li has updated the pull request incrementally with one additional commit since the last revision: Address part of comments from Emanuel ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14581/files - new: https://git.openjdk.org/jdk/pull/14581/files/11fe4cd6..a58e04e9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14581&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14581&range=00-01 Stats: 172 lines in 8 files changed: 63 ins; 20 del; 89 mod Patch: https://git.openjdk.org/jdk/pull/14581.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14581/head:pull/14581 PR: https://git.openjdk.org/jdk/pull/14581 From epeter at openjdk.org Mon Jul 3 07:40:57 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 3 Jul 2023 07:40:57 GMT Subject: RFR: 8309894: compiler/vectorapi/VectorLogicalOpIdentityTest.java fails on SVE system with UseSVE=0 In-Reply-To: <3PFvaToBrRaOrXxhyZpR3G7fKQ0OzeYJeUzqJiCxvw0=.88fd1545-f1e2-4f3d-9e1a-cfb66ce3de27@github.com> References: <2kGmMRiqW2Myc4Iumj9PBSskUycPz7GzFMJ0Xe5Qg-4=.24b7f369-d291-4842-afa6-d2c7cd75cdd8@github.com> <3PFvaToBrRaOrXxhyZpR3G7fKQ0OzeYJeUzqJiCxvw0=.88fd1545-f1e2-4f3d-9e1a-cfb66ce3de27@github.com> Message-ID: On Fri, 30 Jun 2023 08:01:29 GMT, Xiaohong Gong wrote: >> Hi there, I'v filed the vm option and cpu feature sync issue here for AArch64: https://bugs.openjdk.org/browse/JDK-8311130, and will address the comment with it. Thanks again for the advice! >> >> Hi @eme64 , besides the sync issue, does the change to IR framework make sense to you? Currently, if we use an architecture specific vm options with `applyIf` for an IR check, and run the test on another different architecture, the whole test will fail by throwing exceptions, even if we add the `applyIfCPUFeature` to do the cpu check. The changes in the IR framework can fix this issue. >> >> If that part seems fine to you, maybe we can let this PR in first? Since the test failure will noise our internal ci testing. WDYT? Thanks! > >> @XiaohongGong I totally agree with the changes to the IR framework (having `applyIfCPUFeature` before `applyIf`). > > Thanks a lot! > >> Otherwise, using both `UseSVE=0` and `sve, false` is a temporary fix that should be reverted after [JDK-8311130](https://bugs.openjdk.org/browse/JDK-8311130). I'm accepting it as a temporary fix only. Who will do the real fix? > > We (Arm) will do the real fix. `UseSVE=0` is needed when `sve, true`, which only affects this test now. And yes, I can revert these IR checks once the real fix is in. > >> I was a bit afraid not keeping the CPU feature and the VM flag in sync could also lead to issues in the backend of aarch64. But it does indeed seem that we only use `UseSVE`, and never `VM_Version::supports_sve()`. Still, someone might use them synonymous in the future and expect that they are in sync. > > Agree, although we only use `UseSVE` in backend now. > >> Actually, since there are only so few uses of `VM_Version::supports_sve()`, is the risk not very low to just mask off the feature now directly with this fix? That fix does not look so complicated as I feared. What do you think? > > I prefer fixing that in a separate patch. One reason is syncing the vm options and cpu features is a refactory to AArch64 backend for me. It has other relative cpu features specific to different SVE systems besides `sve`. For example, the `svebitperm` which exists after sve2. We have to take a consideration for them as well. Besides, although the changes is not so big, we have to do more testing to make sure no regressions are involved. > > And besides the `UseSVE`, do you think it's necessary to sync other options as well? > >> Anyway, I just launched testing for commit 1: tier1-6 plus stress testing. Will report back on Monday probably. > > Thanks for doing this! @XiaohongGong Testing for commit 1 looks good. I don't have the expertise on the other CPU features. But from a distance, I'd say yes. The idea with all the CPU features is that we can simulate less powerful machines with more powerful machines. But I would first fix the SVE features, and handle the others separately. Maybe for those a separate discussion needs to happen first. @vnkozlov What do you think about syncing CPU features with their VM Flags? Only relevant for `AVX` and `SVE`, or more generally? What about `UseSHA`, `UseAES` for example? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14533#issuecomment-1617535481 From epeter at openjdk.org Mon Jul 3 07:46:03 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 3 Jul 2023 07:46:03 GMT Subject: RFR: 8309894: compiler/vectorapi/VectorLogicalOpIdentityTest.java fails on SVE system with UseSVE=0 In-Reply-To: <2kGmMRiqW2Myc4Iumj9PBSskUycPz7GzFMJ0Xe5Qg-4=.24b7f369-d291-4842-afa6-d2c7cd75cdd8@github.com> References: <2kGmMRiqW2Myc4Iumj9PBSskUycPz7GzFMJ0Xe5Qg-4=.24b7f369-d291-4842-afa6-d2c7cd75cdd8@github.com> Message-ID: <0Ptn4bictx-0XbgcbNPZERlYzrTF8kvdvXwNSMV9EG4=.a433fbcf-b761-4cd6-9320-2a582e8ec22e@github.com> On Mon, 19 Jun 2023 01:49:57 GMT, Xiaohong Gong wrote: > This test fails with several IR check failures when run on ARM SVE systems with vm option `-XX:UseSVE=0`. Here is one of the failed IR rule: > > > @IR(counts = {IRNode.AND_V, "1"}, applyIfCPUFeatureOr = {"sve", "true", "avx512", "true"}) > public static void testAndMaskSameValue1() > > The specified IR in the test depends on the platform's predicate feature. Hence the IR check can be applied only on ARM SVE or X86 AVX512 systems. But with `-XX:UseSVE=0` on ARM SVE machines, JVM will disable SVE feature for compiler. But the CPU feature is not changed. To guarantee the IR rule is run with SVE as expected, it has to add another condition like `applyIf = {"UseSVE", ">0"}`. Consider `UseSVE` is an ARM specific option, it can be used only on ARM CPUs. So the right IR rules should be: > > > @IR(counts = {IRNode.AND_V, "1"}, applyIfCPUFeature = {"sve", "true"}, applyIf = {"UseSVE", "> 0"}) > @IR(counts = {IRNode.AND_V, "1"}, applyIfCPUFeature = {"avx512", "true"}) > public static void testAndMaskSameValue1() > > > This patch also changes the check order of conditions for a IR rule. It's better to check `applyIfCPUFeature` before `applyIf`, in case the vm option is invalid on running hardware, which makes test throw exception and abort. > > Verified on X86 systems with `UseAVX=1/2/3` by removing the test from ProblemList.txt, and SVE systems with `UseSVE=0/1`. Accepted as a temporary fix that has to be reverted with [JDK-8311130](https://bugs.openjdk.org/browse/JDK-8311130). ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14533#pullrequestreview-1510571733 From pli at openjdk.org Mon Jul 3 07:46:15 2023 From: pli at openjdk.org (Pengfei Li) Date: Mon, 3 Jul 2023 07:46:15 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v2] In-Reply-To: References: Message-ID: On Fri, 23 Jun 2023 09:48:27 GMT, Emanuel Peter wrote: >> Pengfei Li has updated the pull request incrementally with one additional commit since the last revision: >> >> Address part of comments from Emanuel > > src/hotspot/share/opto/loopnode.cpp line 2280: > >> 2278: if (!stride_is_con()) { >> 2279: // Stride could be non-constant if a loop is vector masked >> 2280: return 0; > > Could this break the assumption anywhere else that `stride_con != 0`? > I fear that it may just silently succeed everywhere, or do checks like: > > if (stride_con() > 0) { > // assume positive > } else { > // assume negative (now wrong!) > } > > Might it be better to have an assert here, and do the `stride_is_con` checks at the call sites of `stride_con`? I have reverted this change and turned to update `CountedLoopNode::stride_con()` (and add asserts there) to mitigate this potential issue. That one is a call site of this function and "int" counted loop transformation directly calls there. Before my patch, that function may also return 0. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1250410252 From pli at openjdk.org Mon Jul 3 07:46:17 2023 From: pli at openjdk.org (Pengfei Li) Date: Mon, 3 Jul 2023 07:46:17 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v2] In-Reply-To: References: Message-ID: On Fri, 23 Jun 2023 10:36:49 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/loopnode.cpp line 4688: >> >>> 4686: for (LoopTreeIterator iter(_ltree_root); !iter.done(); iter.next()) { >>> 4687: IdealLoopTree* lpt = iter.current(); >>> 4688: if (lpt->is_counted() && lpt->is_innermost()) { >> >> Is this applied to all innermost counted loops? Or only post-loops? > > Ah, you do the check inside. Why not lift it out and assert inside? I have lift the checks out in commit 2 and added assertions inside. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1250411661 From jbhateja at openjdk.org Mon Jul 3 07:51:20 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 3 Jul 2023 07:51:20 GMT Subject: RFR: 8309531: Incorrect result with unwrapped iotaShuffle. [v6] In-Reply-To: <0OY_OdZGlSeQhVmmSyYitp4XjtKSXziY1p0wKy6V68s=.68c8351c-7f6e-44ca-91dd-35fa83cfbb8d@github.com> References: <0OY_OdZGlSeQhVmmSyYitp4XjtKSXziY1p0wKy6V68s=.68c8351c-7f6e-44ca-91dd-35fa83cfbb8d@github.com> Message-ID: <79P4C0c_nBrk5vF8IQkWhz3uALJPnLs-XE8BKnEC6Ho=.43391ac3-78ee-4a57-8042-6bf854a5ffb1@github.com> > Patch fixes following two issues in iotaShuffle inline expander with unwrapped indices. > 1) Disable intrinsification if effective index do not lie within byte value range. > 2) Use GT predicate while computing comparison mask for all the indices above vector length. > > No performance degradation seen with existing slice/unslice operations which internally calls wrapped iotaShuffle. > > This interim patch addresses incorrectness around iotaShuffle till we introduce modified shuffle implementations > with JDK-8310691. > > Kindly review and share feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolution. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14700/files - new: https://git.openjdk.org/jdk/pull/14700/files/1276f73d..1a48af2a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14700&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14700&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14700.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14700/head:pull/14700 PR: https://git.openjdk.org/jdk/pull/14700 From jbhateja at openjdk.org Mon Jul 3 07:51:21 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 3 Jul 2023 07:51:21 GMT Subject: RFR: 8309531: Incorrect result with unwrapped iotaShuffle. [v5] In-Reply-To: <57TmaArrOUxndp4pnfHZyI6vJgSjoS5S-y10PJpHH1M=.106869dc-7f63-4e68-9eba-7e5631747041@github.com> References: <0OY_OdZGlSeQhVmmSyYitp4XjtKSXziY1p0wKy6V68s=.68c8351c-7f6e-44ca-91dd-35fa83cfbb8d@github.com> <3Qaa5jHFD9FxiCcArBu7mhPIMmbQKcq1_YNjOTaM6hU=.673059d1-200f-4535-93ae-f79a4e5ef06b@github.com> <57TmaArrOUxndp4pnfHZyI6vJgSjoS5S-y10PJpHH1M=.106869dc-7f63-4e68-9eba-7e5631747041@github.com> Message-ID: <1-kMANvzciSMiqEZupGWItwPCk-S1AyAZiiDpUk5j_I=.f8661444-3b03-47b4-bc12-925bbb607785@github.com> On Mon, 3 Jul 2023 06:32:05 GMT, Xiaohong Gong wrote: > `TestVectorShuffleIota.java` fails with `bad AD file` on macosx-aarch64-debug: > > ``` > o647 VectorMaskCmp === _ o707 o645 [[ o650 ]] 17 #vectord[4]:{byte} > > --N: o647 VectorMaskCmp === _ o707 o645 [[ o650 ]] 17 #vectord[4]:{byte} > > --N: o707 Binary === _ o646 o643 [[ o647 ]] > _Binary_vReg_vReg 0 _Binary_vReg_vReg > > --N: o646 ReplicateB === _ o167 [[ o707 o649 ]] #vectord[4]:{byte} > VREG 0 VREG > VECD 0 VECD > > --N: o643 LShiftVB === _ o638 o642 [[ o707 o648 ]] #vectord[4]:{byte} > VREG 0 VREG > VECD 0 VECD > > --N: o645 ConI === o0 [[ o647 ]] #int:17 > IMMI 0 IMMI > IMMI_GT_1 0 IMMI_GT_1 > IMMI_POSITIVE 0 IMMI_POSITIVE > IMMI_CMPU_COND 0 IMMI_CMPU_COND > IMMI26 0 IMMI26 > IMMI19 0 IMMI19 > IMMIU7 0 IMMIU7 > IMMIU12 0 IMMIU12 > IMMIOFFSET 0 IMMIOFFSET > IMMIOFFSET1 0 IMMIOFFSET1 > IMMIOFFSET2 0 IMMIOFFSET2 > IMMIOFFSET4 0 IMMIOFFSET4 > IMMIOFFSET8 0 IMMIOFFSET8 > IMMIOFFSET16 0 IMMIOFFSET16 > IMMI8 0 IMMI8 > IMMI8_SHIFT8 0 IMMI8_SHIFT8 > IMMBADDSUBV 0 IMMBADDSUBV > IMMIADDSUB 0 IMMIADDSUB > IMMIADDSUBV 0 IMMIADDSUBV > IMMBLOG 0 IMMBLOG > IREGI 100 loadConI > IREGINOSP 100 loadConI > IREGI_R0 100 loadConI > IREGI_R2 100 loadConI > IREGI_R3 100 loadConI > IREGI_R4 100 loadConI > IREGIORL2I 100 IREGI > ``` Hi @TobiHartmann, Fixed, can you kindly run this again through your test infra before checkin. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14700#issuecomment-1617553552 From xgong at openjdk.org Mon Jul 3 07:52:01 2023 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 3 Jul 2023 07:52:01 GMT Subject: RFR: 8309894: compiler/vectorapi/VectorLogicalOpIdentityTest.java fails on SVE system with UseSVE=0 In-Reply-To: <2kGmMRiqW2Myc4Iumj9PBSskUycPz7GzFMJ0Xe5Qg-4=.24b7f369-d291-4842-afa6-d2c7cd75cdd8@github.com> References: <2kGmMRiqW2Myc4Iumj9PBSskUycPz7GzFMJ0Xe5Qg-4=.24b7f369-d291-4842-afa6-d2c7cd75cdd8@github.com> Message-ID: On Mon, 19 Jun 2023 01:49:57 GMT, Xiaohong Gong wrote: > This test fails with several IR check failures when run on ARM SVE systems with vm option `-XX:UseSVE=0`. Here is one of the failed IR rule: > > > @IR(counts = {IRNode.AND_V, "1"}, applyIfCPUFeatureOr = {"sve", "true", "avx512", "true"}) > public static void testAndMaskSameValue1() > > The specified IR in the test depends on the platform's predicate feature. Hence the IR check can be applied only on ARM SVE or X86 AVX512 systems. But with `-XX:UseSVE=0` on ARM SVE machines, JVM will disable SVE feature for compiler. But the CPU feature is not changed. To guarantee the IR rule is run with SVE as expected, it has to add another condition like `applyIf = {"UseSVE", ">0"}`. Consider `UseSVE` is an ARM specific option, it can be used only on ARM CPUs. So the right IR rules should be: > > > @IR(counts = {IRNode.AND_V, "1"}, applyIfCPUFeature = {"sve", "true"}, applyIf = {"UseSVE", "> 0"}) > @IR(counts = {IRNode.AND_V, "1"}, applyIfCPUFeature = {"avx512", "true"}) > public static void testAndMaskSameValue1() > > > This patch also changes the check order of conditions for a IR rule. It's better to check `applyIfCPUFeature` before `applyIf`, in case the vm option is invalid on running hardware, which makes test throw exception and abort. > > Verified on X86 systems with `UseAVX=1/2/3` by removing the test from ProblemList.txt, and SVE systems with `UseSVE=0/1`. > Accepted as a temporary fix that has to be reverted with [JDK-8311130](https://bugs.openjdk.org/browse/JDK-83T11130). Thanks for the review and testing! I will revert the IR test part with JDK-8311130. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14533#issuecomment-1617558114 From pli at openjdk.org Mon Jul 3 07:59:10 2023 From: pli at openjdk.org (Pengfei Li) Date: Mon, 3 Jul 2023 07:59:10 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v2] In-Reply-To: References: Message-ID: On Fri, 23 Jun 2023 10:43:34 GMT, Emanuel Peter wrote: >> Pengfei Li has updated the pull request incrementally with one additional commit since the last revision: >> >> Address part of comments from Emanuel > > src/hotspot/share/opto/loopnode.hpp line 143: > >> 141: if (is_vector_masked()) { >> 142: return false; >> 143: } > > Does this mean that the post-loop has a `CountedLoop` node, but it does not adhere to the counted-loop assumptions, such as having a `incr`, `limit`, `phi` etc? With the old post-loop-vectorization, the LoopNode would always fold away, so it would disappear after IGVN. But now it would stick around, right? Could that turn out to be a problem? After being vectorized, the post loop still has `phi`, `incr` and `limit` as before. In other words, the post loop is still a loop now. I think the only difference is that the loop stride value is not a constant any more (as we introduces the `VectorMaskTrueCountNode` for the new stride). The old implementation of post loop vectorization makes the vector-masked post loop run only once so it can optimize the `LoopNode` away. But we cannot do this now without doing multi-versioning. (Without the scalar post loop, loop may run insufficient iterations when the "atomic" post loop is not entered.) > src/hotspot/share/opto/loopnode.hpp line 775: > >> 773: >> 774: void collect_loop_core_nodes(PhaseIdealLoop* phase, Unique_Node_List& wq) const; >> 775: > > nit: why move it? This function was private before. I need to make it public so I can use it in `vmaskloop.cpp`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1250428341 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1250430133 From epeter at openjdk.org Mon Jul 3 07:59:12 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 3 Jul 2023 07:59:12 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v2] In-Reply-To: References: Message-ID: On Fri, 23 Jun 2023 11:24:59 GMT, Emanuel Peter wrote: >> Pengfei Li has updated the pull request incrementally with one additional commit since the last revision: >> >> Address part of comments from Emanuel > > src/hotspot/share/opto/vectornode.hpp line 1826: > >> 1824: class LoopVectorMaskNode : public TypeNode { >> 1825: private: >> 1826: int _max_trips; > > Add comment: what is this for exactly? Maybe consider adding more elaborate specification/description above the 3 node classes. > > General code style: I think we are trying to get away from the `//--------------NodeName/FunctionName-------` tags, so no need to add them anymore. That is already much better. Could you please also explain what the inputs mean and do? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1250426626 From pli at openjdk.org Mon Jul 3 08:17:10 2023 From: pli at openjdk.org (Pengfei Li) Date: Mon, 3 Jul 2023 08:17:10 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v2] In-Reply-To: <6IkvVTm9e60qXwaID0EihRXlUielrryBWoTmYAp3PuU=.c624b13d-bc6d-4c79-86a6-72bda016b50f@github.com> References: <6IkvVTm9e60qXwaID0EihRXlUielrryBWoTmYAp3PuU=.c624b13d-bc6d-4c79-86a6-72bda016b50f@github.com> Message-ID: On Fri, 23 Jun 2023 10:49:50 GMT, Emanuel Peter wrote: >> Pengfei Li has updated the pull request incrementally with one additional commit since the last revision: >> >> Address part of comments from Emanuel > > src/hotspot/share/opto/superword.cpp line 179: > >> 177: assert(_packset.length() == 0, "packset must be empty"); >> 178: success = SLP_extract(); >> 179: if (PostLoopMultiversioning) { > > Could we now have an assert for `cl->is_main_loop()` at the beginning of `SuperWord::transform_loop`, and remove all checks for it in SuperWord? Unfortunately, I just tried updating this but found assertion failures. I see `SuperWord::transform_loop()` is also called in `IdealLoopTree::policy_unroll_slp_analysis()` which can pass a normal loop (the loop before iteration-split). I assume only main loops require unrolling analysis and don't understand why it could be a normal loop. Maybe that's bad code and we need refactor C2's unrolling analysis first. > src/hotspot/share/opto/superword.cpp line 632: > >> 630: cl->set_slp_pack_count(_packset.length()); >> 631: } >> 632: } else { > > Again: Could we now have an assert for `cl->is_main_loop()` at the beginning of `SuperWord::SLP_extract`, and remove all checks for it in SuperWord? Ok to do it here as `do_optimization` is false in the unrolling analysis phase. I've updated the code in commit 2. > src/hotspot/share/opto/superword.hpp line 251: > >> 249: int count_size(int size) { >> 250: return _stats[exact_log2(size)]; >> 251: } > > Add assert from `record_size`? Done, thanks! > src/hotspot/share/opto/superword.hpp line 666: > >> 664: IdealLoopTree* lpt() const { return _lpt; } >> 665: PhiNode* iv() const { >> 666: return _slp ? _slp->iv() : _lpt->_head->as_CountedLoop()->phi()->as_Phi(); > > I'd suggest either cache it directly from `_lpt->_head->as_CountedLoop()->phi()->as_Phi()`, or just query it directly. Reduce dependence on `_slp`. Good catch! What do you think of getting rid of `_slp` completely in `SWPointer` refactoring? > src/hotspot/share/opto/superword.hpp line 669: > >> 667: } >> 668: >> 669: void init(); > > This is just a helper function for the constructors, right? Maybe move it closer to them? Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1250443717 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1250447018 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1250450494 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1250452509 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1250452658 From pli at openjdk.org Mon Jul 3 08:17:11 2023 From: pli at openjdk.org (Pengfei Li) Date: Mon, 3 Jul 2023 08:17:11 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v2] In-Reply-To: References: Message-ID: <7iru0xDm4lckuwyHvqGSld0_kWUVYSTg5BT3-rqP3Vw=.cab94902-98aa-40ed-ae13-d238380b6267@github.com> On Fri, 23 Jun 2023 11:02:39 GMT, Emanuel Peter wrote: >> If you are going to do that, I'd suggest doing this refactoring in a separate RFE. It would help in general with any future extension to auto-vectorization. > > Can we untangle it completely from SuperWord? it seems you have made it optional, so yes. And maybe we can also make the trace flags like `_slp->is_trace_alignment()` independent? It would be nice to also be able to trace this for non SuperWord-contexts like post-loop masked vectoriaztion, right? I will try to do this in another JBS and come back here later. >> After all, should the `VectorizeDebug` flag not apply everywhere? See `phase->C->directive()->VectorizeDebugOption`. > > I'd also move this to some static functions in a potential "autovectorization.hpp", and move `_vector_loop_debug` there, together with all its `is_trace...` accessors. I agree current code here is a bit ugly. I will try to make it better in `SWPointer` refactoring. >> Oh dear, I just saw the same pattern in: >> >> bool TypeNode::cmp(const Node& n) const { >> return !Type::cmp(_type, ((TypeNode&)n)._type); >> } >> >> We should try to avoid doing that. > > Even if all callers currently ensure that `n` has the correct type, I'd say it is still not a great idea to cast without checking, at least in debug. I searched all C2 code and saw a lot of such patterns. Perhaps doing this in another RFE? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1250448780 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1250450253 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1250456476 From pli at openjdk.org Mon Jul 3 08:17:13 2023 From: pli at openjdk.org (Pengfei Li) Date: Mon, 3 Jul 2023 08:17:13 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v2] In-Reply-To: References: Message-ID: On Mon, 3 Jul 2023 07:53:34 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/vectornode.hpp line 1826: >> >>> 1824: class LoopVectorMaskNode : public TypeNode { >>> 1825: private: >>> 1826: int _max_trips; >> >> Add comment: what is this for exactly? Maybe consider adding more elaborate specification/description above the 3 node classes. >> >> General code style: I think we are trying to get away from the `//--------------NodeName/FunctionName-------` tags, so no need to add them anymore. > > That is already much better. Could you please also explain what the inputs mean and do? Ok, will do that later ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1250453061 From pli at openjdk.org Mon Jul 3 08:21:08 2023 From: pli at openjdk.org (Pengfei Li) Date: Mon, 3 Jul 2023 08:21:08 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v2] In-Reply-To: <9J-XGP_2qSJT-EefUtvLMt1HzWHWgtvN3RmanPRDt0I=.71bc5695-7eab-4d29-8ff4-b20f28721247@github.com> References: <9J-XGP_2qSJT-EefUtvLMt1HzWHWgtvN3RmanPRDt0I=.71bc5695-7eab-4d29-8ff4-b20f28721247@github.com> Message-ID: On Fri, 23 Jun 2023 11:58:16 GMT, Emanuel Peter wrote: >> Pengfei Li has updated the pull request incrementally with one additional commit since the last revision: >> >> Address part of comments from Emanuel > > src/hotspot/share/opto/vmaskloop.hpp line 85: > >> 83: >> 84: // Some node check utilities >> 85: bool is_loop_iv(Node* n) { return n == _iv; } > > General code style comment, applies everywhere: add more `const` everywhere. To arguments, and the functions themselves, wherever possible. Thanks for pointing out. I added some in commit 2. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1250463493 From dqu at openjdk.org Mon Jul 3 08:30:58 2023 From: dqu at openjdk.org (Daohan Qu) Date: Mon, 3 Jul 2023 08:30:58 GMT Subject: RFR: 8310331: JitTester: Exclude java.lang.Math.random In-Reply-To: References: <7mcEVlD5zO5ds7WO9kLAlsahYblH-kkIYh5ld0AL_tg=.23e79233-3e96-4bd8-924f-75394313a184@github.com> Message-ID: On Mon, 3 Jul 2023 05:19:43 GMT, Tobias Hartmann wrote: >> Test cases generated by JitTester might contain calls to `java.lang.Math.random()`. We could not set a seed for this random call. (In its implementation, `java.lang.Math` create `java.util.Random` instance statically (using the constructor `Random()`) and there is no way to set a seed for it.) >> >> Such tests might show up different variable values/printouts on each execution (Please refer to [the issue description](https://bugs.openjdk.org/browse/JDK-8310331)). >> >> Since it is meaningless to generate test cases with "unreproducible" results and JitTester has been able to assign random values to the generated variables (this seed could be set). Maybe we could just exclude the use of `java.lang.Math.random()` in JitTester's test case generation. > > Looks good to me. Thanks for your reviews, @TobiHartmann and @chhagedorn! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14748#issuecomment-1617616776 From pli at openjdk.org Mon Jul 3 08:34:10 2023 From: pli at openjdk.org (Pengfei Li) Date: Mon, 3 Jul 2023 08:34:10 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v2] In-Reply-To: References: Message-ID: On Fri, 23 Jun 2023 12:22:14 GMT, Emanuel Peter wrote: >> Pengfei Li has updated the pull request incrementally with one additional commit since the last revision: >> >> Address part of comments from Emanuel > > src/hotspot/share/opto/vmaskloop.cpp line 63: > >> 61: if (cl->is_vector_masked()) return; >> 62: // Skip non-post loop >> 63: if (!cl->is_post_loop()) return; > > Check before entering, and assert here. Done > src/hotspot/share/opto/vmaskloop.hpp line 95: > >> 93: } >> 94: return false; >> 95: } > > Do you not want to do this sort of implementation in `SWPointer` instead? There are already methods like `scaled_iv_plus_offset`, so it would fit in next to that, right? It doesn't fit well as functions in `SWPointer` can only be used for checking the pattern in indices. But this function may be used for checking the loop increment pattern which is not in array indices, perhaps `a[i] = b[i] * (i + 1)`. We don't have `SWPointer` constructed for this. I have rename the function to make the purpose clear. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1250480876 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1250480274 From pli at openjdk.org Mon Jul 3 08:34:13 2023 From: pli at openjdk.org (Pengfei Li) Date: Mon, 3 Jul 2023 08:34:13 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v2] In-Reply-To: References: Message-ID: <-Rjm33TqNYRuAQYLf6FL4rnNZOgLQvuDukt5Te-oXNM=.14d0efe5-c5f7-4a2e-8c51-bb18a6f19937@github.com> On Fri, 23 Jun 2023 12:08:34 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/vmaskloop.hpp line 97: >> >>> 95: } >>> 96: >>> 97: bool is_memory_phi(Node* n) { >> >> Looks like a helper method that could live in `node.hpp` or `cfgnode.hpp`. > > SuperWord also makes similar checks, you could refactor those too. Good suggestion. Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1250480720 From pli at openjdk.org Mon Jul 3 08:44:09 2023 From: pli at openjdk.org (Pengfei Li) Date: Mon, 3 Jul 2023 08:44:09 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization In-Reply-To: References: Message-ID: On Fri, 23 Jun 2023 14:06:33 GMT, Emanuel Peter wrote: > Tests are building... > > I already am getting this, from our build system: `Toolchain: clang (clang/LLVM from Xcode 12.4)`, for the `macosx-aarch64-...` builds. > > ``` > .../src/hotspot/share/opto/vmaskloop.cpp:970:20: error: format string is not a string literal [-Werror,-Wformat-nonliteral] > tty->vprint_cr(format, ap); > ``` > > That means we won't get any test coverage on those platforms from this test run. Build issues are fixed in commit 2 by removing the `va_list` which is not actually used in current code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14581#issuecomment-1617635167 From pli at openjdk.org Mon Jul 3 08:44:16 2023 From: pli at openjdk.org (Pengfei Li) Date: Mon, 3 Jul 2023 08:44:16 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v2] In-Reply-To: References: Message-ID: <63xntDgcTJN-51cfPjP1XsWdNLkeURQuWmE8hluHbIM=.84e6a8b9-66b1-4427-ab2c-355c0c621871@github.com> On Fri, 23 Jun 2023 12:24:57 GMT, Emanuel Peter wrote: >> Pengfei Li has updated the pull request incrementally with one additional commit since the last revision: >> >> Address part of comments from Emanuel > > src/hotspot/share/opto/vmaskloop.cpp line 71: > >> 69: if (cl->loopexit()->in(0) != cl) return; >> 70: // Skip if some loop operations are pinned to the backedge >> 71: if (cl->back_control()->outcnt() != 1) return; > > It would be interesting to have some trace flag that tells us why we bailed out here and did not do the post-loop vectorization. Unless of course it becomes too noisy. Great suggestion! Done. > src/hotspot/share/opto/vmaskloop.cpp line 104: > >> 102: _core_set.clear(); >> 103: _body_set.clear(); >> 104: _body_nodes.clear(); > > Would it make sense to somehow reserve the space, so that we do not allocate multiple times when growing these data structures later? Could you elaborate how to do such reservation in C2? Just allocation with some larger sizes at the beginning? Or any other examples to refer? > src/hotspot/share/opto/vmaskloop.cpp line 172: > >> 170: if (idx != -1) { >> 171: trace_msg(nullptr, "Loop has unreachable node while traversing from head"); >> 172: return false; > > Can this ever happen? Or could you add an assert here? Yes, it happened before. I will try to find a case. > src/hotspot/share/opto/vmaskloop.cpp line 214: > >> 212: } >> 213: } else if (in->is_Phi()) { >> 214: // 2) We don't support phi nodes except the iv phi of the loop > > Add: and memory phi's cannot be reached. Done > src/hotspot/share/opto/vmaskloop.cpp line 223: > >> 221: return true; >> 222: } else { >> 223: trace_msg(in, "Found unsupported memory load input"); > > This is a bit generic. Would be nice to have more specific info why it is "unsupported". See my example that hit it. Good suggestion! I have added more `trace_msg()` calls in `VectorMaskedLoop::mem_access_to_swpointer`. > src/hotspot/share/opto/vmaskloop.cpp line 269: > >> 267: Node_List* worklist = new Node_List(_arena); >> 268: if (!collect_statements_helper(store, MemNode::ValueIn, stmt, worklist)) { >> 269: return false; > > Why does the `store` need special handling here? Can you not just throw it on the `worklist`? Would be nice to have the code be shorter ;) Good catch! Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1250481354 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1250488640 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1250483771 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1250492279 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1250494389 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1250492173 From epeter at openjdk.org Mon Jul 3 08:49:11 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 3 Jul 2023 08:49:11 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v16] In-Reply-To: References: Message-ID: > For some changes to `SuperWord`, and maybe auto-vectorization in general, I want to strengthen the IR Framework. > > **Motivation** > I want to not just find the relevant IR nodes, but also assert that they have the maximal length that they could have on the respective platform (given the CPU features and `MaxVectorSize`). Without this verification it is possible that a future change leads to a regression where we still vectorize but at shorter vector widths as before - leading to performance loss. > > **How to use it** > > All `IRNode`s in `test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java` that are created with `vectorNode` are now all matched with their `type` and `size`. The regex might now look something like this: > > `"(\d+(\s){2}(VectorCastF2X.*)+(\s){2}===.*vector[A-Za-z][8]:{int})"` > which would match with IR nodes dumped like that: > `1150 VectorCastF2X === _ 1151 [[ 1146 ]] #vectory[8]:{int} ...` > > The goal was to keep it simple and straight forward. In most cases, you can just use the nodes as before, and implicitly we now check for maximal size automatically. However, in some cases we want to ensure there is no or only a limited number of nodes (`failOn` or comparison `<` or `<=` or `=0`) - in those cases we usually want to make sure there is not any node of any size, so we match with any size by default. The size can also explicitly be constrained using `IRNode.VECTOR_SIZE`. > > Some examples: > 1. `@IR(counts = {IRNode.LOAD_VECTOR_I, " >0 "})` -> search for a `LoadVector` node with `type` `int`, and maximal `size` possible on the machine (limited by CPU features and `MaxVectorSize`). This is the most common use case. > 2. `@IR(failOn = { IRNode.LOAD_VECTOR_L, IRNode.STORE_VECTOR })` -> fail if there is a `LoadVector` with type `long`, of `any` size. > 3. `@IR(counts = { IRNode.XOR_VI, IRNode.VECTOR_SIZE_4, " > 0 "})` -> find at least one `XorV` node with type `int` and exactly `4` elements. Useful for VectorAPI when the vector species is fixed. > 4. `@IR(counts = { IRNode.LOAD_VECTOR_D, IRNode.VECTOR_SIZE + "min(4, max_double)", " >0 " })` -> search for a `LoadVector` node with `type` `double`, and `size` exactly equals to `min(4, max_double)` (so 4 elements, or if the hardware allows fewer `doubles`, then that number). > 5. `@IR(counts = { IRNode.ABS_VF, IRNode.VECTOR_SIZE + "min(LoopMaxUnroll, max_float)", ">= 1" })` -> find at least one `AbsV` nodes with type `float`, and the `size` exactly equals to the smaller of `LoopMaxUnroll` or the maximal size allow... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: fix TestUnorderedReductionPartialVectorization.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14539/files - new: https://git.openjdk.org/jdk/pull/14539/files/e7f442e7..c2637731 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14539&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14539&range=14-15 Stats: 4 lines in 1 file changed: 1 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/14539.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14539/head:pull/14539 PR: https://git.openjdk.org/jdk/pull/14539 From chagedorn at openjdk.org Mon Jul 3 08:50:00 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 3 Jul 2023 08:50:00 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v16] In-Reply-To: References: Message-ID: On Mon, 3 Jul 2023 08:49:11 GMT, Emanuel Peter wrote: >> For some changes to `SuperWord`, and maybe auto-vectorization in general, I want to strengthen the IR Framework. >> >> **Motivation** >> I want to not just find the relevant IR nodes, but also assert that they have the maximal length that they could have on the respective platform (given the CPU features and `MaxVectorSize`). Without this verification it is possible that a future change leads to a regression where we still vectorize but at shorter vector widths as before - leading to performance loss. >> >> **How to use it** >> >> All `IRNode`s in `test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java` that are created with `vectorNode` are now all matched with their `type` and `size`. The regex might now look something like this: >> >> `"(\d+(\s){2}(VectorCastF2X.*)+(\s){2}===.*vector[A-Za-z][8]:{int})"` >> which would match with IR nodes dumped like that: >> `1150 VectorCastF2X === _ 1151 [[ 1146 ]] #vectory[8]:{int} ...` >> >> The goal was to keep it simple and straight forward. In most cases, you can just use the nodes as before, and implicitly we now check for maximal size automatically. However, in some cases we want to ensure there is no or only a limited number of nodes (`failOn` or comparison `<` or `<=` or `=0`) - in those cases we usually want to make sure there is not any node of any size, so we match with any size by default. The size can also explicitly be constrained using `IRNode.VECTOR_SIZE`. >> >> Some examples: >> 1. `@IR(counts = {IRNode.LOAD_VECTOR_I, " >0 "})` -> search for a `LoadVector` node with `type` `int`, and maximal `size` possible on the machine (limited by CPU features and `MaxVectorSize`). This is the most common use case. >> 2. `@IR(failOn = { IRNode.LOAD_VECTOR_L, IRNode.STORE_VECTOR })` -> fail if there is a `LoadVector` with type `long`, of `any` size. >> 3. `@IR(counts = { IRNode.XOR_VI, IRNode.VECTOR_SIZE_4, " > 0 "})` -> find at least one `XorV` node with type `int` and exactly `4` elements. Useful for VectorAPI when the vector species is fixed. >> 4. `@IR(counts = { IRNode.LOAD_VECTOR_D, IRNode.VECTOR_SIZE + "min(4, max_double)", " >0 " })` -> search for a `LoadVector` node with `type` `double`, and `size` exactly equals to `min(4, max_double)` (so 4 elements, or if the hardware allows fewer `doubles`, then that number). >> 5. `@IR(counts = { IRNode.ABS_VF, IRNode.VECTOR_SIZE + "min(LoopMaxUnroll, max_float)", ">= 1" })` -> find at least one `AbsV` nodes with type `float`, and the `size` exactly equals to the smaller of... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > fix TestUnorderedReductionPartialVectorization.java Thanks for doing the updates! I have some more comments. I've also went through the test changes which look good. ------------- PR Review: https://git.openjdk.org/jdk/pull/14539#pullrequestreview-1510536110 From chagedorn at openjdk.org Mon Jul 3 08:50:10 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 3 Jul 2023 08:50:10 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v15] In-Reply-To: References: Message-ID: On Mon, 3 Jul 2023 07:19:09 GMT, Emanuel Peter wrote: >> For some changes to `SuperWord`, and maybe auto-vectorization in general, I want to strengthen the IR Framework. >> >> **Motivation** >> I want to not just find the relevant IR nodes, but also assert that they have the maximal length that they could have on the respective platform (given the CPU features and `MaxVectorSize`). Without this verification it is possible that a future change leads to a regression where we still vectorize but at shorter vector widths as before - leading to performance loss. >> >> **How to use it** >> >> All `IRNode`s in `test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java` that are created with `vectorNode` are now all matched with their `type` and `size`. The regex might now look something like this: >> >> `"(\d+(\s){2}(VectorCastF2X.*)+(\s){2}===.*vector[A-Za-z][8]:{int})"` >> which would match with IR nodes dumped like that: >> `1150 VectorCastF2X === _ 1151 [[ 1146 ]] #vectory[8]:{int} ...` >> >> The goal was to keep it simple and straight forward. In most cases, you can just use the nodes as before, and implicitly we now check for maximal size automatically. However, in some cases we want to ensure there is no or only a limited number of nodes (`failOn` or comparison `<` or `<=` or `=0`) - in those cases we usually want to make sure there is not any node of any size, so we match with any size by default. The size can also explicitly be constrained using `IRNode.VECTOR_SIZE`. >> >> Some examples: >> 1. `@IR(counts = {IRNode.LOAD_VECTOR_I, " >0 "})` -> search for a `LoadVector` node with `type` `int`, and maximal `size` possible on the machine (limited by CPU features and `MaxVectorSize`). This is the most common use case. >> 2. `@IR(failOn = { IRNode.LOAD_VECTOR_L, IRNode.STORE_VECTOR })` -> fail if there is a `LoadVector` with type `long`, of `any` size. >> 3. `@IR(counts = { IRNode.XOR_VI, IRNode.VECTOR_SIZE_4, " > 0 "})` -> find at least one `XorV` node with type `int` and exactly `4` elements. Useful for VectorAPI when the vector species is fixed. >> 4. `@IR(counts = { IRNode.LOAD_VECTOR_D, IRNode.VECTOR_SIZE + "min(4, max_double)", " >0 " })` -> search for a `LoadVector` node with `type` `double`, and `size` exactly equals to `min(4, max_double)` (so 4 elements, or if the hardware allows fewer `doubles`, then that number). >> 5. `@IR(counts = { IRNode.ABS_VF, IRNode.VECTOR_SIZE + "min(LoopMaxUnroll, max_float)", ">= 1" })` -> find at least one `AbsV` nodes with type `float`, and the `size` exactly equals to the smaller of... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Fix 2 IR framework tests test/hotspot/jtreg/compiler/c2/irTests/TestVectorConditionalMove.java line 209: > 207: IRNode.VECTOR_MASK_CMP_F, ">0", > 208: IRNode.VECTOR_BLEND_F, ">0", > 209: IRNode.STORE_VECTOR, ">0"}, Since "IR node count > 0" is quite a common usage, we could think about introducing a separate "hasAny" `@IR` node attribute at some point and replace all usages. But this is of course unrelated to this patch and could be done separately at some point. test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 83: > 81: * or {@code <=} or {@code =0}, the default size is {@link #VECTOR_SIZE_ANY}, allowing any > 82: * size. The motivation for these default values is that in most cases one wants to have > 83: * vectorization with maximal vector width, or no vectorization of any vectro width. Suggestion: * vectorization with maximal vector width, or no vectorization of any vector width. test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 2268: > 2266: size = Integer.parseInt(sizes[i]); > 2267: } catch (NumberFormatException e) { > 2268: TestFormat.checkNoReport(false, "Vector node has invalid size \"" + sizes[i] + "\", in \"" + sizeString + "\""); Can be replaced by direct `throw new TestFormatException(...)`. test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 2316: > 2314: tag_val = Integer.parseInt(tag); > 2315: } catch (NumberFormatException e) { > 2316: TestFormat.checkNoReport(false, "Vector node has invalid size in \"min(...)\", argument " + i + ", \"" + tag + "\", in \"" + sizeTagString + "\""); Can be replaced by direct `throw new TestFormatException(...)`. test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 2355: > 2353: boolean avx512bw = vmInfo.hasCPUFeature("avx512bw"); > 2354: if (avx512) { > 2355: maxBytes = 64; Since you set the default to 64 above, you don't need this assignment here anymore. test/hotspot/jtreg/compiler/lib/ir_framework/driver/irmatching/irrule/constraint/raw/RawCountsConstraint.java line 86: > 84: default -> { > 85: throw new TestFormatException("Comparator not handled: " + comparison.getComparator()); > 86: } For throw statements, you can omit the braces: Suggestion: case "!=" -> throw new TestFormatException("Not-equal comparator not supported for node count: "" + comparison.getComparator() + "". Please rewrite the rule."); default -> throw new TestFormatException("Comparator not handled: " + comparison.getComparator()); Maybe we should also wrap some of the long lines in this `switch` statement. test/hotspot/jtreg/compiler/lib/ir_framework/driver/irmatching/parser/VMInfoParser.java line 37: > 35: import java.util.Map; > 36: import java.util.regex.Matcher; > 37: import java.util.regex.Pattern; Some imports are unused: Suggestion: import compiler.lib.ir_framework.TestFramework; import compiler.lib.ir_framework.shared.TestFrameworkException; import compiler.lib.ir_framework.test.VMInfoPrinter; import java.util.HashMap; import java.util.Map; import java.util.regex.Matcher; import java.util.regex.Pattern; test/hotspot/jtreg/compiler/lib/ir_framework/test/VMInfoPrinter.java line 37: > 35: import java.util.List; > 36: import java.util.Objects; > 37: import java.util.function.Function; Some imports are unused: Suggestion: import compiler.lib.ir_framework.shared.TestFrameworkSocket; import jdk.test.whitebox.WhiteBox; test/hotspot/jtreg/compiler/lib/ir_framework/test/VMInfoPrinter.java line 57: > 55: vmInfo.append("MaxVectorSize:" + maxVectorSize).append(System.lineSeparator()); > 56: long loopMaxUnroll = WHITE_BOX.getIntxVMFlag("LoopMaxUnroll"); > 57: vmInfo.append("LoopMaxUnroll:" + loopMaxUnroll).append(System.lineSeparator()); When using a `StringBuilder`, you should avoid concatenation with `+` and use `append()` calls instead: Suggestion: vmInfo.append("cpuFeatures:").append(cpuFeatures).append(System.lineSeparator()); long maxVectorSize = WHITE_BOX.getIntxVMFlag("MaxVectorSize"); vmInfo.append("MaxVectorSize:").append(maxVectorSize).append(System.lineSeparator()); long loopMaxUnroll = WHITE_BOX.getIntxVMFlag("LoopMaxUnroll"); vmInfo.append("LoopMaxUnroll:").append(loopMaxUnroll).append(System.lineSeparator()); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1250444976 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1250381492 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1250428438 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1250427303 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1250426196 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1250405028 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1250414863 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1250417590 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1250420248 From xgong at openjdk.org Mon Jul 3 08:53:01 2023 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 3 Jul 2023 08:53:01 GMT Subject: RFR: 8309531: Incorrect result with unwrapped iotaShuffle. [v6] In-Reply-To: <79P4C0c_nBrk5vF8IQkWhz3uALJPnLs-XE8BKnEC6Ho=.43391ac3-78ee-4a57-8042-6bf854a5ffb1@github.com> References: <0OY_OdZGlSeQhVmmSyYitp4XjtKSXziY1p0wKy6V68s=.68c8351c-7f6e-44ca-91dd-35fa83cfbb8d@github.com> <79P4C0c_nBrk5vF8IQkWhz3uALJPnLs-XE8BKnEC6Ho=.43391ac3-78ee-4a57-8042-6bf854a5ffb1@github.com> Message-ID: On Mon, 3 Jul 2023 07:51:20 GMT, Jatin Bhateja wrote: >> Patch fixes following two issues in iotaShuffle inline expander with unwrapped indices. >> 1) Disable intrinsification if effective index do not lie within byte value range. >> 2) Use GT predicate while computing comparison mask for all the indices above vector length. >> >> No performance degradation seen with existing slice/unslice operations which internally calls wrapped iotaShuffle. >> >> This interim patch addresses incorrectness around iotaShuffle till we introduce modified shuffle implementations >> with JDK-8310691. >> >> Kindly review and share feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolution. AArch64 tests pass on linux. So LGTM! ------------- Marked as reviewed by xgong (Committer). PR Review: https://git.openjdk.org/jdk/pull/14700#pullrequestreview-1510696799 From dqu at openjdk.org Mon Jul 3 09:03:02 2023 From: dqu at openjdk.org (Daohan Qu) Date: Mon, 3 Jul 2023 09:03:02 GMT Subject: Integrated: 8310331: JitTester: Exclude java.lang.Math.random In-Reply-To: <7mcEVlD5zO5ds7WO9kLAlsahYblH-kkIYh5ld0AL_tg=.23e79233-3e96-4bd8-924f-75394313a184@github.com> References: <7mcEVlD5zO5ds7WO9kLAlsahYblH-kkIYh5ld0AL_tg=.23e79233-3e96-4bd8-924f-75394313a184@github.com> Message-ID: On Sun, 2 Jul 2023 09:16:55 GMT, Daohan Qu wrote: > Test cases generated by JitTester might contain calls to `java.lang.Math.random()`. We could not set a seed for this random call. (In its implementation, `java.lang.Math` create `java.util.Random` instance statically (using the constructor `Random()`) and there is no way to set a seed for it.) > > Such tests might show up different variable values/printouts on each execution (Please refer to [the issue description](https://bugs.openjdk.org/browse/JDK-8310331)). > > Since it is meaningless to generate test cases with "unreproducible" results and JitTester has been able to assign random values to the generated variables (this seed could be set). Maybe we could just exclude the use of `java.lang.Math.random()` in JitTester's test case generation. This pull request has now been integrated. Changeset: 8e0ca8e0 Author: Daohan Qu Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/8e0ca8e05c0dcf201b2ede87620c6cde79e7d550 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8310331: JitTester: Exclude java.lang.Math.random Reviewed-by: thartmann, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/14748 From chagedorn at openjdk.org Mon Jul 3 09:07:02 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 3 Jul 2023 09:07:02 GMT Subject: RFR: 8309894: compiler/vectorapi/VectorLogicalOpIdentityTest.java fails on SVE system with UseSVE=0 In-Reply-To: <2kGmMRiqW2Myc4Iumj9PBSskUycPz7GzFMJ0Xe5Qg-4=.24b7f369-d291-4842-afa6-d2c7cd75cdd8@github.com> References: <2kGmMRiqW2Myc4Iumj9PBSskUycPz7GzFMJ0Xe5Qg-4=.24b7f369-d291-4842-afa6-d2c7cd75cdd8@github.com> Message-ID: <8lV2gbQeZAFeYjU0FmNUd-Uehn4FhahFNaZRO9m-jkg=.06214543-a779-4884-a0b8-5983904e830c@github.com> On Mon, 19 Jun 2023 01:49:57 GMT, Xiaohong Gong wrote: > This test fails with several IR check failures when run on ARM SVE systems with vm option `-XX:UseSVE=0`. Here is one of the failed IR rule: > > > @IR(counts = {IRNode.AND_V, "1"}, applyIfCPUFeatureOr = {"sve", "true", "avx512", "true"}) > public static void testAndMaskSameValue1() > > The specified IR in the test depends on the platform's predicate feature. Hence the IR check can be applied only on ARM SVE or X86 AVX512 systems. But with `-XX:UseSVE=0` on ARM SVE machines, JVM will disable SVE feature for compiler. But the CPU feature is not changed. To guarantee the IR rule is run with SVE as expected, it has to add another condition like `applyIf = {"UseSVE", ">0"}`. Consider `UseSVE` is an ARM specific option, it can be used only on ARM CPUs. So the right IR rules should be: > > > @IR(counts = {IRNode.AND_V, "1"}, applyIfCPUFeature = {"sve", "true"}, applyIf = {"UseSVE", "> 0"}) > @IR(counts = {IRNode.AND_V, "1"}, applyIfCPUFeature = {"avx512", "true"}) > public static void testAndMaskSameValue1() > > > This patch also changes the check order of conditions for a IR rule. It's better to check `applyIfCPUFeature` before `applyIf`, in case the vm option is invalid on running hardware, which makes test throw exception and abort. > > Verified on X86 systems with `UseAVX=1/2/3` by removing the test from ProblemList.txt, and SVE systems with `UseSVE=0/1`. > Accepted as a temporary fix that has to be reverted with [JDK-8311130](https://bugs.openjdk.org/browse/JDK-8311130). I agree with this temporary fix and reverting it again when syncing the flags with the CPU features in JDK-8311130. test/hotspot/jtreg/compiler/lib/ir_framework/README.md line 99: > 97: Sometimes, an `@IR` rule should only be applied if a certain CPU feature is present. This can be done with the attributes `applyIfCPUFeatureXXX` in [@IR](./IR.java) which follow the same logic as the `applyIfXXX` methods for flags in the previous section. An example with `applyIfCPUFeatureXXX` can be found in [TestCPUFeatureCheck](../../../testlibrary_tests/ir_framework/tests/TestCPUFeatureCheck.java) (internal framework test). > 98: > 99: If a `@Test` annotated method has multiple preconditions (for example `applyIf` and `applyIfCPUFeature`), they are evaluated as a logical conjunction. It's worth noting that flags in `applyIf` are checked only if the cpu features in `applyIfCPUFeature` are matched when they are both specified. This can avoid the vm option being evaluated on hardware that does not support it. An example with both `applyIfCPUFeatureXXX` and `applyIfXXX` can be found in [TestPreconditions](../../../testlibrary_tests/ir_framework/tests/TestPreconditions.java) (internal framework test). Suggestion: If a `@Test` annotated method has multiple preconditions (for example `applyIf` and `applyIfCPUFeature`), they are evaluated as a logical conjunction. It's worth noting that flags in `applyIf` are checked only if the CPU features in `applyIfCPUFeature` are matched when they are both specified. This avoids the VM flag being evaluated on hardware that does not support it. An example with both `applyIfCPUFeatureXXX` and `applyIfXXX` can be found in [TestPreconditions](../../../testlibrary_tests/ir_framework/tests/TestPreconditions.java) (internal framework test). test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestPreconditions.java line 65: > 63: // Note that precondition `applyIfCPUFeature` will be evaluated first with > 64: // early return. Hence the IR check should not be applied on non-aarch64 > 65: // systems, and no exception happens. Maybe you can expand here that no exception is thrown because we are not checking the value of the unsupported `UseSVE` flag on non-aarch64 systems. Same for x86 and `UseAVX` below. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14533#pullrequestreview-1506628333 PR Review Comment: https://git.openjdk.org/jdk/pull/14533#discussion_r1250513134 PR Review Comment: https://git.openjdk.org/jdk/pull/14533#discussion_r1247511655 From pli at openjdk.org Mon Jul 3 09:07:12 2023 From: pli at openjdk.org (Pengfei Li) Date: Mon, 3 Jul 2023 09:07:12 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v2] In-Reply-To: References: <0s9ixJcCQIRzJ3h4tpPwVeC7HmYbdDqhd3V6BWZDUTg=.f2b9dad5-73f2-4a34-b24d-639f4fe3de9e@github.com> Message-ID: On Tue, 27 Jun 2023 16:58:52 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/vmaskloop.cpp line 550: >> >>> 548: // 2) Address is growing down (index scale * loop stride < 0) >>> 549: // 3) Memory access scale is different from data size >>> 550: // 4) The loop increment node is on the SWPointer's node stack >> >> Why should the `incr` not be on the node stack? > > Does that not prevent `a[i+1]` from being accepted? That's a really corner case. In C2's ideal graph, most loop statements eventually uses the loop induction variable `phi` node as a input. That's good. But, there is one exception that a loop statement has a sub-expression of `iv + stride`. In this kind of cases, IGVN may do common sub-expression elimination and the inputs may come from the loop increment node thereafter. As the final step of vector masked transformation replaces the loop increment node, the calculation for `iv + stride` will also be replaced as well and it causes mis-compilation. In current patch, I duplicate the loop increment pattern for update (that's why we have `is_loop_incr_pattern()`, see commit 2) to avoid this issue, but currently it only applies to the expression not in array indices, such as `a[i] = i + 1`. For the patterns like `a[i+1] = i`, I'm still looking for a better approach to handle. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1250523973 From pli at openjdk.org Mon Jul 3 09:15:20 2023 From: pli at openjdk.org (Pengfei Li) Date: Mon, 3 Jul 2023 09:15:20 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v2] In-Reply-To: <0s9ixJcCQIRzJ3h4tpPwVeC7HmYbdDqhd3V6BWZDUTg=.f2b9dad5-73f2-4a34-b24d-639f4fe3de9e@github.com> References: <0s9ixJcCQIRzJ3h4tpPwVeC7HmYbdDqhd3V6BWZDUTg=.f2b9dad5-73f2-4a34-b24d-639f4fe3de9e@github.com> Message-ID: On Fri, 23 Jun 2023 14:44:43 GMT, Emanuel Peter wrote: >> Pengfei Li has updated the pull request incrementally with one additional commit since the last revision: >> >> Address part of comments from Emanuel > > src/hotspot/share/opto/vmaskloop.cpp line 548: > >> 546: // Check supported memory access via SWPointer. It's not supported if >> 547: // 1) The constructed SWPointer is invalid >> 548: // 2) Address is growing down (index scale * loop stride < 0) > > Is that a limitation that could be removed in the future? Yes, at least on SVE2. For growing up memory accesses, we generate vector masks that indicate active lanes at lower parts of a vector. But it's opposite for growing down memory accesses where active lanes are at higher parts of a vector. Only SVE2 of AArch64 can generate vector masks in this way, current SVE(1) can not. I'm not sure whether x86 AVX-512 has the similar ability. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1250534252 From epeter at openjdk.org Mon Jul 3 09:17:23 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 3 Jul 2023 09:17:23 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v17] In-Reply-To: References: Message-ID: <7cLT4DKDMwRGhsYIj_1CPh61lmgyJSup69-_DNT6DIQ=.8a828bfb-0131-4098-8f4c-5813c3dc0a50@github.com> > For some changes to `SuperWord`, and maybe auto-vectorization in general, I want to strengthen the IR Framework. > > **Motivation** > I want to not just find the relevant IR nodes, but also assert that they have the maximal length that they could have on the respective platform (given the CPU features and `MaxVectorSize`). Without this verification it is possible that a future change leads to a regression where we still vectorize but at shorter vector widths as before - leading to performance loss. > > **How to use it** > > All `IRNode`s in `test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java` that are created with `vectorNode` are now all matched with their `type` and `size`. The regex might now look something like this: > > `"(\d+(\s){2}(VectorCastF2X.*)+(\s){2}===.*vector[A-Za-z][8]:{int})"` > which would match with IR nodes dumped like that: > `1150 VectorCastF2X === _ 1151 [[ 1146 ]] #vectory[8]:{int} ...` > > The goal was to keep it simple and straight forward. In most cases, you can just use the nodes as before, and implicitly we now check for maximal size automatically. However, in some cases we want to ensure there is no or only a limited number of nodes (`failOn` or comparison `<` or `<=` or `=0`) - in those cases we usually want to make sure there is not any node of any size, so we match with any size by default. The size can also explicitly be constrained using `IRNode.VECTOR_SIZE`. > > Some examples: > 1. `@IR(counts = {IRNode.LOAD_VECTOR_I, " >0 "})` -> search for a `LoadVector` node with `type` `int`, and maximal `size` possible on the machine (limited by CPU features and `MaxVectorSize`). This is the most common use case. > 2. `@IR(failOn = { IRNode.LOAD_VECTOR_L, IRNode.STORE_VECTOR })` -> fail if there is a `LoadVector` with type `long`, of `any` size. > 3. `@IR(counts = { IRNode.XOR_VI, IRNode.VECTOR_SIZE_4, " > 0 "})` -> find at least one `XorV` node with type `int` and exactly `4` elements. Useful for VectorAPI when the vector species is fixed. > 4. `@IR(counts = { IRNode.LOAD_VECTOR_D, IRNode.VECTOR_SIZE + "min(4, max_double)", " >0 " })` -> search for a `LoadVector` node with `type` `double`, and `size` exactly equals to `min(4, max_double)` (so 4 elements, or if the hardware allows fewer `doubles`, then that number). > 5. `@IR(counts = { IRNode.ABS_VF, IRNode.VECTOR_SIZE + "min(LoopMaxUnroll, max_float)", ">= 1" })` -> find at least one `AbsV` nodes with type `float`, and the `size` exactly equals to the smaller of `LoopMaxUnroll` or the maximal size allow... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14539/files - new: https://git.openjdk.org/jdk/pull/14539/files/c2637731..9ba2aea2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14539&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14539&range=15-16 Stats: 25 lines in 4 files changed: 0 ins; 18 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/14539.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14539/head:pull/14539 PR: https://git.openjdk.org/jdk/pull/14539 From pli at openjdk.org Mon Jul 3 09:23:12 2023 From: pli at openjdk.org (Pengfei Li) Date: Mon, 3 Jul 2023 09:23:12 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v2] In-Reply-To: References: Message-ID: On Fri, 23 Jun 2023 14:54:27 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/vmaskloop.cpp line 317: >> >>> 315: >>> 316: // Find element basic type for each vectorization candidate node >>> 317: bool VectorMaskedLoop::find_vector_element_types() { >> >> This is very similar to `SuperWord::compute_vector_element_type`. It would be nice to extract it from both and have some shared utility, right? > > Or is there a clear reason why the two are too different? We need more investigation and discussions about this. Will discuss with you later. >> src/hotspot/share/opto/vmaskloop.cpp line 337: >> >>> 335: // For load node, check if it has the same vector element size with >>> 336: // the bottom type of the statement >>> 337: if (!same_element_size(mem_type, stmt_bottom_type)) { >> >> Can this limitation be removed in the future? > > Write: > Vector element size does not match of the store in the statement. Yes, we have tried supporting type conversions (between different type sizes) but current solution is not mature and not included in this patch. So this limitation is added here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1250546161 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1250545053 From pli at openjdk.org Mon Jul 3 09:23:15 2023 From: pli at openjdk.org (Pengfei Li) Date: Mon, 3 Jul 2023 09:23:15 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v2] In-Reply-To: <0s9ixJcCQIRzJ3h4tpPwVeC7HmYbdDqhd3V6BWZDUTg=.f2b9dad5-73f2-4a34-b24d-639f4fe3de9e@github.com> References: <0s9ixJcCQIRzJ3h4tpPwVeC7HmYbdDqhd3V6BWZDUTg=.f2b9dad5-73f2-4a34-b24d-639f4fe3de9e@github.com> Message-ID: On Fri, 23 Jun 2023 14:45:20 GMT, Emanuel Peter wrote: >> Pengfei Li has updated the pull request incrementally with one additional commit since the last revision: >> >> Address part of comments from Emanuel > > src/hotspot/share/opto/vmaskloop.cpp line 549: > >> 547: // 1) The constructed SWPointer is invalid >> 548: // 2) Address is growing down (index scale * loop stride < 0) >> 549: // 3) Memory access scale is different from data size > > I guess this could also be relaxed for strided accesses in the future? Exactly! I have tried supporting some basic strided accesses. The code is not included in this patch as it's not that beneficial on some CPUs and requires more C2 refactorings. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1250541657 From pli at openjdk.org Mon Jul 3 09:36:10 2023 From: pli at openjdk.org (Pengfei Li) Date: Mon, 3 Jul 2023 09:36:10 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v2] In-Reply-To: References: Message-ID: On Fri, 23 Jun 2023 14:56:25 GMT, Emanuel Peter wrote: >> Pengfei Li has updated the pull request incrementally with one additional commit since the last revision: >> >> Address part of comments from Emanuel > > src/hotspot/share/opto/vmaskloop.cpp line 363: > >> 361: // Otherwise, use signed subword type or the statement's bottom type >> 362: if (subword_stmt) { >> 363: set_elem_bt(node, get_signed_subword_bt(stmt_bottom_type)); > > Why are you taking only the signed subword type, and not unsigned (eg for char you take short)? Current SuperWord also does in this way (see `SuperWord::container_type()`). A main reason is that some matching rules on some backends (like x86) only matches signed subword type. AFAICR, it's good to removing this for AArch64. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1250564508 From epeter at openjdk.org Mon Jul 3 09:37:22 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 3 Jul 2023 09:37:22 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v18] In-Reply-To: References: Message-ID: > For some changes to `SuperWord`, and maybe auto-vectorization in general, I want to strengthen the IR Framework. > > **Motivation** > I want to not just find the relevant IR nodes, but also assert that they have the maximal length that they could have on the respective platform (given the CPU features and `MaxVectorSize`). Without this verification it is possible that a future change leads to a regression where we still vectorize but at shorter vector widths as before - leading to performance loss. > > **How to use it** > > All `IRNode`s in `test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java` that are created with `vectorNode` are now all matched with their `type` and `size`. The regex might now look something like this: > > `"(\d+(\s){2}(VectorCastF2X.*)+(\s){2}===.*vector[A-Za-z][8]:{int})"` > which would match with IR nodes dumped like that: > `1150 VectorCastF2X === _ 1151 [[ 1146 ]] #vectory[8]:{int} ...` > > The goal was to keep it simple and straight forward. In most cases, you can just use the nodes as before, and implicitly we now check for maximal size automatically. However, in some cases we want to ensure there is no or only a limited number of nodes (`failOn` or comparison `<` or `<=` or `=0`) - in those cases we usually want to make sure there is not any node of any size, so we match with any size by default. The size can also explicitly be constrained using `IRNode.VECTOR_SIZE`. > > Some examples: > 1. `@IR(counts = {IRNode.LOAD_VECTOR_I, " >0 "})` -> search for a `LoadVector` node with `type` `int`, and maximal `size` possible on the machine (limited by CPU features and `MaxVectorSize`). This is the most common use case. > 2. `@IR(failOn = { IRNode.LOAD_VECTOR_L, IRNode.STORE_VECTOR })` -> fail if there is a `LoadVector` with type `long`, of `any` size. > 3. `@IR(counts = { IRNode.XOR_VI, IRNode.VECTOR_SIZE_4, " > 0 "})` -> find at least one `XorV` node with type `int` and exactly `4` elements. Useful for VectorAPI when the vector species is fixed. > 4. `@IR(counts = { IRNode.LOAD_VECTOR_D, IRNode.VECTOR_SIZE + "min(4, max_double)", " >0 " })` -> search for a `LoadVector` node with `type` `double`, and `size` exactly equals to `min(4, max_double)` (so 4 elements, or if the hardware allows fewer `doubles`, then that number). > 5. `@IR(counts = { IRNode.ABS_VF, IRNode.VECTOR_SIZE + "min(LoopMaxUnroll, max_float)", ">= 1" })` -> find at least one `AbsV` nodes with type `float`, and the `size` exactly equals to the smaller of `LoopMaxUnroll` or the maximal size allow... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: more for Christian's reviews ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14539/files - new: https://git.openjdk.org/jdk/pull/14539/files/9ba2aea2..e1e7613c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14539&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14539&range=16-17 Stats: 17 lines in 2 files changed: 6 ins; 3 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/14539.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14539/head:pull/14539 PR: https://git.openjdk.org/jdk/pull/14539 From epeter at openjdk.org Mon Jul 3 09:37:22 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 3 Jul 2023 09:37:22 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v15] In-Reply-To: References: Message-ID: On Mon, 3 Jul 2023 08:06:19 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix 2 IR framework tests > > test/hotspot/jtreg/compiler/c2/irTests/TestVectorConditionalMove.java line 209: > >> 207: IRNode.VECTOR_MASK_CMP_F, ">0", >> 208: IRNode.VECTOR_BLEND_F, ">0", >> 209: IRNode.STORE_VECTOR, ">0"}, > > Since "IR node count > 0" is quite a common usage, we could think about introducing a separate "hasAny" `@IR` node attribute at some point and replace all usages. But this is of course unrelated to this patch and could be done separately at some point. It would be nice not to have to write ">0" all the time :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1250562850 From pli at openjdk.org Mon Jul 3 09:48:11 2023 From: pli at openjdk.org (Pengfei Li) Date: Mon, 3 Jul 2023 09:48:11 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v2] In-Reply-To: References: Message-ID: On Fri, 23 Jun 2023 15:02:15 GMT, Emanuel Peter wrote: >> Pengfei Li has updated the pull request incrementally with one additional commit since the last revision: >> >> Address part of comments from Emanuel > > src/hotspot/share/opto/vmaskloop.cpp line 357: > >> 355: set_elem_bt(node, mem_type); >> 356: } else { >> 357: trace_msg(node, "Subword operand does not have precise type"); > > Not clear to me what this means. Precise type info about signedness means that we know exactly whether the data is signed or unsigned. For some operations, such as right shift, results are different for signed and unsigned operands, so C2 has to know the signedness. However, in any Java arithmetic operation, operands of Java subword types are promoted to int first. Sometimes, for example, if an intermediate result is a binary operation of both signed and unsigned, we don't have the precise type info, so we don't know how to vectorize it. (see below example where the signedness info is lost after a short and a char are added) for (int i = 0; i < SIZE; i++) { shorts[i] = (shorts[i] + chars[i]) >> 10; } > src/hotspot/share/opto/vmaskloop.cpp line 367: > >> 365: BasicType self_type = node->bottom_type()->array_element_basic_type(); >> 366: if (!same_element_size(self_type, stmt_bottom_type)) { >> 367: trace_msg(node, "Vector element size does not match"); > > does not match with what? size of store of statement? The message is updated. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1250582207 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1250582846 From xgong at openjdk.org Mon Jul 3 10:19:09 2023 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 3 Jul 2023 10:19:09 GMT Subject: RFR: 8309894: compiler/vectorapi/VectorLogicalOpIdentityTest.java fails on SVE system with UseSVE=0 [v2] In-Reply-To: <2kGmMRiqW2Myc4Iumj9PBSskUycPz7GzFMJ0Xe5Qg-4=.24b7f369-d291-4842-afa6-d2c7cd75cdd8@github.com> References: <2kGmMRiqW2Myc4Iumj9PBSskUycPz7GzFMJ0Xe5Qg-4=.24b7f369-d291-4842-afa6-d2c7cd75cdd8@github.com> Message-ID: <1FK6BWl9Ij-hYz6MlaknQX6KlznKV5jFShT7-nuaxfI=.63568a1c-fc9e-428f-b637-2a7afbd49633@github.com> > This test fails with several IR check failures when run on ARM SVE systems with vm option `-XX:UseSVE=0`. Here is one of the failed IR rule: > > > @IR(counts = {IRNode.AND_V, "1"}, applyIfCPUFeatureOr = {"sve", "true", "avx512", "true"}) > public static void testAndMaskSameValue1() > > The specified IR in the test depends on the platform's predicate feature. Hence the IR check can be applied only on ARM SVE or X86 AVX512 systems. But with `-XX:UseSVE=0` on ARM SVE machines, JVM will disable SVE feature for compiler. But the CPU feature is not changed. To guarantee the IR rule is run with SVE as expected, it has to add another condition like `applyIf = {"UseSVE", ">0"}`. Consider `UseSVE` is an ARM specific option, it can be used only on ARM CPUs. So the right IR rules should be: > > > @IR(counts = {IRNode.AND_V, "1"}, applyIfCPUFeature = {"sve", "true"}, applyIf = {"UseSVE", "> 0"}) > @IR(counts = {IRNode.AND_V, "1"}, applyIfCPUFeature = {"avx512", "true"}) > public static void testAndMaskSameValue1() > > > This patch also changes the check order of conditions for a IR rule. It's better to check `applyIfCPUFeature` before `applyIf`, in case the vm option is invalid on running hardware, which makes test throw exception and abort. > > Verified on X86 systems with `UseAVX=1/2/3` by removing the test from ProblemList.txt, and SVE systems with `UseSVE=0/1`. Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: Address review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14533/files - new: https://git.openjdk.org/jdk/pull/14533/files/de33dd21..7c57d5e5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14533&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14533&range=00-01 Stats: 6 lines in 2 files changed: 2 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/14533.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14533/head:pull/14533 PR: https://git.openjdk.org/jdk/pull/14533 From xgong at openjdk.org Mon Jul 3 10:19:11 2023 From: xgong at openjdk.org (Xiaohong Gong) Date: Mon, 3 Jul 2023 10:19:11 GMT Subject: RFR: 8309894: compiler/vectorapi/VectorLogicalOpIdentityTest.java fails on SVE system with UseSVE=0 [v2] In-Reply-To: <8lV2gbQeZAFeYjU0FmNUd-Uehn4FhahFNaZRO9m-jkg=.06214543-a779-4884-a0b8-5983904e830c@github.com> References: <2kGmMRiqW2Myc4Iumj9PBSskUycPz7GzFMJ0Xe5Qg-4=.24b7f369-d291-4842-afa6-d2c7cd75cdd8@github.com> <8lV2gbQeZAFeYjU0FmNUd-Uehn4FhahFNaZRO9m-jkg=.06214543-a779-4884-a0b8-5983904e830c@github.com> Message-ID: On Mon, 3 Jul 2023 09:03:56 GMT, Christian Hagedorn wrote: >> Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: >> >> Address review comments > >> Accepted as a temporary fix that has to be reverted with [JDK-8311130](https://bugs.openjdk.org/browse/JDK-8311130). > > I agree with this temporary fix and reverting it again when syncing the flags with the CPU features in JDK-8311130. Thanks for the review @chhagedorn ! I'v addressed the comments in latest commit. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14533#issuecomment-1617808537 From chagedorn at openjdk.org Mon Jul 3 10:31:58 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 3 Jul 2023 10:31:58 GMT Subject: RFR: 8309894: compiler/vectorapi/VectorLogicalOpIdentityTest.java fails on SVE system with UseSVE=0 [v2] In-Reply-To: <1FK6BWl9Ij-hYz6MlaknQX6KlznKV5jFShT7-nuaxfI=.63568a1c-fc9e-428f-b637-2a7afbd49633@github.com> References: <2kGmMRiqW2Myc4Iumj9PBSskUycPz7GzFMJ0Xe5Qg-4=.24b7f369-d291-4842-afa6-d2c7cd75cdd8@github.com> <1FK6BWl9Ij-hYz6MlaknQX6KlznKV5jFShT7-nuaxfI=.63568a1c-fc9e-428f-b637-2a7afbd49633@github.com> Message-ID: <_09au9G-u7-RiMqXNdy1aqpRJt9YVcfnerIyJuTYoPs=.e859efac-e888-46e1-b3f2-de7f9406f0f6@github.com> On Mon, 3 Jul 2023 10:19:09 GMT, Xiaohong Gong wrote: >> This test fails with several IR check failures when run on ARM SVE systems with vm option `-XX:UseSVE=0`. Here is one of the failed IR rule: >> >> >> @IR(counts = {IRNode.AND_V, "1"}, applyIfCPUFeatureOr = {"sve", "true", "avx512", "true"}) >> public static void testAndMaskSameValue1() >> >> The specified IR in the test depends on the platform's predicate feature. Hence the IR check can be applied only on ARM SVE or X86 AVX512 systems. But with `-XX:UseSVE=0` on ARM SVE machines, JVM will disable SVE feature for compiler. But the CPU feature is not changed. To guarantee the IR rule is run with SVE as expected, it has to add another condition like `applyIf = {"UseSVE", ">0"}`. Consider `UseSVE` is an ARM specific option, it can be used only on ARM CPUs. So the right IR rules should be: >> >> >> @IR(counts = {IRNode.AND_V, "1"}, applyIfCPUFeature = {"sve", "true"}, applyIf = {"UseSVE", "> 0"}) >> @IR(counts = {IRNode.AND_V, "1"}, applyIfCPUFeature = {"avx512", "true"}) >> public static void testAndMaskSameValue1() >> >> >> This patch also changes the check order of conditions for a IR rule. It's better to check `applyIfCPUFeature` before `applyIf`, in case the vm option is invalid on running hardware, which makes test throw exception and abort. >> >> Verified on X86 systems with `UseAVX=1/2/3` by removing the test from ProblemList.txt, and SVE systems with `UseSVE=0/1`. > > Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments Thanks for the update, looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14533#pullrequestreview-1510902124 From chagedorn at openjdk.org Mon Jul 3 10:45:59 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 3 Jul 2023 10:45:59 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v18] In-Reply-To: References: Message-ID: On Mon, 3 Jul 2023 09:37:22 GMT, Emanuel Peter wrote: >> For some changes to `SuperWord`, and maybe auto-vectorization in general, I want to strengthen the IR Framework. >> >> **Motivation** >> I want to not just find the relevant IR nodes, but also assert that they have the maximal length that they could have on the respective platform (given the CPU features and `MaxVectorSize`). Without this verification it is possible that a future change leads to a regression where we still vectorize but at shorter vector widths as before - leading to performance loss. >> >> **How to use it** >> >> All `IRNode`s in `test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java` that are created with `vectorNode` are now all matched with their `type` and `size`. The regex might now look something like this: >> >> `"(\d+(\s){2}(VectorCastF2X.*)+(\s){2}===.*vector[A-Za-z][8]:{int})"` >> which would match with IR nodes dumped like that: >> `1150 VectorCastF2X === _ 1151 [[ 1146 ]] #vectory[8]:{int} ...` >> >> The goal was to keep it simple and straight forward. In most cases, you can just use the nodes as before, and implicitly we now check for maximal size automatically. However, in some cases we want to ensure there is no or only a limited number of nodes (`failOn` or comparison `<` or `<=` or `=0`) - in those cases we usually want to make sure there is not any node of any size, so we match with any size by default. The size can also explicitly be constrained using `IRNode.VECTOR_SIZE`. >> >> Some examples: >> 1. `@IR(counts = {IRNode.LOAD_VECTOR_I, " >0 "})` -> search for a `LoadVector` node with `type` `int`, and maximal `size` possible on the machine (limited by CPU features and `MaxVectorSize`). This is the most common use case. >> 2. `@IR(failOn = { IRNode.LOAD_VECTOR_L, IRNode.STORE_VECTOR })` -> fail if there is a `LoadVector` with type `long`, of `any` size. >> 3. `@IR(counts = { IRNode.XOR_VI, IRNode.VECTOR_SIZE_4, " > 0 "})` -> find at least one `XorV` node with type `int` and exactly `4` elements. Useful for VectorAPI when the vector species is fixed. >> 4. `@IR(counts = { IRNode.LOAD_VECTOR_D, IRNode.VECTOR_SIZE + "min(4, max_double)", " >0 " })` -> search for a `LoadVector` node with `type` `double`, and `size` exactly equals to `min(4, max_double)` (so 4 elements, or if the hardware allows fewer `doubles`, then that number). >> 5. `@IR(counts = { IRNode.ABS_VF, IRNode.VECTOR_SIZE + "min(LoopMaxUnroll, max_float)", ">= 1" })` -> find at least one `AbsV` nodes with type `float`, and the `size` exactly equals to the smaller of... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > more for Christian's reviews Looks good, thanks for doing all the updates! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14539#pullrequestreview-1510925121 From chagedorn at openjdk.org Mon Jul 3 10:46:02 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 3 Jul 2023 10:46:02 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v15] In-Reply-To: References: Message-ID: On Mon, 3 Jul 2023 09:32:03 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/c2/irTests/TestVectorConditionalMove.java line 209: >> >>> 207: IRNode.VECTOR_MASK_CMP_F, ">0", >>> 208: IRNode.VECTOR_BLEND_F, ">0", >>> 209: IRNode.STORE_VECTOR, ">0"}, >> >> Since "IR node count > 0" is quite a common usage, we could think about introducing a separate "hasAny" `@IR` node attribute at some point and replace all usages. But this is of course unrelated to this patch and could be done separately at some point. > > It would be nice not to have to write ">0" all the time :) Right :-) I've filed [JDK-8311238](https://bugs.openjdk.org/browse/JDK-8311238) to keep track of that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1250680718 From epeter at openjdk.org Mon Jul 3 13:29:30 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 3 Jul 2023 13:29:30 GMT Subject: RFR: 8282365: Optimize divideUnsigned and remainderUnsigned for constants [v16] In-Reply-To: References: Message-ID: On Sat, 10 Jun 2023 01:30:09 GMT, Quan Anh Mai wrote: >> This patch implements idealisation for unsigned divisions to change a division by a constant into a series of multiplication and shift. I also change the idealisation of `DivI` to get a more efficient series when the magic constant overflows an int32. >> >> In general, the idea behind a signed division transformation is that for a positive constant `d`, we would need to find constants `c` and `m` so that: >> >> floor(x / d) = floor(x * c / 2**m) for 0 < x < 2**(N - 1) (1) >> ceil(x / d) = floor(x * c / 2**m) + 1 for -2**(N - 1) <= x < 0 (2) >> >> The implementation in the original book takes into consideration that the machine may not be able to perform the full multiplication `x * c`, so the constant overflow and we need to add back the dividend as in `DivLNode::Ideal` cases. However, for int32 division, `x * c` cannot overflow an int64. As a result, it is always feasible to just calculate the product and shift the result. >> >> For unsigned multiplication, the situation is somewhat trickier because the condition needs to be twice as strong (the condition (1) and (2) above are mostly the same). This results in the magic constant `c` calculated based on the method presented in Hacker's Delight by Henry S. Warren, Jr. may overflow an uintN. For int division, we can depend on the theorem devised by Arch D. Robison in N-Bit Unsigned Division Via N-Bit Multiply-Add, which states that there exists either: >> >> c1 in uint32 and m1, such that floor(x / d) = floor(x * c1 / 2**m1) for 0 < x < 2**32 (3) >> c2 in uint32 and m2, such that floor(x / d) = floor((x + 1) * c2 / 2**m2) for 0 < x < 2**32 (4) >> >> which means that either `x * c1` never overflows an uint64 or `(x + 1) * c2` never overflows an uint64. And we can perform a full multiplication. >> >> For longs, there is no way to do a full multiplication so we do some basic transformations to achieve a computable formula. The details I have written as comments in the overflow case. >> >> More tests are added to cover the possible patterns. >> >> Please take a look and have some reviews. Thank you very much. > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 50 commits: > > - missing java_negate > - Merge branch 'master' into unsignedDiv > - whitespace > - move asserts to use sites > - windows complaints > - compiler complaints > - undefined internal linkage > - add tests, special casing large shift > - draft > - Merge branch 'master' into unsignedDiv > - ... and 40 more: https://git.openjdk.org/jdk/compare/5b147eb5...eb1f5dd9 @merykitty This is a great RFE, let's keep working on it! But I'm concerned about the tests. Yes, the gtests do maybe ensure that the magic constant computation is ok. But all the other code in all the `Ideal` melthods is not tested by your gtests. For that you can only really generate java tests. I think it would be really nice to see end to end tests here. Also the `Ideal` methods do different things for different ranges of the arguments. This really requires more testing. Maybe you already have some of those, but it would be nice to hear from you if they are covering all cases. Ah. I did not review the magic constant computation code itself. It looks reasonable, but reviewing that for correctness is almost impossible anyway. We really have to rely on the tests for that. Thanks again for the work, and let's make this happen! Emanuel src/hotspot/share/opto/divnode.cpp line 39: > 37: #include "utilities/powerOfTwo.hpp" > 38: > 39: // Portions of code courtesy of Clifford Click Not sure if this line should be removed? src/hotspot/share/opto/divnode.cpp line 123: > 121: // magic_const should be a u32 > 122: assert(magic_const >= 0 && magic_const <= jlong(max_juint), "sanity"); > 123: assert(shift_const >= 0 && shift_const < 32, "sanity"); Maybe just move this inside the `magic_int_divide_constants`? You already have similar asserts there. src/hotspot/share/opto/divnode.cpp line 141: > 139: // this has the effect of negating the quotient. > 140: if (!d_pos) { > 141: Node* temp = addend0; addend0 = addend1; addend1 = temp; `swap(addend0, addend1)` - would that work? Would be easier to read. src/hotspot/share/opto/divnode.cpp line 152: > 150: } > 151: > 152: //--------------------------transform_int_udivide------------------------------ You could remove these lines `//------xxx------`, they are not required by the style guide any more src/hotspot/share/opto/divnode.cpp line 154: > 152: //--------------------------transform_int_udivide------------------------------ > 153: // Convert an unsigned division by constant divisor into an alternate Ideal graph. > 154: // Return NULL if no transformation occurs. There may have been the `NULL` -> `nullptr` change since you wrote this code. Please update your patch accordingly. src/hotspot/share/opto/divnode.cpp line 154: > 152: //--------------------------transform_int_udivide------------------------------ > 153: // Convert an unsigned division by constant divisor into an alternate Ideal graph. > 154: // Return NULL if no transformation occurs. null src/hotspot/share/opto/divnode.cpp line 160: > 158: > 159: // Result > 160: Node* q = NULL; nullptr src/hotspot/share/opto/divnode.cpp line 160: > 158: > 159: // Result > 160: Node* q = NULL; nullptr src/hotspot/share/opto/divnode.cpp line 178: > 176: magic_int_unsigned_divide_constants_down(divisor, magic_const, shift_const); > 177: assert(magic_const >= 0 && magic_const <= 0x1FFFFFFFFL, "sanity"); > 178: assert(shift_const >= 0 && shift_const < 33, "sanity"); Add these inside the function, that way also your gtest run these asserts. src/hotspot/share/opto/divnode.cpp line 184: > 182: julong max_dividend; > 183: if (dividend_type->_hi < 0 || dividend_type->_lo >= 0) { > 184: max_dividend = julong(juint(dividend_type->_hi)); This is quite dense. A bit more explanation would help. Ah, you are checking if conversion from `i32` to `u32` of the dividend leads to an overflow? Maybe also add an assert like this (this should hold, right?): `assert( julong(juint(dividend_type->_hi)) >= julong(juint(dividend_type->_lo)), "sanity")` This code also tells me that we probably want to have some tests with different dividends (ie dividends that have different kinds of ranges). Ranges like that could be acheived if you use the tripcount `i` as the `dividend`. The tripcount has a range defined by init and limit of the counted loop. src/hotspot/share/opto/divnode.cpp line 188: > 186: max_dividend = max_juint; > 187: } > 188: if (julong(magic_const) <= max_julong / max_dividend) { Could `max_dividend` ever be `zero`? I guess only if the dividend was exactly `zero`, in which case we should probably not end up here, or is that somehow possible? src/hotspot/share/opto/divnode.cpp line 191: > 189: // No overflow here, just do the transformation > 190: if (shift_const == 32) { > 191: q = phase->intcon(0); Would it not be nicer to handle this special case directly in the `URShiftLNode`? Just replace it during `Value` with zero, if the shift constant is too large. src/hotspot/share/opto/divnode.cpp line 208: > 206: magic_int_unsigned_divide_constants_up(divisor, magic_const, shift_const); > 207: assert(magic_const >= 0 && magic_const <= jlong(max_juint), "sanity"); > 208: assert(shift_const >= 0 && shift_const < 32, "sanity"); Again, add asserts inside the method. src/hotspot/share/opto/divnode.cpp line 406: > 404: // this has the effect of negating the quotient. > 405: if (!d_pos) { > 406: Node *temp = addend0; addend0 = addend1; addend1 = temp; If we are already here, might as well replace with `swap` src/hotspot/share/opto/divnode.cpp line 419: > 417: //--------------------------transform_long_udivide----------------------------- > 418: // Convert an unsigned division by constant divisor into an alternate Ideal graph. > 419: // Return NULL if no transformation occurs. null src/hotspot/share/opto/divnode.cpp line 424: > 422: > 423: // Result > 424: Node* q = NULL; nullptr src/hotspot/share/opto/divnode.cpp line 441: > 439: jlong magic_const; > 440: jint shift_const; > 441: bool magic_const_ovf; `does_magic_const_overflow` Would that work too? src/hotspot/share/opto/divnode.cpp line 443: > 441: bool magic_const_ovf; > 442: magic_long_unsigned_divide_constants(divisor, magic_const, shift_const, magic_const_ovf); > 443: assert(shift_const >= 0 && shift_const < 65, "sanity"); Move asserts inside function. src/hotspot/share/opto/divnode.cpp line 448: > 446: Node* mul_hi = phase->transform(new UMulHiLNode(dividend, magic)); > 447: > 448: if (!magic_const_ovf) { Don't understand this case, what happens here? src/hotspot/share/opto/divnode.cpp line 462: > 460: } > 461: > 462: // Just do the minimum for now Minimum of what? Not sure what you mean src/hotspot/share/opto/divnode.cpp line 469: > 467: mul_hi = phase->transform(new AddLNode(mul_hi, dividend)); > 468: q = new URShiftLNode(mul_hi, phase->intcon(shift_const)); > 469: } I need more comments here. src/hotspot/share/opto/divnode.cpp line 906: > 904: } > 905: > 906: // TODO: Improve Value inference of both signed and unsigned division Did you miss a `TODO` here? src/hotspot/share/opto/divnode.cpp line 915: > 913: > 914: //------------------------------Idealize--------------------------------------- > 915: Node *UDivLNode::Ideal(PhaseGVN *phase, bool can_reshape) { Exactly same comments apply as for `UDivINode::Ideal` src/hotspot/share/opto/divnode.cpp line 921: > 919: Node* UDivINode::Ideal(PhaseGVN* phase, bool can_reshape) { > 920: // Check for dead control input > 921: if (in(0) && remove_dead_region(phase, can_reshape)) { Please make nullptr check explicit Suggestion: if (in(0) != nullptr && remove_dead_region(phase, can_reshape)) { src/hotspot/share/opto/divnode.cpp line 925: > 923: } > 924: // Don't bother trying to transform a dead node > 925: if(in(0) && in(0)->is_top()) { Suggestion: if(in(0) != nullptr && in(0)->is_top()) { src/hotspot/share/opto/divnode.cpp line 931: > 929: const Type* t = phase->type(in(2)); > 930: if(t == TypeInt::ONE) { // Identity? > 931: return nullptr; // Skip it Does `Value` handle this? src/hotspot/share/opto/divnode.cpp line 936: > 934: const TypeInt* ti = t->isa_int(); > 935: if(ti == nullptr) { > 936: return nullptr; Can this ever happen? Only if it is top? If so, add assert! src/hotspot/share/opto/divnode.cpp line 941: > 939: // Check for useless control input > 940: // Check for excluding div-zero case > 941: if (in(0) && (ti->_hi < 0 || ti->_lo > 0)) { Suggestion: if (in(0) != nullptr && (ti->_hi < 0 || ti->_lo > 0)) { src/hotspot/share/opto/divnode.cpp line 947: > 945: > 946: // Divisor very large, constant 2**31 can be transform to a shift > 947: if (ti->_hi <= 0 && ti->_hi > min_jint) { Would be easier to read as a range like this Suggestion: if (min_jint < ti->_hi && ti->_hi <= 0) { src/hotspot/share/opto/divnode.cpp line 956: > 954: return nullptr; > 955: } > 956: juint i = ti->get_con(); // Get divisor I'd replace `i` with `divisor_con`. src/hotspot/share/opto/divnode.cpp line 994: > 992: } > 993: > 994: // TODO: Improve Value inference of both signed and unsigned division Another stranded `TODO` src/hotspot/share/opto/divnode.cpp line 997: > 995: const TypeLong* i1 = t1->isa_long(); > 996: const TypeLong* i2 = t2->isa_long(); > 997: assert(i1 != nullptr && i2 != nullptr, ""); Use `t1->is_long()`, it already has a built in assert for `long` :) src/hotspot/share/opto/divnode.cpp line 1038: > 1036: Node* cmp = phase->transform(new CmpULNode(in(1), in(2))); > 1037: Node* bol = phase->transform(new BoolNode(cmp, BoolTest::ge)); > 1038: return new CMoveLNode(bol, phase->longcon(0), phase->longcon(1), TypeLong::make(0, 1, Type::WidenMin)); Do we have tests for this case? src/hotspot/share/opto/divnode.cpp line 1391: > 1389: //============================================================================= > 1390: //------------------------------Idealize--------------------------------------- > 1391: Node* UModINode::Ideal(PhaseGVN* phase, bool can_reshape) { Same comments again src/hotspot/share/opto/divnode.cpp line 1419: > 1417: } > 1418: juint con = ti->get_con(); > 1419: const Type* u = phase->type(in(1)); This is a constant foldable bailout? Why do you do it earlier here? Generally, I'm starting to wonder if all this code duplication makes sense in all the `Ideal` methods? src/hotspot/share/opto/divnode.cpp line 1426: > 1424: // See if we are MOD'ing by 2^k > 1425: if (is_power_of_2(con)) { > 1426: return new AndINode(in(1), phase->intcon(con - 1)); Do we have tests? src/hotspot/share/opto/divnode.cpp line 1428: > 1426: return new AndINode(in(1), phase->intcon(con - 1)); > 1427: } > 1428: // TODO: This can be calculated directly, see https://arxiv.org/abs/1902.01961 Stranded `TODO`? src/hotspot/share/opto/divnode.cpp line 1431: > 1429: Node* q = transform_int_udivide(phase, in(1), con); > 1430: if (q == nullptr) { > 1431: return nullptr; Is this possible? Can it ever return `nullptr`? src/hotspot/share/opto/divnode.cpp line 1465: > 1463: //============================================================================= > 1464: //------------------------------Idealize--------------------------------------- > 1465: Node* UModLNode::Ideal(PhaseGVN* phase, bool can_reshape) { Same comments as for other `Ideal` methods src/hotspot/share/opto/divnode.cpp line 1503: > 1501: } > 1502: Node* q = transform_long_udivide(phase, in(1), con); > 1503: if (q == nullptr) { Maybe assert `!Matcher::match_rule_supported(Op_UMulHiL)`, if that is the only case it should happen? src/hotspot/share/utilities/javaArithmetic.cpp line 71: > 69: > 70: assert(M < java_shift_left(jlong(1), 32), ""); > 71: assert(s < 32, ""); Just in case: assert that they are non-negative (`>=0`) src/hotspot/share/utilities/javaArithmetic.cpp line 148: > 146: int64_t p; > 147: uint64_t ad, anc, delta, q1, r1, q2, r2, t; > 148: const uint64_t two63 = UCONST64(0x8000000000000000); // 2**63. A shifted one might be more readable src/hotspot/share/utilities/javaArithmetic.cpp line 176: > 174: > 175: M = q2 + 1; > 176: s = p - 64; // shift amount to return. Add asserts here ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/9947#pullrequestreview-1510911188 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250669894 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250686803 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250702239 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250729639 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250732261 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250732637 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250730579 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250732745 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250736080 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250765990 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250776391 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250788763 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250789401 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250794972 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250733095 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250733208 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250802121 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250800018 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250807470 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250808763 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250811574 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250814645 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250850870 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250817906 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250818223 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250819940 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250822359 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250827220 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250830792 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250837670 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250842873 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250846780 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250852739 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250853251 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250855965 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250856771 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250857127 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250861546 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250865067 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250869173 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250682316 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250877478 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250792970 From epeter at openjdk.org Mon Jul 3 13:29:30 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 3 Jul 2023 13:29:30 GMT Subject: RFR: 8282365: Optimize divideUnsigned and remainderUnsigned for constants [v16] In-Reply-To: References: Message-ID: On Mon, 3 Jul 2023 12:19:43 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 50 commits: >> >> - missing java_negate >> - Merge branch 'master' into unsignedDiv >> - whitespace >> - move asserts to use sites >> - windows complaints >> - compiler complaints >> - undefined internal linkage >> - add tests, special casing large shift >> - draft >> - Merge branch 'master' into unsignedDiv >> - ... and 40 more: https://git.openjdk.org/jdk/compare/5b147eb5...eb1f5dd9 > > src/hotspot/share/opto/divnode.cpp line 441: > >> 439: jlong magic_const; >> 440: jint shift_const; >> 441: bool magic_const_ovf; > > `does_magic_const_overflow` Would that work too? I'm not sure exactly what this boolean means, and it is making it diffucult to undersand the logic below > src/hotspot/share/opto/divnode.cpp line 947: > >> 945: >> 946: // Divisor very large, constant 2**31 can be transform to a shift >> 947: if (ti->_hi <= 0 && ti->_hi > min_jint) { > > Would be easier to read as a range like this > Suggestion: > > if (min_jint < ti->_hi && ti->_hi <= 0) { Do we have test cases for this? > src/hotspot/share/opto/divnode.cpp line 956: > >> 954: return nullptr; >> 955: } >> 956: juint i = ti->get_con(); // Get divisor > > I'd replace `i` with `divisor_con`. Or at the very least `ti_con`. > src/hotspot/share/utilities/javaArithmetic.cpp line 71: > >> 69: >> 70: assert(M < java_shift_left(jlong(1), 32), ""); >> 71: assert(s < 32, ""); > > Just in case: assert that they are non-negative (`>=0`) Generally, take the asserts into all of these methods, so that they are also tested in the gtest ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250806030 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250834360 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250838627 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250875841 From epeter at openjdk.org Mon Jul 3 13:29:31 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 3 Jul 2023 13:29:31 GMT Subject: RFR: 8282365: Optimize divideUnsigned and remainderUnsigned for constants [v16] In-Reply-To: References: Message-ID: On Mon, 3 Jul 2023 12:43:57 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/divnode.cpp line 947: >> >>> 945: >>> 946: // Divisor very large, constant 2**31 can be transform to a shift >>> 947: if (ti->_hi <= 0 && ti->_hi > min_jint) { >> >> Would be easier to read as a range like this >> Suggestion: >> >> if (min_jint < ti->_hi && ti->_hi <= 0) { > > Do we have test cases for this? Especially we need a variable case that does not constant fold ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250834879 From epeter at openjdk.org Mon Jul 3 13:29:31 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 3 Jul 2023 13:29:31 GMT Subject: RFR: 8282365: Optimize divideUnsigned and remainderUnsigned for constants [v16] In-Reply-To: References: Message-ID: <4nGBVKmuZDK8giWSnQ8gi6VR62MERCB0moYv92Uae-8=.e10c405e-d58e-496f-b7d3-b3fa923b4b5c@github.com> On Mon, 3 Jul 2023 12:44:24 GMT, Emanuel Peter wrote: >> Do we have test cases for this? > > Especially we need a variable case that does not constant fold Maybe also say that you land in this case for negative constants `ti`. That is expected, right? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1250842509 From epeter at openjdk.org Mon Jul 3 14:01:09 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 3 Jul 2023 14:01:09 GMT Subject: RFR: 8282365: Optimize divideUnsigned and remainderUnsigned for constants [v15] In-Reply-To: References: Message-ID: On Sat, 10 Jun 2023 01:25:55 GMT, Quan Anh Mai wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> whitespace > > May I have a second review for this patch, please? @merykitty I just discussed the testing with @TobiHartmann . He just came across this test: `test/hotspot/jtreg/compiler/c2/TestUnsignedByteCompare1.java`. The cool thing is that you can "simulate" constants with `MethodHandles.constant`. At runtime apparently the invocation specualte-and-traps it to a constant value. That means you can just set a new value, it depopts, and hopefully eventually re-compiles with the next constants. You could easily set up one of these tests per node. Any maybe throw in some interesting ranges for the `dividend`. An interesting experiment would be to have a IR test that works with a random constant, and then have an IR rule that fails if we find a`div` node. At least for those cases where that should work. And then you can easily compare the div results with a non-compiled method that computes the same value. ------------- PR Comment: https://git.openjdk.org/jdk/pull/9947#issuecomment-1618337778 From epeter at openjdk.org Mon Jul 3 14:40:13 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 3 Jul 2023 14:40:13 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v2] In-Reply-To: References: Message-ID: <6EUcNQbNFGXwDM1MlNBIRqWVm7PaY4nwSA_SWq3s4bM=.a84e82df-4ef6-410e-b6b8-cda32f0bc455@github.com> On Mon, 3 Jul 2023 07:54:38 GMT, Pengfei Li wrote: >> src/hotspot/share/opto/loopnode.hpp line 143: >> >>> 141: if (is_vector_masked()) { >>> 142: return false; >>> 143: } >> >> Does this mean that the post-loop has a `CountedLoop` node, but it does not adhere to the counted-loop assumptions, such as having a `incr`, `limit`, `phi` etc? With the old post-loop-vectorization, the LoopNode would always fold away, so it would disappear after IGVN. But now it would stick around, right? Could that turn out to be a problem? > > After being vectorized, the post loop still has `phi`, `incr` and `limit` as before. In other words, the post loop is still a loop now. I think the only difference is that the loop stride value is not a constant any more (as we introduces the `VectorMaskTrueCountNode` for the new stride). The old implementation of post loop vectorization makes the vector-masked post loop run only once so it can optimize the `LoopNode` away. But we cannot do this now without doing multi-versioning. (Without the scalar post loop, loop may run insufficient iterations when the "atomic" post loop is not entered.) I see. Maybe it would be cleaner to separate the "outside/after" loop uses of the `incr` with what happens inside the loop? If we do take the backedge, the stride is a known constant. Only if we exit do we need to add the unknown number of iterations with `VectorMaskTrueCountNode`. See also this comment https://github.com/openjdk/jdk/pull/14581/files#r1250973547 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1250974056 From epeter at openjdk.org Mon Jul 3 14:40:18 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 3 Jul 2023 14:40:18 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v2] In-Reply-To: References: Message-ID: On Mon, 3 Jul 2023 07:37:22 GMT, Pengfei Li wrote: >> ## TL;DR >> >> This patch completely re-implements C2's experimental post loop vectorization for better stability, maintainability and performance. Compared with the original implementation, this new implementation adds a standalone loop phase in C2's ideal loop phases and can vectorize more post loops. The original implementation and all code related to multi-versioned post loops are deleted in this patch. More details about this patch can be found in the document replied in this pull request. > > Pengfei Li has updated the pull request incrementally with one additional commit since the last revision: > > Address part of comments from Emanuel src/hotspot/share/opto/vmaskloop.cpp line 978: > 976: > 977: // Update loop increment/decrement to the vector mask true count > 978: Node* true_cnt = new VectorMaskTrueCountNode(root_vmask, TypeInt::INT); This seems expensive to have to use inside the loop. Is there a way we could move this outside the loop? Because if we do take the backedge then we know that we have to take the full `stride`, right? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1250973547 From epeter at openjdk.org Mon Jul 3 14:40:19 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 3 Jul 2023 14:40:19 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v2] In-Reply-To: References: Message-ID: On Mon, 3 Jul 2023 14:34:00 GMT, Emanuel Peter wrote: >> Pengfei Li has updated the pull request incrementally with one additional commit since the last revision: >> >> Address part of comments from Emanuel > > src/hotspot/share/opto/vmaskloop.cpp line 978: > >> 976: >> 977: // Update loop increment/decrement to the vector mask true count >> 978: Node* true_cnt = new VectorMaskTrueCountNode(root_vmask, TypeInt::INT); > > This seems expensive to have to use inside the loop. Is there a way we could move this outside the loop? Because if we do take the backedge then we know that we have to take the full `stride`, right? I guess you would have to separate out the loop-internal uses and the outside uses of the `incr`. The inside uses would use the `stride` (or is there an exception?) and the outside ones could use the `VectorMaskTrueCountNode`. Doing something like that could have better performance. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1250976621 From epeter at openjdk.org Mon Jul 3 14:47:10 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 3 Jul 2023 14:47:10 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v2] In-Reply-To: References: <6IkvVTm9e60qXwaID0EihRXlUielrryBWoTmYAp3PuU=.c624b13d-bc6d-4c79-86a6-72bda016b50f@github.com> Message-ID: On Mon, 3 Jul 2023 08:05:33 GMT, Pengfei Li wrote: >> src/hotspot/share/opto/superword.cpp line 179: >> >>> 177: assert(_packset.length() == 0, "packset must be empty"); >>> 178: success = SLP_extract(); >>> 179: if (PostLoopMultiversioning) { >> >> Could we now have an assert for `cl->is_main_loop()` at the beginning of `SuperWord::transform_loop`, and remove all checks for it in SuperWord? > > Unfortunately, I just tried updating this but found assertion failures. I see `SuperWord::transform_loop()` is also called in `IdealLoopTree::policy_unroll_slp_analysis()` which can pass a normal loop (the loop before iteration-split). I assume only main loops require unrolling analysis and don't understand why it could be a normal loop. Maybe that's bad code and we need refactor C2's unrolling analysis first. It would be great if we could untangle that a bit. Let me know what idea you come up with. It also sounds confusing that the "analysis" only `policy_unroll_slp_analysis` should call a method that is called "transform" like `transform_loop`. >> src/hotspot/share/opto/superword.hpp line 666: >> >>> 664: IdealLoopTree* lpt() const { return _lpt; } >>> 665: PhiNode* iv() const { >>> 666: return _slp ? _slp->iv() : _lpt->_head->as_CountedLoop()->phi()->as_Phi(); >> >> I'd suggest either cache it directly from `_lpt->_head->as_CountedLoop()->phi()->as_Phi()`, or just query it directly. Reduce dependence on `_slp`. > > Good catch! What do you think of getting rid of `_slp` completely in `SWPointer` refactoring? I think that would be optimal, if it is possible. I would maybe call it a `CLPointer`, for counted-loop-pointer? And only have a reference to the `_lpt` / `cl`. Eventually, we may want to even allow non-conted-loops, but that is really for the future. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1250981304 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1250984838 From epeter at openjdk.org Mon Jul 3 14:47:10 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 3 Jul 2023 14:47:10 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v2] In-Reply-To: <7iru0xDm4lckuwyHvqGSld0_kWUVYSTg5BT3-rqP3Vw=.cab94902-98aa-40ed-ae13-d238380b6267@github.com> References: <7iru0xDm4lckuwyHvqGSld0_kWUVYSTg5BT3-rqP3Vw=.cab94902-98aa-40ed-ae13-d238380b6267@github.com> Message-ID: On Mon, 3 Jul 2023 08:08:29 GMT, Pengfei Li wrote: >> Can we untangle it completely from SuperWord? it seems you have made it optional, so yes. And maybe we can also make the trace flags like `_slp->is_trace_alignment()` independent? It would be nice to also be able to trace this for non SuperWord-contexts like post-loop masked vectoriaztion, right? > > I will try to do this in another JBS and come back here later. That would be fantastic! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1250982246 From epeter at openjdk.org Mon Jul 3 14:55:11 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 3 Jul 2023 14:55:11 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v2] In-Reply-To: <7iru0xDm4lckuwyHvqGSld0_kWUVYSTg5BT3-rqP3Vw=.cab94902-98aa-40ed-ae13-d238380b6267@github.com> References: <7iru0xDm4lckuwyHvqGSld0_kWUVYSTg5BT3-rqP3Vw=.cab94902-98aa-40ed-ae13-d238380b6267@github.com> Message-ID: On Mon, 3 Jul 2023 08:13:41 GMT, Pengfei Li wrote: >> Even if all callers currently ensure that `n` has the correct type, I'd say it is still not a great idea to cast without checking, at least in debug. > > I searched all C2 code and saw a lot of such patterns. Perhaps doing this in another RFE? Yes, please file an RFE. You can assign it to me if you don't want to do it yourself - I may find someone else to do it or do it myself eventually. But for new code please already use `as_LoopVectorMaskNode()`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1250988000 From epeter at openjdk.org Mon Jul 3 14:55:14 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 3 Jul 2023 14:55:14 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v2] In-Reply-To: References: Message-ID: On Mon, 3 Jul 2023 07:37:22 GMT, Pengfei Li wrote: >> ## TL;DR >> >> This patch completely re-implements C2's experimental post loop vectorization for better stability, maintainability and performance. Compared with the original implementation, this new implementation adds a standalone loop phase in C2's ideal loop phases and can vectorize more post loops. The original implementation and all code related to multi-versioned post loops are deleted in this patch. More details about this patch can be found in the document replied in this pull request. > > Pengfei Li has updated the pull request incrementally with one additional commit since the last revision: > > Address part of comments from Emanuel src/hotspot/share/opto/vmaskloop.cpp line 64: > 62: > 63: if (!cl->is_valid_counted_loop(T_INT)) { > 64: trace_msg(nullptr, "Loop is not a valid counted loop"); Would it help to dump the loop head here? Just that one knows which loop is being rejected here? src/hotspot/share/opto/vmaskloop.cpp line 68: > 66: } > 67: if (abs(cl->stride_con()) != 1) { > 68: trace_msg(nullptr, "Loop has unsupported stride value"); Dump loop head and the stride ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1250994587 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1250995022 From epeter at openjdk.org Mon Jul 3 14:55:17 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 3 Jul 2023 14:55:17 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v2] In-Reply-To: <63xntDgcTJN-51cfPjP1XsWdNLkeURQuWmE8hluHbIM=.84e6a8b9-66b1-4427-ab2c-355c0c621871@github.com> References: <63xntDgcTJN-51cfPjP1XsWdNLkeURQuWmE8hluHbIM=.84e6a8b9-66b1-4427-ab2c-355c0c621871@github.com> Message-ID: <-KAmqqEkhtq1UTcfF5xv1etrrhY0CjW7t07wVrSstJY=.1b1e8e20-f016-4cef-9c9a-157c189d1653@github.com> On Mon, 3 Jul 2023 08:31:20 GMT, Pengfei Li wrote: >> src/hotspot/share/opto/vmaskloop.cpp line 71: >> >>> 69: if (cl->loopexit()->in(0) != cl) return; >>> 70: // Skip if some loop operations are pinned to the backedge >>> 71: if (cl->back_control()->outcnt() != 1) return; >> >> It would be interesting to have some trace flag that tells us why we bailed out here and did not do the post-loop vectorization. Unless of course it becomes too noisy. > > Great suggestion! Done. Thanks :) >> src/hotspot/share/opto/vmaskloop.hpp line 95: >> >>> 93: } >>> 94: return false; >>> 95: } >> >> Do you not want to do this sort of implementation in `SWPointer` instead? There are already methods like `scaled_iv_plus_offset`, so it would fit in next to that, right? > > It doesn't fit well as functions in `SWPointer` can only be used for checking the pattern in indices. But this function may be used for checking the loop increment pattern which is not in array indices, perhaps `a[i] = b[i] * (i + 1)`. We don't have `SWPointer` constructed for this. I have rename the function to make the purpose clear. Ah, you are right, that is a different use. Yes, better function name often does the trick ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1250996321 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1250991219 From epeter at openjdk.org Mon Jul 3 15:01:22 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 3 Jul 2023 15:01:22 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v2] In-Reply-To: References: <63xntDgcTJN-51cfPjP1XsWdNLkeURQuWmE8hluHbIM=.84e6a8b9-66b1-4427-ab2c-355c0c621871@github.com> Message-ID: <4iKhNaaEAPEsAz0-1S0VJhU1rzQ0CNcJm7IepcBaUU4=.bcdb0b39-8d0c-4e99-b1f3-3d916ab8d6be@github.com> On Mon, 3 Jul 2023 14:57:50 GMT, Emanuel Peter wrote: >> Could you elaborate how to do such reservation in C2? Just allocation with some larger sizes at the beginning? Or any other examples to refer? > > I think what I have seen people do is just to `map` a high enough index value with `nullptr`. A bit hacky, but it ensures that the arrays underneath get grown sufficiently immediately. It would be nice to have some kind of `reserve` methods... but I don't think we have that. Also: Use the `VectorSet` instead of the `Unique_Node_List` if you can. It uses less space ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1251002744 From epeter at openjdk.org Mon Jul 3 15:01:21 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 3 Jul 2023 15:01:21 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v2] In-Reply-To: <63xntDgcTJN-51cfPjP1XsWdNLkeURQuWmE8hluHbIM=.84e6a8b9-66b1-4427-ab2c-355c0c621871@github.com> References: <63xntDgcTJN-51cfPjP1XsWdNLkeURQuWmE8hluHbIM=.84e6a8b9-66b1-4427-ab2c-355c0c621871@github.com> Message-ID: On Mon, 3 Jul 2023 08:36:44 GMT, Pengfei Li wrote: >> src/hotspot/share/opto/vmaskloop.cpp line 104: >> >>> 102: _core_set.clear(); >>> 103: _body_set.clear(); >>> 104: _body_nodes.clear(); >> >> Would it make sense to somehow reserve the space, so that we do not allocate multiple times when growing these data structures later? > > Could you elaborate how to do such reservation in C2? Just allocation with some larger sizes at the beginning? Or any other examples to refer? I think what I have seen people do is just to `map` a high enough index value with `nullptr`. A bit hacky, but it ensures that the arrays underneath get grown sufficiently immediately. It would be nice to have some kind of `reserve` methods... but I don't think we have that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1251002205 From epeter at openjdk.org Mon Jul 3 15:01:25 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 3 Jul 2023 15:01:25 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v2] In-Reply-To: References: Message-ID: On Mon, 3 Jul 2023 07:37:22 GMT, Pengfei Li wrote: >> ## TL;DR >> >> This patch completely re-implements C2's experimental post loop vectorization for better stability, maintainability and performance. Compared with the original implementation, this new implementation adds a standalone loop phase in C2's ideal loop phases and can vectorize more post loops. The original implementation and all code related to multi-versioned post loops are deleted in this patch. More details about this patch can be found in the document replied in this pull request. > > Pengfei Li has updated the pull request incrementally with one additional commit since the last revision: > > Address part of comments from Emanuel src/hotspot/share/opto/vmaskloop.hpp line 46: > 44: > 45: // Data structures for loop analysis > 46: Unique_Node_List _core_set; // Loop core nodes set for fast membership check If this is really only for membership test, and you never need the list of nodes, you could just use the `VectorSet`. Uses less memory. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1250999324 From simonis at openjdk.org Mon Jul 3 15:25:54 2023 From: simonis at openjdk.org (Volker Simonis) Date: Mon, 3 Jul 2023 15:25:54 GMT Subject: RFR: 8303279: C2: crash in SubTypeCheckNode::sub() at IGVN split if [v3] In-Reply-To: <5Oi18HUJNORp0_wYO2nD_aUGaoyvJCuyT4YmppEWvgA=.f40a1a10-7ee2-483d-bb6c-2fb7ec3218ad@github.com> References: <5Oi18HUJNORp0_wYO2nD_aUGaoyvJCuyT4YmppEWvgA=.f40a1a10-7ee2-483d-bb6c-2fb7ec3218ad@github.com> Message-ID: On Thu, 29 Jun 2023 07:38:15 GMT, Roland Westrelin wrote: >> The crash occurs because at split if during IGVN, a `SubTypeCheck` is >> created with null as input. That happens because the control path the >> `SubTypeCheck` is cloned for is dead. To fix that I propose delaying >> split if until dead paths are collapsed. >> >> I added an assert to check a nullable first input to `SubTypeCheck` >> nodes (which should be impossible because it should be null >> checked). When I ran testing, a number of cases showed up with known >> non null values non properly marked as non null. I fixed them. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Looks good to me. Thanks for the quick fix! ------------- Marked as reviewed by simonis (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14678#pullrequestreview-1511407984 From jbhateja at openjdk.org Mon Jul 3 22:03:41 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 3 Jul 2023 22:03:41 GMT Subject: RFR: 8311023: assert(false) failed: EA: missing memory path Message-ID: Handling missing cases for VectorizedHashCode while collecting memory nodes for propagating new type information through the graph. We associate new instance types with CheckCastPP nodes succeeding allocation IR, refresh connectivity of MemoryMerge slices at indices corresponding to this new alias type and update the memory edges of user memory nodes in the ideal graph to eases out scalar replacement for non-escaping allocations. Please review and share feedback. Best Regards, Jatin ------------- Commit messages: - 8311023: assert(false) failed: EA: missing memory path Changes: https://git.openjdk.org/jdk/pull/14764/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14764&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8311023 Stats: 52 lines in 2 files changed: 51 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14764.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14764/head:pull/14764 PR: https://git.openjdk.org/jdk/pull/14764 From pli at openjdk.org Tue Jul 4 01:34:23 2023 From: pli at openjdk.org (Pengfei Li) Date: Tue, 4 Jul 2023 01:34:23 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v2] In-Reply-To: <-UengrhToQL0qKVGetApNHkjRfUPMo8pEte_gtvCK5g=.b9b70067-9a40-445a-b37b-6a4ddee35be5@github.com> References: <-UengrhToQL0qKVGetApNHkjRfUPMo8pEte_gtvCK5g=.b9b70067-9a40-445a-b37b-6a4ddee35be5@github.com> Message-ID: On Mon, 26 Jun 2023 06:45:19 GMT, Emanuel Peter wrote: >> Pengfei Li has updated the pull request incrementally with one additional commit since the last revision: >> >> Address part of comments from Emanuel > > src/hotspot/share/opto/vmaskloop.cpp line 424: > >> 422: int vopc = 0; >> 423: if (node->is_Mem()) { >> 424: vopc = node->is_Store() ? Op_StoreVectorMasked : Op_LoadVectorMasked; > > Mabye just for good measure: add an assert that it can only be a Load or a Store. Done in commit 2 > src/hotspot/share/opto/vmaskloop.cpp line 429: > >> 427: } >> 428: if (vopc == 0 || >> 429: !Matcher::match_rule_supported_vector_masked(vopc, vlen, bt)) { > > Do all nodes need to be maskable? Or is it enough if only load/store are maskable? Only load, store and reduction operations need to be masked. We previously supported reductions but that's excluded from this patch - only load and store now. I've updated the code in commit 2. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1251380343 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1251380835 From xgong at openjdk.org Tue Jul 4 01:38:17 2023 From: xgong at openjdk.org (Xiaohong Gong) Date: Tue, 4 Jul 2023 01:38:17 GMT Subject: Integrated: 8309894: compiler/vectorapi/VectorLogicalOpIdentityTest.java fails on SVE system with UseSVE=0 In-Reply-To: <2kGmMRiqW2Myc4Iumj9PBSskUycPz7GzFMJ0Xe5Qg-4=.24b7f369-d291-4842-afa6-d2c7cd75cdd8@github.com> References: <2kGmMRiqW2Myc4Iumj9PBSskUycPz7GzFMJ0Xe5Qg-4=.24b7f369-d291-4842-afa6-d2c7cd75cdd8@github.com> Message-ID: On Mon, 19 Jun 2023 01:49:57 GMT, Xiaohong Gong wrote: > This test fails with several IR check failures when run on ARM SVE systems with vm option `-XX:UseSVE=0`. Here is one of the failed IR rule: > > > @IR(counts = {IRNode.AND_V, "1"}, applyIfCPUFeatureOr = {"sve", "true", "avx512", "true"}) > public static void testAndMaskSameValue1() > > The specified IR in the test depends on the platform's predicate feature. Hence the IR check can be applied only on ARM SVE or X86 AVX512 systems. But with `-XX:UseSVE=0` on ARM SVE machines, JVM will disable SVE feature for compiler. But the CPU feature is not changed. To guarantee the IR rule is run with SVE as expected, it has to add another condition like `applyIf = {"UseSVE", ">0"}`. Consider `UseSVE` is an ARM specific option, it can be used only on ARM CPUs. So the right IR rules should be: > > > @IR(counts = {IRNode.AND_V, "1"}, applyIfCPUFeature = {"sve", "true"}, applyIf = {"UseSVE", "> 0"}) > @IR(counts = {IRNode.AND_V, "1"}, applyIfCPUFeature = {"avx512", "true"}) > public static void testAndMaskSameValue1() > > > This patch also changes the check order of conditions for a IR rule. It's better to check `applyIfCPUFeature` before `applyIf`, in case the vm option is invalid on running hardware, which makes test throw exception and abort. > > Verified on X86 systems with `UseAVX=1/2/3` by removing the test from ProblemList.txt, and SVE systems with `UseSVE=0/1`. This pull request has now been integrated. Changeset: 60544f90 Author: Xiaohong Gong URL: https://git.openjdk.org/jdk/commit/60544f9088c11e4718a9cd77f21792c6ba387440 Stats: 63 lines in 4 files changed: 39 ins; 9 del; 15 mod 8309894: compiler/vectorapi/VectorLogicalOpIdentityTest.java fails on SVE system with UseSVE=0 Reviewed-by: epeter, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/14533 From pli at openjdk.org Tue Jul 4 01:45:14 2023 From: pli at openjdk.org (Pengfei Li) Date: Tue, 4 Jul 2023 01:45:14 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v2] In-Reply-To: References: Message-ID: <4T2ExJCtPF7g-Os7mQ7cqG2cEXN-jILfitBb-kySlzY=.f57c136c-edc3-48a4-b653-3a5767dcb60a@github.com> On Tue, 27 Jun 2023 17:49:43 GMT, Emanuel Peter wrote: > General question: Do you have any tests with varying loop limit, and check that you stop exactly at the right iteration? Would be even more interesting with mixed type examples. Just to see that you do not over/under duplicate the vectors. Yes, we previously tested this with a lot of fuzzer tests. We did find issues before but they are all fixed now. (Previously we also supported reductions, and it's a bit tricky to duplicate reductions.) > src/hotspot/share/opto/vmaskloop.cpp line 403: > >> 401: int opc = node->Opcode(); >> 402: BasicType bt = elem_bt(node); >> 403: int vlen = Matcher::max_vector_size(bt); > > Theoretically, different `bt` can have different `Matcher::vector_width_in_bytes`. So `vlen` would not always correspond to `MaxVectorSize / element_size`. It just means that here you would end up checking for a shorter length than maybe expected? But maybe that is intended, it depends on how you generate the nodes later. I think it's good, at least for AArch64 SVE. Do you mean that other architecture may prefer using shorter vectors for better performances? (say, using 256-bit on AVX-512?) Does setting a smaller `MaxVectorSize` help? > src/hotspot/share/opto/vmaskloop.cpp line 442: > >> 440: // nodes to bail out for complex loops >> 441: bool VectorMaskedLoop::analyze_loop_body_nodes() { >> 442: VectorSet tracked(_arena); > > This is probably a good case where you could use `ResourceMark rm;` and just put the `VectorSet` on the default resource arena. Updated here, and in another place. thanks. > src/hotspot/share/opto/vmaskloop.cpp line 465: > >> 463: for (int idx = 0; idx < n_nodes; idx++) { >> 464: Node* node = _body_nodes.at(idx); >> 465: if ((node->is_Mem() && node->as_Mem()->is_Store())) { > > Suggestion: > > if ((node->is_Store())) { Done > src/hotspot/share/opto/vmaskloop.cpp line 474: > >> 472: if (!in_body(out)) { >> 473: trace_msg(node, "Node has out-of-loop user found"); >> 474: return false; > > Can this be handled in the future with a extract node? I guess you would have to extract it from a variable element, as the last iteration is not always the same. Probably, but I haven't tried it so far. Will do it in the future. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14581#issuecomment-1619339468 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1251383442 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1251383769 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1251383913 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1251384357 From pli at openjdk.org Tue Jul 4 02:17:22 2023 From: pli at openjdk.org (Pengfei Li) Date: Tue, 4 Jul 2023 02:17:22 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v2] In-Reply-To: References: <0s9ixJcCQIRzJ3h4tpPwVeC7HmYbdDqhd3V6BWZDUTg=.f2b9dad5-73f2-4a34-b24d-639f4fe3de9e@github.com> Message-ID: On Wed, 28 Jun 2023 10:24:58 GMT, Emanuel Peter wrote: >> I have an example here: >> >> public class Test { >> static int RANGE = 1024; >> >> public static void main(String[] strArr) { >> byte a[] = new byte[RANGE]; >> long b[] = new long[RANGE]; >> test0(a, b); >> } >> >> static void test0(byte[] a, long[] b) { >> for (int i = 0; i < RANGE; i++) { >> a[i]++; >> b[i]++; >> } >> } >> } >> >> `./java -Xcomp -XX:-TieredCompilation -XX:+TraceNewVectors -XX:+TraceLoopOpts -XX:+UnlockExperimentalVMOptions -XX:+UseMaskedLoop -XX:+TraceMaskedLoop -XX:CompileCommand=compileonly,Test::test0 Test.java` >> This are the masks: >> >> Generated vector masks in vmask tree >> Lane_size = 1 >> 3710 LoopVectorMask === _ 367 26 [[ 3711 3712 ]] #vectormask[64]:{byte} >> Lane_size = 2 >> 3711 ExtractLowMask === _ 3710 [[ 3713 3714 ]] #vectormask[32]:{short} >> 3712 ExtractHighMask === _ 3710 [[ 3715 3716 ]] #vectormask[32]:{short} >> Lane_size = 4 >> 3713 ExtractLowMask === _ 3711 [[ 3717 3718 ]] #vectormask[16]:{int} >> 3714 ExtractHighMask === _ 3711 [[ 3719 3720 ]] #vectormask[16]:{int} >> 3715 ExtractLowMask === _ 3712 [[ 3721 3722 ]] #vectormask[16]:{int} >> 3716 ExtractHighMask === _ 3712 [[ 3723 3724 ]] #vectormask[16]:{int} >> Lane_size = 8 >> 3717 ExtractLowMask === _ 3713 [[ ]] #vectormask[8]:{long} >> 3718 ExtractHighMask === _ 3713 [[ ]] #vectormask[8]:{long} >> 3719 ExtractLowMask === _ 3714 [[ ]] #vectormask[8]:{long} >> 3720 ExtractHighMask === _ 3714 [[ ]] #vectormask[8]:{long} >> 3721 ExtractLowMask === _ 3715 [[ ]] #vectormask[8]:{long} >> 3722 ExtractHighMask === _ 3715 [[ ]] #vectormask[8]:{long} >> 3723 ExtractLowMask === _ 3716 [[ ]] #vectormask[8]:{long} >> 3724 ExtractHighMask === _ 3716 [[ ]] #vectormask[8]:{long} >> >> That is indeed `15` masks. Hmm. Maybe that is the best one can do. And maybe it is not all that bad. But again, would be interesting to see the benchmarks for that case. > > Aha, maybe here we could just get away with 1 vmask for `byte`, and then directly extract 8 vmasks for `long`, since we do not need the ones in the middle? You'd have to generalize your `Extract(High/Low)Mask`. We just benchmarked this "byte + long" case and saw some performance regressions after vectorization. Yes, too many mask operations are expensive. GCC does this in a better way: For adjacent data sizes (larger = 2 * smaller), it extracts two halves of the vector mask, but for non-adjacent data sizes, it re-generates vector masks without extraction. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1251401597 From pli at openjdk.org Tue Jul 4 02:22:23 2023 From: pli at openjdk.org (Pengfei Li) Date: Tue, 4 Jul 2023 02:22:23 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v2] In-Reply-To: References: Message-ID: On Tue, 27 Jun 2023 17:34:42 GMT, Emanuel Peter wrote: >> Pengfei Li has updated the pull request incrementally with one additional commit since the last revision: >> >> Address part of comments from Emanuel > > src/hotspot/share/opto/vmaskloop.cpp line 735: > >> 733: vnode = new StoreVectorMaskedNode(ctrl, mem, addr, val, at, mask); >> 734: } >> 735: } else if (VectorNode::is_convert_opcode(opc)) { > > Ok, this does work for same size conversions: > `./java -Xcomp -XX:-TieredCompilation -XX:+TraceNewVectors -XX:+TraceLoopOpts -XX:+UnlockExperimentalVMOptions -XX:+UseMaskedLoop -XX:+TraceMaskedLoop -XX:CompileCommand=compileonly,Test::test0 -XX:+TraceSuperWord Test.java` > > public class Test { > static int RANGE = 1024; > > public static void main(String[] strArr) { > double a[] = new double[RANGE]; > long b[] = new long[RANGE]; > test0(a, b); > } > > static void test0(double[] a, long[] b) { > for (int i = 0; i < RANGE; i++) { > b[i] = (long)a[i]; > } > } > } > > Good to see some conversion is possible. But if I replace double with float, I get `Vector element size does not match`. Can that limitation be lifted? We tried to do that but found some obstacles (perhaps you are aware of). > If you started implementing type conversion for different size types, you'd have to extract_lo/hi or pack the vectors. That would be an invasive change to the current implementation. As you said in your overall feedback, conversions between types of different data sizes requires vector pack/unpack which has conflict with existing semantics of current C2 type conversion nodes. We are still considering how to do it. It would be good if you have better suggestions or can help us with this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1251403951 From pli at openjdk.org Tue Jul 4 02:29:09 2023 From: pli at openjdk.org (Pengfei Li) Date: Tue, 4 Jul 2023 02:29:09 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v2] In-Reply-To: References: <0s9ixJcCQIRzJ3h4tpPwVeC7HmYbdDqhd3V6BWZDUTg=.f2b9dad5-73f2-4a34-b24d-639f4fe3de9e@github.com> Message-ID: On Tue, 27 Jun 2023 17:47:33 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/vmaskloop.cpp line 785: >> >>> 783: } >>> 784: >>> 785: // Duplicate vectorized operations with given vector element size >> >> Got to here today. There should probably be some comment higher up that you first replace scalars with one vector each, and then duplicate them for the larger types that need multiple vectors. >> >> I'm also concerned that there may be some platforms where the max vector width in bytes is not the same for all types. But maybe all platforms that support masked register ops also all have the same vector width in bytes for all types? > > Assume we only allow `32` bit registers for `int`, but `64` bits for doubles. Now you'd be assuming that there need to be double as many `double` vectors as `int` vectors. But actually, they need the same amount of vectors, because vectors of both sizes fit exactly `8` elements. More comments are added. I can only say this way is good on current AArch64. As we don't have enough knowledge of other architectures, we may need some help if we need to change this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1251405212 From pli at openjdk.org Tue Jul 4 02:29:14 2023 From: pli at openjdk.org (Pengfei Li) Date: Tue, 4 Jul 2023 02:29:14 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v2] In-Reply-To: References: Message-ID: On Wed, 28 Jun 2023 10:20:25 GMT, Emanuel Peter wrote: >> Pengfei Li has updated the pull request incrementally with one additional commit since the last revision: >> >> Address part of comments from Emanuel > > src/hotspot/share/opto/vmaskloop.cpp line 790: > >> 788: // Compute vector duplication count and the vmask tree level >> 789: int dup_cnt = lane_size / _size_stats.smallest_size(); >> 790: int level = exact_log2(dup_cnt); > > Rename `level` to something more expressive. Maybe just `vmask_tree_level`. Also in all other methods. Otherwise it is not quite clear what it is supposed to be. Renamed > src/hotspot/share/opto/vmaskloop.cpp line 798: > >> 796: if (type2aelembytes(statement_bottom_type(stmt)) != lane_size) { >> 797: continue; >> 798: } > > You could assert here, that the max vector size for bt is as expected. It's done. > src/hotspot/share/opto/vmaskloop.cpp line 874: > >> 872: void VectorMaskedLoop::adjust_vector_node(Node* vn, Node_List* vmask_tree, >> 873: int level, int mask_off) { >> 874: Node* vmask = vmask_tree->at((1 << level) + mask_off); > > Again, rename `level`. Maybe it could be `vmask_tree_level` and `vmask_tree_level_offset`? Here I finally understood what you mean by the two variables `level` and `mask_off`. Also renamed. > src/hotspot/share/opto/vmaskloop.cpp line 876: > >> 874: Node* vmask = vmask_tree->at((1 << level) + mask_off); >> 875: int lane_size = type2aelembytes(Matcher::vector_element_basic_type(vmask)); >> 876: uint vector_size_in_bytes = Matcher::max_vector_size(T_BYTE); > > Can you add an assert that this is the same as `Matcher::vector_width_in_bytes(Matcher::vector_element_basic_type(vmask))` ? Asserts added. > src/hotspot/share/opto/vmaskloop.cpp line 884: > >> 882: Node* ptr = vn->in(MemNode::Address); >> 883: Node* base = ptr->in(AddPNode::Base); >> 884: int mem_scale = Matcher::max_vector_size(T_BYTE); > > Duplicate of `vector_size_in_bytes`? Aha, it's duplicated now. Previously we did some strided access support so we added this. Removed now. > src/hotspot/share/opto/vmaskloop.cpp line 893: > >> 891: // 2) For populate index, update start index for non-zero mask offset >> 892: if (mask_off != 0) { >> 893: int v_stride = vector_size_in_bytes / lane_size * _cl->stride_con(); > > Is there any test for PopulateIndex with stride that is not `1`? For now I guess only `-1` would even be allowed. Good question, I will check it then. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1251405695 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1251405307 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1251405747 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1251405983 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1251406373 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1251406879 From pli at openjdk.org Tue Jul 4 02:29:15 2023 From: pli at openjdk.org (Pengfei Li) Date: Tue, 4 Jul 2023 02:29:15 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v2] In-Reply-To: References: <0s9ixJcCQIRzJ3h4tpPwVeC7HmYbdDqhd3V6BWZDUTg=.f2b9dad5-73f2-4a34-b24d-639f4fe3de9e@github.com> Message-ID: On Wed, 28 Jun 2023 10:16:28 GMT, Emanuel Peter wrote: >> I just added some shorts, so that the int and float would be duplicated ;) > > Suggested solution: track the last memory state per slice, just like I recently did in `SuperWord::schedule_reorder_memops` with `current_state_in_slice`. I'm not quite familiar with memory slice. Will do more investigation and come back later. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1251405644 From pli at openjdk.org Tue Jul 4 02:40:15 2023 From: pli at openjdk.org (Pengfei Li) Date: Tue, 4 Jul 2023 02:40:15 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v2] In-Reply-To: References: Message-ID: On Wed, 28 Jun 2023 10:41:12 GMT, Emanuel Peter wrote: >> Pengfei Li has updated the pull request incrementally with one additional commit since the last revision: >> >> Address part of comments from Emanuel > > src/hotspot/share/opto/vmaskloop.cpp line 89: > >> 87: cl->mark_loop_vectorized(); >> 88: cl->mark_vector_masked(); >> 89: _phase->C->set_max_vector_size(MaxVectorSize); > > What is this for? On AArch64 with SVE, we pre-initialize an all-true register (p7) only in compiled methods where `MaxVectorSize` is set. So if a compiled method implicitly uses ptrue (because it's vectorized), we need to set the `MaxVectorSize` to guarantee p7 is initialized. This usage starts since the initial SVE patch (in 2020). We know this is not a perfect solution and @fg1417 is currently investigating if we have better solutions. > src/hotspot/share/opto/vmaskloop.cpp line 531: > >> 529: if (!addp->is_AddP() || !operates_on_array_of_type(addp, mem_type)) { >> 530: return nullptr; >> 531: } > > I guess this prevents you from having `Unsafe` use type mismatched loads/stores. But it also prevents vectorization in cases where one might just store shorts into an int array using `Unsafe`. This saves you a lot of headaches. You probably don't lose too much for not vectorizing those cases. Exactly. Is there any case in real applications that may store shorts into an int array? > src/hotspot/share/opto/vmaskloop.cpp line 642: > >> 640: >> 641: // Helper method for finding or creating a vector input at specified index >> 642: Node* VectorMaskedLoop::get_vector_input(Node* node, uint idx) { > > This is analogous to `SuperWord::vector_opd`. Can we not refactor things so that we can share the code? I like refactoring, but it may require big effort. Shall we discuss it later in our future conversations? > src/hotspot/share/opto/vmaskloop.cpp line 939: > >> 937: Node* root_vmask = vmask_tree->at(1); >> 938: >> 939: // Replace vectorization candidate nodes to vector nodes > > Expand explanation. Say that you are for now only generating a single vector node per scalar node. And that the duplication afterwards makes sure that all scalar nodes are "widened" to the same number of elements. The smalles type using a single vector, larger types using multiple (duplicated) vectors per scalar node. Thanks for suggestion. Done in commit 2. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1251410392 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1251410844 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1251411540 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1251411689 From pli at openjdk.org Tue Jul 4 02:44:17 2023 From: pli at openjdk.org (Pengfei Li) Date: Tue, 4 Jul 2023 02:44:17 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v2] In-Reply-To: References: Message-ID: On Wed, 28 Jun 2023 10:58:02 GMT, Emanuel Peter wrote: >> Pengfei Li has updated the pull request incrementally with one additional commit since the last revision: >> >> Address part of comments from Emanuel > > test/hotspot/jtreg/compiler/vectorization/runner/ArrayCopyTest.java line 82: > >> 80: @IR(applyIfCPUFeature = {"sve", "true"}, >> 81: applyIf = {"UseMaskedLoop", "true"}, >> 82: counts = {IRNode.LOOP_VECTOR_MASK, ">0"}) > > We could also do this: > If the CPU features do not support the features for `UseMaskedLoop`, then just put it back to `false`. That way, we do not have to check for the required cpu features. Because when the flag it `true`, we know the platform must also support the corresponding masked instructions. Yes, thanks for this. I will clean up all the IR rules after JDK-8311130 and JDK-8309697 are done. > test/hotspot/jtreg/compiler/vectorization/runner/ArrayInvariantFillTest.java line 69: > >> 67: @Test >> 68: @IR(applyIfCPUFeatureOr = {"asimd", "true", "sse2", "true"}, >> 69: applyIf = {"OptimizeFill", "false"}, > > This seems unrelated. Why did you have to add this? Will cleanup this in JDK-8309697. > test/hotspot/jtreg/compiler/vectorization/runner/VectorizationTestRunner.java line 84: > >> 82: TestFramework irTest = new TestFramework(klass); >> 83: // Add extra VM options to enable more auto-vectorization chances >> 84: irTest.addFlags("-XX:-OptimizeFill"); > > Aha, you removed this too. Fair enough. But since the runner is currently requiring everything to be `flagless`, now I cannot actually force `-XX:-OptimizeFill` from the outside. And that means that potentially the tests are never actually run with `OptimizeFill` off, and we never actually can check the IR rules. We lose test coverage. That makes me a bit nervous. > > Suggestion: if tests actually require the flag off to execute the IR rule, then we should have two scenarios, one where the flag is on, and one when it is off. Again, will cleanup this in JDK-8309697. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1251412833 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1251413032 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1251413480 From duke at openjdk.org Tue Jul 4 04:19:03 2023 From: duke at openjdk.org (Swati Sharma) Date: Tue, 4 Jul 2023 04:19:03 GMT Subject: RFR: 8311178: JMH tests don't scale well when sharing output buffers In-Reply-To: <-bJ0RJ4qNzcZoXn6WmqPd7VbtBEqoi-iMmUy3D-MNJo=.13545140-598a-4691-a13c-9c559756c918@github.com> References: <-bJ0RJ4qNzcZoXn6WmqPd7VbtBEqoi-iMmUy3D-MNJo=.13545140-598a-4691-a13c-9c559756c918@github.com> Message-ID: On Sat, 1 Jul 2023 19:05:23 GMT, Sergey Tsypanov wrote: >> The below benchmark files have scaling issues due to cache contention and leads to poor scaling when run on multiple threads. The patch sets the scope from benchmark level to thread level to fix the issue: >> - org/openjdk/bench/java/io/DataOutputStreamTest.java >> - org/openjdk/bench/java/lang/ArrayCopyObject.java >> - org/openjdk/bench/java/lang/ArrayFiddle.java >> - org/openjdk/bench/java/time/format/DateTimeFormatterBench.java >> - org/openjdk/bench/jdk/incubator/vector/IndexInRangeBenchmark.java >> - org/openjdk/bench/jdk/incubator/vector/MemorySegmentVectorAccess.java >> - org/openjdk/bench/jdk/incubator/vector/StoreMaskedBenchmark.java >> - org/openjdk/bench/jdk/incubator/vector/StoreMaskedIOOBEBenchmark.java >> - org/openjdk/bench/vm/compiler/ArrayFill.java >> - org/openjdk/bench/vm/compiler/IndexVector.java >> >> Also removing the static scope for variables in org/openjdk/bench/jdk/incubator/vector/VectorFPtoIntCastOperations.java for better scaling. >> >> Please review and share your feedback. >> >> Thanks, >> Swati > > test/micro/org/openjdk/bench/java/lang/ArrayCopyObject.java line 64: > >> 62: } >> 63: >> 64: @State(Scope.Thread) > > Are you sure it makes sense as in `main()` method we set `fork(1)` so there's only one thread running the benchmark? AFAIK fork value specifies number of times harness should [fork](https://javadoc.io/doc/org.openjdk.jmh/jmh-core/0.6/org/openjdk/jmh/annotations/Fork.html). Also the change is setting scope to thread level not controlling the number of threads. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14746#discussion_r1251459004 From duke at openjdk.org Tue Jul 4 04:31:54 2023 From: duke at openjdk.org (sid8606) Date: Tue, 4 Jul 2023 04:31:54 GMT Subject: RFR: 8309889: [s390] Missing return statement after calling jump_to_native_invoker method in generate_method_handle_dispatch. In-Reply-To: References: Message-ID: On Mon, 26 Jun 2023 06:05:12 GMT, sid8606 wrote: > Missing return statement after calling jump_to_native_invoker method in generate_method_handle_dispatch, it leads to assert(is_valid()) failed: invalid register. > > Ran tier1 test cases passing with release, fastdebug and slowdebug. Thank you for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14647#issuecomment-1619462561 From duke at openjdk.org Tue Jul 4 04:38:03 2023 From: duke at openjdk.org (sid8606) Date: Tue, 4 Jul 2023 04:38:03 GMT Subject: Integrated: 8309889: [s390] Missing return statement after calling jump_to_native_invoker method in generate_method_handle_dispatch. In-Reply-To: References: Message-ID: On Mon, 26 Jun 2023 06:05:12 GMT, sid8606 wrote: > Missing return statement after calling jump_to_native_invoker method in generate_method_handle_dispatch, it leads to assert(is_valid()) failed: invalid register. > > Ran tier1 test cases passing with release, fastdebug and slowdebug. This pull request has now been integrated. Changeset: 514816ed Author: Sidraya Jayagond Committer: Amit Kumar URL: https://git.openjdk.org/jdk/commit/514816ed7d7dea1fb13d32b80aef89774bee13d3 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8309889: [s390] Missing return statement after calling jump_to_native_invoker method in generate_method_handle_dispatch. Reviewed-by: amitkumar, lucy ------------- PR: https://git.openjdk.org/jdk/pull/14647 From thartmann at openjdk.org Tue Jul 4 06:17:00 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 4 Jul 2023 06:17:00 GMT Subject: RFR: 8309531: Incorrect result with unwrapped iotaShuffle. [v6] In-Reply-To: <79P4C0c_nBrk5vF8IQkWhz3uALJPnLs-XE8BKnEC6Ho=.43391ac3-78ee-4a57-8042-6bf854a5ffb1@github.com> References: <0OY_OdZGlSeQhVmmSyYitp4XjtKSXziY1p0wKy6V68s=.68c8351c-7f6e-44ca-91dd-35fa83cfbb8d@github.com> <79P4C0c_nBrk5vF8IQkWhz3uALJPnLs-XE8BKnEC6Ho=.43391ac3-78ee-4a57-8042-6bf854a5ffb1@github.com> Message-ID: On Mon, 3 Jul 2023 07:51:20 GMT, Jatin Bhateja wrote: >> Patch fixes following two issues in iotaShuffle inline expander with unwrapped indices. >> 1) Disable intrinsification if effective index do not lie within byte value range. >> 2) Use GT predicate while computing comparison mask for all the indices above vector length. >> >> No performance degradation seen with existing slice/unslice operations which internally calls wrapped iotaShuffle. >> >> This interim patch addresses incorrectness around iotaShuffle till we introduce modified shuffle implementations >> with JDK-8310691. >> >> Kindly review and share feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolution. All tests passed, I added some style comments. src/hotspot/share/opto/vectorIntrinsics.cpp line 595: > 593: const TypeInt* wrap = gvn().type(argument(6))->isa_int(); > 594: > 595: if (shuffle_klass == nullptr || shuffle_klass->const_oop() == nullptr || Suggestion: if (shuffle_klass == nullptr || shuffle_klass->const_oop() == nullptr || src/hotspot/share/opto/vectorIntrinsics.cpp line 620: > 618: > 619: if (!do_wrap && !effective_indices_in_range) { > 620: // FIXME: disable instrinsification for unwrapped shuffle iota Please don't add FIXME's to new code or at least file a follow-up RFE and reference it here. src/hotspot/share/opto/vectorIntrinsics.cpp line 641: > 639: > 640: bool step_multiply = !step_val->is_con() || !is_power_of_2(step_val->get_con()); > 641: if(step_multiply) { Suggestion: if (step_multiply) { src/hotspot/share/opto/vectorIntrinsics.cpp line 646: > 644: } > 645: } else { > 646: if (!arch_supports_vector(Op_LShiftVB, num_elem, elem_bt, VecMaskNotUsed)) { Can be converted to `else if` src/hotspot/share/opto/vectorIntrinsics.cpp line 659: > 657: Node* step = argument(5); > 658: > 659: if(step_multiply) { Suggestion: if (step_multiply) { ------------- PR Review: https://git.openjdk.org/jdk/pull/14700#pullrequestreview-1512123128 PR Review Comment: https://git.openjdk.org/jdk/pull/14700#discussion_r1251530503 PR Review Comment: https://git.openjdk.org/jdk/pull/14700#discussion_r1251529875 PR Review Comment: https://git.openjdk.org/jdk/pull/14700#discussion_r1251531381 PR Review Comment: https://git.openjdk.org/jdk/pull/14700#discussion_r1251532043 PR Review Comment: https://git.openjdk.org/jdk/pull/14700#discussion_r1251530845 From pli at openjdk.org Tue Jul 4 08:50:14 2023 From: pli at openjdk.org (Pengfei Li) Date: Tue, 4 Jul 2023 08:50:14 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization In-Reply-To: <8KPkr2loby3RVIrYQBiXWv3Ph2E0saSLVDBMFHi88LQ=.b1ffb28d-54a8-4dcc-9472-e53b055a72ee@github.com> References: <8KPkr2loby3RVIrYQBiXWv3Ph2E0saSLVDBMFHi88LQ=.b1ffb28d-54a8-4dcc-9472-e53b055a72ee@github.com> Message-ID: On Thu, 29 Jun 2023 10:54:29 GMT, Emanuel Peter wrote: >> Hi @eme64, >> >> I guess you have done your first round of review. @fg1417 and I really appreciate all your constructive inputs. By reading your comments, I believe you have reviewed this patch in very detail. Thanks again! >> >> What I am doing now: >> >> - I'm trying to fix the issues which I think can be fixed immediately. >> - I'm trying to answer all your simple questions ASAP. >> >> For your request of big refactoring work, I feel like I personally may not have enough time and capability to complete it in a short time. We may need some discussion about it. But it's great to know more about your "hybrid vectorizer" plan from your feedback. It looks like a grand plan, and requires significant effort and cooperation. I strongly agree that we need some conversation to discuss where we should move forward and what we can cooperate. Could you give us a moment to digest your idea before we schedule a conversation? >> >> BTW: What's your preferred time for a conversation? We are based in Shanghai (GMT+8) > > Hi @pfustc ! > > I'm grad you appreciate my review. > >> For your request of big refactoring work, I feel like I personally may not have enough time and capability to complete it in a short time. > > Are you under some time constraint? No pressure from my side, take the time you need. > > I would very much love to have a conversation over a video call with you. I think that would be beneficial for all of us. The problem from our side (Oracle) are intellectual property concerns. OpenJDK emails and PR's are all under the Oracle Contributor Agreement. So there I'm free to have conversations. I'm trying to figure out if we can have a similar frame for a video call, sadly it may take a few weeks or months to get that sorted, as many people are on summer vacation. > > Please take some time to digest the feedback. This is a big change set, it will take a while to be ready for integration at any rate. And again, I would really urge you to consider some refactoring of SuperWord in a separate RFE before this change here. > > I'm looking forward to more collaboration - over PR comments, emails, and hopefully eventually video calls as well ? > Emanuel Hi @eme64, In commit 2, I have fixed all simple issues according to your comments and marked them "resolved". And we may spend more time then on the remaining unresolved issues. Now I'd like to answer more questions in your overall feedback. > You could not mask all instructions, just loads and stores. But do you really need to mask all other instructions too? I guess not if they do not have side-effects, right? Adding avx/avx2 would unlock this feature for many more intel machines. Besides loads and stores, vector reductions also need to be masked because they do have side-effect (only active lanes should be involved in reduction operations). But yes, reduction support is already excluded from this patch because of performance. Perhaps in the future we can consider transforming reductions to non-reductions (just like what you did recently in SuperWord) to get better performance. In commit 2, I have updated this code according to your suggestion. Thanks for it! Regarding avx/avx2 support, I'm afraid we don't have enough knowledge and test resources of x86. We may need Intel's help if we want to do this. > Indexing arrays: From my experiments, I have to conclude that you only allow simple indexing of the form a[i], no offset or scaling a[i*2 + 3]. Scaling support is excluded from this patch because they're actually "strided accesses". We cannot get better performance for them with current gather/scatter nodes in C2. But, offset support is in (except the only special case of `iv + stride` which you experimented). You may try other cases like `a[i + 2]` or `a[i - 3]` and see they are vectorized. > Why not use this for main-loops? As I have mentioned above, we have experimented more than what we do in this patch, including reductions, strided accesses, type conversions and for normal (unsplit) loops. Indeed, at the beginning, we used this for normal loops and did vectorization before C2's loop iteration split - this is the ideal SVE-style vectorization on AArch64 as the generated code is very elegant. But unfortunately, that performance result is not as good as we expected, and it added a lot of complexity because we need to be make it compatible with C2's loop strip-mining. Later, we turned to use this for post loops. It does reduce a lot of complicity and show better performance (at least on all SVE CPUs we have) now. Using this for main loops is still in our long-term plan, but not in short-term because current SuperWord can do it well. > What about a hybrid vectorizer? While working on this patch, we were also thinking about how to make this co-exist with SuperWord. That's a big question! But unfortunately, current SuperWord code is quite convoluted with heavy historical burden and very few people in the JDK community were interested in this before. And for a long time, we have been hoping someone would refactor it. But before you, nobody seems to have the interest or ambition to do it. We now are very glad to know that you have such "hybrid" plan. But I think, how to refactor current SuperWord code also depends on what the "hybrid vectorizer" eventually looks like. We may need more discussions in the future about the direction of refactoring. Looking forward to having a video call with you. Thanks, Pengfei ------------- PR Comment: https://git.openjdk.org/jdk/pull/14581#issuecomment-1619814602 From thartmann at openjdk.org Tue Jul 4 10:32:05 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 4 Jul 2023 10:32:05 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v18] In-Reply-To: References: Message-ID: On Mon, 3 Jul 2023 09:37:22 GMT, Emanuel Peter wrote: >> For some changes to `SuperWord`, and maybe auto-vectorization in general, I want to strengthen the IR Framework. >> >> **Motivation** >> I want to not just find the relevant IR nodes, but also assert that they have the maximal length that they could have on the respective platform (given the CPU features and `MaxVectorSize`). Without this verification it is possible that a future change leads to a regression where we still vectorize but at shorter vector widths as before - leading to performance loss. >> >> **How to use it** >> >> All `IRNode`s in `test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java` that are created with `vectorNode` are now all matched with their `type` and `size`. The regex might now look something like this: >> >> `"(\d+(\s){2}(VectorCastF2X.*)+(\s){2}===.*vector[A-Za-z][8]:{int})"` >> which would match with IR nodes dumped like that: >> `1150 VectorCastF2X === _ 1151 [[ 1146 ]] #vectory[8]:{int} ...` >> >> The goal was to keep it simple and straight forward. In most cases, you can just use the nodes as before, and implicitly we now check for maximal size automatically. However, in some cases we want to ensure there is no or only a limited number of nodes (`failOn` or comparison `<` or `<=` or `=0`) - in those cases we usually want to make sure there is not any node of any size, so we match with any size by default. The size can also explicitly be constrained using `IRNode.VECTOR_SIZE`. >> >> Some examples: >> 1. `@IR(counts = {IRNode.LOAD_VECTOR_I, " >0 "})` -> search for a `LoadVector` node with `type` `int`, and maximal `size` possible on the machine (limited by CPU features and `MaxVectorSize`). This is the most common use case. >> 2. `@IR(failOn = { IRNode.LOAD_VECTOR_L, IRNode.STORE_VECTOR })` -> fail if there is a `LoadVector` with type `long`, of `any` size. >> 3. `@IR(counts = { IRNode.XOR_VI, IRNode.VECTOR_SIZE_4, " > 0 "})` -> find at least one `XorV` node with type `int` and exactly `4` elements. Useful for VectorAPI when the vector species is fixed. >> 4. `@IR(counts = { IRNode.LOAD_VECTOR_D, IRNode.VECTOR_SIZE + "min(4, max_double)", " >0 " })` -> search for a `LoadVector` node with `type` `double`, and `size` exactly equals to `min(4, max_double)` (so 4 elements, or if the hardware allows fewer `doubles`, then that number). >> 5. `@IR(counts = { IRNode.ABS_VF, IRNode.VECTOR_SIZE + "min(LoopMaxUnroll, max_float)", ">= 1" })` -> find at least one `AbsV` nodes with type `float`, and the `size` exactly equals to the smaller of... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > more for Christian's reviews Nice enhancement! I skimmed through the changes and it looks good to me. test/hotspot/jtreg/compiler/lib/ir_framework/driver/irmatching/irrule/constraint/raw/RawCountsConstraint.java line 62: > 60: switch (comparison.getComparator()) { > 61: case "<" -> { > 62: TestFormat.checkNoReport(comparison.getGivenValue() > 1, "Node count comparison \"<" + What if `comparison.getGivenValue()` is negative? test/hotspot/jtreg/compiler/lib/ir_framework/driver/irmatching/irrule/constraint/raw/RawCountsConstraint.java line 86: > 84: } > 85: case "!=" -> throw new TestFormatException("Not-equal comparator not supported for node count: \"" + > 86: comparison.getComparator() + "\". Please rewrite the rule."); No need to call `comparison.getComparator()` again here. test/hotspot/jtreg/compiler/lib/ir_framework/driver/irmatching/parser/VMInfo.java line 59: > 57: return Long.parseLong(getStringValue(key)); > 58: } catch (NumberFormatException e) { > 59: throw new TestFrameworkException("VMInfo value for \"" + key + "\" is not long, got \"" + getStringValue(key) + "\""); Suggestion: throw new TestFrameworkException("VMInfo value for "" + key + "" is not a long, got "" + getStringValue(key) + """); ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14539#pullrequestreview-1512584953 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1251833266 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1251830428 PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1251834541 From jsjolen at openjdk.org Tue Jul 4 10:35:21 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 4 Jul 2023 10:35:21 GMT Subject: RFR: 8311087: PhiNode::wait_for_region_igvn should break early [v2] In-Reply-To: References: Message-ID: <2AVJe88tw0HV9HbHk_T3GLo-oIvnMNAWEkpw9SLkTYc=.3d55a112-c9e1-4ff1-a076-9976234af8a5@github.com> > Hi, please consider this PR. > > Instead of continuing the loop we break after setting `delay = true`. I also flattened out the indentation by transforming `A && B` into `!A || !B` and `continue`ing if so. This makes the code a bit longer, but clearer imho. > > I've run `TestCastIIAfterUnrollingInOuterLoop`, which is the test that was added along with this method. I'm also running tier1. > > Thanks, > Johan Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Wrap continue in braces ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14707/files - new: https://git.openjdk.org/jdk/pull/14707/files/9dc51e1c..a327dd14 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14707&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14707&range=00-01 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/14707.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14707/head:pull/14707 PR: https://git.openjdk.org/jdk/pull/14707 From jsjolen at openjdk.org Tue Jul 4 10:35:21 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 4 Jul 2023 10:35:21 GMT Subject: RFR: 8311087: PhiNode::wait_for_region_igvn should break early [v2] In-Reply-To: References: Message-ID: On Fri, 30 Jun 2023 06:29:02 GMT, Christian Hagedorn wrote: >> Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: >> >> Wrap continue in braces > > src/hotspot/share/opto/cfgnode.cpp line 1944: > >> 1942: Node* n = in(j); >> 1943: >> 1944: if (rc == nullptr || !rc->is_Proj()) continue; > > Maybe you could put braces around the `continue` statements. Sounds good, I didn't change indentation though. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14707#discussion_r1251838476 From jsjolen at openjdk.org Tue Jul 4 10:35:48 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 4 Jul 2023 10:35:48 GMT Subject: RFR: 8311087: PhiNode::wait_for_region_igvn should break early [v2] In-Reply-To: References: Message-ID: <0F7xhykjE9fU9icg9Nb0xzGvIIw4bFkJeba9fYgdz5M=.9b292f1b-8af1-4f9c-ae11-39be762d973d@github.com> On Thu, 29 Jun 2023 13:15:09 GMT, Christian Hagedorn wrote: >> Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: >> >> Wrap continue in braces > > src/hotspot/share/opto/cfgnode.cpp line 1966: > >> 1964: delay = true; >> 1965: break; >> 1966: } > > Just an idea, how about putting this into a separate method `should_delay()` (or something like that) and replacing `continue` with `return false` and `break` with `return true`? If `should_delay()` is true at some point, we can push `this` to the worklist and return true. > > But looks good either way. Yeah, it could be a lambda (this isn't used anywhere else), but not sure that it'd give us too much in terms of clarity here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14707#discussion_r1251843018 From thartmann at openjdk.org Tue Jul 4 10:39:53 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 4 Jul 2023 10:39:53 GMT Subject: RFR: 8311023: assert(false) failed: EA: missing memory path In-Reply-To: References: Message-ID: On Mon, 3 Jul 2023 21:55:44 GMT, Jatin Bhateja wrote: > Handling missing cases for VectorizedHashCode while collecting memory nodes for propagating new type information through the graph. > > We associate new instance types with CheckCastPP nodes succeeding allocation IR, refresh connectivity of MemoryMerge slices at alias indices corresponding to new instance type and update the memory edges of user memory nodes in the ideal graph to ease out scalar replacements. > > Please review and share feedback. > > Best Regards, > Jatin Looks good to me. Please change the test name to something more descriptive. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14764#pullrequestreview-1512610019 From epeter at openjdk.org Tue Jul 4 11:26:05 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Jul 2023 11:26:05 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v18] In-Reply-To: References: Message-ID: On Tue, 4 Jul 2023 10:23:39 GMT, Tobias Hartmann wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> more for Christian's reviews > > test/hotspot/jtreg/compiler/lib/ir_framework/driver/irmatching/irrule/constraint/raw/RawCountsConstraint.java line 62: > >> 60: switch (comparison.getComparator()) { >> 61: case "<" -> { >> 62: TestFormat.checkNoReport(comparison.getGivenValue() > 1, "Node count comparison \"<" + > > What if `comparison.getGivenValue()` is negative? A rule like that would always fail. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1251894841 From epeter at openjdk.org Tue Jul 4 11:40:24 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Jul 2023 11:40:24 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v19] In-Reply-To: References: Message-ID: <0GxMrQWf5GLtBfD1g20VtG5X2b4l6vDVBwJ_j65G3tg=.a754e2d5-c938-4599-8e50-585f0817d263@github.com> > For some changes to `SuperWord`, and maybe auto-vectorization in general, I want to strengthen the IR Framework. > > **Motivation** > I want to not just find the relevant IR nodes, but also assert that they have the maximal length that they could have on the respective platform (given the CPU features and `MaxVectorSize`). Without this verification it is possible that a future change leads to a regression where we still vectorize but at shorter vector widths as before - leading to performance loss. > > **How to use it** > > All `IRNode`s in `test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java` that are created with `vectorNode` are now all matched with their `type` and `size`. The regex might now look something like this: > > `"(\d+(\s){2}(VectorCastF2X.*)+(\s){2}===.*vector[A-Za-z][8]:{int})"` > which would match with IR nodes dumped like that: > `1150 VectorCastF2X === _ 1151 [[ 1146 ]] #vectory[8]:{int} ...` > > The goal was to keep it simple and straight forward. In most cases, you can just use the nodes as before, and implicitly we now check for maximal size automatically. However, in some cases we want to ensure there is no or only a limited number of nodes (`failOn` or comparison `<` or `<=` or `=0`) - in those cases we usually want to make sure there is not any node of any size, so we match with any size by default. The size can also explicitly be constrained using `IRNode.VECTOR_SIZE`. > > Some examples: > 1. `@IR(counts = {IRNode.LOAD_VECTOR_I, " >0 "})` -> search for a `LoadVector` node with `type` `int`, and maximal `size` possible on the machine (limited by CPU features and `MaxVectorSize`). This is the most common use case. > 2. `@IR(failOn = { IRNode.LOAD_VECTOR_L, IRNode.STORE_VECTOR })` -> fail if there is a `LoadVector` with type `long`, of `any` size. > 3. `@IR(counts = { IRNode.XOR_VI, IRNode.VECTOR_SIZE_4, " > 0 "})` -> find at least one `XorV` node with type `int` and exactly `4` elements. Useful for VectorAPI when the vector species is fixed. > 4. `@IR(counts = { IRNode.LOAD_VECTOR_D, IRNode.VECTOR_SIZE + "min(4, max_double)", " >0 " })` -> search for a `LoadVector` node with `type` `double`, and `size` exactly equals to `min(4, max_double)` (so 4 elements, or if the hardware allows fewer `doubles`, then that number). > 5. `@IR(counts = { IRNode.ABS_VF, IRNode.VECTOR_SIZE + "min(LoopMaxUnroll, max_float)", ">= 1" })` -> find at least one `AbsV` nodes with type `float`, and the `size` exactly equals to the smaller of `LoopMaxUnroll` or the maximal size allow... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Response to Tobias' review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14539/files - new: https://git.openjdk.org/jdk/pull/14539/files/e1e7613c..1e160262 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14539&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14539&range=17-18 Stats: 7 lines in 2 files changed: 4 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/14539.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14539/head:pull/14539 PR: https://git.openjdk.org/jdk/pull/14539 From epeter at openjdk.org Tue Jul 4 11:40:24 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Jul 2023 11:40:24 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v18] In-Reply-To: References: Message-ID: On Tue, 4 Jul 2023 11:23:29 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/lib/ir_framework/driver/irmatching/irrule/constraint/raw/RawCountsConstraint.java line 62: >> >>> 60: switch (comparison.getComparator()) { >>> 61: case "<" -> { >>> 62: TestFormat.checkNoReport(comparison.getGivenValue() > 1, "Node count comparison \"<" + >> >> What if `comparison.getGivenValue()` is negative? > > A rule like that would always fail. Will add more checks to give more precise messages. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14539#discussion_r1251904030 From epeter at openjdk.org Tue Jul 4 11:44:47 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Jul 2023 11:44:47 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v20] In-Reply-To: References: Message-ID: > For some changes to `SuperWord`, and maybe auto-vectorization in general, I want to strengthen the IR Framework. > > **Motivation** > I want to not just find the relevant IR nodes, but also assert that they have the maximal length that they could have on the respective platform (given the CPU features and `MaxVectorSize`). Without this verification it is possible that a future change leads to a regression where we still vectorize but at shorter vector widths as before - leading to performance loss. > > **How to use it** > > All `IRNode`s in `test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java` that are created with `vectorNode` are now all matched with their `type` and `size`. The regex might now look something like this: > > `"(\d+(\s){2}(VectorCastF2X.*)+(\s){2}===.*vector[A-Za-z][8]:{int})"` > which would match with IR nodes dumped like that: > `1150 VectorCastF2X === _ 1151 [[ 1146 ]] #vectory[8]:{int} ...` > > The goal was to keep it simple and straight forward. In most cases, you can just use the nodes as before, and implicitly we now check for maximal size automatically. However, in some cases we want to ensure there is no or only a limited number of nodes (`failOn` or comparison `<` or `<=` or `=0`) - in those cases we usually want to make sure there is not any node of any size, so we match with any size by default. The size can also explicitly be constrained using `IRNode.VECTOR_SIZE`. > > Some examples: > 1. `@IR(counts = {IRNode.LOAD_VECTOR_I, " >0 "})` -> search for a `LoadVector` node with `type` `int`, and maximal `size` possible on the machine (limited by CPU features and `MaxVectorSize`). This is the most common use case. > 2. `@IR(failOn = { IRNode.LOAD_VECTOR_L, IRNode.STORE_VECTOR })` -> fail if there is a `LoadVector` with type `long`, of `any` size. > 3. `@IR(counts = { IRNode.XOR_VI, IRNode.VECTOR_SIZE_4, " > 0 "})` -> find at least one `XorV` node with type `int` and exactly `4` elements. Useful for VectorAPI when the vector species is fixed. > 4. `@IR(counts = { IRNode.LOAD_VECTOR_D, IRNode.VECTOR_SIZE + "min(4, max_double)", " >0 " })` -> search for a `LoadVector` node with `type` `double`, and `size` exactly equals to `min(4, max_double)` (so 4 elements, or if the hardware allows fewer `doubles`, then that number). > 5. `@IR(counts = { IRNode.ABS_VF, IRNode.VECTOR_SIZE + "min(LoopMaxUnroll, max_float)", ">= 1" })` -> find at least one `AbsV` nodes with type `float`, and the `size` exactly equals to the smaller of `LoopMaxUnroll` or the maximal size allow... Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 63 commits: - merge from master, manual merge for VectorLogicalOpIdentityTest.java - Response to Tobias' review - more for Christian's reviews - Apply suggestions from code review Co-authored-by: Christian Hagedorn - fix TestUnorderedReductionPartialVectorization.java - Fix 2 IR framework tests - fix merged tests - Merge branch 'master' into JDK-8310308 - whitespace fix - more refactoring for review - ... and 53 more: https://git.openjdk.org/jdk/compare/711cddd8...a86de32d ------------- Changes: https://git.openjdk.org/jdk/pull/14539/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14539&range=19 Stats: 3317 lines in 66 files changed: 1244 ins; 21 del; 2052 mod Patch: https://git.openjdk.org/jdk/pull/14539.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14539/head:pull/14539 PR: https://git.openjdk.org/jdk/pull/14539 From epeter at openjdk.org Tue Jul 4 11:49:09 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Jul 2023 11:49:09 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v18] In-Reply-To: References: Message-ID: On Mon, 3 Jul 2023 10:42:51 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> more for Christian's reviews > > Looks good, thanks for doing all the updates! Thanks @chhagedorn for all the help with getting this to designed, debugged and reviewed! @TobiHartmann for the reviews and suggestions! I'm now running some last pre-integration tests and will integrate ASAP to avoid more merge conflicts. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14539#issuecomment-1620095330 From epeter at openjdk.org Tue Jul 4 12:01:17 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Jul 2023 12:01:17 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v2] In-Reply-To: References: Message-ID: <2Pgw8cvXHt5MbQOSiD9C_pIIyE5peaxJDbwI_w-9XJY=.1b09b1e6-e4be-48ea-b054-9d2dac2dbf30@github.com> On Mon, 3 Jul 2023 09:20:19 GMT, Pengfei Li wrote: >> Or is there a clear reason why the two are too different? > > We need more investigation and discussions about this. Will discuss with you later. Sounds good. >> Write: >> Vector element size does not match of the store in the statement. > > Yes, we have tried supporting type conversions (between different type sizes) but current solution is not mature and not included in this patch. So this limitation is added here. Ok, fine. Leave that for the future. >> Does that not prevent `a[i+1]` from being accepted? > > That's a really corner case. In C2's ideal graph, most loop statements eventually uses the loop induction variable `phi` node as a input. That's good. But, there is one exception that a loop statement has a sub-expression of `iv + stride`. In this kind of cases, IGVN may do common sub-expression elimination and the inputs may come from the loop increment node thereafter. As the final step of vector masked transformation replaces the loop increment node, the calculation for `iv + stride` will also be replaced as well and it causes mis-compilation. In current patch, I duplicate the loop increment pattern for update (that's why we have `is_loop_incr_pattern()`, see commit 2) to avoid this issue, but currently it only applies to the expression not in array indices, such as `a[i] = i + 1`. For the patterns like `a[i+1] = i`, I'm still looking for a better approach to handle. Ok yes. You will have to find a way to separate out all these kinds of cases: Using incr on backedge Using incr as "i + stride" inside loop Using incr outside/after loop -> final value + 1 -> count iterations BTW: what happens if you use the `phi` directly after/outside the loop? Do you properly count the final value? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1251930727 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1251930451 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1251926315 From epeter at openjdk.org Tue Jul 4 12:01:20 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Jul 2023 12:01:20 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v2] In-Reply-To: References: Message-ID: On Mon, 3 Jul 2023 09:33:18 GMT, Pengfei Li wrote: >> src/hotspot/share/opto/vmaskloop.cpp line 363: >> >>> 361: // Otherwise, use signed subword type or the statement's bottom type >>> 362: if (subword_stmt) { >>> 363: set_elem_bt(node, get_signed_subword_bt(stmt_bottom_type)); >> >> Why are you taking only the signed subword type, and not unsigned (eg for char you take short)? > > Current SuperWord also does in this way (see `SuperWord::container_type()`). A main reason is that some matching rules on some backends (like x86) only matches signed subword type. AFAICR, it's good to removing this for AArch64. Ok. This sounds like we should probably refactor the backend accordingly. That would simplify things for loop vectorizer / SuperWord. >> src/hotspot/share/opto/vmaskloop.cpp line 548: >> >>> 546: // Check supported memory access via SWPointer. It's not supported if >>> 547: // 1) The constructed SWPointer is invalid >>> 548: // 2) Address is growing down (index scale * loop stride < 0) >> >> Is that a limitation that could be removed in the future? > > Yes, at least on SVE2. For growing up memory accesses, we generate vector masks that indicate active lanes at lower parts of a vector. But it's opposite for growing down memory accesses where active lanes are at higher parts of a vector. Only SVE2 of AArch64 can generate vector masks in this way, current SVE(1) can not. I'm not sure whether x86 AVX-512 has the similar ability. There must surely be some way. The only question is what is the cheapest way to do it, ie with the fewest number of instructions. >> src/hotspot/share/opto/vmaskloop.cpp line 549: >> >>> 547: // 1) The constructed SWPointer is invalid >>> 548: // 2) Address is growing down (index scale * loop stride < 0) >>> 549: // 3) Memory access scale is different from data size >> >> I guess this could also be relaxed for strided accesses in the future? > > Exactly! I have tried supporting some basic strided accesses. The code is not included in this patch as it's not that beneficial on some CPUs and requires more C2 refactorings. Great, you should probably leave that to a future RFE anyway. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1251931841 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1251929419 PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1251930025 From epeter at openjdk.org Tue Jul 4 12:05:18 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Jul 2023 12:05:18 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v2] In-Reply-To: References: Message-ID: On Mon, 3 Jul 2023 09:44:53 GMT, Pengfei Li wrote: >> src/hotspot/share/opto/vmaskloop.cpp line 357: >> >>> 355: set_elem_bt(node, mem_type); >>> 356: } else { >>> 357: trace_msg(node, "Subword operand does not have precise type"); >> >> Not clear to me what this means. > > Precise type info about signedness means that we know exactly whether the data is signed or unsigned. For some operations, such as right shift, results are different for signed and unsigned operands, so C2 has to know the signedness. However, in any Java arithmetic operation, operands of Java subword types are promoted to int first. Sometimes, for example, if an intermediate result is a binary operation of both signed and unsigned, we don't have the precise type info, so we don't know how to vectorize it. (see below example where the signedness info is lost after a short and a char are added) > > for (int i = 0; i < SIZE; i++) { > shorts[i] = (shorts[i] + chars[i]) >> 10; > } That is annoying. Do you think we can do something about this in the future, or is this just a fundamental restriction of Java / C2? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1251934470 From epeter at openjdk.org Tue Jul 4 15:13:03 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 4 Jul 2023 15:13:03 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v20] In-Reply-To: References: Message-ID: <-qqrbn3AEg8hWGR54mECSjSIzeuY_WgoNAT-sHtnXa0=.f99edc34-44e5-4724-b338-293f521d165b@github.com> On Tue, 4 Jul 2023 11:44:47 GMT, Emanuel Peter wrote: >> For some changes to `SuperWord`, and maybe auto-vectorization in general, I want to strengthen the IR Framework. >> >> **Motivation** >> I want to not just find the relevant IR nodes, but also assert that they have the maximal length that they could have on the respective platform (given the CPU features and `MaxVectorSize`). Without this verification it is possible that a future change leads to a regression where we still vectorize but at shorter vector widths as before - leading to performance loss. >> >> **How to use it** >> >> All `IRNode`s in `test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java` that are created with `vectorNode` are now all matched with their `type` and `size`. The regex might now look something like this: >> >> `"(\d+(\s){2}(VectorCastF2X.*)+(\s){2}===.*vector[A-Za-z][8]:{int})"` >> which would match with IR nodes dumped like that: >> `1150 VectorCastF2X === _ 1151 [[ 1146 ]] #vectory[8]:{int} ...` >> >> The goal was to keep it simple and straight forward. In most cases, you can just use the nodes as before, and implicitly we now check for maximal size automatically. However, in some cases we want to ensure there is no or only a limited number of nodes (`failOn` or comparison `<` or `<=` or `=0`) - in those cases we usually want to make sure there is not any node of any size, so we match with any size by default. The size can also explicitly be constrained using `IRNode.VECTOR_SIZE`. >> >> Some examples: >> 1. `@IR(counts = {IRNode.LOAD_VECTOR_I, " >0 "})` -> search for a `LoadVector` node with `type` `int`, and maximal `size` possible on the machine (limited by CPU features and `MaxVectorSize`). This is the most common use case. >> 2. `@IR(failOn = { IRNode.LOAD_VECTOR_L, IRNode.STORE_VECTOR })` -> fail if there is a `LoadVector` with type `long`, of `any` size. >> 3. `@IR(counts = { IRNode.XOR_VI, IRNode.VECTOR_SIZE_4, " > 0 "})` -> find at least one `XorV` node with type `int` and exactly `4` elements. Useful for VectorAPI when the vector species is fixed. >> 4. `@IR(counts = { IRNode.LOAD_VECTOR_D, IRNode.VECTOR_SIZE + "min(4, max_double)", " >0 " })` -> search for a `LoadVector` node with `type` `double`, and `size` exactly equals to `min(4, max_double)` (so 4 elements, or if the hardware allows fewer `doubles`, then that number). >> 5. `@IR(counts = { IRNode.ABS_VF, IRNode.VECTOR_SIZE + "min(LoopMaxUnroll, max_float)", ">= 1" })` -> find at least one `AbsV` nodes with type `float`, and the `size` exactly equals to the smaller of... > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 63 commits: > > - merge from master, manual merge for VectorLogicalOpIdentityTest.java > - Response to Tobias' review > - more for Christian's reviews > - Apply suggestions from code review > > Co-authored-by: Christian Hagedorn > - fix TestUnorderedReductionPartialVectorization.java > - Fix 2 IR framework tests > - fix merged tests > - Merge branch 'master' into JDK-8310308 > - whitespace fix > - more refactoring for review > - ... and 53 more: https://git.openjdk.org/jdk/compare/711cddd8...a86de32d Ah, it turns out on our testing infrastructure it looks all green, but on GitHub actions I have 36 failing tests. And I think they are almost all related. Will need to investigate. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14539#issuecomment-1620423281 From jbhateja at openjdk.org Tue Jul 4 19:42:07 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 4 Jul 2023 19:42:07 GMT Subject: RFR: 8309531: Incorrect result with unwrapped iotaShuffle. [v7] In-Reply-To: <0OY_OdZGlSeQhVmmSyYitp4XjtKSXziY1p0wKy6V68s=.68c8351c-7f6e-44ca-91dd-35fa83cfbb8d@github.com> References: <0OY_OdZGlSeQhVmmSyYitp4XjtKSXziY1p0wKy6V68s=.68c8351c-7f6e-44ca-91dd-35fa83cfbb8d@github.com> Message-ID: <1ScYh_SsRObWtf7ukAs-PFDeHQMy2hjLc5AWlt5w2qQ=.b6ff1285-f2b4-4fde-8208-f2d66f3dbb03@github.com> > Patch fixes following two issues in iotaShuffle inline expander with unwrapped indices. > 1) Disable intrinsification if effective index do not lie within byte value range. > 2) Use GT predicate while computing comparison mask for all the indices above vector length. > > No performance degradation seen with existing slice/unslice operations which internally calls wrapped iotaShuffle. > > This interim patch addresses incorrectness around iotaShuffle till we introduce modified shuffle implementations > with JDK-8310691. > > Kindly review and share feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments addressed. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14700/files - new: https://git.openjdk.org/jdk/pull/14700/files/1a48af2a..c181fcf0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14700&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14700&range=05-06 Stats: 14 lines in 1 file changed: 0 ins; 6 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/14700.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14700/head:pull/14700 PR: https://git.openjdk.org/jdk/pull/14700 From jbhateja at openjdk.org Tue Jul 4 19:42:09 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 4 Jul 2023 19:42:09 GMT Subject: RFR: 8309531: Incorrect result with unwrapped iotaShuffle. [v6] In-Reply-To: References: <0OY_OdZGlSeQhVmmSyYitp4XjtKSXziY1p0wKy6V68s=.68c8351c-7f6e-44ca-91dd-35fa83cfbb8d@github.com> <79P4C0c_nBrk5vF8IQkWhz3uALJPnLs-XE8BKnEC6Ho=.43391ac3-78ee-4a57-8042-6bf854a5ffb1@github.com> Message-ID: On Tue, 4 Jul 2023 06:10:47 GMT, Tobias Hartmann wrote: > Please don't add FIXME's to new code or at least file a follow-up RFE and reference it here. JDK-8311305 filed to address this in a follow-up RFE. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14700#discussion_r1252307918 From jbhateja at openjdk.org Tue Jul 4 19:50:18 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 4 Jul 2023 19:50:18 GMT Subject: RFR: 8311023: assert(false) failed: EA: missing memory path [v2] In-Reply-To: References: Message-ID: > Handling missing cases for VectorizedHashCode while collecting memory nodes for propagating new type information through the graph. > > We associate new instance types with CheckCastPP nodes succeeding allocation IR, refresh connectivity of MemoryMerge slices at alias indices corresponding to new instance type and update the memory edges of user memory nodes in the ideal graph to ease out scalar replacements. > > Please review and share feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Rename test. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14764/files - new: https://git.openjdk.org/jdk/pull/14764/files/75e1642f..d56563ff Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14764&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14764&range=00-01 Stats: 0 lines in 1 file changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14764.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14764/head:pull/14764 PR: https://git.openjdk.org/jdk/pull/14764 From thartmann at openjdk.org Wed Jul 5 05:08:04 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 5 Jul 2023 05:08:04 GMT Subject: RFR: 8309531: Incorrect result with unwrapped iotaShuffle. [v7] In-Reply-To: <1ScYh_SsRObWtf7ukAs-PFDeHQMy2hjLc5AWlt5w2qQ=.b6ff1285-f2b4-4fde-8208-f2d66f3dbb03@github.com> References: <0OY_OdZGlSeQhVmmSyYitp4XjtKSXziY1p0wKy6V68s=.68c8351c-7f6e-44ca-91dd-35fa83cfbb8d@github.com> <1ScYh_SsRObWtf7ukAs-PFDeHQMy2hjLc5AWlt5w2qQ=.b6ff1285-f2b4-4fde-8208-f2d66f3dbb03@github.com> Message-ID: On Tue, 4 Jul 2023 19:42:07 GMT, Jatin Bhateja wrote: >> Patch fixes following two issues in iotaShuffle inline expander with unwrapped indices. >> 1) Disable intrinsification if effective index do not lie within byte value range. >> 2) Use GT predicate while computing comparison mask for all the indices above vector length. >> >> No performance degradation seen with existing slice/unslice operations which internally calls wrapped iotaShuffle. >> >> This interim patch addresses incorrectness around iotaShuffle till we introduce modified shuffle implementations >> with JDK-8310691. >> >> Kindly review and share feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments addressed. Thanks. Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14700#pullrequestreview-1513663992 From jbhateja at openjdk.org Wed Jul 5 05:49:10 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 5 Jul 2023 05:49:10 GMT Subject: Integrated: 8309531: Incorrect result with unwrapped iotaShuffle. In-Reply-To: <0OY_OdZGlSeQhVmmSyYitp4XjtKSXziY1p0wKy6V68s=.68c8351c-7f6e-44ca-91dd-35fa83cfbb8d@github.com> References: <0OY_OdZGlSeQhVmmSyYitp4XjtKSXziY1p0wKy6V68s=.68c8351c-7f6e-44ca-91dd-35fa83cfbb8d@github.com> Message-ID: <1FYaljEPzXt0pZ2Lx_3bMrWhKVwgdwkA3Zgcwrw6AD8=.b418754b-b286-4472-8a0a-15dbf5534228@github.com> On Wed, 28 Jun 2023 17:59:07 GMT, Jatin Bhateja wrote: > Patch fixes following two issues in iotaShuffle inline expander with unwrapped indices. > 1) Disable intrinsification if effective index do not lie within byte value range. > 2) Use GT predicate while computing comparison mask for all the indices above vector length. > > No performance degradation seen with existing slice/unslice operations which internally calls wrapped iotaShuffle. > > This interim patch addresses incorrectness around iotaShuffle till we introduce modified shuffle implementations > with JDK-8310691. > > Kindly review and share feedback. > > Best Regards, > Jatin This pull request has now been integrated. Changeset: d6578bff Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/d6578bff1c69ebc165fc9734e6503bd2d5d021c2 Stats: 158 lines in 2 files changed: 128 ins; 5 del; 25 mod 8309531: Incorrect result with unwrapped iotaShuffle. Reviewed-by: sviswanathan, xgong, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/14700 From thartmann at openjdk.org Wed Jul 5 06:29:15 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 5 Jul 2023 06:29:15 GMT Subject: [jdk21] RFR: 8309531: Incorrect result with unwrapped iotaShuffle. Message-ID: Backport of [JDK-8309531](https://bugs.openjdk.java.net/browse/JDK-8309531). Applies cleanly. Thanks, Tobias ------------- Commit messages: - 8309531: Incorrect result with unwrapped iotaShuffle. Changes: https://git.openjdk.org/jdk21/pull/95/files Webrev: https://webrevs.openjdk.org/?repo=jdk21&pr=95&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8309531 Stats: 158 lines in 2 files changed: 128 ins; 5 del; 25 mod Patch: https://git.openjdk.org/jdk21/pull/95.diff Fetch: git fetch https://git.openjdk.org/jdk21.git pull/95/head:pull/95 PR: https://git.openjdk.org/jdk21/pull/95 From fgao at openjdk.org Wed Jul 5 06:39:08 2023 From: fgao at openjdk.org (Fei Gao) Date: Wed, 5 Jul 2023 06:39:08 GMT Subject: RFR: 8308340: C2: Idealize Fma nodes [v3] In-Reply-To: References: Message-ID: <0dNegp_gS_SFEX9RXNlxNQyC9BtXH3BFKtNJOJQ-1vY=.99ece825-d5cd-47d3-a38f-9022c54907d2@github.com> On Wed, 28 Jun 2023 08:17:37 GMT, Fei Gao wrote: >> Some platforms, like aarch64, ppc, and riscv, support fusing `Math.fma(-a, b, c)` or `Math.fma(a, -b, c)` by generating partially symmetric match rules like: >> >> >> match(Set dst (FmaF src3 (Binary (NegF src1) src2))); >> match(Set dst (FmaF src3 (Binary src1 (NegF src2)))); >> >> >> Since `Fma` is partially commutative, the patch is to convert `Math.fma(-a, b, c)` to `Math.fma(b, -a, c)` in gvn phase, making node patterns canonical. Then we can remove redundant rules. >> >> Also, we should guarantee that C2 generates `Fma` nodes only on platforms supporting `Fma` instructions before matcher, so we can remove all `predicate(UseFMA)` for all `Fma` rules. >> >> After the patch, the code size of libjvm.so on aarch64 platform decreased by 63.4k. >> >> The patch passed all tier 1 - 3 on aarch64 and x86 platforms. > > Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into fg8308340 > - Move check for UseFMA from c2compiler.cpp to Matcher::match_rule_supported in .ad files > - Merge branch 'master' into fg8308340 > - 8308340: C2: Idealize Fma nodes > > Some platforms, like aarch64, ppc, and riscv, support fusing > `Math.fma(-a, b, c)` or `Math.fma(a, -b, c)` by generating > partially symmetric match rules like: > > ``` > match(Set dst (FmaF src3 (Binary (NegF src1) src2))); > match(Set dst (FmaF src3 (Binary src1 (NegF src2)))); > ``` > > Since `Fma` is partially communitive, the patch is to convert > `Math.fma(-a, b, c)` to `Math.fma(b, -a, c)` in gvn phase, > making node patterns canonical. Then we can remove redundant > rules. > > Also, we should guarantee that C2 generates `Fma` nodes only on > platforms supporting `Fma` instructions before matcher, so we > can remove all `predicate(UseFMA)` for all `Fma` rules. > > After the patch, the code size of libjvm.so on aarch64 platform > decreased by 63.4k. > > The patch passed all tier 1 - 3 on aarch64 and x86 platforms. Can I get a second review, please? Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14576#issuecomment-1621115459 From epeter at openjdk.org Wed Jul 5 07:13:31 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 5 Jul 2023 07:13:31 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v21] In-Reply-To: References: Message-ID: > For some changes to `SuperWord`, and maybe auto-vectorization in general, I want to strengthen the IR Framework. > > **Motivation** > I want to not just find the relevant IR nodes, but also assert that they have the maximal length that they could have on the respective platform (given the CPU features and `MaxVectorSize`). Without this verification it is possible that a future change leads to a regression where we still vectorize but at shorter vector widths as before - leading to performance loss. > > **How to use it** > > All `IRNode`s in `test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java` that are created with `vectorNode` are now all matched with their `type` and `size`. The regex might now look something like this: > > `"(\d+(\s){2}(VectorCastF2X.*)+(\s){2}===.*vector[A-Za-z][8]:{int})"` > which would match with IR nodes dumped like that: > `1150 VectorCastF2X === _ 1151 [[ 1146 ]] #vectory[8]:{int} ...` > > The goal was to keep it simple and straight forward. In most cases, you can just use the nodes as before, and implicitly we now check for maximal size automatically. However, in some cases we want to ensure there is no or only a limited number of nodes (`failOn` or comparison `<` or `<=` or `=0`) - in those cases we usually want to make sure there is not any node of any size, so we match with any size by default. The size can also explicitly be constrained using `IRNode.VECTOR_SIZE`. > > Some examples: > 1. `@IR(counts = {IRNode.LOAD_VECTOR_I, " >0 "})` -> search for a `LoadVector` node with `type` `int`, and maximal `size` possible on the machine (limited by CPU features and `MaxVectorSize`). This is the most common use case. > 2. `@IR(failOn = { IRNode.LOAD_VECTOR_L, IRNode.STORE_VECTOR })` -> fail if there is a `LoadVector` with type `long`, of `any` size. > 3. `@IR(counts = { IRNode.XOR_VI, IRNode.VECTOR_SIZE_4, " > 0 "})` -> find at least one `XorV` node with type `int` and exactly `4` elements. Useful for VectorAPI when the vector species is fixed. > 4. `@IR(counts = { IRNode.LOAD_VECTOR_D, IRNode.VECTOR_SIZE + "min(4, max_double)", " >0 " })` -> search for a `LoadVector` node with `type` `double`, and `size` exactly equals to `min(4, max_double)` (so 4 elements, or if the hardware allows fewer `doubles`, then that number). > 5. `@IR(counts = { IRNode.ABS_VF, IRNode.VECTOR_SIZE + "min(LoopMaxUnroll, max_float)", ">= 1" })` -> find at least one `AbsV` nodes with type `float`, and the `size` exactly equals to the smaller of `LoopMaxUnroll` or the maximal size allow... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: print VMInfo from Test VM ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14539/files - new: https://git.openjdk.org/jdk/pull/14539/files/a86de32d..bbd4ce8f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14539&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14539&range=19-20 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14539.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14539/head:pull/14539 PR: https://git.openjdk.org/jdk/pull/14539 From jbhateja at openjdk.org Wed Jul 5 07:44:17 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 5 Jul 2023 07:44:17 GMT Subject: RFR: 8311023: assert(false) failed: EA: missing memory path [v3] In-Reply-To: References: Message-ID: > Handling missing cases for VectorizedHashCode while collecting memory nodes for propagating new type information through the graph. > > We associate new instance types with CheckCastPP nodes succeeding allocation IR, refresh connectivity of MemoryMerge slices at alias indices corresponding to new instance type and update the memory edges of user memory nodes in the ideal graph to ease out scalar replacements. > > Please review and share feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8311023 - Missed test update. - Rename test. - 8311023: assert(false) failed: EA: missing memory path ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14764/files - new: https://git.openjdk.org/jdk/pull/14764/files/d56563ff..344fdf9b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14764&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14764&range=01-02 Stats: 279 lines in 29 files changed: 178 ins; 29 del; 72 mod Patch: https://git.openjdk.org/jdk/pull/14764.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14764/head:pull/14764 PR: https://git.openjdk.org/jdk/pull/14764 From chagedorn at openjdk.org Wed Jul 5 09:16:54 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 5 Jul 2023 09:16:54 GMT Subject: RFR: 8311023: assert(false) failed: EA: missing memory path [v3] In-Reply-To: References: Message-ID: On Wed, 5 Jul 2023 07:44:17 GMT, Jatin Bhateja wrote: >> Handling missing cases for VectorizedHashCode while collecting memory nodes for propagating new type information through the graph. >> >> We associate new instance types with CheckCastPP nodes succeeding allocation IR, refresh connectivity of MemoryMerge slices at alias indices corresponding to new instance type and update the memory edges of user memory nodes in the ideal graph to ease out scalar replacements. >> >> Please review and share feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8311023 > - Missed test update. > - Rename test. > - 8311023: assert(false) failed: EA: missing memory path Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14764#pullrequestreview-1514052500 From chagedorn at openjdk.org Wed Jul 5 09:18:12 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 5 Jul 2023 09:18:12 GMT Subject: RFR: 8311279: TestStressIGVNAndCCP.java failed with different IGVN traces for the same seed Message-ID: In `TestStressIGVNAndCCP`, we are executing a JVM twice and only want to compile and then run a single method `sum()`. The expectation is that we get the same output with `-XX:+TraceIterativeGVN`. However, our testing found a case where this did not match. When looking at the diff, I've noticed that with one JVM `sum()` had compile id 1, while the other JVM used compile id 2 for `sum()`. This makes a difference for the `debug_idx` which is printed for a dead node as "`compile id * 10000000000 + node index`": Compile id 1: 80 Phi === _ _ _ [[ ]] [10000000080] ... Compile id 2: 80 Phi === _ _ _ [[ ]] [20000000080] ... I was not able to reproduce the original report but my suspicion is that one JVM additionally compiled a native method wrapper or a method handle intrinsic for some reason but the other one did not. This would explain the different compile id because we are should only compiling `sum()` with the given `CompileOnly` JVM flag. A native compilation can be triggered, for example, by passing additionally passing `-esa` with `CompileOnly`. We get compile id 3 for `sum()`: 60 1 n java.lang.invoke.MethodHandle::invokeBasic()I (native) 60 2 n java.lang.invoke.MethodHandle::linkToSpecial(LL)I (native) (static) 69 3 b compiler.debug.TestStressIGVNAndCCP::sum (27 bytes) To fix this, I suggest to use the `-XX:+CICountNative` flag which uses a separate counter for native compilations. Then, we'll always get compile id 1 for `sum()`: 50 1 n java.lang.invoke.MethodHandle::invokeBasic()I (native) 51 2 n java.lang.invoke.MethodHandle::linkToSpecial(LL)I (native) (static) 59 1 b compiler.debug.TestStressIGVNAndCCP::sum (27 bytes) This is the same approach as done in [JDK-8269342](https://bugs.openjdk.org/browse/JDK-8269342) to reliably crash with `-XX:CICrashAt=1` in the first non-native compilation. Thanks, Christian ------------- Commit messages: - 8311279: TestStressIGVNAndCCP.java failed with different IGVN traces for the same seed Changes: https://git.openjdk.org/jdk/pull/14771/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14771&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8311279 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14771.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14771/head:pull/14771 PR: https://git.openjdk.org/jdk/pull/14771 From thartmann at openjdk.org Wed Jul 5 10:26:00 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 5 Jul 2023 10:26:00 GMT Subject: RFR: 8311279: TestStressIGVNAndCCP.java failed with different IGVN traces for the same seed In-Reply-To: References: Message-ID: On Wed, 5 Jul 2023 09:10:48 GMT, Christian Hagedorn wrote: > In `TestStressIGVNAndCCP`, we are executing a JVM twice and only want to compile and then run a single method `sum()`. The expectation is that we get the same output with `-XX:+TraceIterativeGVN`. However, our testing found a case where this did not match. When looking at the diff, I've noticed that with one JVM `sum()` had compile id 1, while the other JVM used compile id 2 for `sum()`. This makes a difference for the `debug_idx` which is printed for a dead node as "`compile id * 10000000000 + node index`": > > Compile id 1: > > 80 Phi === _ _ _ [[ ]] [10000000080] ... > > Compile id 2: > > 80 Phi === _ _ _ [[ ]] [20000000080] ... > > > I was not able to reproduce the original report but my suspicion is that one JVM additionally compiled a native method wrapper or a method handle intrinsic for some reason but the other one did not. This would explain the different compile id because we are should only compiling `sum()` with the given `CompileOnly` JVM flag. A native compilation can be triggered, for example, by passing additionally passing `-esa` with `CompileOnly`. We get compile id 3 for `sum()`: > > 60 1 n java.lang.invoke.MethodHandle::invokeBasic()I (native) > 60 2 n java.lang.invoke.MethodHandle::linkToSpecial(LL)I (native) (static) > 69 3 b compiler.debug.TestStressIGVNAndCCP::sum (27 bytes) > > > To fix this, I suggest to use the `-XX:+CICountNative` flag which uses a separate counter for native compilations. Then, we'll always get compile id 1 for `sum()`: > > 50 1 n java.lang.invoke.MethodHandle::invokeBasic()I (native) > 51 2 n java.lang.invoke.MethodHandle::linkToSpecial(LL)I (native) (static) > 59 1 b compiler.debug.TestStressIGVNAndCCP::sum (27 bytes) > > > This is the same approach as done in [JDK-8269342](https://bugs.openjdk.org/browse/JDK-8269342) to reliably crash with `-XX:CICrashAt=1` in the first non-native compilation. > > Thanks, > Christian That looks good and trivial to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14771#pullrequestreview-1514186228 From chagedorn at openjdk.org Wed Jul 5 10:33:53 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 5 Jul 2023 10:33:53 GMT Subject: RFR: 8311279: TestStressIGVNAndCCP.java failed with different IGVN traces for the same seed In-Reply-To: References: Message-ID: On Wed, 5 Jul 2023 09:10:48 GMT, Christian Hagedorn wrote: > In `TestStressIGVNAndCCP`, we are executing a JVM twice and only want to compile and then run a single method `sum()`. The expectation is that we get the same output with `-XX:+TraceIterativeGVN`. However, our testing found a case where this did not match. When looking at the diff, I've noticed that with one JVM `sum()` had compile id 1, while the other JVM used compile id 2 for `sum()`. This makes a difference for the `debug_idx` which is printed for a dead node as "`compile id * 10000000000 + node index`": > > Compile id 1: > > 80 Phi === _ _ _ [[ ]] [10000000080] ... > > Compile id 2: > > 80 Phi === _ _ _ [[ ]] [20000000080] ... > > > I was not able to reproduce the original report but my suspicion is that one JVM additionally compiled a native method wrapper or a method handle intrinsic for some reason but the other one did not. This would explain the different compile id because we are should only compiling `sum()` with the given `CompileOnly` JVM flag. A native compilation can be triggered, for example, by passing additionally passing `-esa` with `CompileOnly`. We get compile id 3 for `sum()`: > > 60 1 n java.lang.invoke.MethodHandle::invokeBasic()I (native) > 60 2 n java.lang.invoke.MethodHandle::linkToSpecial(LL)I (native) (static) > 69 3 b compiler.debug.TestStressIGVNAndCCP::sum (27 bytes) > > > To fix this, I suggest to use the `-XX:+CICountNative` flag which uses a separate counter for native compilations. Then, we'll always get compile id 1 for `sum()`: > > 50 1 n java.lang.invoke.MethodHandle::invokeBasic()I (native) > 51 2 n java.lang.invoke.MethodHandle::linkToSpecial(LL)I (native) (static) > 59 1 b compiler.debug.TestStressIGVNAndCCP::sum (27 bytes) > > > This is the same approach as done in [JDK-8269342](https://bugs.openjdk.org/browse/JDK-8269342) to reliably crash with `-XX:CICrashAt=1` in the first non-native compilation. > > Thanks, > Christian Thanks Tobias for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14771#issuecomment-1621484833 From chagedorn at openjdk.org Wed Jul 5 10:33:58 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 5 Jul 2023 10:33:58 GMT Subject: RFR: 8311087: PhiNode::wait_for_region_igvn should break early [v2] In-Reply-To: <0F7xhykjE9fU9icg9Nb0xzGvIIw4bFkJeba9fYgdz5M=.9b292f1b-8af1-4f9c-ae11-39be762d973d@github.com> References: <0F7xhykjE9fU9icg9Nb0xzGvIIw4bFkJeba9fYgdz5M=.9b292f1b-8af1-4f9c-ae11-39be762d973d@github.com> Message-ID: On Tue, 4 Jul 2023 10:33:08 GMT, Johan Sj?len wrote: >> src/hotspot/share/opto/cfgnode.cpp line 1966: >> >>> 1964: delay = true; >>> 1965: break; >>> 1966: } >> >> Just an idea, how about putting this into a separate method `should_delay()` (or something like that) and replacing `continue` with `return false` and `break` with `return true`? If `should_delay()` is true at some point, we can push `this` to the worklist and return true. >> >> But looks good either way. > > Yeah, it could be a lambda (this isn't used anywhere else), but not sure that it'd give us too much in terms of clarity here. I don't have a strong opinion here - it's fine like that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14707#discussion_r1252902117 From chagedorn at openjdk.org Wed Jul 5 10:33:56 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 5 Jul 2023 10:33:56 GMT Subject: RFR: 8311087: PhiNode::wait_for_region_igvn should break early [v2] In-Reply-To: <2AVJe88tw0HV9HbHk_T3GLo-oIvnMNAWEkpw9SLkTYc=.3d55a112-c9e1-4ff1-a076-9976234af8a5@github.com> References: <2AVJe88tw0HV9HbHk_T3GLo-oIvnMNAWEkpw9SLkTYc=.3d55a112-c9e1-4ff1-a076-9976234af8a5@github.com> Message-ID: On Tue, 4 Jul 2023 10:35:21 GMT, Johan Sj?len wrote: >> Hi, please consider this PR. >> >> Instead of continuing the loop we break after setting `delay = true`. I also flattened out the indentation by transforming `A && B` into `!A || !B` and `continue`ing if so. This makes the code a bit longer, but clearer imho. >> >> I've run `TestCastIIAfterUnrollingInOuterLoop`, which is the test that was added along with this method. I'm also running tier1. >> >> Thanks, >> Johan > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Wrap continue in braces Looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14707#pullrequestreview-1514199729 From fgao at openjdk.org Wed Jul 5 11:11:22 2023 From: fgao at openjdk.org (Fei Gao) Date: Wed, 5 Jul 2023 11:11:22 GMT Subject: RFR: 8308340: C2: Idealize Fma nodes [v4] In-Reply-To: References: Message-ID: > Some platforms, like aarch64, ppc, and riscv, support fusing `Math.fma(-a, b, c)` or `Math.fma(a, -b, c)` by generating partially symmetric match rules like: > > > match(Set dst (FmaF src3 (Binary (NegF src1) src2))); > match(Set dst (FmaF src3 (Binary src1 (NegF src2)))); > > > Since `Fma` is partially commutative, the patch is to convert `Math.fma(-a, b, c)` to `Math.fma(b, -a, c)` in gvn phase, making node patterns canonical. Then we can remove redundant rules. > > Also, we should guarantee that C2 generates `Fma` nodes only on platforms supporting `Fma` instructions before matcher, so we can remove all `predicate(UseFMA)` for all `Fma` rules. > > After the patch, the code size of libjvm.so on aarch64 platform decreased by 63.4k. > > The patch passed all tier 1 - 3 on aarch64 and x86 platforms. Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'master' into fg8308340 - Merge branch 'master' into fg8308340 - Move check for UseFMA from c2compiler.cpp to Matcher::match_rule_supported in .ad files - Merge branch 'master' into fg8308340 - 8308340: C2: Idealize Fma nodes Some platforms, like aarch64, ppc, and riscv, support fusing `Math.fma(-a, b, c)` or `Math.fma(a, -b, c)` by generating partially symmetric match rules like: ``` match(Set dst (FmaF src3 (Binary (NegF src1) src2))); match(Set dst (FmaF src3 (Binary src1 (NegF src2)))); ``` Since `Fma` is partially communitive, the patch is to convert `Math.fma(-a, b, c)` to `Math.fma(b, -a, c)` in gvn phase, making node patterns canonical. Then we can remove redundant rules. Also, we should guarantee that C2 generates `Fma` nodes only on platforms supporting `Fma` instructions before matcher, so we can remove all `predicate(UseFMA)` for all `Fma` rules. After the patch, the code size of libjvm.so on aarch64 platform decreased by 63.4k. The patch passed all tier 1 - 3 on aarch64 and x86 platforms. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14576/files - new: https://git.openjdk.org/jdk/pull/14576/files/06162d88..8d6d98ec Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14576&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14576&range=02-03 Stats: 7589 lines in 496 files changed: 4438 ins; 1088 del; 2063 mod Patch: https://git.openjdk.org/jdk/pull/14576.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14576/head:pull/14576 PR: https://git.openjdk.org/jdk/pull/14576 From clanger at openjdk.org Wed Jul 5 13:53:58 2023 From: clanger at openjdk.org (Christoph Langer) Date: Wed, 5 Jul 2023 13:53:58 GMT Subject: RFR: JDK-8310550: Adjust references to rt.jar [v3] In-Reply-To: References: Message-ID: On Fri, 30 Jun 2023 11:37:10 GMT, Matthias Baesken wrote: >> There are a few references to rt.jar in comments and in the codebase itself. Some of them might be removed or adjusted. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > remove import Looks good overall. I made a few suggestions. test/hotspot/jtreg/vmTestbase/nsk/jvmti/AttachOnDemand/attach024/TestDescription.java line 40: > 38: * Agent's JAR file contains modified class java.util.TooManyListenersException (it is assumed > 39: * that this class isn't loaded before agent is loaded), agent instantiates TooManyListenersException > 40: * and checks that non-modified version of this class was loaded from jdk image (not from agent's JAR). "from the jdk image" test/jdk/com/sun/tools/attach/ProviderTest.java line 110: > 108: public static void main(String args[]) throws Exception { > 109: // deal with internal builds where classes are loaded from the > 110: // 'classes' directory rather than the image modules file "... rather than the runtime image" test/langtools/tools/javap/4798312/JavapShouldLoadClassesFromRTJarTest.java line 27: > 25: * @test > 26: * @bug 4798312 > 27: * @summary In Windows, javap doesn't load classes from image "... from the runtime image" ------------- Changes requested by clanger (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14593#pullrequestreview-1514576016 PR Review Comment: https://git.openjdk.org/jdk/pull/14593#discussion_r1253140142 PR Review Comment: https://git.openjdk.org/jdk/pull/14593#discussion_r1253141204 PR Review Comment: https://git.openjdk.org/jdk/pull/14593#discussion_r1253142105 From clanger at openjdk.org Wed Jul 5 13:54:01 2023 From: clanger at openjdk.org (Christoph Langer) Date: Wed, 5 Jul 2023 13:54:01 GMT Subject: RFR: JDK-8310550: Adjust references to rt.jar [v3] In-Reply-To: References: Message-ID: On Thu, 22 Jun 2023 09:21:29 GMT, Matthias Baesken wrote: >> src/jdk.compiler/share/classes/com/sun/tools/javac/file/JavacFileManager.java line 196: >> >>> 194: >>> 195: /** >>> 196: * Set whether or not to use ct.sym as an alternate >> >> As an alternate to what? This needs something else. > > should "to the image modules files" be used instead ? maybe "... to the current runtime."? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14593#discussion_r1253139297 From ecaspole at openjdk.org Wed Jul 5 14:41:55 2023 From: ecaspole at openjdk.org (Eric Caspole) Date: Wed, 5 Jul 2023 14:41:55 GMT Subject: RFR: 8311178: JMH tests don't scale well when sharing output buffers In-Reply-To: References: Message-ID: On Sat, 1 Jul 2023 07:53:17 GMT, Swati Sharma wrote: > The below benchmark files have scaling issues due to cache contention and leads to poor scaling when run on multiple threads. The patch sets the scope from benchmark level to thread level to fix the issue: > - org/openjdk/bench/java/io/DataOutputStreamTest.java > - org/openjdk/bench/java/lang/ArrayCopyObject.java > - org/openjdk/bench/java/lang/ArrayFiddle.java > - org/openjdk/bench/java/time/format/DateTimeFormatterBench.java > - org/openjdk/bench/jdk/incubator/vector/IndexInRangeBenchmark.java > - org/openjdk/bench/jdk/incubator/vector/MemorySegmentVectorAccess.java > - org/openjdk/bench/jdk/incubator/vector/StoreMaskedBenchmark.java > - org/openjdk/bench/jdk/incubator/vector/StoreMaskedIOOBEBenchmark.java > - org/openjdk/bench/vm/compiler/ArrayFill.java > - org/openjdk/bench/vm/compiler/IndexVector.java > > Also removing the static scope for variables in org/openjdk/bench/jdk/incubator/vector/VectorFPtoIntCastOperations.java for better scaling. > > Please review and share your feedback. > > Thanks, > Swati LGTM. Eric ------------- Marked as reviewed by ecaspole (Committer). PR Review: https://git.openjdk.org/jdk/pull/14746#pullrequestreview-1514692914 From mbaesken at openjdk.org Wed Jul 5 15:07:15 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 5 Jul 2023 15:07:15 GMT Subject: RFR: JDK-8310550: Adjust references to rt.jar [v4] In-Reply-To: References: Message-ID: > There are a few references to rt.jar in comments and in the codebase itself. Some of them might be removed or adjusted. Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: Adjust comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14593/files - new: https://git.openjdk.org/jdk/pull/14593/files/9b2232a7..3a7b057a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14593&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14593&range=02-03 Stats: 4 lines in 4 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/14593.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14593/head:pull/14593 PR: https://git.openjdk.org/jdk/pull/14593 From mbaesken at openjdk.org Wed Jul 5 15:07:17 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 5 Jul 2023 15:07:17 GMT Subject: RFR: JDK-8310550: Adjust references to rt.jar [v3] In-Reply-To: References: Message-ID: On Fri, 30 Jun 2023 11:37:10 GMT, Matthias Baesken wrote: >> There are a few references to rt.jar in comments and in the codebase itself. Some of them might be removed or adjusted. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > remove import Hi Christoph, thanks for the suggestions, I added some changes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14593#issuecomment-1621939153 From clanger at openjdk.org Wed Jul 5 15:07:16 2023 From: clanger at openjdk.org (Christoph Langer) Date: Wed, 5 Jul 2023 15:07:16 GMT Subject: RFR: JDK-8310550: Adjust references to rt.jar [v4] In-Reply-To: References: Message-ID: On Wed, 5 Jul 2023 15:01:52 GMT, Matthias Baesken wrote: >> There are a few references to rt.jar in comments and in the codebase itself. Some of them might be removed or adjusted. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > Adjust comments Fine from my end now. Just one minor nit left. ? src/jdk.compiler/share/classes/com/sun/tools/javac/file/JavacFileManager.java line 196: > 194: > 195: /** > 196: * Set whether or not to use ct.sym as an alternate to the current runtime You should bring back the period at the end of the sentence. ------------- Marked as reviewed by clanger (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14593#pullrequestreview-1514740197 PR Review Comment: https://git.openjdk.org/jdk/pull/14593#discussion_r1253244166 From jbhateja at openjdk.org Wed Jul 5 15:40:15 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 5 Jul 2023 15:40:15 GMT Subject: RFR: 8311023: assert(false) failed: EA: missing memory path [v3] In-Reply-To: References: Message-ID: On Wed, 5 Jul 2023 09:14:24 GMT, Christian Hagedorn wrote: >> Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8311023 >> - Missed test update. >> - Rename test. >> - 8311023: assert(false) failed: EA: missing memory path > > Looks good! Thanks @chhagedorn , @TobiHartmann ------------- PR Comment: https://git.openjdk.org/jdk/pull/14764#issuecomment-1622015597 From jbhateja at openjdk.org Wed Jul 5 15:40:17 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 5 Jul 2023 15:40:17 GMT Subject: Integrated: 8311023: assert(false) failed: EA: missing memory path In-Reply-To: References: Message-ID: On Mon, 3 Jul 2023 21:55:44 GMT, Jatin Bhateja wrote: > Handling missing cases for VectorizedHashCode while collecting memory nodes for propagating new type information through the graph. > > We associate new instance types with CheckCastPP nodes succeeding allocation IR, refresh connectivity of MemoryMerge slices at alias indices corresponding to new instance type and update the memory edges of user memory nodes in the ideal graph to ease out scalar replacements. > > Please review and share feedback. > > Best Regards, > Jatin This pull request has now been integrated. Changeset: 6ebb0e3b Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/6ebb0e3bd4ba3579c66cdc5a329e95df7bda5b95 Stats: 52 lines in 2 files changed: 51 ins; 0 del; 1 mod 8311023: assert(false) failed: EA: missing memory path Reviewed-by: thartmann, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/14764 From thartmann at openjdk.org Thu Jul 6 06:12:24 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 6 Jul 2023 06:12:24 GMT Subject: [jdk21] RFR: 8311023: assert(false) failed: EA: missing memory path Message-ID: Backport of [JDK-8311023](https://bugs.openjdk.java.net/browse/JDK-8311023). Applies cleanly. Thanks, Tobias ------------- Commit messages: - 8311023: assert(false) failed: EA: missing memory path Changes: https://git.openjdk.org/jdk21/pull/99/files Webrev: https://webrevs.openjdk.org/?repo=jdk21&pr=99&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8311023 Stats: 52 lines in 2 files changed: 51 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk21/pull/99.diff Fetch: git fetch https://git.openjdk.org/jdk21.git pull/99/head:pull/99 PR: https://git.openjdk.org/jdk21/pull/99 From chagedorn at openjdk.org Thu Jul 6 06:21:00 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 6 Jul 2023 06:21:00 GMT Subject: [jdk21] RFR: 8311023: assert(false) failed: EA: missing memory path In-Reply-To: References: Message-ID: <5dXe212jBVGAoo1yThEQmIG7gCO2cOiykpS4cpg1gRc=.91879cd4-9275-4251-84d8-544168ba776c@github.com> On Thu, 6 Jul 2023 06:03:49 GMT, Tobias Hartmann wrote: > Backport of [JDK-8311023](https://bugs.openjdk.java.net/browse/JDK-8311023). Applies cleanly. > > Thanks, > Tobias Looks good. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk21/pull/99#pullrequestreview-1515834155 From chagedorn at openjdk.org Thu Jul 6 06:22:05 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 6 Jul 2023 06:22:05 GMT Subject: [jdk21] RFR: 8309531: Incorrect result with unwrapped iotaShuffle. In-Reply-To: References: Message-ID: On Wed, 5 Jul 2023 06:21:35 GMT, Tobias Hartmann wrote: > Backport of [JDK-8309531](https://bugs.openjdk.java.net/browse/JDK-8309531). Applies cleanly. > > Thanks, > Tobias Looks good. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk21/pull/95#pullrequestreview-1515835156 From thartmann at openjdk.org Thu Jul 6 06:30:54 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 6 Jul 2023 06:30:54 GMT Subject: [jdk21] RFR: 8311023: assert(false) failed: EA: missing memory path In-Reply-To: References: Message-ID: On Thu, 6 Jul 2023 06:03:49 GMT, Tobias Hartmann wrote: > Backport of [JDK-8311023](https://bugs.openjdk.java.net/browse/JDK-8311023). Applies cleanly. > > Thanks, > Tobias Thanks, Christian! ------------- PR Comment: https://git.openjdk.org/jdk21/pull/99#issuecomment-1623067247 From thartmann at openjdk.org Thu Jul 6 06:33:03 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 6 Jul 2023 06:33:03 GMT Subject: [jdk21] RFR: 8309531: Incorrect result with unwrapped iotaShuffle. In-Reply-To: References: Message-ID: On Wed, 5 Jul 2023 06:21:35 GMT, Tobias Hartmann wrote: > Backport of [JDK-8309531](https://bugs.openjdk.java.net/browse/JDK-8309531). Applies cleanly. > > Thanks, > Tobias Thanks, Christian! ------------- PR Comment: https://git.openjdk.org/jdk21/pull/95#issuecomment-1623066931 From thartmann at openjdk.org Thu Jul 6 06:33:04 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 6 Jul 2023 06:33:04 GMT Subject: [jdk21] Integrated: 8309531: Incorrect result with unwrapped iotaShuffle. In-Reply-To: References: Message-ID: On Wed, 5 Jul 2023 06:21:35 GMT, Tobias Hartmann wrote: > Backport of [JDK-8309531](https://bugs.openjdk.java.net/browse/JDK-8309531). Applies cleanly. > > Thanks, > Tobias This pull request has now been integrated. Changeset: 0ee169f1 Author: Tobias Hartmann URL: https://git.openjdk.org/jdk21/commit/0ee169f14b08b68441217eaa6f4d9d26305d858c Stats: 158 lines in 2 files changed: 128 ins; 5 del; 25 mod 8309531: Incorrect result with unwrapped iotaShuffle. Reviewed-by: chagedorn Backport-of: d6578bff1c69ebc165fc9734e6503bd2d5d021c2 ------------- PR: https://git.openjdk.org/jdk21/pull/95 From chagedorn at openjdk.org Thu Jul 6 06:35:13 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 6 Jul 2023 06:35:13 GMT Subject: Integrated: 8311279: TestStressIGVNAndCCP.java failed with different IGVN traces for the same seed In-Reply-To: References: Message-ID: <4i6DIf7EI-eoakLPqpJPP8nn9nyYgH0qfBkDTdGocVA=.dc8cdfa9-46bf-482d-9e8f-6b37c32d61d6@github.com> On Wed, 5 Jul 2023 09:10:48 GMT, Christian Hagedorn wrote: > In `TestStressIGVNAndCCP`, we are executing a JVM twice and only want to compile and then run a single method `sum()`. The expectation is that we get the same output with `-XX:+TraceIterativeGVN`. However, our testing found a case where this did not match. When looking at the diff, I've noticed that with one JVM `sum()` had compile id 1, while the other JVM used compile id 2 for `sum()`. This makes a difference for the `debug_idx` which is printed for a dead node as "`compile id * 10000000000 + node index`": > > Compile id 1: > > 80 Phi === _ _ _ [[ ]] [10000000080] ... > > Compile id 2: > > 80 Phi === _ _ _ [[ ]] [20000000080] ... > > > I was not able to reproduce the original report but my suspicion is that one JVM additionally compiled a native method wrapper or a method handle intrinsic for some reason but the other one did not. This would explain the different compile id because we are should only compiling `sum()` with the given `CompileOnly` JVM flag. A native compilation can be triggered, for example, by passing additionally passing `-esa` with `CompileOnly`. We get compile id 3 for `sum()`: > > 60 1 n java.lang.invoke.MethodHandle::invokeBasic()I (native) > 60 2 n java.lang.invoke.MethodHandle::linkToSpecial(LL)I (native) (static) > 69 3 b compiler.debug.TestStressIGVNAndCCP::sum (27 bytes) > > > To fix this, I suggest to use the `-XX:+CICountNative` flag which uses a separate counter for native compilations. Then, we'll always get compile id 1 for `sum()`: > > 50 1 n java.lang.invoke.MethodHandle::invokeBasic()I (native) > 51 2 n java.lang.invoke.MethodHandle::linkToSpecial(LL)I (native) (static) > 59 1 b compiler.debug.TestStressIGVNAndCCP::sum (27 bytes) > > > This is the same approach as done in [JDK-8269342](https://bugs.openjdk.org/browse/JDK-8269342) to reliably crash with `-XX:CICrashAt=1` in the first non-native compilation. > > Thanks, > Christian This pull request has now been integrated. Changeset: edb2be10 Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/edb2be10fb897834ed78ab4493d3a4f73dc2e140 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8311279: TestStressIGVNAndCCP.java failed with different IGVN traces for the same seed Reviewed-by: thartmann ------------- PR: https://git.openjdk.org/jdk/pull/14771 From epeter at openjdk.org Thu Jul 6 07:23:39 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 6 Jul 2023 07:23:39 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v22] In-Reply-To: References: Message-ID: > For some changes to `SuperWord`, and maybe auto-vectorization in general, I want to strengthen the IR Framework. > > **Motivation** > I want to not just find the relevant IR nodes, but also assert that they have the maximal length that they could have on the respective platform (given the CPU features and `MaxVectorSize`). Without this verification it is possible that a future change leads to a regression where we still vectorize but at shorter vector widths as before - leading to performance loss. > > **How to use it** > > All `IRNode`s in `test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java` that are created with `vectorNode` are now all matched with their `type` and `size`. The regex might now look something like this: > > `"(\d+(\s){2}(VectorCastF2X.*)+(\s){2}===.*vector[A-Za-z][8]:{int})"` > which would match with IR nodes dumped like that: > `1150 VectorCastF2X === _ 1151 [[ 1146 ]] #vectory[8]:{int} ...` > > The goal was to keep it simple and straight forward. In most cases, you can just use the nodes as before, and implicitly we now check for maximal size automatically. However, in some cases we want to ensure there is no or only a limited number of nodes (`failOn` or comparison `<` or `<=` or `=0`) - in those cases we usually want to make sure there is not any node of any size, so we match with any size by default. The size can also explicitly be constrained using `IRNode.VECTOR_SIZE`. > > Some examples: > 1. `@IR(counts = {IRNode.LOAD_VECTOR_I, " >0 "})` -> search for a `LoadVector` node with `type` `int`, and maximal `size` possible on the machine (limited by CPU features and `MaxVectorSize`). This is the most common use case. > 2. `@IR(failOn = { IRNode.LOAD_VECTOR_L, IRNode.STORE_VECTOR })` -> fail if there is a `LoadVector` with type `long`, of `any` size. > 3. `@IR(counts = { IRNode.XOR_VI, IRNode.VECTOR_SIZE_4, " > 0 "})` -> find at least one `XorV` node with type `int` and exactly `4` elements. Useful for VectorAPI when the vector species is fixed. > 4. `@IR(counts = { IRNode.LOAD_VECTOR_D, IRNode.VECTOR_SIZE + "min(4, max_double)", " >0 " })` -> search for a `LoadVector` node with `type` `double`, and `size` exactly equals to `min(4, max_double)` (so 4 elements, or if the hardware allows fewer `doubles`, then that number). > 5. `@IR(counts = { IRNode.ABS_VF, IRNode.VECTOR_SIZE + "min(LoopMaxUnroll, max_float)", ">= 1" })` -> find at least one `AbsV` nodes with type `float`, and the `size` exactly equals to the smaller of `LoopMaxUnroll` or the maximal size allow... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: TestSpillTheBeans.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14539/files - new: https://git.openjdk.org/jdk/pull/14539/files/bbd4ce8f..e1b452e5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14539&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14539&range=20-21 Stats: 34 lines in 1 file changed: 34 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14539.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14539/head:pull/14539 PR: https://git.openjdk.org/jdk/pull/14539 From mbaesken at openjdk.org Thu Jul 6 07:35:10 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 6 Jul 2023 07:35:10 GMT Subject: RFR: JDK-8310550: Adjust references to rt.jar [v5] In-Reply-To: References: Message-ID: <98WmwQW2HwA0y6V4kHm-Mz75WifXcX1R6eKMq-jQyjU=.b07ce857-c2d1-46cb-9dc5-0dd075ad8dd4@github.com> > There are a few references to rt.jar in comments and in the codebase itself. Some of them might be removed or adjusted. Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: Adjust comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14593/files - new: https://git.openjdk.org/jdk/pull/14593/files/3a7b057a..f29c4019 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14593&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14593&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14593.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14593/head:pull/14593 PR: https://git.openjdk.org/jdk/pull/14593 From mbaesken at openjdk.org Thu Jul 6 07:35:11 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Thu, 6 Jul 2023 07:35:11 GMT Subject: RFR: JDK-8310550: Adjust references to rt.jar [v4] In-Reply-To: References: Message-ID: On Wed, 5 Jul 2023 15:07:15 GMT, Matthias Baesken wrote: >> There are a few references to rt.jar in comments and in the codebase itself. Some of them might be removed or adjusted. > > Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision: > > Adjust comments Hi Christoph, thanks for the review ! I added the '.' as suggested. Any objections to the latest revision? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14593#issuecomment-1623132227 From dholmes at openjdk.org Thu Jul 6 07:35:12 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 6 Jul 2023 07:35:12 GMT Subject: RFR: JDK-8310550: Adjust references to rt.jar [v5] In-Reply-To: References: Message-ID: On Thu, 22 Jun 2023 09:23:05 GMT, Matthias Baesken wrote: >> test/langtools/tools/javap/4798312/JavapShouldLoadClassesFromRTJarTest.java line 1: >> >>> 1: /* >> >> The name of this test includes RTJar. It needs to be changed too I think. Does this test actually still test something? > > It seems to start a javap. So I think it tests something, how important this is and what other tests might cover similar stuff, I do not know unfortunately . This is a trivial test for a trivial issue. javap will be tested much more thoroughly by other tests. I think this test can be deleted without any loss of coverage. Or it can just be left. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14593#discussion_r1254047816 From thartmann at openjdk.org Thu Jul 6 07:45:08 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 6 Jul 2023 07:45:08 GMT Subject: [jdk21] Integrated: 8311023: assert(false) failed: EA: missing memory path In-Reply-To: References: Message-ID: On Thu, 6 Jul 2023 06:03:49 GMT, Tobias Hartmann wrote: > Backport of [JDK-8311023](https://bugs.openjdk.java.net/browse/JDK-8311023). Applies cleanly. > > Thanks, > Tobias This pull request has now been integrated. Changeset: c86f4dea Author: Tobias Hartmann URL: https://git.openjdk.org/jdk21/commit/c86f4dea9529640cd3234c5cad2f36f3201b1385 Stats: 52 lines in 2 files changed: 51 ins; 0 del; 1 mod 8311023: assert(false) failed: EA: missing memory path Reviewed-by: chagedorn Backport-of: 6ebb0e3bd4ba3579c66cdc5a329e95df7bda5b95 ------------- PR: https://git.openjdk.org/jdk21/pull/99 From thartmann at openjdk.org Thu Jul 6 11:36:05 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 6 Jul 2023 11:36:05 GMT Subject: [jdk21] RFR: 8310425: [JVMCI] compiler/runtime/TestConstantDynamic: lookupConstant returned an object of incorrect type: null Message-ID: Backport of [JDK-8310425](https://bugs.openjdk.java.net/browse/JDK-8310425). Applies cleanly. Thanks, Tobias ------------- Commit messages: - 8310425: [JVMCI] compiler/runtime/TestConstantDynamic: lookupConstant returned an object of incorrect type: null Changes: https://git.openjdk.org/jdk21/pull/101/files Webrev: https://webrevs.openjdk.org/?repo=jdk21&pr=101&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8310425 Stats: 14 lines in 2 files changed: 10 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk21/pull/101.diff Fetch: git fetch https://git.openjdk.org/jdk21.git pull/101/head:pull/101 PR: https://git.openjdk.org/jdk21/pull/101 From jsjolen at openjdk.org Thu Jul 6 12:30:05 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 6 Jul 2023 12:30:05 GMT Subject: RFR: 8311087: PhiNode::wait_for_region_igvn should break early [v2] In-Reply-To: <2AVJe88tw0HV9HbHk_T3GLo-oIvnMNAWEkpw9SLkTYc=.3d55a112-c9e1-4ff1-a076-9976234af8a5@github.com> References: <2AVJe88tw0HV9HbHk_T3GLo-oIvnMNAWEkpw9SLkTYc=.3d55a112-c9e1-4ff1-a076-9976234af8a5@github.com> Message-ID: On Tue, 4 Jul 2023 10:35:21 GMT, Johan Sj?len wrote: >> Hi, please consider this PR. >> >> Instead of continuing the loop we break after setting `delay = true`. I also flattened out the indentation by transforming `A && B` into `!A || !B` and `continue`ing if so. This makes the code a bit longer, but clearer imho. >> >> I've run `TestCastIIAfterUnrollingInOuterLoop`, which is the test that was added along with this method. I'm also running tier1. >> >> Thanks, >> Johan > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Wrap continue in braces Thanks! Integrating as only change after this was a minor style issue. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14707#issuecomment-1623591811 From jsjolen at openjdk.org Thu Jul 6 12:30:07 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 6 Jul 2023 12:30:07 GMT Subject: Integrated: 8311087: PhiNode::wait_for_region_igvn should break early In-Reply-To: References: Message-ID: <_m8vJmKCJqIVBX3OpURtLlENK2bB49-6rkpy8UJhuV8=.5e5a3d53-eca5-4931-9c03-b6b9942d9ca2@github.com> On Thu, 29 Jun 2023 10:44:45 GMT, Johan Sj?len wrote: > Hi, please consider this PR. > > Instead of continuing the loop we break after setting `delay = true`. I also flattened out the indentation by transforming `A && B` into `!A || !B` and `continue`ing if so. This makes the code a bit longer, but clearer imho. > > I've run `TestCastIIAfterUnrollingInOuterLoop`, which is the test that was added along with this method. I'm also running tier1. > > Thanks, > Johan This pull request has now been integrated. Changeset: 97e99f01 Author: Johan Sj?len URL: https://git.openjdk.org/jdk/commit/97e99f01d4f317fce1a204c01874a68f5e25a051 Stats: 24 lines in 1 file changed: 4 ins; 0 del; 20 mod 8311087: PhiNode::wait_for_region_igvn should break early Reviewed-by: thartmann, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/14707 From chagedorn at openjdk.org Thu Jul 6 12:36:54 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 6 Jul 2023 12:36:54 GMT Subject: [jdk21] RFR: 8310425: [JVMCI] compiler/runtime/TestConstantDynamic: lookupConstant returned an object of incorrect type: null In-Reply-To: References: Message-ID: On Thu, 6 Jul 2023 11:28:49 GMT, Tobias Hartmann wrote: > Backport of [JDK-8310425](https://bugs.openjdk.java.net/browse/JDK-8310425). Applies cleanly. > > Thanks, > Tobias Looks good. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk21/pull/101#pullrequestreview-1516442177 From thartmann at openjdk.org Thu Jul 6 12:36:55 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 6 Jul 2023 12:36:55 GMT Subject: [jdk21] RFR: 8310425: [JVMCI] compiler/runtime/TestConstantDynamic: lookupConstant returned an object of incorrect type: null In-Reply-To: References: Message-ID: On Thu, 6 Jul 2023 11:28:49 GMT, Tobias Hartmann wrote: > Backport of [JDK-8310425](https://bugs.openjdk.java.net/browse/JDK-8310425). Applies cleanly. > > Thanks, > Tobias Thanks, Christian! ------------- PR Comment: https://git.openjdk.org/jdk21/pull/101#issuecomment-1623602886 From thartmann at openjdk.org Thu Jul 6 12:57:54 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 6 Jul 2023 12:57:54 GMT Subject: [jdk21] Integrated: 8310425: [JVMCI] compiler/runtime/TestConstantDynamic: lookupConstant returned an object of incorrect type: null In-Reply-To: References: Message-ID: On Thu, 6 Jul 2023 11:28:49 GMT, Tobias Hartmann wrote: > Backport of [JDK-8310425](https://bugs.openjdk.java.net/browse/JDK-8310425). Applies cleanly. > > Thanks, > Tobias This pull request has now been integrated. Changeset: 830279e0 Author: Tobias Hartmann URL: https://git.openjdk.org/jdk21/commit/830279e0540c01e2012c60b724857a7fe4a450b1 Stats: 14 lines in 2 files changed: 10 ins; 0 del; 4 mod 8310425: [JVMCI] compiler/runtime/TestConstantDynamic: lookupConstant returned an object of incorrect type: null Reviewed-by: chagedorn Backport-of: 15878360bf22c88a6e4038f05efa6db08d72b309 ------------- PR: https://git.openjdk.org/jdk21/pull/101 From cslucas at openjdk.org Thu Jul 6 13:06:30 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Thu, 6 Jul 2023 13:06:30 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v21] In-Reply-To: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: > Can I please get reviews for this PR? > > The most common and frequent use of NonEscaping Phis merging object allocations is for debugging information. The two graphs below show numbers for Renaissance and DaCapo benchmarks - similar results are obtained for all other applications that I tested. > > With what frequency does each IR node type occurs as an allocation merge user? I.e., if the same node type uses a Phi N times the counter is incremented by N: > > ![image](https://user-images.githubusercontent.com/2249648/222280517-4dcf5871-2564-4207-b49e-22aee47fa49d.png) > > What are the most common users of allocation merges? I.e., if the same node type uses a Phi N times the counter is incremented by 1: > > ![image](https://user-images.githubusercontent.com/2249648/222280608-ca742a4e-1622-4e69-a778-e4db6805ea02.png) > > This PR adds support scalar replacing allocations participating in merges used as debug information OR as a base for field loads. I plan to create subsequent PRs to enable scalar replacement of merges used by other node types (CmpP is next on the list) subsequently. > > The approach I used for _rematerialization_ is pretty straightforward. It consists basically of the following. 1) New IR node (suggested by V. Kozlov), named SafePointScalarMergeNode, to represent a set of SafePointScalarObjectNode; 2) Each scalar replaceable input participating in a merge will get a SafePointScalarObjectNode like if it weren't part of a merge. 3) Add a new Class to support the rematerialization of SR objects that are part of a merge; 4) Patch HotSpot to be able to serialize and deserialize debug information related to allocation merges; 5) Patch C2 to generate unique types for SR objects participating in some allocation merges. > > The approach I used for _enabling the scalar replacement of some of the inputs of the allocation merge_ is also pretty straightforward: call `MemNode::split_through_phi` to, well, split AddP->Load* through the merge which will render the Phi useless. > > I tested this with JTREG tests tier 1-4 (Windows, Linux, and Mac) and didn't see regression. I also experimented with several applications and didn't see any failure. I also ran tests with "-ea -esa -Xbatch -Xcomp -XX:+UnlockExperimentalVMOptions -XX:-TieredCompilation -server -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP" and didn't observe any related failures. Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: - Merge branch 'openjdk:master' into rematerialization-of-merges - Addressing PR feedback. - Merge remote-tracking branch 'origin/master' into rematerialization-of-merges - Merge branch 'openjdk:master' into rematerialization-of-merges - Rome minor refactorings. - Merge remote-tracking branch 'origin/master' into rematerialization-of-merges Catching up with master. - Address PR review 6: debug format output & some refactoring. - Catching up with master branch. Merge remote-tracking branch 'origin/master' into rematerialization-of-merges - Address PR review 6: refactoring around rematerialization & improve test cases. - Address PR review 5: refactor on rematerialization & add tests. - ... and 12 more: https://git.openjdk.org/jdk/compare/97e99f01...25b683d6 ------------- Changes: https://git.openjdk.org/jdk/pull/12897/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12897&range=20 Stats: 2733 lines in 26 files changed: 2485 ins; 108 del; 140 mod Patch: https://git.openjdk.org/jdk/pull/12897.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12897/head:pull/12897 PR: https://git.openjdk.org/jdk/pull/12897 From cslucas at openjdk.org Thu Jul 6 13:06:30 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Thu, 6 Jul 2023 13:06:30 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v18] In-Reply-To: <72OcyhmFKGyTwDy8LQ0blp5HG5dg5l9OsU5dh9osVxo=.73b3a79e-ff24-4f41-b39b-650a9036ee76@github.com> References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> <-A7bd8C0q5o1WuRSeSkYYnUoApV4s9uijPmiNB2Wteo=.c5bc944c-88a3-4228-bd41-091ac6c8fb1d@github.com> <72OcyhmFKGyTwDy8LQ0blp5HG5dg5l9OsU5dh9osVxo=.73b3a79e-ff24-4f41-b39b-650a9036ee76@github.com> Message-ID: <1gB4pzC79wZ9fs7t5eWE4yTlyYkz4oK1K36wc7MWgBo=.cc02b3a7-3b4d-4909-8013-746008f50058@github.com> On Tue, 20 Jun 2023 16:44:28 GMT, Vladimir Ivanov wrote: >> Thank you once more for the comments @iwanowww . I?ll address them asap. >> >> Can I ask what requirements are there for a product flag? > >> Can I ask what requirements are there for a product flag? > > Product flags are treated as part of public API of the JVM. So, changes in behavior have to go through CSR process. Also, a product flag has to be deprecated/obsoleted first before it can be removed which takes multiple releases to happen. Better to avoid introducing new product flags unless it is well-justified or necessary. @iwanowww - I believe I've addressed all your comments so far. Is everything still looking good? ------------- PR Comment: https://git.openjdk.org/jdk/pull/12897#issuecomment-1623641674 From bulasevich at openjdk.org Thu Jul 6 14:43:55 2023 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Thu, 6 Jul 2023 14:43:55 GMT Subject: RFR: 8293170: Improve encoding of the debuginfo nmethod section In-Reply-To: <47iDSIn_k8jXYisbGzwQOILoNh45xPO3kkj-k2-BZ1E=.2ea0570e-4055-4517-a128-2af1b62c7529@github.com> References: <47iDSIn_k8jXYisbGzwQOILoNh45xPO3kkj-k2-BZ1E=.2ea0570e-4055-4517-a128-2af1b62c7529@github.com> Message-ID: On Thu, 2 Feb 2023 12:54:06 GMT, Boris Ulasevich wrote: > This is another pull request to replace https://github.com/openjdk/jdk/pull/10025 change which was blocked as not acceptable (see https://github.com/openjdk/jdk/pull/10025#pullrequestreview-1228216330) > > The objections to change #10025 were: > - specialized algorithm for given data complicates things, makes it hard to learn, test and support > - algorithm is changed for DebugInfo, and the benefit is only for one type of data > - statistics of the debug info data can (will) change, breaking the optimization > > The suggestion was: > - don't change the core algorithm, but add one on top or underneath the existing one, or reuse off-the-shelf zero-reduction schemes such as Cap'n Proto > > With this change I propose a different approach. Instead of bit coding, the sequence of zeros in a data stream is encoded with a special character that normally never appears in Unsinged5 encoding, followed by a byte containing a number of zeros. In this way the updated algorithm is a pure extension of the existing encoding algorithm: data encoded without the zero-reduction trick is unpacked in the same way as before. > > Currently there are several datasets affected by this change: Dependencies info, OopMap info, LineNumber info, Debug info. Only Debug info has a large number of zeros and gets a significant benefit. I experimented with the Cap'n Proto and lz4 algorithms on DebugInfo. The Unsinged5 algorithm has a better compression rate than these. > > DebugInfo data size is reduced by ~20% (actually, 10-30%, depending on the application). Total nmethod size reduction is ~3%. > > Performance impact: Renaisance and DaCapo benchmarks do not show any difference. kind reminder ------------- PR Comment: https://git.openjdk.org/jdk/pull/12387#issuecomment-1623804498 From cjplummer at openjdk.org Thu Jul 6 15:14:58 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Thu, 6 Jul 2023 15:14:58 GMT Subject: RFR: 8293170: Improve encoding of the debuginfo nmethod section In-Reply-To: <47iDSIn_k8jXYisbGzwQOILoNh45xPO3kkj-k2-BZ1E=.2ea0570e-4055-4517-a128-2af1b62c7529@github.com> References: <47iDSIn_k8jXYisbGzwQOILoNh45xPO3kkj-k2-BZ1E=.2ea0570e-4055-4517-a128-2af1b62c7529@github.com> Message-ID: <48GcHhDYvV1QpMaREDRwoMcmBGPTj0IVahSQZjuwLbc=.5166409e-e7b1-475f-a637-45be93e6c582@github.com> On Thu, 2 Feb 2023 12:54:06 GMT, Boris Ulasevich wrote: > This is another pull request to replace https://github.com/openjdk/jdk/pull/10025 change which was blocked as not acceptable (see https://github.com/openjdk/jdk/pull/10025#pullrequestreview-1228216330) > > The objections to change #10025 were: > - specialized algorithm for given data complicates things, makes it hard to learn, test and support > - algorithm is changed for DebugInfo, and the benefit is only for one type of data > - statistics of the debug info data can (will) change, breaking the optimization > > The suggestion was: > - don't change the core algorithm, but add one on top or underneath the existing one, or reuse off-the-shelf zero-reduction schemes such as Cap'n Proto > > With this change I propose a different approach. Instead of bit coding, the sequence of zeros in a data stream is encoded with a special character that normally never appears in Unsinged5 encoding, followed by a byte containing a number of zeros. In this way the updated algorithm is a pure extension of the existing encoding algorithm: data encoded without the zero-reduction trick is unpacked in the same way as before. > > Currently there are several datasets affected by this change: Dependencies info, OopMap info, LineNumber info, Debug info. Only Debug info has a large number of zeros and gets a significant benefit. I experimented with the Cap'n Proto and lz4 algorithms on DebugInfo. The Unsinged5 algorithm has a better compression rate than these. > > DebugInfo data size is reduced by ~20% (actually, 10-30%, depending on the application). Total nmethod size reduction is ~3%. > > Performance impact: Renaisance and DaCapo benchmarks do not show any difference. src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/code/CompressedReadStream.java line 106: > 104: @Override > 105: public void setPosition(int position) { > 106: this.position = position; Maybe a call to `super()` would be better. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12387#discussion_r1254582735 From chagedorn at openjdk.org Thu Jul 6 15:18:13 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 6 Jul 2023 15:18:13 GMT Subject: RFR: 8311588: C2: RepeatCompilation compiler directive does not choose stress seed randomly Message-ID: <1Dg9_gR-lLe4YfZ2RKYXnaqYYpJOujts1ptGDfmETD8=.957a179f-5f27-4a28-bfde-fb0ce51efb8f@github.com> The compiler directive `RepeatCompilation` can be used to repeat compilations of specific methods, for example for `Test::test()`: -XX:CompileCommand=RepeatCompilation,Test::test,1000 When using this compiler directive in combination with `StressIGVN/CCP/GCM/LCM`, C2 only sets the stress seed once to a random value. Afterward, it keeps the seed for all repeated compilations. This should be changed to always select a random seed with each repetition (i.e. get the same behavior as the flag `RepeatCompilation`). Thanks, Christian ------------- Commit messages: - 8311588: C2: RepeatCompilation compiler directive does not choose stress seed randomly Changes: https://git.openjdk.org/jdk/pull/14786/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14786&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8311588 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14786.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14786/head:pull/14786 PR: https://git.openjdk.org/jdk/pull/14786 From sspitsyn at openjdk.org Thu Jul 6 22:35:55 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 6 Jul 2023 22:35:55 GMT Subject: RFR: 8293170: Improve encoding of the debuginfo nmethod section In-Reply-To: <47iDSIn_k8jXYisbGzwQOILoNh45xPO3kkj-k2-BZ1E=.2ea0570e-4055-4517-a128-2af1b62c7529@github.com> References: <47iDSIn_k8jXYisbGzwQOILoNh45xPO3kkj-k2-BZ1E=.2ea0570e-4055-4517-a128-2af1b62c7529@github.com> Message-ID: On Thu, 2 Feb 2023 12:54:06 GMT, Boris Ulasevich wrote: > This is another pull request to replace https://github.com/openjdk/jdk/pull/10025 change which was blocked as not acceptable (see https://github.com/openjdk/jdk/pull/10025#pullrequestreview-1228216330) > > The objections to change #10025 were: > - specialized algorithm for given data complicates things, makes it hard to learn, test and support > - algorithm is changed for DebugInfo, and the benefit is only for one type of data > - statistics of the debug info data can (will) change, breaking the optimization > > The suggestion was: > - don't change the core algorithm, but add one on top or underneath the existing one, or reuse off-the-shelf zero-reduction schemes such as Cap'n Proto > > With this change I propose a different approach. Instead of bit coding, the sequence of zeros in a data stream is encoded with a special character that normally never appears in Unsinged5 encoding, followed by a byte containing a number of zeros. In this way the updated algorithm is a pure extension of the existing encoding algorithm: data encoded without the zero-reduction trick is unpacked in the same way as before. > > Currently there are several datasets affected by this change: Dependencies info, OopMap info, LineNumber info, Debug info. Only Debug info has a large number of zeros and gets a significant benefit. I experimented with the Cap'n Proto and lz4 algorithms on DebugInfo. The Unsinged5 algorithm has a better compression rate than these. > > DebugInfo data size is reduced by ~20% (actually, 10-30%, depending on the application). Total nmethod size reduction is ~3%. > > Performance impact: Renaisance and DaCapo benchmarks do not show any difference. src/hotspot/share/code/compressedStream.hpp line 118: > 116: bool handle_zero(juint value) { > 117: if (value == 0) { > 118: _zero_count = (_zero_count == 0xFF) ? 0 : _zero_count; The case of `_zero_count` overflow is not clear. Apparently, I'm missing something here. Current code is just clearing the previously counted `_zero_count`. I'd expect some action like storing the current number of zeros or advancing the `_position`. Do you have a test for this? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12387#discussion_r1254988216 From duke at openjdk.org Thu Jul 6 23:35:09 2023 From: duke at openjdk.org (duke) Date: Thu, 6 Jul 2023 23:35:09 GMT Subject: Withdrawn: 8301991: Convert l10n properties resource bundles to UTF-8 native In-Reply-To: <0MB7FLFNfaGEWssr9X54UJ_iZNFWBJkxQ1yusP7fsuY=.3f9f3de5-fe84-48e6-9449-626cac42da0b@github.com> References: <0MB7FLFNfaGEWssr9X54UJ_iZNFWBJkxQ1yusP7fsuY=.3f9f3de5-fe84-48e6-9449-626cac42da0b@github.com> Message-ID: On Thu, 23 Feb 2023 09:04:23 GMT, Justin Lu wrote: > This PR converts Unicode sequences to UTF-8 native in .properties file. (Excluding the Unicode space and tab sequence). The conversion was done using native2ascii. > > In addition, the build logic is adjusted to support reading in the .properties files as UTF-8 during the conversion from .properties file to .java ListResourceBundle file. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/12726 From never at openjdk.org Fri Jul 7 06:20:04 2023 From: never at openjdk.org (Tom Rodriguez) Date: Fri, 7 Jul 2023 06:20:04 GMT Subject: RFR: 8311557: [JVMCI] deadlock with JVMTI thread suspension Message-ID: Java based JVMCI compiler threads are more like normal Java threads so they aren't `hidden_from_external_view` like the native compilers. This can leak to deadlocks if you use JVMTI to suspend all threads since this will block the compiler queue and can block execution if background compilation is disabled. It's reasonable to treat libgraal threads like native threads in this regard. Making jargraal threads hidden too would interfere with using profiling and debugging tool on them so I've left that alone but it might be worth changing the JVMTI suspend and resume functions to explicitly skip compiler threads as well. ------------- Commit messages: - 8311557: [JVMCI] deadlock with JVMTI thread suspension Changes: https://git.openjdk.org/jdk/pull/14799/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14799&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8311557 Stats: 7 lines in 2 files changed: 5 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14799.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14799/head:pull/14799 PR: https://git.openjdk.org/jdk/pull/14799 From haosun at openjdk.org Fri Jul 7 06:45:13 2023 From: haosun at openjdk.org (Hao Sun) Date: Fri, 7 Jul 2023 06:45:13 GMT Subject: RFR: 8311548: AArch64: [ZGC] Many tests fail with "assert(allocates2(pc)) failed: not in CodeBuffer memory" on some CPUs Message-ID: ### Problem: We got many ZGC related JTreg test failures on ThunderX2 CPU. The failure can be easily reproduced by the following command as well. `java -XX:+UseZGC -XX:+ZGenerational --version` Here shows the snippet of error message: A fatal error has been detected by the Java Runtime Environment: Internal Error (~/jdk_build/jdk_src/src/hotspot/share/asm/codeBuffer.hpp:200), pid=108369, tid=108373 assert(allocates2(pc)) failed: not in CodeBuffer memory: 0x0000ffff9c921100 <= 0x0000ffff9c934c04 <= 0x0000ffff9c934c00 JRE version: (22.0) (fastdebug build ) Java VM: OpenJDK 64-Bit Server VM (fastdebug 22-internal-git-0916e6a60, mixed mode, sharing, compressed class ptrs, z gc, linux-aarch64) Problematic frame: V [libjvm.so+0x45413c] Instruction_aarch64::~Instruction_aarch64()+0xbc ### Root cause: >From the backtrace (See the Description session in the JBS [1]), we can see that it's an assembler failure and the failure occurred when the VM generated "final stubs". The root cause is that the buffer size of "final stubs" is too small for ThunderX2 CPU with ZGC on. The reason that this failure only occurred on ThunderX2 CPU but not on CPUs like Neoverse N1/N2, is that 1) VM flag "AvoidUnalignedAccesses" is enabled by default on CPUs like ThunderX2 (See the code [2]), and 2) more instructions would be generated especially with ZGC (E.g., see the code [3]). Hence, the failure would also occur on CPUs like Neoverse N1/N2 if we pass VM option "-XX:+AvoidUnalignedAccesses", e.g., `java -XX:+UseZGC -XX:+ZGenerational -XX:+AvoidUnalignedAccesses --version` ### Fix: Increasing the buffer size for "final blobs", i.e. variable `_final_stubs_code_size`, would fix the failure. We manually computed the code size for "final stubs" on ThunderX2 CPU and the size is roughly "105568" bytes. In this patch, we increase the buffer size from "60000" to "100000" for ZGC_ONLY(). ### Test: Tier1~3 passed on Linux/ThunderX2 and Linux/Neoverse N1. [1] https://bugs.openjdk.org/browse/JDK-8311548 [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/vm_version_aarch64.cpp#L154-L177 [3] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L919-L1109 ------------- Commit messages: - 8311548: AArch64: [ZGC] Many tests fail with "assert(allocates2(pc)) failed: not in CodeBuffer memory" on some CPUs Changes: https://git.openjdk.org/jdk/pull/14800/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14800&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8311548 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14800.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14800/head:pull/14800 PR: https://git.openjdk.org/jdk/pull/14800 From thartmann at openjdk.org Fri Jul 7 06:50:17 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 7 Jul 2023 06:50:17 GMT Subject: [jdk21] RFR: 8295210: IR framework should not whitelist -XX:-UseTLAB Message-ID: Backport of [JDK-8295210](https://bugs.openjdk.java.net/browse/JDK-8295210). Applies cleanly. ------------- Commit messages: - 8295210: IR framework should not whitelist -XX:-UseTLAB Changes: https://git.openjdk.org/jdk21/pull/104/files Webrev: https://webrevs.openjdk.org/?repo=jdk21&pr=104&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8295210 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk21/pull/104.diff Fetch: git fetch https://git.openjdk.org/jdk21.git pull/104/head:pull/104 PR: https://git.openjdk.org/jdk21/pull/104 From aboldtch at openjdk.org Fri Jul 7 06:50:56 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 7 Jul 2023 06:50:56 GMT Subject: RFR: 8311548: AArch64: [ZGC] Many tests fail with "assert(allocates2(pc)) failed: not in CodeBuffer memory" on some CPUs In-Reply-To: References: Message-ID: On Fri, 7 Jul 2023 06:38:10 GMT, Hao Sun wrote: > ### Problem: > > We got many ZGC related JTreg test failures on ThunderX2 CPU. The failure can be easily reproduced by the following command as well. > > `java -XX:+UseZGC -XX:+ZGenerational --version` > > Here shows the snippet of error message: > > > A fatal error has been detected by the Java Runtime Environment: > > Internal Error (~/jdk_build/jdk_src/src/hotspot/share/asm/codeBuffer.hpp:200), pid=108369, tid=108373 > assert(allocates2(pc)) failed: not in CodeBuffer memory: 0x0000ffff9c921100 <= 0x0000ffff9c934c04 <= 0x0000ffff9c934c00 > > JRE version: (22.0) (fastdebug build ) > Java VM: OpenJDK 64-Bit Server VM (fastdebug 22-internal-git-0916e6a60, mixed mode, sharing, compressed class ptrs, z gc, linux-aarch64) > Problematic frame: > V [libjvm.so+0x45413c] Instruction_aarch64::~Instruction_aarch64()+0xbc > > > ### Root cause: > > From the backtrace (See the Description session in the JBS [1]), we can see that it's an assembler failure and the failure occurred when the VM generated "final stubs". The root cause is that the buffer size of "final stubs" is too small for ThunderX2 CPU with ZGC on. > > The reason that this failure only occurred on ThunderX2 CPU but not on CPUs like Neoverse N1/N2, is that 1) VM flag "AvoidUnalignedAccesses" is enabled by default on CPUs like ThunderX2 (See the code [2]), and 2) more instructions would be generated especially with ZGC (E.g., see the code [3]). > > Hence, the failure would also occur on CPUs like Neoverse N1/N2 if we pass VM option "-XX:+AvoidUnalignedAccesses", e.g., > > `java -XX:+UseZGC -XX:+ZGenerational -XX:+AvoidUnalignedAccesses --version` > > ### Fix: > > Increasing the buffer size for "final blobs", i.e. variable `_final_stubs_code_size`, would fix the failure. > > We manually computed the code size for "final stubs" on ThunderX2 CPU and the size is roughly "105568" bytes. In this patch, we increase the buffer size from "60000" to "100000" for ZGC_ONLY(). > > ### Test: > > Tier1~3 passed on Linux/ThunderX2 and Linux/Neoverse N1. > > [1] https://bugs.openjdk.org/browse/JDK-8311548 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/vm_version_aarch64.cpp#L154-L177 > [3] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L919-L1109 Marked as reviewed by aboldtch (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14800#pullrequestreview-1518055870 From mbaesken at openjdk.org Fri Jul 7 07:00:05 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Fri, 7 Jul 2023 07:00:05 GMT Subject: Integrated: JDK-8310550: Adjust references to rt.jar In-Reply-To: References: Message-ID: <5z7tycp2SolizjphgpOZ9dewDyUSx4kL-Ad-D9_fKZE=.0f5bda5f-2d9c-41cf-b6cd-dd4ee866aaf9@github.com> On Wed, 21 Jun 2023 15:18:19 GMT, Matthias Baesken wrote: > There are a few references to rt.jar in comments and in the codebase itself. Some of them might be removed or adjusted. This pull request has now been integrated. Changeset: 25cbe85d Author: Matthias Baesken URL: https://git.openjdk.org/jdk/commit/25cbe85d6f46bed82c7f1266ce52c86943e29d60 Stats: 17 lines in 12 files changed: 0 ins; 8 del; 9 mod 8310550: Adjust references to rt.jar Reviewed-by: erikj, clanger ------------- PR: https://git.openjdk.org/jdk/pull/14593 From epeter at openjdk.org Fri Jul 7 07:05:52 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 7 Jul 2023 07:05:52 GMT Subject: [jdk21] RFR: 8295210: IR framework should not whitelist -XX:-UseTLAB In-Reply-To: References: Message-ID: On Fri, 7 Jul 2023 06:43:16 GMT, Tobias Hartmann wrote: > Backport of [JDK-8295210](https://bugs.openjdk.java.net/browse/JDK-8295210). Applies cleanly. Looks good ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk21/pull/104#pullrequestreview-1518089483 From thartmann at openjdk.org Fri Jul 7 07:12:06 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 7 Jul 2023 07:12:06 GMT Subject: [jdk21] RFR: 8295210: IR framework should not whitelist -XX:-UseTLAB In-Reply-To: References: Message-ID: On Fri, 7 Jul 2023 06:43:16 GMT, Tobias Hartmann wrote: > Backport of [JDK-8295210](https://bugs.openjdk.java.net/browse/JDK-8295210). Applies cleanly. Thanks, Emanuel. ------------- PR Comment: https://git.openjdk.org/jdk21/pull/104#issuecomment-1624857303 From thartmann at openjdk.org Fri Jul 7 07:12:07 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 7 Jul 2023 07:12:07 GMT Subject: [jdk21] Integrated: 8295210: IR framework should not whitelist -XX:-UseTLAB In-Reply-To: References: Message-ID: On Fri, 7 Jul 2023 06:43:16 GMT, Tobias Hartmann wrote: > Backport of [JDK-8295210](https://bugs.openjdk.java.net/browse/JDK-8295210). Applies cleanly. This pull request has now been integrated. Changeset: 2d7ed189 Author: Tobias Hartmann URL: https://git.openjdk.org/jdk21/commit/2d7ed1898b7050ccf654c29c90ec93e36cd8fdad Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod 8295210: IR framework should not whitelist -XX:-UseTLAB Reviewed-by: epeter Backport-of: 31dcda5d67c90ecd571b0a943bcedc0bfe3f1fba ------------- PR: https://git.openjdk.org/jdk21/pull/104 From pli at openjdk.org Fri Jul 7 07:20:18 2023 From: pli at openjdk.org (Pengfei Li) Date: Fri, 7 Jul 2023 07:20:18 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v2] In-Reply-To: References: Message-ID: <5KRqQ2tEGzQRaxk8cYsu7iPXjYjeACidrtHFwDqhxDw=.36a3f286-4c1a-4c59-966a-b79e5ec7a21b@github.com> On Mon, 3 Jul 2023 14:37:03 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/vmaskloop.cpp line 978: >> >>> 976: >>> 977: // Update loop increment/decrement to the vector mask true count >>> 978: Node* true_cnt = new VectorMaskTrueCountNode(root_vmask, TypeInt::INT); >> >> This seems expensive to have to use inside the loop. Is there a way we could move this outside the loop? Because if we do take the backedge then we know that we have to take the full `stride`, right? > > I guess you would have to separate out the loop-internal uses and the outside uses of the `incr`. The inside uses would use the `stride` (or is there an exception?) and the outside ones could use the `VectorMaskTrueCountNode`. > > Doing something like that could have better performance. > This seems expensive to have to use inside the loop. Is there a way we could move this outside the loop? Because if we do take the backedge then we know that we have to take the full stride, right? It's not completely right. We have tried using multiplied stride inside the loop and just handle out-of-loop uses of the `incr` node. Mis-compilation happens in some very corner cases where the loop limit value is very close to the max value of `int`, like in below case. for (int i = 2147483600; i < 2147483645; i++) { // ... } If we always take the full stride inside the vectorized loop, the induction variable may overflow and is rotated to a negative value before it reaches the loop limit. This causes the backedge is taken forever and the finite loop is optimized to an infinite loop. I see that for general counted loops, C2 inserts some limit check predicate in the counted loop construction phase to avoid this issue (it's implemented in `PhaseIdealLoop::insert_loop_limit_check_predicate()`). But I'm not sure if it is possible (and worthy) to add similar limit check predicate for post loops. It looks that current C2 post loops have no place to add extra loop predicates. What's your suggestion for this? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14581#discussion_r1255363034 From pli at openjdk.org Fri Jul 7 07:53:15 2023 From: pli at openjdk.org (Pengfei Li) Date: Fri, 7 Jul 2023 07:53:15 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization In-Reply-To: <8KPkr2loby3RVIrYQBiXWv3Ph2E0saSLVDBMFHi88LQ=.b1ffb28d-54a8-4dcc-9472-e53b055a72ee@github.com> References: <8KPkr2loby3RVIrYQBiXWv3Ph2E0saSLVDBMFHi88LQ=.b1ffb28d-54a8-4dcc-9472-e53b055a72ee@github.com> Message-ID: On Thu, 29 Jun 2023 10:54:29 GMT, Emanuel Peter wrote: >> Hi @eme64, >> >> I guess you have done your first round of review. @fg1417 and I really appreciate all your constructive inputs. By reading your comments, I believe you have reviewed this patch in very detail. Thanks again! >> >> What I am doing now: >> >> - I'm trying to fix the issues which I think can be fixed immediately. >> - I'm trying to answer all your simple questions ASAP. >> >> For your request of big refactoring work, I feel like I personally may not have enough time and capability to complete it in a short time. We may need some discussion about it. But it's great to know more about your "hybrid vectorizer" plan from your feedback. It looks like a grand plan, and requires significant effort and cooperation. I strongly agree that we need some conversation to discuss where we should move forward and what we can cooperate. Could you give us a moment to digest your idea before we schedule a conversation? >> >> BTW: What's your preferred time for a conversation? We are based in Shanghai (GMT+8) > > Hi @pfustc ! > > I'm grad you appreciate my review. > >> For your request of big refactoring work, I feel like I personally may not have enough time and capability to complete it in a short time. > > Are you under some time constraint? No pressure from my side, take the time you need. > > I would very much love to have a conversation over a video call with you. I think that would be beneficial for all of us. The problem from our side (Oracle) are intellectual property concerns. OpenJDK emails and PR's are all under the Oracle Contributor Agreement. So there I'm free to have conversations. I'm trying to figure out if we can have a similar frame for a video call, sadly it may take a few weeks or months to get that sorted, as many people are on summer vacation. > > Please take some time to digest the feedback. This is a big change set, it will take a while to be ready for integration at any rate. And again, I would really urge you to consider some refactoring of SuperWord in a separate RFE before this change here. > > I'm looking forward to more collaboration - over PR comments, emails, and hopefully eventually video calls as well ? > Emanuel Hi @eme64, I just experimented some initial SuperWord refactoring work but found the refactoring process may cause more crashes/bugs with `PostLoopMultiversioning`. It seems that nobody is currently using this experimental feature and/or has interest to maintain it. If we have already reached a consensus that we will abandon it eventually, shall we propose a PR to remove it first before doing the refactoring? I think this way may speedup our refactoring process. An alternative approach is keeping the legacy code in SuperWord for now but tolerating new bugs of `PostLoopMultiversiong`, which already has many bugs. What's your opinion on this? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14581#issuecomment-1624936112 From mli at openjdk.org Fri Jul 7 08:31:56 2023 From: mli at openjdk.org (Hamlin Li) Date: Fri, 7 Jul 2023 08:31:56 GMT Subject: RFR: 8311178: JMH tests don't scale well when sharing output buffers In-Reply-To: References: Message-ID: On Sat, 1 Jul 2023 07:53:17 GMT, Swati Sharma wrote: > The below benchmark files have scaling issues due to cache contention and leads to poor scaling when run on multiple threads. The patch sets the scope from benchmark level to thread level to fix the issue: > - org/openjdk/bench/java/io/DataOutputStreamTest.java > - org/openjdk/bench/java/lang/ArrayCopyObject.java > - org/openjdk/bench/java/lang/ArrayFiddle.java > - org/openjdk/bench/java/time/format/DateTimeFormatterBench.java > - org/openjdk/bench/jdk/incubator/vector/IndexInRangeBenchmark.java > - org/openjdk/bench/jdk/incubator/vector/MemorySegmentVectorAccess.java > - org/openjdk/bench/jdk/incubator/vector/StoreMaskedBenchmark.java > - org/openjdk/bench/jdk/incubator/vector/StoreMaskedIOOBEBenchmark.java > - org/openjdk/bench/vm/compiler/ArrayFill.java > - org/openjdk/bench/vm/compiler/IndexVector.java > > Also removing the static scope for variables in org/openjdk/bench/jdk/incubator/vector/VectorFPtoIntCastOperations.java for better scaling. > > Please review and share your feedback. > > Thanks, > Swati Hi, I'm not sure if I understand this improvement correctly. I'm not quite familiar with JMH and it's annotations, but seems to me, the change from `@State(Scope.Benchmark)` to `@State(Scope.Thread)` should not improve the performance by reducing cache contention, as in the jmh doc it says "State objects are usually injected into Benchmark methods as ***arguments***, and JMH takes care of their instantiation and sharing.", this seems mean that @State only matters when the annotated class is used as a parameter of a @Benchmark method, but in the tests you modifed, seems there is no such use case. Please also check the sample usages at https://github.com/openjdk/jmh/blob/master/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_03_States.java. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14746#issuecomment-1625030990 From epeter at openjdk.org Fri Jul 7 15:16:30 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 7 Jul 2023 15:16:30 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v2] In-Reply-To: References: Message-ID: On Fri, 7 Jul 2023 15:11:20 GMT, Vladimir Kozlov wrote: >> Pengfei Li has updated the pull request incrementally with one additional commit since the last revision: >> >> Address part of comments from Emanuel > > Yes, you can remove old code first. And work on new implementation after that. Thanks for weighing in @vnkozlov . I think that is the best way as well. It makes refactoring much easier. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14581#issuecomment-1625560727 From kvn at openjdk.org Fri Jul 7 15:16:30 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 7 Jul 2023 15:16:30 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v2] In-Reply-To: References: Message-ID: On Mon, 3 Jul 2023 07:37:22 GMT, Pengfei Li wrote: >> ## TL;DR >> >> This patch completely re-implements C2's experimental post loop vectorization for better stability, maintainability and performance. Compared with the original implementation, this new implementation adds a standalone loop phase in C2's ideal loop phases and can vectorize more post loops. The original implementation and all code related to multi-versioned post loops are deleted in this patch. More details about this patch can be found in the document replied in this pull request. > > Pengfei Li has updated the pull request incrementally with one additional commit since the last revision: > > Address part of comments from Emanuel Yes, you can remove old code first. And work on new implementation after that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14581#issuecomment-1625560237 From skuksenko at openjdk.org Fri Jul 7 16:05:54 2023 From: skuksenko at openjdk.org (Sergey Kuksenko) Date: Fri, 7 Jul 2023 16:05:54 GMT Subject: RFR: 8311178: JMH tests don't scale well when sharing output buffers In-Reply-To: References: Message-ID: On Fri, 7 Jul 2023 08:29:06 GMT, Hamlin Li wrote: >> The below benchmark files have scaling issues due to cache contention and leads to poor scaling when run on multiple threads. The patch sets the scope from benchmark level to thread level to fix the issue: >> - org/openjdk/bench/java/io/DataOutputStreamTest.java >> - org/openjdk/bench/java/lang/ArrayCopyObject.java >> - org/openjdk/bench/java/lang/ArrayFiddle.java >> - org/openjdk/bench/java/time/format/DateTimeFormatterBench.java >> - org/openjdk/bench/jdk/incubator/vector/IndexInRangeBenchmark.java >> - org/openjdk/bench/jdk/incubator/vector/MemorySegmentVectorAccess.java >> - org/openjdk/bench/jdk/incubator/vector/StoreMaskedBenchmark.java >> - org/openjdk/bench/jdk/incubator/vector/StoreMaskedIOOBEBenchmark.java >> - org/openjdk/bench/vm/compiler/ArrayFill.java >> - org/openjdk/bench/vm/compiler/IndexVector.java >> >> Also removing the static scope for variables in org/openjdk/bench/jdk/incubator/vector/VectorFPtoIntCastOperations.java for better scaling. >> >> Please review and share your feedback. >> >> Thanks, >> Swati > > Hi, > I'm not sure if I understand this improvement correctly. > I'm not quite familiar with JMH and it's annotations, but seems to me, the change from `@State(Scope.Benchmark)` to `@State(Scope.Thread)` should not improve the performance by reducing cache contention, as in the jmh doc it says "State objects are usually injected into Benchmark methods as ***arguments***, and JMH takes care of their instantiation and sharing.", this seems mean that @State only matters when the annotated class is used as a parameter of a @Benchmark method, but in the tests you modifed, seems there is no such use case. > Please also check the sample usages at https://github.com/openjdk/jmh/blob/master/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_03_States.java. @Hamlin-Li The PR is fully correct. Don't forget, every Java instance method has a specific argument which called "this". That is why @State annotation is working. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14746#issuecomment-1625627033 From duke at openjdk.org Sat Jul 8 12:01:53 2023 From: duke at openjdk.org (Swati Sharma) Date: Sat, 8 Jul 2023 12:01:53 GMT Subject: RFR: 8311178: JMH tests don't scale well when sharing output buffers In-Reply-To: References: Message-ID: On Sat, 1 Jul 2023 07:53:17 GMT, Swati Sharma wrote: > The below benchmark files have scaling issues due to cache contention and leads to poor scaling when run on multiple threads. The patch sets the scope from benchmark level to thread level to fix the issue: > - org/openjdk/bench/java/io/DataOutputStreamTest.java > - org/openjdk/bench/java/lang/ArrayCopyObject.java > - org/openjdk/bench/java/lang/ArrayFiddle.java > - org/openjdk/bench/java/time/format/DateTimeFormatterBench.java > - org/openjdk/bench/jdk/incubator/vector/IndexInRangeBenchmark.java > - org/openjdk/bench/jdk/incubator/vector/MemorySegmentVectorAccess.java > - org/openjdk/bench/jdk/incubator/vector/StoreMaskedBenchmark.java > - org/openjdk/bench/jdk/incubator/vector/StoreMaskedIOOBEBenchmark.java > - org/openjdk/bench/vm/compiler/ArrayFill.java > - org/openjdk/bench/vm/compiler/IndexVector.java > > Also removing the static scope for variables in org/openjdk/bench/jdk/incubator/vector/VectorFPtoIntCastOperations.java for better scaling. > > Please review and share your feedback. > > Thanks, > Swati > Hi, I'm not sure if I understand this improvement correctly. I'm not quite familiar with JMH and it's annotations, but seems to me, the change from `@State(Scope.Benchmark)` to `@State(Scope.Thread)` should not improve the performance by reducing cache contention, as in the jmh doc it says "State objects are usually injected into Benchmark methods as _**arguments**_, and JMH takes care of their instantiation and sharing.", this seems mean that @State only matters when the annotated class is used as a parameter of a @benchmark method, but in the tests you modifed, seems there is no such use case. Please also check the sample usages at https://github.com/openjdk/jmh/blob/master/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_03_States.java. Please check the https://github.com/openjdk/jmh/blob/master/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_04_DefaultState.java sample case where @State annotation applies to all instance methods. The PR proposal is to change default state from benchmark to thread. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14746#issuecomment-1627181265 From fyang at openjdk.org Mon Jul 10 03:23:04 2023 From: fyang at openjdk.org (Fei Yang) Date: Mon, 10 Jul 2023 03:23:04 GMT Subject: RFR: 8311548: AArch64: [ZGC] Many tests fail with "assert(allocates2(pc)) failed: not in CodeBuffer memory" on some CPUs In-Reply-To: References: Message-ID: On Fri, 7 Jul 2023 06:38:10 GMT, Hao Sun wrote: > ### Problem: > > We got many ZGC related JTreg test failures on ThunderX2 CPU. The failure can be easily reproduced by the following command as well. > > `java -XX:+UseZGC -XX:+ZGenerational --version` > > Here shows the snippet of error message: > > > A fatal error has been detected by the Java Runtime Environment: > > Internal Error (~/jdk_build/jdk_src/src/hotspot/share/asm/codeBuffer.hpp:200), pid=108369, tid=108373 > assert(allocates2(pc)) failed: not in CodeBuffer memory: 0x0000ffff9c921100 <= 0x0000ffff9c934c04 <= 0x0000ffff9c934c00 > > JRE version: (22.0) (fastdebug build ) > Java VM: OpenJDK 64-Bit Server VM (fastdebug 22-internal-git-0916e6a60, mixed mode, sharing, compressed class ptrs, z gc, linux-aarch64) > Problematic frame: > V [libjvm.so+0x45413c] Instruction_aarch64::~Instruction_aarch64()+0xbc > > > ### Root cause: > > From the backtrace (See the Description session in the JBS [1]), we can see that it's an assembler failure and the failure occurred when the VM generated "final stubs". The root cause is that the buffer size of "final stubs" is too small for ThunderX2 CPU with ZGC on. > > The reason that this failure only occurred on ThunderX2 CPU but not on CPUs like Neoverse N1/N2, is that 1) VM flag "AvoidUnalignedAccesses" is enabled by default on CPUs like ThunderX2 (See the code [2]), and 2) more instructions would be generated especially with ZGC (E.g., see the code [3]). > > Hence, the failure would also occur on CPUs like Neoverse N1/N2 if we pass VM option "-XX:+AvoidUnalignedAccesses", e.g., > > `java -XX:+UseZGC -XX:+ZGenerational -XX:+AvoidUnalignedAccesses --version` > > ### Fix: > > Increasing the buffer size for "final blobs", i.e. variable `_final_stubs_code_size`, would fix the failure. > > We manually computed the code size for "final stubs" on ThunderX2 CPU and the size is roughly "105568" bytes. In this patch, we increase the buffer size from "60000" to "100000" for ZGC_ONLY(). > > ### Test: > > Tier1~3 passed on Linux/ThunderX2 and Linux/Neoverse N1. > > [1] https://bugs.openjdk.org/browse/JDK-8311548 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/vm_version_aarch64.cpp#L154-L177 > [3] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L919-L1109 LGTM. I could also reproduce this issue on HiSilicon TSV110 for which -XX:+AvoidUnalignedAccesses is true. Stub sizes after this change for reference: $ java -XX:+UseZGC -XX:+ZGenerational -Xlog:stubs -version [0.013s][info][stubs] StubRoutines (initial stubs) [0x0000ffff8f920080, 0x0000ffff8f922a10] used: 5408, free: 5232 [0.212s][info][stubs] StubRoutines (continuation stubs) [0x0000ffff8f923500, 0x0000ffff8f923f50] used: 648, free: 1992 [0.246s][info][stubs] StubRoutines (final stubs) [0x0000ffff8f96b480, 0x0000ffff8f988bc0] used: 103664, free: 16976 [0.270s][info][stubs] StubRoutines (compiler stubs) [0x0000ffff8fa1c580, 0x0000ffff8fa27ac0] used: 24552, free: 21848 ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14800#pullrequestreview-1521155294 From thartmann at openjdk.org Mon Jul 10 05:19:01 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 10 Jul 2023 05:19:01 GMT Subject: RFR: 8311588: C2: RepeatCompilation compiler directive does not choose stress seed randomly In-Reply-To: <1Dg9_gR-lLe4YfZ2RKYXnaqYYpJOujts1ptGDfmETD8=.957a179f-5f27-4a28-bfde-fb0ce51efb8f@github.com> References: <1Dg9_gR-lLe4YfZ2RKYXnaqYYpJOujts1ptGDfmETD8=.957a179f-5f27-4a28-bfde-fb0ce51efb8f@github.com> Message-ID: On Thu, 6 Jul 2023 15:11:05 GMT, Christian Hagedorn wrote: > The compiler directive `RepeatCompilation` can be used to repeat compilations of specific methods, for example for `Test::test()`: > > -XX:CompileCommand=RepeatCompilation,Test::test,1000 > > When using this compiler directive in combination with `StressIGVN/CCP/GCM/LCM`, C2 only sets the stress seed once to a random value. Afterward, it keeps the seed for all repeated compilations. This should be changed to always select a random seed with each repetition (i.e. get the same behavior as the flag `RepeatCompilation`). > > Thanks, > Christian Looks good and trivial to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14786#pullrequestreview-1521230316 From alanb at openjdk.org Mon Jul 10 05:33:12 2023 From: alanb at openjdk.org (Alan Bateman) Date: Mon, 10 Jul 2023 05:33:12 GMT Subject: RFR: JDK-8310550: Adjust references to rt.jar [v5] In-Reply-To: References: Message-ID: On Fri, 30 Jun 2023 09:57:04 GMT, Matthias Baesken wrote: > Hi Alan, I adjusted the comment in DriverManager.java . Thanks, the update looks okay. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14593#discussion_r1257728322 From chagedorn at openjdk.org Mon Jul 10 07:47:11 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 10 Jul 2023 07:47:11 GMT Subject: RFR: 8311588: C2: RepeatCompilation compiler directive does not choose stress seed randomly In-Reply-To: <1Dg9_gR-lLe4YfZ2RKYXnaqYYpJOujts1ptGDfmETD8=.957a179f-5f27-4a28-bfde-fb0ce51efb8f@github.com> References: <1Dg9_gR-lLe4YfZ2RKYXnaqYYpJOujts1ptGDfmETD8=.957a179f-5f27-4a28-bfde-fb0ce51efb8f@github.com> Message-ID: <6ZGusVV47Fc1yL337MjR_GlYd9lOhLvmSBI1c3eodsg=.10dc2756-859d-44c7-97e7-4f77eaa3247f@github.com> On Thu, 6 Jul 2023 15:11:05 GMT, Christian Hagedorn wrote: > The compiler directive `RepeatCompilation` can be used to repeat compilations of specific methods, for example for `Test::test()`: > > -XX:CompileCommand=RepeatCompilation,Test::test,1000 > > When using this compiler directive in combination with `StressIGVN/CCP/GCM/LCM`, C2 only sets the stress seed once to a random value. Afterward, it keeps the seed for all repeated compilations. This should be changed to always select a random seed with each repetition (i.e. get the same behavior as the flag `RepeatCompilation`). > > Thanks, > Christian Thanks Tobias for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14786#issuecomment-1628411624 From chagedorn at openjdk.org Mon Jul 10 07:47:11 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 10 Jul 2023 07:47:11 GMT Subject: Integrated: 8311588: C2: RepeatCompilation compiler directive does not choose stress seed randomly In-Reply-To: <1Dg9_gR-lLe4YfZ2RKYXnaqYYpJOujts1ptGDfmETD8=.957a179f-5f27-4a28-bfde-fb0ce51efb8f@github.com> References: <1Dg9_gR-lLe4YfZ2RKYXnaqYYpJOujts1ptGDfmETD8=.957a179f-5f27-4a28-bfde-fb0ce51efb8f@github.com> Message-ID: On Thu, 6 Jul 2023 15:11:05 GMT, Christian Hagedorn wrote: > The compiler directive `RepeatCompilation` can be used to repeat compilations of specific methods, for example for `Test::test()`: > > -XX:CompileCommand=RepeatCompilation,Test::test,1000 > > When using this compiler directive in combination with `StressIGVN/CCP/GCM/LCM`, C2 only sets the stress seed once to a random value. Afterward, it keeps the seed for all repeated compilations. This should be changed to always select a random seed with each repetition (i.e. get the same behavior as the flag `RepeatCompilation`). > > Thanks, > Christian This pull request has now been integrated. Changeset: 06a1a15d Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/06a1a15d014f5ca48f62f5f0c8e8682086c4ae0b Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8311588: C2: RepeatCompilation compiler directive does not choose stress seed randomly Reviewed-by: thartmann ------------- PR: https://git.openjdk.org/jdk/pull/14786 From mli at openjdk.org Mon Jul 10 08:20:56 2023 From: mli at openjdk.org (Hamlin Li) Date: Mon, 10 Jul 2023 08:20:56 GMT Subject: RFR: 8311178: JMH tests don't scale well when sharing output buffers In-Reply-To: References: Message-ID: On Fri, 7 Jul 2023 16:02:57 GMT, Sergey Kuksenko wrote: >> Hi, >> I'm not sure if I understand this improvement correctly. >> I'm not quite familiar with JMH and it's annotations, but seems to me, the change from `@State(Scope.Benchmark)` to `@State(Scope.Thread)` should not improve the performance by reducing cache contention, as in the jmh doc it says "State objects are usually injected into Benchmark methods as ***arguments***, and JMH takes care of their instantiation and sharing.", this seems mean that @State only matters when the annotated class is used as a parameter of a @Benchmark method, but in the tests you modifed, seems there is no such use case. >> Please also check the sample usages at https://github.com/openjdk/jmh/blob/master/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_03_States.java. > > @Hamlin-Li > The PR is fully correct. > Don't forget, every Java instance method has a specific argument which called "this". That is why @State annotation is working. @kuksenko @swati-sha Thanks for explanation. I can understand what you said. But I'm still not quite sure, as I remember jmh does some code manipulation or instrumentation at source code (or bytecode level?), so the jmh test code you write or see might not be the exact code to be executed at runtime. It's better to be reviewed further by some one more familiar with jmh, or could you add some data comparing the performance difference, so we can tell it easily? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14746#issuecomment-1628469975 From kbarrett at openjdk.org Mon Jul 10 08:37:53 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 10 Jul 2023 08:37:53 GMT Subject: RFR: 8311548: AArch64: [ZGC] Many tests fail with "assert(allocates2(pc)) failed: not in CodeBuffer memory" on some CPUs In-Reply-To: References: Message-ID: On Fri, 7 Jul 2023 06:38:10 GMT, Hao Sun wrote: > ### Problem: > > We got many ZGC related JTreg test failures on ThunderX2 CPU. The failure can be easily reproduced by the following command as well. > > `java -XX:+UseZGC -XX:+ZGenerational --version` > > Here shows the snippet of error message: > > > A fatal error has been detected by the Java Runtime Environment: > > Internal Error (~/jdk_build/jdk_src/src/hotspot/share/asm/codeBuffer.hpp:200), pid=108369, tid=108373 > assert(allocates2(pc)) failed: not in CodeBuffer memory: 0x0000ffff9c921100 <= 0x0000ffff9c934c04 <= 0x0000ffff9c934c00 > > JRE version: (22.0) (fastdebug build ) > Java VM: OpenJDK 64-Bit Server VM (fastdebug 22-internal-git-0916e6a60, mixed mode, sharing, compressed class ptrs, z gc, linux-aarch64) > Problematic frame: > V [libjvm.so+0x45413c] Instruction_aarch64::~Instruction_aarch64()+0xbc > > > ### Root cause: > > From the backtrace (See the Description session in the JBS [1]), we can see that it's an assembler failure and the failure occurred when the VM generated "final stubs". The root cause is that the buffer size of "final stubs" is too small for ThunderX2 CPU with ZGC on. > > The reason that this failure only occurred on ThunderX2 CPU but not on CPUs like Neoverse N1/N2, is that 1) VM flag "AvoidUnalignedAccesses" is enabled by default on CPUs like ThunderX2 (See the code [2]), and 2) more instructions would be generated especially with ZGC (E.g., see the code [3]). > > Hence, the failure would also occur on CPUs like Neoverse N1/N2 if we pass VM option "-XX:+AvoidUnalignedAccesses", e.g., > > `java -XX:+UseZGC -XX:+ZGenerational -XX:+AvoidUnalignedAccesses --version` > > ### Fix: > > Increasing the buffer size for "final blobs", i.e. variable `_final_stubs_code_size`, would fix the failure. > > We manually computed the code size for "final stubs" on ThunderX2 CPU and the size is roughly "105568" bytes. In this patch, we increase the buffer size from "60000" to "100000" for ZGC_ONLY(). > > ### Test: > > Tier1~3 passed on Linux/ThunderX2 and Linux/Neoverse N1. > > [1] https://bugs.openjdk.org/browse/JDK-8311548 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/vm_version_aarch64.cpp#L154-L177 > [3] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L919-L1109 Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14800#pullrequestreview-1521558465 From tschatzl at openjdk.org Mon Jul 10 08:53:53 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 10 Jul 2023 08:53:53 GMT Subject: RFR: 8311548: AArch64: [ZGC] Many tests fail with "assert(allocates2(pc)) failed: not in CodeBuffer memory" on some CPUs In-Reply-To: References: Message-ID: On Fri, 7 Jul 2023 06:38:10 GMT, Hao Sun wrote: > ### Problem: > > We got many ZGC related JTreg test failures on ThunderX2 CPU. The failure can be easily reproduced by the following command as well. > > `java -XX:+UseZGC -XX:+ZGenerational --version` > > Here shows the snippet of error message: > > > A fatal error has been detected by the Java Runtime Environment: > > Internal Error (~/jdk_build/jdk_src/src/hotspot/share/asm/codeBuffer.hpp:200), pid=108369, tid=108373 > assert(allocates2(pc)) failed: not in CodeBuffer memory: 0x0000ffff9c921100 <= 0x0000ffff9c934c04 <= 0x0000ffff9c934c00 > > JRE version: (22.0) (fastdebug build ) > Java VM: OpenJDK 64-Bit Server VM (fastdebug 22-internal-git-0916e6a60, mixed mode, sharing, compressed class ptrs, z gc, linux-aarch64) > Problematic frame: > V [libjvm.so+0x45413c] Instruction_aarch64::~Instruction_aarch64()+0xbc > > > ### Root cause: > > From the backtrace (See the Description session in the JBS [1]), we can see that it's an assembler failure and the failure occurred when the VM generated "final stubs". The root cause is that the buffer size of "final stubs" is too small for ThunderX2 CPU with ZGC on. > > The reason that this failure only occurred on ThunderX2 CPU but not on CPUs like Neoverse N1/N2, is that 1) VM flag "AvoidUnalignedAccesses" is enabled by default on CPUs like ThunderX2 (See the code [2]), and 2) more instructions would be generated especially with ZGC (E.g., see the code [3]). > > Hence, the failure would also occur on CPUs like Neoverse N1/N2 if we pass VM option "-XX:+AvoidUnalignedAccesses", e.g., > > `java -XX:+UseZGC -XX:+ZGenerational -XX:+AvoidUnalignedAccesses --version` > > ### Fix: > > Increasing the buffer size for "final blobs", i.e. variable `_final_stubs_code_size`, would fix the failure. > > We manually computed the code size for "final stubs" on ThunderX2 CPU and the size is roughly "105568" bytes. In this patch, we increase the buffer size from "60000" to "100000" for ZGC_ONLY(). > > ### Test: > > Tier1~3 passed on Linux/ThunderX2 and Linux/Neoverse N1. > > [1] https://bugs.openjdk.org/browse/JDK-8311548 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/vm_version_aarch64.cpp#L154-L177 > [3] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L919-L1109 Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14800#pullrequestreview-1521590087 From epeter at openjdk.org Mon Jul 10 12:52:20 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 10 Jul 2023 12:52:20 GMT Subject: RFR: 8308504: C2: "malformed control flow" after JDK-8303466 In-Reply-To: References: <0WzrNqoWqx9ZGrELdwQhQ8ykMEQGcEYVJabKHOTXoQg=.c8a2b21c-82c8-4f7f-bdea-515fcba745be@github.com> Message-ID: On Thu, 22 Jun 2023 13:54:37 GMT, Roland Westrelin wrote: >>> > It doesn't seem to be true that the loop incr never overflows in the general case. See this example: >>> >>> But don't we check that the limit is small enough at runtime, such that there cannot be an overflow? We do that with `check_stride_overflow` and `insert_loop_limit_check_predicate`. And if it does overflow, we do not go into the counted-loop, but we uncommon trap, I think. Or are you sure that we actually enter the counted-loop with your example? Or just some peeled version? >> >> Ran with: >> >> -XX:-BackgroundCompilation -XX:-UseOnStackReplacement -XX:-TieredCompilation -XX:CompileOnly=TestOverflowCountedLoopIncr::test -XX:CompileCommand=quiet -XX:LoopMaxUnroll=0 -XX:+UseCountedLoopSafepoints >> >> I see a single counted loop, no uncommon trap in the IR. > >> @rwestrel so you think the `incr` can indeed overflow, and that is ok? Or would that be a bug? Why do we even have the loop limit check in the first place, if overflow is allowed? > > To guarantee no overflow requires init < limit (for a loop going up). Nothing guarantees that when c2 pattern matches a counted loop. Whether overflow is a problem or not would require taking a closer look at individual optimizations. @rwestrel I finally looked into your overflow-example. Ran it like this: `./java -XX:-BackgroundCompilation -XX:-UseOnStackReplacement -XX:-TieredCompilation -XX:CompileOnly=TestOverflowCountedLoopIncr::test -XX:CompileCommand=quiet -XX:LoopMaxUnroll=0 -XX:+UseCountedLoopSafepoints -XX:+PrintIdeal -Xbatch TestOverflowCountedLoopIncr.java` This is the graph I get: ![image](https://github.com/openjdk/jdk/assets/32593061/1f5f02dd-2c79-4f76-a823-0d1478b1b874) These are the constants: 1 Con === 0 [[ ]] #top 28 ConI === 0 [[ 152 ]] #int:1 153 ConI === 0 [[ 173 181 ]] #int:min+100 157 ConL === 0 [[ 158 ]] #long:1176 189 ConF === 0 [[ 187 ]] #ftcon:2.000000 ------------- PR Comment: https://git.openjdk.org/jdk/pull/14331#issuecomment-1628893068 From epeter at openjdk.org Mon Jul 10 13:20:23 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 10 Jul 2023 13:20:23 GMT Subject: RFR: 8308504: C2: "malformed control flow" after JDK-8303466 [v2] In-Reply-To: References: Message-ID: <9dUIdG_dLmwzZ8rt4UfCJm5XnZzkD3pZR-tygN_-La8=.812fb9b9-af3e-4077-bcd3-a0d5d2c6e4b4@github.com> > This is another case where imprecise type computation leads to corrupted control flow. > > The loop incr `AddI` does never overflow at runtime, this is guaranteed by the loop limit check (we ensure that adding the incr value to the limit would not overflow). However, the type so far did often overflow, and returns type `int`. The loop tripcount `Phi` does not have such type overflow. So we had cases where the Phi had type `minint...7`, but the AddI had type `int`, since it adds `-3`, which makes the lower bound underflow (even though the runtime value never would). If we had known that an underflow is impossible because of the limit check, then we could have had type `minint...4` for the incr AddI. > > Since JDK-8303466, the type of the limit can now not overflow and is more precise. In the example, the limit is `8...maxint`. The CastII after the zero-trip-guard thus concludes that the type must be `9...maxint` (we have a decrement loop). > > Since the main-loop knows that the trip-count Phi has type `minint...7`, we get an empty range (intersection of `9...maxint` and `minint...7`), and the main-loop (in this case one of the assertion predicates) start to corrode away because of TOP inputs. This creates malformed control flow, we have an if with only one projection. > > **Why did this arise with JDK-8303466?** We made the loop-limit type more precise, forbidding overflow. So before that change I think the loop limit just had a type would have overfown, and returned `int`. That would have not allowed the main-loop internals to detect that it could never be reached. The main-loop would not have died at all, but it would still never have been entered. > > **Solution** Since the AddI (incr) of the tripcount Phi cannot overflow/underflow, **we should also prevent the type of the incr from overflowing/underflowing**. That means that it basically has the same type as the Phi, if not even a bit better. And this means that the type of the incr used to compare against the loop-limit is consistent with the type information in the main-loop. > > I also had to improve the notification in IGVN, I encountered a case where a incr node was replaced via Identity, so the new incr node needed its type to be tightened (no overflow). > > **Risk** Just as JDK-8303466, this fix here makes types more precise. That can always mean that somewhere else we also need more precise types than we currently have. It is difficult to know what such bugs are still lurking for us. > > **Testing** Attached one regre... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: revert bad fix ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14331/files - new: https://git.openjdk.org/jdk/pull/14331/files/a3e8da27..e6b1424b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14331&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14331&range=00-01 Stats: 73 lines in 3 files changed: 8 ins; 51 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/14331.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14331/head:pull/14331 PR: https://git.openjdk.org/jdk/pull/14331 From skuksenko at openjdk.org Mon Jul 10 14:06:13 2023 From: skuksenko at openjdk.org (Sergey Kuksenko) Date: Mon, 10 Jul 2023 14:06:13 GMT Subject: RFR: 8311178: JMH tests don't scale well when sharing output buffers In-Reply-To: References: Message-ID: On Mon, 10 Jul 2023 08:17:59 GMT, Hamlin Li wrote: >> @Hamlin-Li >> The PR is fully correct. >> Don't forget, every Java instance method has a specific argument which called "this". That is why @State annotation is working. > > @kuksenko @swati-sha Thanks for explanation. I can understand what you said. > But I'm still not quite sure, as I remember jmh does some code manipulation or instrumentation at source code (or bytecode level?), so the jmh test code you write or see might not be the exact code to be executed at runtime. > It's better to be reviewed further by some one more familiar with jmh, or could you add some data comparing the performance difference, so we can tell it easily? @Hamlin-Li I am one of JMH's authors. I know how it works. There is no need for tests. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14746#issuecomment-1629033698 From chagedorn at openjdk.org Mon Jul 10 15:24:39 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 10 Jul 2023 15:24:39 GMT Subject: RFR: 8305636: Expand and clean up predicate classes and move them into separate files Message-ID: This is the third clean-up PR towards fixing issues with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). This patch does not change anything in the way the old Assertion Predicates work. After collecting and moving the predicate code in the last clean-up PR https://github.com/openjdk/jdk/pull/14017 to the classes `Predicates/ParsePredicates`, I'm now completely moving the code to separate `predicates.cpp/hpp` files. By doing so, I also updated the predicate description and updated some namings. Since this description is also moved to the new files, I've committed the description update separately to better reflect these changes. Changes include: - Moved `Predicates/ParsePredicates` classes to new files `predicates.cpp/hpp`. - Turning the `Predicates` utility class into a real class to represent all predicates: - Contains three `PredicateBlock` fields for each Predicate Block (see description of `Predicate Block`). - The `PredicateBlock` class offers methods to query the presence of predicates and to access them (e.g. get the Parse Predicate projection). - In the process, the `ParsePredicates` could be removed as the Parse Predicates are now covered by the `PredicateBlock` class. - New `AssertionPredicatesWithHalt` class to skip over Assertion Predicates (will be further cleaned up later with the complete fix in JDK-8288981). - Updated predicate description and moved to `predicates.hpp`. - Small clean-ups such as variable renaming or code move. Not included: - Refactoring predicate traversal to clone/copy/initialize predicates for loop unswitching, pre/main/post, loop peeling etc. (this is only done in the actual fix in JDK-8288981 which requires some updates anyways - so this refactoring is not done here (yet)). Testing: Tier1-7, hs-precheckin-comp, hs-comp-stress Thanks, Christian ------------- Commit messages: - 8305636: Expand and clean up predicate classes and move them into separate files - Update description Changes: https://git.openjdk.org/jdk/pull/14814/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14814&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8305636 Stats: 1172 lines in 10 files changed: 568 ins; 470 del; 134 mod Patch: https://git.openjdk.org/jdk/pull/14814.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14814/head:pull/14814 PR: https://git.openjdk.org/jdk/pull/14814 From jjiang at openjdk.org Mon Jul 10 16:11:13 2023 From: jjiang at openjdk.org (John Jiang) Date: Mon, 10 Jul 2023 16:11:13 GMT Subject: RFR: 8311178: JMH tests don't scale well when sharing output buffers In-Reply-To: References: Message-ID: On Sat, 1 Jul 2023 07:53:17 GMT, Swati Sharma wrote: > The below benchmark files have scaling issues due to cache contention and leads to poor scaling when run on multiple threads. The patch sets the scope from benchmark level to thread level to fix the issue: > - org/openjdk/bench/java/io/DataOutputStreamTest.java > - org/openjdk/bench/java/lang/ArrayCopyObject.java > - org/openjdk/bench/java/lang/ArrayFiddle.java > - org/openjdk/bench/java/time/format/DateTimeFormatterBench.java > - org/openjdk/bench/jdk/incubator/vector/IndexInRangeBenchmark.java > - org/openjdk/bench/jdk/incubator/vector/MemorySegmentVectorAccess.java > - org/openjdk/bench/jdk/incubator/vector/StoreMaskedBenchmark.java > - org/openjdk/bench/jdk/incubator/vector/StoreMaskedIOOBEBenchmark.java > - org/openjdk/bench/vm/compiler/ArrayFill.java > - org/openjdk/bench/vm/compiler/IndexVector.java > > Also removing the static scope for variables in org/openjdk/bench/jdk/incubator/vector/VectorFPtoIntCastOperations.java for better scaling. > > Please review and share your feedback. > > Thanks, > Swati Not review this PR, but just raise a question. Should a JMH test, at least in JDK repo, always uses `@State(Scope.Thread)`, even though it uses only one thread? I just looked through those JMH tests, and found all of them, like the bellows, don't specify the number of threads via `@Threads`. org/openjdk/bench/java/io/DataOutputStreamTest.java org/openjdk/bench/java/lang/ArrayCopyObject.java I suppose the default number of threads is 1. Maybe the default value will be overridden via the commands when running these JMH tests in bulk (?) ------------- PR Comment: https://git.openjdk.org/jdk/pull/14746#issuecomment-1629261727 From vlivanov at openjdk.org Mon Jul 10 17:04:30 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 10 Jul 2023 17:04:30 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v21] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: <1s2gGfT_bsyz2mtAr3UFbXKlXniyiK2Hk4lZmBm_Crk=.89639816-872b-436f-9863-d5044e4a9ea5@github.com> On Thu, 6 Jul 2023 13:06:30 GMT, Cesar Soares Lucas wrote: >> Can I please get reviews for this PR? >> >> The most common and frequent use of NonEscaping Phis merging object allocations is for debugging information. The two graphs below show numbers for Renaissance and DaCapo benchmarks - similar results are obtained for all other applications that I tested. >> >> With what frequency does each IR node type occurs as an allocation merge user? I.e., if the same node type uses a Phi N times the counter is incremented by N: >> >> ![image](https://user-images.githubusercontent.com/2249648/222280517-4dcf5871-2564-4207-b49e-22aee47fa49d.png) >> >> What are the most common users of allocation merges? I.e., if the same node type uses a Phi N times the counter is incremented by 1: >> >> ![image](https://user-images.githubusercontent.com/2249648/222280608-ca742a4e-1622-4e69-a778-e4db6805ea02.png) >> >> This PR adds support scalar replacing allocations participating in merges used as debug information OR as a base for field loads. I plan to create subsequent PRs to enable scalar replacement of merges used by other node types (CmpP is next on the list) subsequently. >> >> The approach I used for _rematerialization_ is pretty straightforward. It consists basically of the following. 1) New IR node (suggested by V. Kozlov), named SafePointScalarMergeNode, to represent a set of SafePointScalarObjectNode; 2) Each scalar replaceable input participating in a merge will get a SafePointScalarObjectNode like if it weren't part of a merge. 3) Add a new Class to support the rematerialization of SR objects that are part of a merge; 4) Patch HotSpot to be able to serialize and deserialize debug information related to allocation merges; 5) Patch C2 to generate unique types for SR objects participating in some allocation merges. >> >> The approach I used for _enabling the scalar replacement of some of the inputs of the allocation merge_ is also pretty straightforward: call `MemNode::split_through_phi` to, well, split AddP->Load* through the merge which will render the Phi useless. >> >> I tested this with JTREG tests tier 1-4 (Windows, Linux, and Mac) and didn't see regression. I also experimented with several applications and didn't see any failure. I also ran tests with "-ea -esa -Xbatch -Xcomp -XX:+UnlockExperimentalVMOptions -XX:-TieredCompilation -server -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP" and didn't observe any related failures. > > Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: > > - Merge branch 'openjdk:master' into rematerialization-of-merges > - Addressing PR feedback. > - Merge remote-tracking branch 'origin/master' into rematerialization-of-merges > - Merge branch 'openjdk:master' into rematerialization-of-merges > - Rome minor refactorings. > - Merge remote-tracking branch 'origin/master' into rematerialization-of-merges > Catching up with master. > - Address PR review 6: debug format output & some refactoring. > - Catching up with master branch. > > Merge remote-tracking branch 'origin/master' into rematerialization-of-merges > - Address PR review 6: refactoring around rematerialization & improve test cases. > - Address PR review 5: refactor on rematerialization & add tests. > - ... and 12 more: https://git.openjdk.org/jdk/compare/97e99f01...25b683d6 The patch looks good. I resubmitted testing with the latest version and the results are clean. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/12897#pullrequestreview-1522564439 From kvn at openjdk.org Mon Jul 10 19:55:07 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 10 Jul 2023 19:55:07 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v21] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: On Thu, 6 Jul 2023 13:06:30 GMT, Cesar Soares Lucas wrote: >> Can I please get reviews for this PR? >> >> The most common and frequent use of NonEscaping Phis merging object allocations is for debugging information. The two graphs below show numbers for Renaissance and DaCapo benchmarks - similar results are obtained for all other applications that I tested. >> >> With what frequency does each IR node type occurs as an allocation merge user? I.e., if the same node type uses a Phi N times the counter is incremented by N: >> >> ![image](https://user-images.githubusercontent.com/2249648/222280517-4dcf5871-2564-4207-b49e-22aee47fa49d.png) >> >> What are the most common users of allocation merges? I.e., if the same node type uses a Phi N times the counter is incremented by 1: >> >> ![image](https://user-images.githubusercontent.com/2249648/222280608-ca742a4e-1622-4e69-a778-e4db6805ea02.png) >> >> This PR adds support scalar replacing allocations participating in merges used as debug information OR as a base for field loads. I plan to create subsequent PRs to enable scalar replacement of merges used by other node types (CmpP is next on the list) subsequently. >> >> The approach I used for _rematerialization_ is pretty straightforward. It consists basically of the following. 1) New IR node (suggested by V. Kozlov), named SafePointScalarMergeNode, to represent a set of SafePointScalarObjectNode; 2) Each scalar replaceable input participating in a merge will get a SafePointScalarObjectNode like if it weren't part of a merge. 3) Add a new Class to support the rematerialization of SR objects that are part of a merge; 4) Patch HotSpot to be able to serialize and deserialize debug information related to allocation merges; 5) Patch C2 to generate unique types for SR objects participating in some allocation merges. >> >> The approach I used for _enabling the scalar replacement of some of the inputs of the allocation merge_ is also pretty straightforward: call `MemNode::split_through_phi` to, well, split AddP->Load* through the merge which will render the Phi useless. >> >> I tested this with JTREG tests tier 1-4 (Windows, Linux, and Mac) and didn't see regression. I also experimented with several applications and didn't see any failure. I also ran tests with "-ea -esa -Xbatch -Xcomp -XX:+UnlockExperimentalVMOptions -XX:-TieredCompilation -server -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP" and didn't observe any related failures. > > Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: > > - Merge branch 'openjdk:master' into rematerialization-of-merges > - Addressing PR feedback. > - Merge remote-tracking branch 'origin/master' into rematerialization-of-merges > - Merge branch 'openjdk:master' into rematerialization-of-merges > - Rome minor refactorings. > - Merge remote-tracking branch 'origin/master' into rematerialization-of-merges > Catching up with master. > - Address PR review 6: debug format output & some refactoring. > - Catching up with master branch. > > Merge remote-tracking branch 'origin/master' into rematerialization-of-merges > - Address PR review 6: refactoring around rematerialization & improve test cases. > - Address PR review 5: refactor on rematerialization & add tests. > - ... and 12 more: https://git.openjdk.org/jdk/compare/97e99f01...25b683d6 The final version looks good to me too. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/12897#pullrequestreview-1522912018 From sviswanathan at openjdk.org Mon Jul 10 21:10:16 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 10 Jul 2023 21:10:16 GMT Subject: RFR: 8311178: JMH tests don't scale well when sharing output buffers In-Reply-To: References: Message-ID: On Sat, 1 Jul 2023 07:53:17 GMT, Swati Sharma wrote: > The below benchmark files have scaling issues due to cache contention and leads to poor scaling when run on multiple threads. The patch sets the scope from benchmark level to thread level to fix the issue: > - org/openjdk/bench/java/io/DataOutputStreamTest.java > - org/openjdk/bench/java/lang/ArrayCopyObject.java > - org/openjdk/bench/java/lang/ArrayFiddle.java > - org/openjdk/bench/java/time/format/DateTimeFormatterBench.java > - org/openjdk/bench/jdk/incubator/vector/IndexInRangeBenchmark.java > - org/openjdk/bench/jdk/incubator/vector/MemorySegmentVectorAccess.java > - org/openjdk/bench/jdk/incubator/vector/StoreMaskedBenchmark.java > - org/openjdk/bench/jdk/incubator/vector/StoreMaskedIOOBEBenchmark.java > - org/openjdk/bench/vm/compiler/ArrayFill.java > - org/openjdk/bench/vm/compiler/IndexVector.java > > Also removing the static scope for variables in org/openjdk/bench/jdk/incubator/vector/VectorFPtoIntCastOperations.java for better scaling. > > Please review and share your feedback. > > Thanks, > Swati Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14746#pullrequestreview-1523049432 From haosun at openjdk.org Mon Jul 10 22:04:18 2023 From: haosun at openjdk.org (Hao Sun) Date: Mon, 10 Jul 2023 22:04:18 GMT Subject: RFR: 8311548: AArch64: [ZGC] Many tests fail with "assert(allocates2(pc)) failed: not in CodeBuffer memory" on some CPUs In-Reply-To: References: Message-ID: On Fri, 7 Jul 2023 06:38:10 GMT, Hao Sun wrote: > ### Problem: > > We got many ZGC related JTreg test failures on ThunderX2 CPU. The failure can be easily reproduced by the following command as well. > > `java -XX:+UseZGC -XX:+ZGenerational --version` > > Here shows the snippet of error message: > > > A fatal error has been detected by the Java Runtime Environment: > > Internal Error (~/jdk_build/jdk_src/src/hotspot/share/asm/codeBuffer.hpp:200), pid=108369, tid=108373 > assert(allocates2(pc)) failed: not in CodeBuffer memory: 0x0000ffff9c921100 <= 0x0000ffff9c934c04 <= 0x0000ffff9c934c00 > > JRE version: (22.0) (fastdebug build ) > Java VM: OpenJDK 64-Bit Server VM (fastdebug 22-internal-git-0916e6a60, mixed mode, sharing, compressed class ptrs, z gc, linux-aarch64) > Problematic frame: > V [libjvm.so+0x45413c] Instruction_aarch64::~Instruction_aarch64()+0xbc > > > ### Root cause: > > From the backtrace (See the Description session in the JBS [1]), we can see that it's an assembler failure and the failure occurred when the VM generated "final stubs". The root cause is that the buffer size of "final stubs" is too small for ThunderX2 CPU with ZGC on. > > The reason that this failure only occurred on ThunderX2 CPU but not on CPUs like Neoverse N1/N2, is that 1) VM flag "AvoidUnalignedAccesses" is enabled by default on CPUs like ThunderX2 (See the code [2]), and 2) more instructions would be generated especially with ZGC (E.g., see the code [3]). > > Hence, the failure would also occur on CPUs like Neoverse N1/N2 if we pass VM option "-XX:+AvoidUnalignedAccesses", e.g., > > `java -XX:+UseZGC -XX:+ZGenerational -XX:+AvoidUnalignedAccesses --version` > > ### Fix: > > Increasing the buffer size for "final blobs", i.e. variable `_final_stubs_code_size`, would fix the failure. > > We manually computed the code size for "final stubs" on ThunderX2 CPU and the size is roughly "105568" bytes. In this patch, we increase the buffer size from "60000" to "100000" for ZGC_ONLY(). > > ### Test: > > Tier1~3 passed on Linux/ThunderX2 and Linux/Neoverse N1. > > [1] https://bugs.openjdk.org/browse/JDK-8311548 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/vm_version_aarch64.cpp#L154-L177 > [3] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L919-L1109 Thanks for your reviews. The GHA tests are green. Let me integrate it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14800#issuecomment-1629788576 From haosun at openjdk.org Mon Jul 10 22:04:19 2023 From: haosun at openjdk.org (Hao Sun) Date: Mon, 10 Jul 2023 22:04:19 GMT Subject: Integrated: 8311548: AArch64: [ZGC] Many tests fail with "assert(allocates2(pc)) failed: not in CodeBuffer memory" on some CPUs In-Reply-To: References: Message-ID: On Fri, 7 Jul 2023 06:38:10 GMT, Hao Sun wrote: > ### Problem: > > We got many ZGC related JTreg test failures on ThunderX2 CPU. The failure can be easily reproduced by the following command as well. > > `java -XX:+UseZGC -XX:+ZGenerational --version` > > Here shows the snippet of error message: > > > A fatal error has been detected by the Java Runtime Environment: > > Internal Error (~/jdk_build/jdk_src/src/hotspot/share/asm/codeBuffer.hpp:200), pid=108369, tid=108373 > assert(allocates2(pc)) failed: not in CodeBuffer memory: 0x0000ffff9c921100 <= 0x0000ffff9c934c04 <= 0x0000ffff9c934c00 > > JRE version: (22.0) (fastdebug build ) > Java VM: OpenJDK 64-Bit Server VM (fastdebug 22-internal-git-0916e6a60, mixed mode, sharing, compressed class ptrs, z gc, linux-aarch64) > Problematic frame: > V [libjvm.so+0x45413c] Instruction_aarch64::~Instruction_aarch64()+0xbc > > > ### Root cause: > > From the backtrace (See the Description session in the JBS [1]), we can see that it's an assembler failure and the failure occurred when the VM generated "final stubs". The root cause is that the buffer size of "final stubs" is too small for ThunderX2 CPU with ZGC on. > > The reason that this failure only occurred on ThunderX2 CPU but not on CPUs like Neoverse N1/N2, is that 1) VM flag "AvoidUnalignedAccesses" is enabled by default on CPUs like ThunderX2 (See the code [2]), and 2) more instructions would be generated especially with ZGC (E.g., see the code [3]). > > Hence, the failure would also occur on CPUs like Neoverse N1/N2 if we pass VM option "-XX:+AvoidUnalignedAccesses", e.g., > > `java -XX:+UseZGC -XX:+ZGenerational -XX:+AvoidUnalignedAccesses --version` > > ### Fix: > > Increasing the buffer size for "final blobs", i.e. variable `_final_stubs_code_size`, would fix the failure. > > We manually computed the code size for "final stubs" on ThunderX2 CPU and the size is roughly "105568" bytes. In this patch, we increase the buffer size from "60000" to "100000" for ZGC_ONLY(). > > ### Test: > > Tier1~3 passed on Linux/ThunderX2 and Linux/Neoverse N1. > > [1] https://bugs.openjdk.org/browse/JDK-8311548 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/vm_version_aarch64.cpp#L154-L177 > [3] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L919-L1109 This pull request has now been integrated. Changeset: 4b1403d0 Author: Hao Sun URL: https://git.openjdk.org/jdk/commit/4b1403d06b99b91ddd89ad6e54669b0595f1f8e5 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8311548: AArch64: [ZGC] Many tests fail with "assert(allocates2(pc)) failed: not in CodeBuffer memory" on some CPUs Reviewed-by: aboldtch, fyang, kbarrett, tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/14800 From haosun at openjdk.org Mon Jul 10 22:07:01 2023 From: haosun at openjdk.org (Hao Sun) Date: Mon, 10 Jul 2023 22:07:01 GMT Subject: RFR: 8311548: AArch64: [ZGC] Many tests fail with "assert(allocates2(pc)) failed: not in CodeBuffer memory" on some CPUs In-Reply-To: References: Message-ID: On Fri, 7 Jul 2023 06:38:10 GMT, Hao Sun wrote: > ### Problem: > > We got many ZGC related JTreg test failures on ThunderX2 CPU. The failure can be easily reproduced by the following command as well. > > `java -XX:+UseZGC -XX:+ZGenerational --version` > > Here shows the snippet of error message: > > > A fatal error has been detected by the Java Runtime Environment: > > Internal Error (~/jdk_build/jdk_src/src/hotspot/share/asm/codeBuffer.hpp:200), pid=108369, tid=108373 > assert(allocates2(pc)) failed: not in CodeBuffer memory: 0x0000ffff9c921100 <= 0x0000ffff9c934c04 <= 0x0000ffff9c934c00 > > JRE version: (22.0) (fastdebug build ) > Java VM: OpenJDK 64-Bit Server VM (fastdebug 22-internal-git-0916e6a60, mixed mode, sharing, compressed class ptrs, z gc, linux-aarch64) > Problematic frame: > V [libjvm.so+0x45413c] Instruction_aarch64::~Instruction_aarch64()+0xbc > > > ### Root cause: > > From the backtrace (See the Description session in the JBS [1]), we can see that it's an assembler failure and the failure occurred when the VM generated "final stubs". The root cause is that the buffer size of "final stubs" is too small for ThunderX2 CPU with ZGC on. > > The reason that this failure only occurred on ThunderX2 CPU but not on CPUs like Neoverse N1/N2, is that 1) VM flag "AvoidUnalignedAccesses" is enabled by default on CPUs like ThunderX2 (See the code [2]), and 2) more instructions would be generated especially with ZGC (E.g., see the code [3]). > > Hence, the failure would also occur on CPUs like Neoverse N1/N2 if we pass VM option "-XX:+AvoidUnalignedAccesses", e.g., > > `java -XX:+UseZGC -XX:+ZGenerational -XX:+AvoidUnalignedAccesses --version` > > ### Fix: > > Increasing the buffer size for "final blobs", i.e. variable `_final_stubs_code_size`, would fix the failure. > > We manually computed the code size for "final stubs" on ThunderX2 CPU and the size is roughly "105568" bytes. In this patch, we increase the buffer size from "60000" to "100000" for ZGC_ONLY(). > > ### Test: > > Tier1~3 passed on Linux/ThunderX2 and Linux/Neoverse N1. > > [1] https://bugs.openjdk.org/browse/JDK-8311548 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/vm_version_aarch64.cpp#L154-L177 > [3] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L919-L1109 As described in [JBS-8311548](https://bugs.openjdk.org/browse/JDK-8311548?focusedCommentId=14594147&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14594147), this bug was introduced in jdk21 and also affected jdk21. Besides, it's a P2 bug. I'd like to backport this patch to jdk21. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14800#issuecomment-1629793470 From haosun at openjdk.org Mon Jul 10 22:16:26 2023 From: haosun at openjdk.org (Hao Sun) Date: Mon, 10 Jul 2023 22:16:26 GMT Subject: [jdk21] RFR: 8311548: AArch64: [ZGC] Many tests fail with "assert(allocates2(pc)) failed: not in CodeBuffer memory" on some CPUs Message-ID: <-LGMh8VX4oMdYfsPxFFL5tb7dCyPzhUltbLF2M8FuDQ=.22248b92-cba4-4e47-b125-9a1d18ff7c8f@github.com> Hi all, This pull request contains a backport of commit [4b1403d0](https://github.com/openjdk/jdk/commit/4b1403d06b99b91ddd89ad6e54669b0595f1f8e5) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. The commit being backported was authored by Hao Sun on 10 Jul 2023 and was reviewed by Axel Boldt-Christmas, Fei Yang, Kim Barrett and Thomas Schatzl. Thanks! ------------- Commit messages: - Backport 4b1403d06b99b91ddd89ad6e54669b0595f1f8e5 Changes: https://git.openjdk.org/jdk21/pull/108/files Webrev: https://webrevs.openjdk.org/?repo=jdk21&pr=108&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8311548 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk21/pull/108.diff Fetch: git fetch https://git.openjdk.org/jdk21.git pull/108/head:pull/108 PR: https://git.openjdk.org/jdk21/pull/108 From coleenp at openjdk.org Tue Jul 11 01:34:39 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 11 Jul 2023 01:34:39 GMT Subject: RFR: 8311847: Fix -Wconversion for assembler.hpp emit_int8, 16 callers Message-ID: Please review changes to fix -Wconversion warnings that come from assembler_.cpp by adding narrow_casts to the emit_int8,16,24, and 32 functions. And some other fixups with checked_cast. Ran tier1 on Oracle platforms, and tier1-4 on linux-x64-debug, linux-aarch64-debug, windows-x64-debug. ------------- Commit messages: - Fix -Wconversion for assembler.hpp emit_int8,16 callers Changes: https://git.openjdk.org/jdk/pull/14822/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14822&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8311847 Stats: 47 lines in 4 files changed: 20 ins; 1 del; 26 mod Patch: https://git.openjdk.org/jdk/pull/14822.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14822/head:pull/14822 PR: https://git.openjdk.org/jdk/pull/14822 From dlong at openjdk.org Tue Jul 11 02:14:15 2023 From: dlong at openjdk.org (Dean Long) Date: Tue, 11 Jul 2023 02:14:15 GMT Subject: RFR: 8311847: Fix -Wconversion for assembler.hpp emit_int8,16 callers In-Reply-To: References: Message-ID: On Tue, 11 Jul 2023 01:26:44 GMT, Coleen Phillimore wrote: > Please review changes to fix -Wconversion warnings that come from assembler_.cpp by adding narrow_casts to the emit_int8,16,24, and 32 functions. And some other fixups with checked_cast. > > Ran tier1 on Oracle platforms, and tier1-4 on linux-x64-debug, linux-aarch64-debug, windows-x64-debug. src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 224: > 222: unsigned target = *(unsigned *)a; > 223: target &= ~mask; > 224: target |= checked_cast(val); Any value that doesn't fit in 32 bits is going to fail, so it's tempting to force the callers to pass 32-bit types, but that's a bigger change. How about something like this: static ALWAYSINLINE void patch(address a, int msb, int lsb, uint32_t val) { /* original code, no additional checked_cast needed */ } static ALWAYSINLINE void patch(address a, int msb, int lsb, uint64_t val) { patch(a, msb, lsb, checked_cast(val)); } src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 239: > 237: unsigned target = *(unsigned *)a; > 238: target &= ~mask; > 239: target |= checked_cast(uval); Same suggestion as patch() above. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14822#discussion_r1259094106 PR Review Comment: https://git.openjdk.org/jdk/pull/14822#discussion_r1259094782 From dlong at openjdk.org Tue Jul 11 02:18:15 2023 From: dlong at openjdk.org (Dean Long) Date: Tue, 11 Jul 2023 02:18:15 GMT Subject: RFR: 8311847: Fix -Wconversion for assembler.hpp emit_int8,16 callers In-Reply-To: References: Message-ID: <3IzSseKzc1au1HBwc6jZCV15Qmqdu_A1_O9FTUHzx5Y=.be7154c1-8488-4d12-ba66-cf476178b5c7@github.com> On Tue, 11 Jul 2023 01:26:44 GMT, Coleen Phillimore wrote: > Please review changes to fix -Wconversion warnings that come from assembler_.cpp by adding narrow_casts to the emit_int8,16,24, and 32 functions. And some other fixups with checked_cast. > > Ran tier1 on Oracle platforms, and tier1-4 on linux-x64-debug, linux-aarch64-debug, windows-x64-debug. src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 265: > 263: int64_t chk = val >> (nbits - 1); > 264: guarantee (chk == -1 || chk == 0, "Field too big for insn"); > 265: uint64_t uval = val; Suggestion: unsigned uval = checked_cast((uint64_t)val); src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 268: > 266: unsigned mask = checked_cast(right_n_bits(nbits)); > 267: uval &= mask; > 268: f(checked_cast(uval), lsb + nbits - 1, lsb); Suggestion: f(uval, lsb + nbits - 1, lsb); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14822#discussion_r1259096829 PR Review Comment: https://git.openjdk.org/jdk/pull/14822#discussion_r1259097036 From dlong at openjdk.org Tue Jul 11 02:34:02 2023 From: dlong at openjdk.org (Dean Long) Date: Tue, 11 Jul 2023 02:34:02 GMT Subject: RFR: 8311847: Fix -Wconversion for assembler.hpp emit_int8,16 callers In-Reply-To: <3IzSseKzc1au1HBwc6jZCV15Qmqdu_A1_O9FTUHzx5Y=.be7154c1-8488-4d12-ba66-cf476178b5c7@github.com> References: <3IzSseKzc1au1HBwc6jZCV15Qmqdu_A1_O9FTUHzx5Y=.be7154c1-8488-4d12-ba66-cf476178b5c7@github.com> Message-ID: On Tue, 11 Jul 2023 02:14:58 GMT, Dean Long wrote: >> Please review changes to fix -Wconversion warnings that come from assembler_.cpp by adding narrow_casts to the emit_int8,16,24, and 32 functions. And some other fixups with checked_cast. >> >> Ran tier1 on Oracle platforms, and tier1-4 on linux-x64-debug, linux-aarch64-debug, windows-x64-debug. > > src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 268: > >> 266: unsigned mask = checked_cast(right_n_bits(nbits)); >> 267: uval &= mask; >> 268: f(checked_cast(uval), lsb + nbits - 1, lsb); > > Suggestion: > > f(uval, lsb + nbits - 1, lsb); See my comment below about not trusting checked_cast to do the right thing for int64_t --> unsigned. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14822#discussion_r1259104769 From dlong at openjdk.org Tue Jul 11 02:34:05 2023 From: dlong at openjdk.org (Dean Long) Date: Tue, 11 Jul 2023 02:34:05 GMT Subject: RFR: 8311847: Fix -Wconversion for assembler.hpp emit_int8,16 callers In-Reply-To: References: Message-ID: On Tue, 11 Jul 2023 01:26:44 GMT, Coleen Phillimore wrote: > Please review changes to fix -Wconversion warnings that come from assembler_.cpp by adding narrow_casts to the emit_int8,16,24, and 32 functions. And some other fixups with checked_cast. > > Ran tier1 on Oracle platforms, and tier1-4 on linux-x64-debug, linux-aarch64-debug, windows-x64-debug. src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 556: > 554: } else { > 555: i->f(0b01, 25, 24); > 556: i->f(checked_cast(offset() >> size), 21, 10); I remember there being issues with checked_cast and sign extension. When going from int64_t to unsigned and back, I think we need to do int64_t --> int32_t --> unsigned, and not int64_t --> uint64_t --> unsigned. Is that what checked_cast will do? To be safe, or at least make it easier to understand, shound't we use checked_cast only to change the size or sign, but not both? So going to int64_t to unsigned would require two checked_casts. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14822#discussion_r1259103473 From dlong at openjdk.org Tue Jul 11 02:37:11 2023 From: dlong at openjdk.org (Dean Long) Date: Tue, 11 Jul 2023 02:37:11 GMT Subject: RFR: 8311847: Fix -Wconversion for assembler.hpp emit_int8,16 callers In-Reply-To: <3IzSseKzc1au1HBwc6jZCV15Qmqdu_A1_O9FTUHzx5Y=.be7154c1-8488-4d12-ba66-cf476178b5c7@github.com> References: <3IzSseKzc1au1HBwc6jZCV15Qmqdu_A1_O9FTUHzx5Y=.be7154c1-8488-4d12-ba66-cf476178b5c7@github.com> Message-ID: On Tue, 11 Jul 2023 02:14:37 GMT, Dean Long wrote: >> Please review changes to fix -Wconversion warnings that come from assembler_.cpp by adding narrow_casts to the emit_int8,16,24, and 32 functions. And some other fixups with checked_cast. >> >> Ran tier1 on Oracle platforms, and tier1-4 on linux-x64-debug, linux-aarch64-debug, windows-x64-debug. > > src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 265: > >> 263: int64_t chk = val >> (nbits - 1); >> 264: guarantee (chk == -1 || chk == 0, "Field too big for insn"); >> 265: uint64_t uval = val; > > Suggestion: > > int32_t val32 = checked_cast(val); > unsigned uval = checked_cast(val32); uint64_t uval64 = val; unsigned uval = checked_cast(uval64); This won't work for negative values, right? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14822#discussion_r1259106251 From dlong at openjdk.org Tue Jul 11 02:59:12 2023 From: dlong at openjdk.org (Dean Long) Date: Tue, 11 Jul 2023 02:59:12 GMT Subject: RFR: 8311847: Fix -Wconversion for assembler.hpp emit_int8,16 callers In-Reply-To: References: Message-ID: On Tue, 11 Jul 2023 01:26:44 GMT, Coleen Phillimore wrote: > Please review changes to fix -Wconversion warnings that come from assembler_.cpp by adding narrow_casts to the emit_int8,16,24, and 32 functions. And some other fixups with checked_cast. > > Ran tier1 on Oracle platforms, and tier1-4 on linux-x64-debug, linux-aarch64-debug, windows-x64-debug. src/hotspot/share/asm/assembler.hpp line 305: > 303: > 304: > 305: public: I don't think we need this. See below. src/hotspot/share/asm/assembler.hpp line 328: > 326: narrow_cast(x2), > 327: narrow_cast(x3), > 328: narrow_cast(x4)); } I'd rather add new alternative entry points for "int", so the existing callers using uint8_t don't need to perform unnecessary widening and narrowing. Instead of narrow_cast(x), how about (uint8_t)checked_cast(x)? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14822#discussion_r1259116145 PR Review Comment: https://git.openjdk.org/jdk/pull/14822#discussion_r1259115553 From dlong at openjdk.org Tue Jul 11 02:59:14 2023 From: dlong at openjdk.org (Dean Long) Date: Tue, 11 Jul 2023 02:59:14 GMT Subject: RFR: 8311847: Fix -Wconversion for assembler.hpp emit_int8,16 callers In-Reply-To: References: Message-ID: On Tue, 11 Jul 2023 02:52:52 GMT, Dean Long wrote: >> Please review changes to fix -Wconversion warnings that come from assembler_.cpp by adding narrow_casts to the emit_int8,16,24, and 32 functions. And some other fixups with checked_cast. >> >> Ran tier1 on Oracle platforms, and tier1-4 on linux-x64-debug, linux-aarch64-debug, windows-x64-debug. > > src/hotspot/share/asm/assembler.hpp line 328: > >> 326: narrow_cast(x2), >> 327: narrow_cast(x3), >> 328: narrow_cast(x4)); } > > I'd rather add new alternative entry points for "int", so the existing callers using uint8_t don't need to perform unnecessary widening and narrowing. > Instead of narrow_cast(x), how about (uint8_t)checked_cast(x)? How many callers are passing in negative values and actually need these convenience functions? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14822#discussion_r1259116990 From mli at openjdk.org Tue Jul 11 06:58:05 2023 From: mli at openjdk.org (Hamlin Li) Date: Tue, 11 Jul 2023 06:58:05 GMT Subject: RFR: 8311178: JMH tests don't scale well when sharing output buffers In-Reply-To: References: Message-ID: On Mon, 10 Jul 2023 08:17:59 GMT, Hamlin Li wrote: >> @Hamlin-Li >> The PR is fully correct. >> Don't forget, every Java instance method has a specific argument which called "this". That is why @State annotation is working. > > @kuksenko @swati-sha Thanks for explanation. I can understand what you said. > But I'm still not quite sure, as I remember jmh does some code manipulation or instrumentation at source code (or bytecode level?), so the jmh test code you write or see might not be the exact code to be executed at runtime. > It's better to be reviewed further by some one more familiar with jmh, or could you add some data comparing the performance difference, so we can tell it easily? > @Hamlin-Li I am one of JMH's authors. I know how it works. There is no need for tests. @kuksenko Thanks for the confirmation. Then the benchmark modification looks good to me. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14746#issuecomment-1630252431 From mli at openjdk.org Tue Jul 11 06:58:02 2023 From: mli at openjdk.org (Hamlin Li) Date: Tue, 11 Jul 2023 06:58:02 GMT Subject: RFR: 8311178: JMH tests don't scale well when sharing output buffers In-Reply-To: References: Message-ID: On Sat, 1 Jul 2023 07:53:17 GMT, Swati Sharma wrote: > The below benchmark files have scaling issues due to cache contention and leads to poor scaling when run on multiple threads. The patch sets the scope from benchmark level to thread level to fix the issue: > - org/openjdk/bench/java/io/DataOutputStreamTest.java > - org/openjdk/bench/java/lang/ArrayCopyObject.java > - org/openjdk/bench/java/lang/ArrayFiddle.java > - org/openjdk/bench/java/time/format/DateTimeFormatterBench.java > - org/openjdk/bench/jdk/incubator/vector/IndexInRangeBenchmark.java > - org/openjdk/bench/jdk/incubator/vector/MemorySegmentVectorAccess.java > - org/openjdk/bench/jdk/incubator/vector/StoreMaskedBenchmark.java > - org/openjdk/bench/jdk/incubator/vector/StoreMaskedIOOBEBenchmark.java > - org/openjdk/bench/vm/compiler/ArrayFill.java > - org/openjdk/bench/vm/compiler/IndexVector.java > > Also removing the static scope for variables in org/openjdk/bench/jdk/incubator/vector/VectorFPtoIntCastOperations.java for better scaling. > > Please review and share your feedback. > > Thanks, > Swati Marked as reviewed by mli (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14746#pullrequestreview-1523547091 From dlong at openjdk.org Tue Jul 11 07:50:06 2023 From: dlong at openjdk.org (Dean Long) Date: Tue, 11 Jul 2023 07:50:06 GMT Subject: RFR: 8311847: Fix -Wconversion for assembler.hpp emit_int8,16 callers In-Reply-To: References: Message-ID: <_LzEAPWveKpz9XieL79V0W_czkOfo2LevXdNPkt73wc=.7fd3df3a-a8bd-4ef2-8776-c0b20e04cf7d@github.com> On Tue, 11 Jul 2023 02:54:13 GMT, Dean Long wrote: >> Please review changes to fix -Wconversion warnings that come from assembler_.cpp by adding narrow_casts to the emit_int8,16,24, and 32 functions. And some other fixups with checked_cast. >> >> Ran tier1 on Oracle platforms, and tier1-4 on linux-x64-debug, linux-aarch64-debug, windows-x64-debug. > > src/hotspot/share/asm/assembler.hpp line 305: > >> 303: >> 304: >> 305: public: > > I don't think we need this. See below. Nevermind, I tried my alternative idea below and it didn't work. For these particular cases where we only care about going to uint8_t, we could check is8bit(). Another trick I've seen is checking if (val >> width) is 0 or -1. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14822#discussion_r1259328766 From tschatzl at openjdk.org Tue Jul 11 07:54:18 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 11 Jul 2023 07:54:18 GMT Subject: [jdk21] RFR: 8311548: AArch64: [ZGC] Many tests fail with "assert(allocates2(pc)) failed: not in CodeBuffer memory" on some CPUs In-Reply-To: <-LGMh8VX4oMdYfsPxFFL5tb7dCyPzhUltbLF2M8FuDQ=.22248b92-cba4-4e47-b125-9a1d18ff7c8f@github.com> References: <-LGMh8VX4oMdYfsPxFFL5tb7dCyPzhUltbLF2M8FuDQ=.22248b92-cba4-4e47-b125-9a1d18ff7c8f@github.com> Message-ID: On Mon, 10 Jul 2023 22:08:41 GMT, Hao Sun wrote: > Hi all, > > This pull request contains a backport of commit [4b1403d0](https://github.com/openjdk/jdk/commit/4b1403d06b99b91ddd89ad6e54669b0595f1f8e5) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Hao Sun on 10 Jul 2023 and was reviewed by Axel Boldt-Christmas, Fei Yang, Kim Barrett and Thomas Schatzl. > > Thanks! Lgtm. Ship it. ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk21/pull/108#pullrequestreview-1523656640 From epeter at openjdk.org Tue Jul 11 08:00:13 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 11 Jul 2023 08:00:13 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v22] In-Reply-To: References: Message-ID: On Thu, 6 Jul 2023 07:23:39 GMT, Emanuel Peter wrote: >> For some changes to `SuperWord`, and maybe auto-vectorization in general, I want to strengthen the IR Framework. >> >> **Motivation** >> I want to not just find the relevant IR nodes, but also assert that they have the maximal length that they could have on the respective platform (given the CPU features and `MaxVectorSize`). Without this verification it is possible that a future change leads to a regression where we still vectorize but at shorter vector widths as before - leading to performance loss. >> >> **How to use it** >> >> All `IRNode`s in `test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java` that are created with `vectorNode` are now all matched with their `type` and `size`. The regex might now look something like this: >> >> `"(\d+(\s){2}(VectorCastF2X.*)+(\s){2}===.*vector[A-Za-z][8]:{int})"` >> which would match with IR nodes dumped like that: >> `1150 VectorCastF2X === _ 1151 [[ 1146 ]] #vectory[8]:{int} ...` >> >> The goal was to keep it simple and straight forward. In most cases, you can just use the nodes as before, and implicitly we now check for maximal size automatically. However, in some cases we want to ensure there is no or only a limited number of nodes (`failOn` or comparison `<` or `<=` or `=0`) - in those cases we usually want to make sure there is not any node of any size, so we match with any size by default. The size can also explicitly be constrained using `IRNode.VECTOR_SIZE`. >> >> Some examples: >> 1. `@IR(counts = {IRNode.LOAD_VECTOR_I, " >0 "})` -> search for a `LoadVector` node with `type` `int`, and maximal `size` possible on the machine (limited by CPU features and `MaxVectorSize`). This is the most common use case. >> 2. `@IR(failOn = { IRNode.LOAD_VECTOR_L, IRNode.STORE_VECTOR })` -> fail if there is a `LoadVector` with type `long`, of `any` size. >> 3. `@IR(counts = { IRNode.XOR_VI, IRNode.VECTOR_SIZE_4, " > 0 "})` -> find at least one `XorV` node with type `int` and exactly `4` elements. Useful for VectorAPI when the vector species is fixed. >> 4. `@IR(counts = { IRNode.LOAD_VECTOR_D, IRNode.VECTOR_SIZE + "min(4, max_double)", " >0 " })` -> search for a `LoadVector` node with `type` `double`, and `size` exactly equals to `min(4, max_double)` (so 4 elements, or if the hardware allows fewer `doubles`, then that number). >> 5. `@IR(counts = { IRNode.ABS_VF, IRNode.VECTOR_SIZE + "min(LoopMaxUnroll, max_float)", ">= 1" })` -> find at least one `AbsV` nodes with type `float`, and the `size` exactly equals to the smaller of... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > TestSpillTheBeans.java The issue is that CascadeLake has `MaxVectorSize=64` , but internally only uses 32 byte. Except if you explicitly set `-XX:MaxVectorSize=64` (not it is no longer "default_cascade_lake"), then it actually uses 64 byte. This makes it impossible to just read `MaxVectorSize`. If the value is 64, the VM may now use 32 byte (default) or 64 byte (if explicitly set). My strategy will be to detect when we are on Cascade Lake, and just avoid doing the vector width checks on that platform - or at least make it weaker. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14539#issuecomment-1630333989 From haosun at openjdk.org Tue Jul 11 08:01:22 2023 From: haosun at openjdk.org (Hao Sun) Date: Tue, 11 Jul 2023 08:01:22 GMT Subject: [jdk21] RFR: 8311548: AArch64: [ZGC] Many tests fail with "assert(allocates2(pc)) failed: not in CodeBuffer memory" on some CPUs In-Reply-To: References: <-LGMh8VX4oMdYfsPxFFL5tb7dCyPzhUltbLF2M8FuDQ=.22248b92-cba4-4e47-b125-9a1d18ff7c8f@github.com> Message-ID: On Tue, 11 Jul 2023 07:51:24 GMT, Thomas Schatzl wrote: >> Hi all, >> >> This pull request contains a backport of commit [4b1403d0](https://github.com/openjdk/jdk/commit/4b1403d06b99b91ddd89ad6e54669b0595f1f8e5) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. >> >> The commit being backported was authored by Hao Sun on 10 Jul 2023 and was reviewed by Axel Boldt-Christmas, Fei Yang, Kim Barrett and Thomas Schatzl. >> >> Thanks! > > Lgtm. Ship it. Thanks for your review @tschatzl I don't think the GHA failure is related to this patch. Hence, let me integrate it. ------------- PR Comment: https://git.openjdk.org/jdk21/pull/108#issuecomment-1630337419 From haosun at openjdk.org Tue Jul 11 08:01:23 2023 From: haosun at openjdk.org (Hao Sun) Date: Tue, 11 Jul 2023 08:01:23 GMT Subject: [jdk21] Integrated: 8311548: AArch64: [ZGC] Many tests fail with "assert(allocates2(pc)) failed: not in CodeBuffer memory" on some CPUs In-Reply-To: <-LGMh8VX4oMdYfsPxFFL5tb7dCyPzhUltbLF2M8FuDQ=.22248b92-cba4-4e47-b125-9a1d18ff7c8f@github.com> References: <-LGMh8VX4oMdYfsPxFFL5tb7dCyPzhUltbLF2M8FuDQ=.22248b92-cba4-4e47-b125-9a1d18ff7c8f@github.com> Message-ID: <9ReFswWK3pV410ITuB5Xuxw36DUzYY1Ja9h2wsWSBD8=.9a4cbc80-11cb-4a61-bdad-acb1bea475a2@github.com> On Mon, 10 Jul 2023 22:08:41 GMT, Hao Sun wrote: > Hi all, > > This pull request contains a backport of commit [4b1403d0](https://github.com/openjdk/jdk/commit/4b1403d06b99b91ddd89ad6e54669b0595f1f8e5) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. > > The commit being backported was authored by Hao Sun on 10 Jul 2023 and was reviewed by Axel Boldt-Christmas, Fei Yang, Kim Barrett and Thomas Schatzl. > > Thanks! This pull request has now been integrated. Changeset: 8808ec3f Author: Hao Sun URL: https://git.openjdk.org/jdk21/commit/8808ec3fbcab8ec9db22d25e508b89fe8db18b97 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8311548: AArch64: [ZGC] Many tests fail with "assert(allocates2(pc)) failed: not in CodeBuffer memory" on some CPUs Reviewed-by: tschatzl Backport-of: 4b1403d06b99b91ddd89ad6e54669b0595f1f8e5 ------------- PR: https://git.openjdk.org/jdk21/pull/108 From duke at openjdk.org Tue Jul 11 08:07:13 2023 From: duke at openjdk.org (Swati Sharma) Date: Tue, 11 Jul 2023 08:07:13 GMT Subject: RFR: 8311178: JMH tests don't scale well when sharing output buffers In-Reply-To: References: Message-ID: On Mon, 10 Jul 2023 16:07:44 GMT, John Jiang wrote: > Not review this PR, but just raise a question. Should a JMH test, at least in JDK repo, always uses `@State(Scope.Thread)`, even though it uses only one thread? > > I just looked through those JMH tests, and found all of them, like the bellows, don't specify the number of threads via `@Threads`. > > ``` > org/openjdk/bench/java/io/DataOutputStreamTest.java > org/openjdk/bench/java/lang/ArrayCopyObject.java > ``` > > I suppose the default number of threads is 1. Maybe the default value will be overridden via the commands when running these JMH tests in bulk (?) Not always but state should be set to Thread level when the class variables are shared by multiple threads, also the variable scope should be set to non static. While running these benchmarks in multithreaded way, observed scaling issues as variables are shared but scope is set to benchmark level. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14746#issuecomment-1630347091 From duke at openjdk.org Tue Jul 11 08:11:03 2023 From: duke at openjdk.org (Swati Sharma) Date: Tue, 11 Jul 2023 08:11:03 GMT Subject: RFR: 8311178: JMH tests don't scale well when sharing output buffers In-Reply-To: References: Message-ID: On Wed, 5 Jul 2023 14:39:08 GMT, Eric Caspole wrote: >> The below benchmark files have scaling issues due to cache contention and leads to poor scaling when run on multiple threads. The patch sets the scope from benchmark level to thread level to fix the issue: >> - org/openjdk/bench/java/io/DataOutputStreamTest.java >> - org/openjdk/bench/java/lang/ArrayCopyObject.java >> - org/openjdk/bench/java/lang/ArrayFiddle.java >> - org/openjdk/bench/java/time/format/DateTimeFormatterBench.java >> - org/openjdk/bench/jdk/incubator/vector/IndexInRangeBenchmark.java >> - org/openjdk/bench/jdk/incubator/vector/MemorySegmentVectorAccess.java >> - org/openjdk/bench/jdk/incubator/vector/StoreMaskedBenchmark.java >> - org/openjdk/bench/jdk/incubator/vector/StoreMaskedIOOBEBenchmark.java >> - org/openjdk/bench/vm/compiler/ArrayFill.java >> - org/openjdk/bench/vm/compiler/IndexVector.java >> >> Also removing the static scope for variables in org/openjdk/bench/jdk/incubator/vector/VectorFPtoIntCastOperations.java for better scaling. >> >> Please review and share your feedback. >> >> Thanks, >> Swati > > LGTM. > Eric Thanks all for reviewing this ! @ericcaspole , @sviswa7 , @Hamlin-Li ------------- PR Comment: https://git.openjdk.org/jdk/pull/14746#issuecomment-1630351831 From roland at openjdk.org Tue Jul 11 08:39:29 2023 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 11 Jul 2023 08:39:29 GMT Subject: RFR: 8303279: C2: crash in SubTypeCheckNode::sub() at IGVN split if [v4] In-Reply-To: References: Message-ID: <78Wpe9Xv51ncdMoHf9X5L-5v5SZf1L-jUum3_G-OFvw=.a40aa522-6435-4965-a5aa-f7ae5995380b@github.com> > The crash occurs because at split if during IGVN, a `SubTypeCheck` is > created with null as input. That happens because the control path the > `SubTypeCheck` is cloned for is dead. To fix that I propose delaying > split if until dead paths are collapsed. > > I added an assert to check a nullable first input to `SubTypeCheck` > nodes (which should be impossible because it should be null > checked). When I ran testing, a number of cases showed up with known > non null values non properly marked as non null. I fixed them. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14678/files - new: https://git.openjdk.org/jdk/pull/14678/files/017d60b1..b0e3cf65 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14678&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14678&range=02-03 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14678.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14678/head:pull/14678 PR: https://git.openjdk.org/jdk/pull/14678 From roland at openjdk.org Tue Jul 11 08:39:29 2023 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 11 Jul 2023 08:39:29 GMT Subject: RFR: 8303279: C2: crash in SubTypeCheckNode::sub() at IGVN split if [v4] In-Reply-To: References: Message-ID: <3Dd1FhILH88RPC26b9J-VtWqSSsrTQgyFTe3HYHcu5E=.081610c1-7ead-40e0-8022-578729cc4cd7@github.com> On Tue, 27 Jun 2023 16:31:15 GMT, Vladimir Kozlov wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > Looks reasonable. @vnkozlov @iwanowww @simonis @TobiHartmann thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14678#issuecomment-1630392539 From roland at openjdk.org Tue Jul 11 08:39:29 2023 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 11 Jul 2023 08:39:29 GMT Subject: RFR: 8303279: C2: crash in SubTypeCheckNode::sub() at IGVN split if [v3] In-Reply-To: References: <5Oi18HUJNORp0_wYO2nD_aUGaoyvJCuyT4YmppEWvgA=.f40a1a10-7ee2-483d-bb6c-2fb7ec3218ad@github.com> Message-ID: On Mon, 3 Jul 2023 05:15:47 GMT, Tobias Hartmann wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > test/hotspot/jtreg/compiler/splitif/TestCrashAtIGVNSplitIfSubType.java line 28: > >> 26: * @bug 8303279 >> 27: * @summary C2: crash in SubTypeCheckNode::sub() at IGVN split if >> 28: * @run main/othervm -XX:-TieredCompilation -XX:-BackgroundCompilation -XX:+UnlockDiagnosticVMOptions -XX:+StressIGVN -XX:StressSeed=598200189 TestCrashAtIGVNSplitIfSubType > > Maybe add a `@run` without a fixed seed to give this a chance to still trigger in the future. Right. I just made that change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14678#discussion_r1259397707 From dlong at openjdk.org Tue Jul 11 08:42:14 2023 From: dlong at openjdk.org (Dean Long) Date: Tue, 11 Jul 2023 08:42:14 GMT Subject: RFR: 8311847: Fix -Wconversion for assembler.hpp emit_int8,16 callers In-Reply-To: References: Message-ID: On Tue, 11 Jul 2023 01:26:44 GMT, Coleen Phillimore wrote: > Please review changes to fix -Wconversion warnings that come from assembler_.cpp by adding narrow_casts to the emit_int8,16,24, and 32 functions. And some other fixups with checked_cast. > > Ran tier1 on Oracle platforms, and tier1-4 on linux-x64-debug, linux-aarch64-debug, windows-x64-debug. src/hotspot/share/asm/assembler.hpp line 297: > 295: constexpr T narrow_cast(int x) const { > 296: if (x < 0) { > 297: assert(x > -(std::numeric_limits::max() - 1), "too negative"); Suggestion: assert(x >= -std::numeric_limits::max(), "too negative"); // >= -256 for 8 bits ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14822#discussion_r1259405009 From thartmann at openjdk.org Tue Jul 11 08:44:03 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 11 Jul 2023 08:44:03 GMT Subject: RFR: 8303279: C2: crash in SubTypeCheckNode::sub() at IGVN split if [v4] In-Reply-To: <78Wpe9Xv51ncdMoHf9X5L-5v5SZf1L-jUum3_G-OFvw=.a40aa522-6435-4965-a5aa-f7ae5995380b@github.com> References: <78Wpe9Xv51ncdMoHf9X5L-5v5SZf1L-jUum3_G-OFvw=.a40aa522-6435-4965-a5aa-f7ae5995380b@github.com> Message-ID: <6jfyGj1W5cCBbixyltY6-uJNgMbNkEJekq2sC32qtzg=.ecc63950-acf8-49ff-abc5-491d16f0340e@github.com> On Tue, 11 Jul 2023 08:39:29 GMT, Roland Westrelin wrote: >> The crash occurs because at split if during IGVN, a `SubTypeCheck` is >> created with null as input. That happens because the control path the >> `SubTypeCheck` is cloned for is dead. To fix that I propose delaying >> split if until dead paths are collapsed. >> >> I added an assert to check a nullable first input to `SubTypeCheck` >> nodes (which should be impossible because it should be null >> checked). When I ran testing, a number of cases showed up with known >> non null values non properly marked as non null. I fixed them. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Marked as reviewed by thartmann (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14678#pullrequestreview-1523761744 From pli at openjdk.org Tue Jul 11 10:04:19 2023 From: pli at openjdk.org (Pengfei Li) Date: Tue, 11 Jul 2023 10:04:19 GMT Subject: RFR: 8311691: C2: Remove legacy code related to PostLoopMultiversioning Message-ID: As discussed in JDK-8308994, we are working on re-implementation of post loop vectorization and planning to refactor current SuperWord code. As nobody is using or maintaining the old implementation now, to make the refactoring work easier, we propose to remove the legacy code of the old implementation first. This patch removes all code realted to `PostLoopMultiversioning` inside and outside SuperWord. After the removal, `SLP_extract()` in SuperWord should only work on main loops. So we also removed all `is_main_loop()` checks inside and added assertions instead. Tested with hotspot::hotspot_all_no_apps, jdk tier1~3, langtools tier1 and 100k fuzzer tests on x86 and AArch64, no issue is found. ------------- Commit messages: - 8311691: C2: Remove legacy code related to PostLoopMultiversioning Changes: https://git.openjdk.org/jdk/pull/14824/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14824&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8311691 Stats: 608 lines in 9 files changed: 7 ins; 564 del; 37 mod Patch: https://git.openjdk.org/jdk/pull/14824.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14824/head:pull/14824 PR: https://git.openjdk.org/jdk/pull/14824 From pli at openjdk.org Tue Jul 11 10:06:36 2023 From: pli at openjdk.org (Pengfei Li) Date: Tue, 11 Jul 2023 10:06:36 GMT Subject: RFR: 8308994: C2: Re-implement experimental post loop vectorization [v2] In-Reply-To: References: Message-ID: On Fri, 7 Jul 2023 15:11:20 GMT, Vladimir Kozlov wrote: >> Pengfei Li has updated the pull request incrementally with one additional commit since the last revision: >> >> Address part of comments from Emanuel > > Yes, you can remove old code first. And work on new implementation after that. Thanks @vnkozlov and @eme64, I just created https://github.com/openjdk/jdk/pull/14824 for the legacy code cleanup. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14581#issuecomment-1630536093 From coleenp at openjdk.org Tue Jul 11 12:19:12 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 11 Jul 2023 12:19:12 GMT Subject: RFR: 8311847: Fix -Wconversion for assembler.hpp emit_int8,16 callers In-Reply-To: References: Message-ID: On Tue, 11 Jul 2023 08:38:56 GMT, Dean Long wrote: >> Please review changes to fix -Wconversion warnings that come from assembler_.cpp by adding narrow_casts to the emit_int8,16,24, and 32 functions. And some other fixups with checked_cast. >> >> Ran tier1 on Oracle platforms, and tier1-4 on linux-x64-debug, linux-aarch64-debug, windows-x64-debug. > > src/hotspot/share/asm/assembler.hpp line 297: > >> 295: constexpr T narrow_cast(int x) const { >> 296: if (x < 0) { >> 297: assert(x > -(std::numeric_limits::max() - 1), "too negative"); > > Suggestion: > > assert(x >= -std::numeric_limits::max() - 1, "too negative"); // >= -256 for 8 bits > > The old code would have checked > -254 I think, when it should be > -257 or >= -256 Thank you - I meant to check these values. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14822#discussion_r1259649487 From coleenp at openjdk.org Tue Jul 11 12:19:13 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 11 Jul 2023 12:19:13 GMT Subject: RFR: 8311847: Fix -Wconversion for assembler.hpp emit_int8,16 callers In-Reply-To: References: Message-ID: <7-pI5PdfVQN7cONgjHkI1JhsDj1L3bovH9RUL-54TdI=.a50593ff-7171-4fef-9dfa-4f58c2596966@github.com> On Tue, 11 Jul 2023 02:56:01 GMT, Dean Long wrote: >> src/hotspot/share/asm/assembler.hpp line 328: >> >>> 326: narrow_cast(x2), >>> 327: narrow_cast(x3), >>> 328: narrow_cast(x4)); } >> >> I'd rather add new alternative entry points for "int", so the existing callers using uint8_t don't need to perform unnecessary widening and narrowing. >> Instead of narrow_cast(x), how about (uint8_t)checked_cast(x)? > > How many callers are passing in negative values and actually need these convenience functions? The overloading was really unhappy with the version of the functions that pass uint8_t for all the arguments. The callers might pass a couple uint8_t but then also a random selection of int and for one or more of the other parameters. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14822#discussion_r1259648687 From coleenp at openjdk.org Tue Jul 11 12:32:12 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 11 Jul 2023 12:32:12 GMT Subject: RFR: 8311847: Fix -Wconversion for assembler.hpp emit_int8,16 callers In-Reply-To: References: <3IzSseKzc1au1HBwc6jZCV15Qmqdu_A1_O9FTUHzx5Y=.be7154c1-8488-4d12-ba66-cf476178b5c7@github.com> Message-ID: On Tue, 11 Jul 2023 02:34:14 GMT, Dean Long wrote: >> src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 265: >> >>> 263: int64_t chk = val >> (nbits - 1); >>> 264: guarantee (chk == -1 || chk == 0, "Field too big for insn"); >>> 265: uint64_t uval = val; >> >> Suggestion: >> >> int32_t val32 = checked_cast(val); >> unsigned uval = checked_cast(val32); > > uint64_t uval64 = val; > unsigned uval = checked_cast(uval64); > This won't work for negative values, right? val comes in signed, so we want to just chop off the sign. checked_cast(signed val) will assert. checked_cast<> doesn't do sign conversion. We don't have a cast that does sign conversion. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14822#discussion_r1259663449 From coleenp at openjdk.org Tue Jul 11 12:41:57 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 11 Jul 2023 12:41:57 GMT Subject: RFR: 8311847: Fix -Wconversion for assembler.hpp emit_int8,16 callers In-Reply-To: References: Message-ID: On Tue, 11 Jul 2023 02:28:07 GMT, Dean Long wrote: >> Please review changes to fix -Wconversion warnings that come from assembler_.cpp by adding narrow_casts to the emit_int8,16,24, and 32 functions. And some other fixups with checked_cast. >> >> Ran tier1 on Oracle platforms, and tier1-4 on linux-x64-debug, linux-aarch64-debug, windows-x64-debug. > > src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 556: > >> 554: } else { >> 555: i->f(0b01, 25, 24); >> 556: i->f(checked_cast(offset() >> size), 21, 10); > > I remember there being issues with checked_cast and sign extension. When going from int64_t to unsigned and back, I think we need to do int64_t --> int32_t --> unsigned, and not int64_t --> uint64_t --> unsigned. Is that what checked_cast will do? To be safe, or at least make it easier to understand, shound't we use checked_cast only to change the size or sign, but not both? So going to int64_t to unsigned would require two checked_casts. The assignment does the sign conversion first. The mask removes the top half with sign extension (right_n_bits is a macro that somehow returns intptr_t). So the check_cast<> just converts unsigned 64 bit to unsigned 32, which shouldn't be necessary since we just chopped off the top bits. uint64_t uval = val; unsigned mask = checked_cast(right_n_bits(nbits)); uval &= mask; f(checked_cast(uval), lsb + nbits - 1, lsb); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14822#discussion_r1259674845 From coleenp at openjdk.org Tue Jul 11 12:59:02 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 11 Jul 2023 12:59:02 GMT Subject: RFR: 8311847: Fix -Wconversion for assembler.hpp emit_int8,16 callers In-Reply-To: References: Message-ID: On Tue, 11 Jul 2023 01:26:44 GMT, Coleen Phillimore wrote: > Please review changes to fix -Wconversion warnings that come from assembler_.cpp by adding narrow_casts to the emit_int8,16,24, and 32 functions. And some other fixups with checked_cast. > > Ran tier1 on Oracle platforms, and tier1-4 on linux-x64-debug, linux-aarch64-debug, windows-x64-debug. Thanks for looking at this and your help. I think I replied to all your comments but I think there's more work to do to make this safer. ------------- PR Review: https://git.openjdk.org/jdk/pull/14822#pullrequestreview-1524185447 From coleenp at openjdk.org Tue Jul 11 12:59:05 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 11 Jul 2023 12:59:05 GMT Subject: RFR: 8311847: Fix -Wconversion for assembler.hpp emit_int8,16 callers In-Reply-To: References: Message-ID: On Tue, 11 Jul 2023 02:09:40 GMT, Dean Long wrote: >> Please review changes to fix -Wconversion warnings that come from assembler_.cpp by adding narrow_casts to the emit_int8,16,24, and 32 functions. And some other fixups with checked_cast. >> >> Ran tier1 on Oracle platforms, and tier1-4 on linux-x64-debug, linux-aarch64-debug, windows-x64-debug. > > src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 224: > >> 222: unsigned target = *(unsigned *)a; >> 223: target &= ~mask; >> 224: target |= checked_cast(val); > > Any value that doesn't fit in 32 bits is going to fail, so it's tempting to force the callers to pass 32-bit types, but that's a bigger change. How about something like this: > > static ALWAYSINLINE void patch(address a, int msb, int lsb, uint32_t val) { > /* original code, no additional checked_cast needed */ > } > > static ALWAYSINLINE void patch(address a, int msb, int lsb, uint64_t val) { > patch(a, msb, lsb, checked_cast(val)); > } I'll try this suggestion. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14822#discussion_r1259690565 From coleenp at openjdk.org Tue Jul 11 12:59:07 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 11 Jul 2023 12:59:07 GMT Subject: RFR: 8311847: Fix -Wconversion for assembler.hpp emit_int8,16 callers In-Reply-To: References: <3IzSseKzc1au1HBwc6jZCV15Qmqdu_A1_O9FTUHzx5Y=.be7154c1-8488-4d12-ba66-cf476178b5c7@github.com> Message-ID: On Tue, 11 Jul 2023 12:29:09 GMT, Coleen Phillimore wrote: >> uint64_t uval64 = val; >> unsigned uval = checked_cast(uval64); >> This won't work for negative values, right? > > val comes in signed, so we want to just chop off the sign. checked_cast(signed val) will assert. checked_cast<> doesn't do sign conversion. We don't have a cast that does sign conversion. Not sure I trust this either. I'm going to write a gtest for this. >> How many callers are passing in negative values and actually need these convenience functions? > > The overloading was really unhappy with the version of the functions that pass uint8_t for all the arguments. The callers might pass a couple uint8_t but then also a random selection of int and for one or more of the other parameters. > How many callers? Actually quite a lot for the emit_int16/24/32 ones. They pass an int imm8 value and have some (value | encode) parameter where encode is an int passed in. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14822#discussion_r1259682829 PR Review Comment: https://git.openjdk.org/jdk/pull/14822#discussion_r1259685864 From coleenp at openjdk.org Tue Jul 11 12:59:11 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 11 Jul 2023 12:59:11 GMT Subject: RFR: 8311847: Fix -Wconversion for assembler.hpp emit_int8,16 callers In-Reply-To: References: <3IzSseKzc1au1HBwc6jZCV15Qmqdu_A1_O9FTUHzx5Y=.be7154c1-8488-4d12-ba66-cf476178b5c7@github.com> Message-ID: On Tue, 11 Jul 2023 02:31:00 GMT, Dean Long wrote: >> src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 268: >> >>> 266: unsigned mask = checked_cast(right_n_bits(nbits)); >>> 267: uval &= mask; >>> 268: f(checked_cast(uval), lsb + nbits - 1, lsb); >> >> Suggestion: >> >> f(uval, lsb + nbits - 1, lsb); > > See my comment below about not trusting checked_cast to do the right thing for int64_t --> unsigned. This one seems to mask off the top half so the checked_cast<> will succeed, ie just change the type. >> src/hotspot/share/asm/assembler.hpp line 305: >> >>> 303: >>> 304: >>> 305: public: >> >> I don't think we need this. See below. > > Nevermind, I tried my alternative idea below and it didn't work. For these particular cases where we only care about going to uint8_t, we could check is8bit(). Another trick I've seen is checking if (val >> width) is 0 or -1. There is an is8bit() test that precedes these casts. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14822#discussion_r1259682275 PR Review Comment: https://git.openjdk.org/jdk/pull/14822#discussion_r1259684184 From roland at openjdk.org Tue Jul 11 14:10:51 2023 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 11 Jul 2023 14:10:51 GMT Subject: RFR: 8308869: C2: use profile data in subtype checks when profile has more than one class [v8] In-Reply-To: <0t_NDTIV7WMVq09RidCDQ3hh0sW3FEDGhwne8_7GD_E=.adf6a3df-2782-4975-b59a-85bd0b89d9d4@github.com> References: <0t_NDTIV7WMVq09RidCDQ3hh0sW3FEDGhwne8_7GD_E=.adf6a3df-2782-4975-b59a-85bd0b89d9d4@github.com> Message-ID: > In this simple micro benchmark: > > https://github.com/franz1981/java-puzzles/blob/main/src/main/java/red/hat/puzzles/polymorphism/RequireNonNullCheckcastScalability.java#L70 > > Performance drops sharply with polluted profile: > > > Benchmark (typePollution) Mode Cnt Score Error Units > RequireNonNullCheckcastScalability.isDuplicated1 false thrpt 10 1453.372 ? 24.919 ops/us > > > to: > > > Benchmark (typePollution) Mode Cnt Score Error Units > RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 28.579 ? 2.280 ops/us > > > The test has 2 type checks to 2 different interfaces so caching with > `secondary_super_cache` doesn't help. > > The micro-benchmark only uses 2 different concrete classes > (`DuplicatedContext` and `NonDuplicatedContext`) and they are recorded > in profile data at the type checks. But c2 only take advantage of > profile data at type checks if they report a single class. > > What I propose is that the full blown type check expanded in > `Phase::gen_subtype_check()` takes advantage of profile data. So in > the case of the micro benchmark, before checking the > `secondary_super_cache`, generated code checks whether the object > being type checked is a `DuplicatedContext` or a > `NonDuplicatedContext`. > > This works fairly well on this micro benchmark: > > > Benchmark (typePollution) Mode Cnt Score Error Units > RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 871.224 ? 20.750 ops/us > > > It also scales much better if there are multiple threads running the > same test (`secondary_super_cache` doesn't scale well: see > JDK-8180450). > > Now if the micro-benchmark is changed according to the comment: > > https://github.com/franz1981/java-puzzles/blob/d2d60af3d0dfe7a2567807395138edcb1d1c24f5/src/main/java/red/hat/puzzles/polymorphism/RequireNonNullCheckcastScalability.java#L62 > > so the type check hits in the `secondary_super_cache`, the current > code performs much better: > > > Benchmark (typePollution) Mode Cnt Score Error Units > RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 871.224 ? 20.750 ops/us > > > but leveraging profiling as explained above performs even better: > > > Benchmark (typePollution) Mode Cnt Score Error Units > RequireNonNullCheckcastScalability.isDuplic... Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: - never common SubTypeCheckNode nodes - keep both ways of doing profile ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14375/files - new: https://git.openjdk.org/jdk/pull/14375/files/101399eb..4072e7ea Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14375&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14375&range=06-07 Stats: 31 lines in 10 files changed: 21 ins; 3 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/14375.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14375/head:pull/14375 PR: https://git.openjdk.org/jdk/pull/14375 From roland at openjdk.org Tue Jul 11 14:11:29 2023 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 11 Jul 2023 14:11:29 GMT Subject: RFR: 8308869: C2: use profile data in subtype checks when profile has more than one class [v7] In-Reply-To: References: <0t_NDTIV7WMVq09RidCDQ3hh0sW3FEDGhwne8_7GD_E=.adf6a3df-2782-4975-b59a-85bd0b89d9d4@github.com> Message-ID: On Fri, 30 Jun 2023 17:30:42 GMT, Vladimir Ivanov wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: >> >> - whitespace >> - reworked change >> - Merge branch 'master' into JDK-8308869 >> - more test failures >> - Merge branch 'master' into JDK-8308869 >> - whitespaces >> - test failures >> - review >> - 32 bit fix >> - white spaces >> - ... and 1 more: https://git.openjdk.org/jdk/compare/67d6bdee...101399eb > > test/hotspot/jtreg/compiler/c2/irTests/ProfileAtTypeCheck.java line 44: > >> 42: flags.add("-XX:TypeProfileSubTypeCheckCommonThreshold=90"); >> 43: if (!Platform.is32bit()) { >> 44: flags.add("-XX:-UseCompressedClassPointers"); > > What's the purpose of `-XX:-UseCompressedClassPointers` on 64-bit platforms? Make it easier to match the IR? It is to make it easier to match the IR. I could update the rules so it's not required. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14375#discussion_r1259796098 From roland at openjdk.org Tue Jul 11 14:17:10 2023 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 11 Jul 2023 14:17:10 GMT Subject: RFR: 8308869: C2: use profile data in subtype checks when profile has more than one class [v6] In-Reply-To: References: <0t_NDTIV7WMVq09RidCDQ3hh0sW3FEDGhwne8_7GD_E=.adf6a3df-2782-4975-b59a-85bd0b89d9d4@github.com> Message-ID: On Fri, 23 Jun 2023 18:47:59 GMT, Vladimir Ivanov wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: >> >> - more test failures >> - Merge branch 'master' into JDK-8308869 >> - whitespaces >> - test failures >> - review >> - 32 bit fix >> - white spaces >> - fix & test > > Overall, I'd prefer to leave commoning considerations for a separate enhancement. > > Embedding `ciCallProfile` looks to me much cleaner than exploding its content into node inputs. > > Having profile info explicitly fed into `SubTypeCheck` as node inputs in practice defeats any possible sharing unless the nodes are constructed from the very same profile data. The types, their order, and frequencies have to perfectly match in order for commoning to happen. You already have `IfNode::same_condition()` to alleviate some of the effects of broken sharing. > > When you embed profiling info you are left with a choice how to common nodes (whether to take profiling info into account or not). But if you simply ignore it until macro expansion, the behavior will stay the same as it is now. > > I prefer the patch to be focused on slow path case (reduce the frequency of secondary super cache checks & updates) and leave the rest for future considerations. > > As an example, it's still an open question for me should `IfNode::search_identical()` take profile info into account. Current patch ignores profile-related info (`IfNode::same_condition()` check), but maybe it is worth merging the profiles instead? > > >> Some SubTypeCheck nodes have no profile data associated with them. > > I don't consider footprint as an issue here. `SubTypeCheck`s are relatively rare and `ciCallProfile` size is quite small for any practical morphism limits. Additional profiling may introduce more about 1-2 additional slots (rather than 10s or 100s) and the main footprint hit will be on runtime side (in MDOs). > >> Is that out of concern that getting the code done on all platforms will be too complicated? > > It does look like an excessive requirement, but I'm not too much concerned about it. If you think it's better to get the full support all at once, I'm perfectly fine with that. It just seems cleaner to refine profiling part separately. There are open questions which may be well out of scope for the proposed enhancement. > > For example, while `checkcast`/`aastore` behave very similarly to `invokevirtual`/`invokeinterface` (very low rate of failures), `instanceof` is different and can expose very high rates of failures (esp. in case of chained `instanceof` checks). Should we continue profiling for that? (Can C2 benefit from such info? I believe so: we could skip SSC check if failure rate is too high.) > > Also, I refrained from commenting on naming, but `ciCallProfile` does look confusing when it comes to `checkcas... @iwanowww thanks for looking at this some more. > I don't see much value in special handling for nodes without associated bytecode location info. Right. I removed it. > What's the plan if we agree on adjusting profile collection? Should all the platforms be updated all at once? If not, how is it intended to work during transition period? I thought it would be better to update profile collection on all platforms as part of this PR initially so no platforms are left behind. But that's likely unrealistic. I pushed a new commit that I think would support both ways of doing profiling so there's no need to have all platforms in sync. What do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14375#issuecomment-1630912653 From roland at openjdk.org Tue Jul 11 14:21:14 2023 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 11 Jul 2023 14:21:14 GMT Subject: RFR: 8308103: Massive (up to ~30x) increase in C2 compilation time since JDK 17 In-Reply-To: References: Message-ID: On Fri, 30 Jun 2023 17:28:03 GMT, Vladimir Kozlov wrote: >> A long chain of nodes are sunk out of a loop. Every time a node is >> moved out of the loop, a cast is created to pin the node out of the >> loop. When its input is next sunk, the cast is removed (the cast is >> replaced by its input) and a new cast is created. Some nodes on the >> chain have 2 other nodes in the chain as uses. When such a node is >> sunk, 2 cast nodes are created, one for each use. So as the compiler >> moves forward in the chain, the number of cast to remove grows. From >> some profiling, removing those casts is what takes a lot of time. >> >> The fix I propose is, when a node is processed, to check whether a >> cast at the out of loop control was already created for that node and >> to reuse it. >> >> The test case takes 6 minutes when I run it without the fix and 3 >> seconds with it. > > src/hotspot/share/opto/loopopts.cpp line 1704: > >> 1702: cast = prev; >> 1703: } else { >> 1704: register_new_node(cast, x_ctrl); > > Can you move creation of `cast` here so you don't need to destroy it in case of previous cast existance? > Or it is possible that `ConstraintCastNode::make_cast_for_type() can return `null`? Thanks for looking at this, Vladimir. I'm not sure I understand what you're suggesting. Is it to not allocate a new node so it doesn't have to be destroyed if an identical node exist? But without a node it's not possible to rely on IGVN hashing? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14732#discussion_r1259809938 From eliu at openjdk.org Tue Jul 11 14:21:37 2023 From: eliu at openjdk.org (Eric Liu) Date: Tue, 11 Jul 2023 14:21:37 GMT Subject: RFR: 8309893: Integrate ReplicateB/S/I/L/F/D nodes to Replicate node Message-ID: <4zQZ1W7GpPyOY0TGusvqNKUoCORK1WUEwSxRnWC4JVE=.127f84f6-a406-43d2-98e7-52b4fa0b5f3d@github.com> This patch creates ReplicateNode to replace ReplicateB/S/I/L/F/DNode, like other vector nodes introduced recently, e.g., PopulateIndexNode and ReverseVNode, etc. This refers from: https://mail.openjdk.org/pipermail/panama-dev/2020-April/008484.html After merging these nodes, code will be easier to maintain. E.g., matching rules can be simplified. Besides AArch64, this patch tries to keep other ad files as the same before, only supplies some necessary predicate. E.g., for matching rules using ReplicateB before, they are now matching Replicate with a new predicate "Matcher::vector_element_basic_type(n) == T_BYTE". This would be easy for review and lower risks. [TEST] x86: Tested with option "-XX:UseAVX=0/1/2/3". AArch64: Tested on SVE machine and Neon machine. Full jtreg passed without new issue. ------------- Commit messages: - 8309893: Integrate ReplicateB/S/I/L/F/D nodes to Replicate node Changes: https://git.openjdk.org/jdk/pull/14830/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14830&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8309893 Stats: 938 lines in 20 files changed: 171 ins; 391 del; 376 mod Patch: https://git.openjdk.org/jdk/pull/14830.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14830/head:pull/14830 PR: https://git.openjdk.org/jdk/pull/14830 From roland at openjdk.org Tue Jul 11 15:18:55 2023 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 11 Jul 2023 15:18:55 GMT Subject: RFR: 8275202: C2: optimize out more redundant conditions In-Reply-To: <978cgwy3Nb_x7yU6jZz0f6zhTBZfphstisAkBf1Vktc=.283d06eb-4f79-40cf-b8dd-a9c230e59902@github.com> References: <978cgwy3Nb_x7yU6jZz0f6zhTBZfphstisAkBf1Vktc=.283d06eb-4f79-40cf-b8dd-a9c230e59902@github.com> Message-ID: On Wed, 21 Jun 2023 12:47:26 GMT, Roland Westrelin wrote: > This change adds a new loop opts pass to optimize redundant conditions > such as the second one in: > > > if (i < 10) { > if (i < 42) { > > > In the branch of the first if, the type of i can be narrowed down to > [min_jint, 9] which can then be used to constant fold the second > condition. > > The compiler already keeps track of type[n] for every node in the > current compilation unit. That's not sufficient to optimize the > snippet above though because the type of i can only be narrowed in > some sections of the control flow (that is a subset of all > controls). The solution is to build a new table that tracks the type > of n at every control c > > > type'[n, root] = type[n] // initialized from igvn's type table > type'[n, c] = type[n, idom(c)] > > > This pass iterates over the CFG looking for conditions such as: > > > if (i < 10) { > > > that allows narrowing the type of i and updates the type' table > accordingly. > > At a region r: > > > type'[n, r] = meet(type'[n, r->in(1)], type'[n, r->in(2)]...) > > > For a Phi phi at a region r: > > > type'[phi, r] = meet(type'[phi->in(1), r->in(1)], type'[phi->in(2), r->in(2)]...) > > > Once a type is narrowed, uses are enqueued and their types are > computed by calling the Value() methods. If a use's type is narrowed, > it's recorded at c in the type' table. Value() methods retrieve types > from the type table, not the type' table. To address that issue while > leaving Value() methods unchanged, before calling Value() at c, the > type table is updated so: > > > type[n] = type'[n, c] > > > An exception is for Phi::Value which needs to retrieve the type of > nodes are various controls: there, a new type(Node* n, Node* c) > method is used. > > For most n and c, type'[n, c] is likely the same as type[n], the type > recorded in the global igvn table (that is there shouldn't be many > nodes at only a few control for which we can narrow the type down). As > a consequence, the types'[n, c] table is implemented with: > > - At c, narrowed down types are stored in a GrowableArray. Each entry > records the previous type at idom(c) and the narrowed down type at > c. > > - The GrowableArray of type updates is recorded in a hash table > indexed by c. If there's no update at c, there's no entry in the > hash table. > > This pass operates in 2 steps: > > - it first iterates over the graph looking for conditions that narrow > the types of some nodes and propagate type updates to uses until a > fix point. > > - it transforms the graph so newly found constant nodes are folded. > > > The new pass is run on every loop opts. There are a couple rea... I realize this is complicated to review and that it's a risky change. Maybe first steps could be to: - agree on whether this is worth pursuing - decide whether to integrate c2 changes that are required for this to be stable but not directly related to this change separately. I could start preparing PRs for them then. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14586#issuecomment-1631018714 From roland at openjdk.org Tue Jul 11 16:02:27 2023 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 11 Jul 2023 16:02:27 GMT Subject: Integrated: 8303279: C2: crash in SubTypeCheckNode::sub() at IGVN split if In-Reply-To: References: Message-ID: On Tue, 27 Jun 2023 14:40:49 GMT, Roland Westrelin wrote: > The crash occurs because at split if during IGVN, a `SubTypeCheck` is > created with null as input. That happens because the control path the > `SubTypeCheck` is cloned for is dead. To fix that I propose delaying > split if until dead paths are collapsed. > > I added an assert to check a nullable first input to `SubTypeCheck` > nodes (which should be impossible because it should be null > checked). When I ran testing, a number of cases showed up with known > non null values non properly marked as non null. I fixed them. This pull request has now been integrated. Changeset: caadad4f Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/caadad4fdc78799dab2d492dba9b9f74b22d036e Stats: 110 lines in 7 files changed: 94 ins; 4 del; 12 mod 8303279: C2: crash in SubTypeCheckNode::sub() at IGVN split if Reviewed-by: kvn, vlivanov, thartmann, simonis ------------- PR: https://git.openjdk.org/jdk/pull/14678 From kvn at openjdk.org Tue Jul 11 16:07:55 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 11 Jul 2023 16:07:55 GMT Subject: RFR: 8308103: Massive (up to ~30x) increase in C2 compilation time since JDK 17 In-Reply-To: References: Message-ID: On Fri, 30 Jun 2023 13:23:38 GMT, Roland Westrelin wrote: > A long chain of nodes are sunk out of a loop. Every time a node is > moved out of the loop, a cast is created to pin the node out of the > loop. When its input is next sunk, the cast is removed (the cast is > replaced by its input) and a new cast is created. Some nodes on the > chain have 2 other nodes in the chain as uses. When such a node is > sunk, 2 cast nodes are created, one for each use. So as the compiler > moves forward in the chain, the number of cast to remove grows. From > some profiling, removing those casts is what takes a lot of time. > > The fix I propose is, when a node is processed, to check whether a > cast at the out of loop control was already created for that node and > to reuse it. > > The test case takes 6 minutes when I run it without the fix and 3 > seconds with it. Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14732#pullrequestreview-1524633607 From kvn at openjdk.org Tue Jul 11 16:07:56 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 11 Jul 2023 16:07:56 GMT Subject: RFR: 8308103: Massive (up to ~30x) increase in C2 compilation time since JDK 17 In-Reply-To: References: Message-ID: On Tue, 11 Jul 2023 14:18:09 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/loopopts.cpp line 1704: >> >>> 1702: cast = prev; >>> 1703: } else { >>> 1704: register_new_node(cast, x_ctrl); >> >> Can you move creation of `cast` here so you don't need to destroy it in case of previous cast existance? >> Or it is possible that `ConstraintCastNode::make_cast_for_type() can return `null`? > > Thanks for looking at this, Vladimir. > I'm not sure I understand what you're suggesting. Is it to not allocate a new node so it doesn't have to be destroyed if an identical node exist? But without a node it's not possible to rely on IGVN hashing? You are right. Using hash to look for existing node is smart. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14732#discussion_r1259956220 From duke at openjdk.org Tue Jul 11 16:18:31 2023 From: duke at openjdk.org (Swati Sharma) Date: Tue, 11 Jul 2023 16:18:31 GMT Subject: Integrated: 8311178: JMH tests don't scale well when sharing output buffers In-Reply-To: References: Message-ID: <1B7eIPZb3Ih_Ep0FEkvR-GRisJ3ZytMXCixJ_YB63ik=.913c26b7-41f4-49a3-a8d2-9b42cdd668d6@github.com> On Sat, 1 Jul 2023 07:53:17 GMT, Swati Sharma wrote: > The below benchmark files have scaling issues due to cache contention and leads to poor scaling when run on multiple threads. The patch sets the scope from benchmark level to thread level to fix the issue: > - org/openjdk/bench/java/io/DataOutputStreamTest.java > - org/openjdk/bench/java/lang/ArrayCopyObject.java > - org/openjdk/bench/java/lang/ArrayFiddle.java > - org/openjdk/bench/java/time/format/DateTimeFormatterBench.java > - org/openjdk/bench/jdk/incubator/vector/IndexInRangeBenchmark.java > - org/openjdk/bench/jdk/incubator/vector/MemorySegmentVectorAccess.java > - org/openjdk/bench/jdk/incubator/vector/StoreMaskedBenchmark.java > - org/openjdk/bench/jdk/incubator/vector/StoreMaskedIOOBEBenchmark.java > - org/openjdk/bench/vm/compiler/ArrayFill.java > - org/openjdk/bench/vm/compiler/IndexVector.java > > Also removing the static scope for variables in org/openjdk/bench/jdk/incubator/vector/VectorFPtoIntCastOperations.java for better scaling. > > Please review and share your feedback. > > Thanks, > Swati This pull request has now been integrated. Changeset: a03a3a43 Author: Swati Sharma Committer: Sandhya Viswanathan URL: https://git.openjdk.org/jdk/commit/a03a3a43bb16ddc7df78f64e07db823224bde6fb Stats: 17 lines in 12 files changed: 0 ins; 0 del; 17 mod 8311178: JMH tests don't scale well when sharing output buffers Co-authored-by: Vladimir Ivanov Reviewed-by: ecaspole, sviswanathan, mli ------------- PR: https://git.openjdk.org/jdk/pull/14746 From kvn at openjdk.org Tue Jul 11 16:37:12 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 11 Jul 2023 16:37:12 GMT Subject: RFR: 8275202: C2: optimize out more redundant conditions In-Reply-To: <978cgwy3Nb_x7yU6jZz0f6zhTBZfphstisAkBf1Vktc=.283d06eb-4f79-40cf-b8dd-a9c230e59902@github.com> References: <978cgwy3Nb_x7yU6jZz0f6zhTBZfphstisAkBf1Vktc=.283d06eb-4f79-40cf-b8dd-a9c230e59902@github.com> Message-ID: On Wed, 21 Jun 2023 12:47:26 GMT, Roland Westrelin wrote: > This change adds a new loop opts pass to optimize redundant conditions > such as the second one in: > > > if (i < 10) { > if (i < 42) { > > > In the branch of the first if, the type of i can be narrowed down to > [min_jint, 9] which can then be used to constant fold the second > condition. > > The compiler already keeps track of type[n] for every node in the > current compilation unit. That's not sufficient to optimize the > snippet above though because the type of i can only be narrowed in > some sections of the control flow (that is a subset of all > controls). The solution is to build a new table that tracks the type > of n at every control c > > > type'[n, root] = type[n] // initialized from igvn's type table > type'[n, c] = type[n, idom(c)] > > > This pass iterates over the CFG looking for conditions such as: > > > if (i < 10) { > > > that allows narrowing the type of i and updates the type' table > accordingly. > > At a region r: > > > type'[n, r] = meet(type'[n, r->in(1)], type'[n, r->in(2)]...) > > > For a Phi phi at a region r: > > > type'[phi, r] = meet(type'[phi->in(1), r->in(1)], type'[phi->in(2), r->in(2)]...) > > > Once a type is narrowed, uses are enqueued and their types are > computed by calling the Value() methods. If a use's type is narrowed, > it's recorded at c in the type' table. Value() methods retrieve types > from the type table, not the type' table. To address that issue while > leaving Value() methods unchanged, before calling Value() at c, the > type table is updated so: > > > type[n] = type'[n, c] > > > An exception is for Phi::Value which needs to retrieve the type of > nodes are various controls: there, a new type(Node* n, Node* c) > method is used. > > For most n and c, type'[n, c] is likely the same as type[n], the type > recorded in the global igvn table (that is there shouldn't be many > nodes at only a few control for which we can narrow the type down). As > a consequence, the types'[n, c] table is implemented with: > > - At c, narrowed down types are stored in a GrowableArray. Each entry > records the previous type at idom(c) and the narrowed down type at > c. > > - The GrowableArray of type updates is recorded in a hash table > indexed by c. If there's no update at c, there's no entry in the > hash table. > > This pass operates in 2 steps: > > - it first iterates over the graph looking for conditions that narrow > the types of some nodes and propagate type updates to uses until a > fix point. > > - it transforms the graph so newly found constant nodes are folded. > > > The new pass is run on every loop opts. There are a couple rea... 1. Yes, it is worth pursuing. This is obvious missing optimization opportunity. 2. Yes, not directly related changes could be pushed separately. Do you know why CCP phase can't help for simple cases (not in loop)? Can we fix CCP to handle simple cases? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14586#issuecomment-1631140471 From kvn at openjdk.org Tue Jul 11 17:42:02 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 11 Jul 2023 17:42:02 GMT Subject: RFR: 8311691: C2: Remove legacy code related to PostLoopMultiversioning In-Reply-To: References: Message-ID: On Tue, 11 Jul 2023 09:56:51 GMT, Pengfei Li wrote: > As discussed in JDK-8308994, we are working on re-implementation of post loop vectorization and planning to refactor current SuperWord code. As nobody is using or maintaining the old implementation now, to make the refactoring work easier, we propose to remove the legacy code of the old implementation first. > > This patch removes all code realted to `PostLoopMultiversioning` inside and outside SuperWord. After the removal, `SLP_extract()` in SuperWord should only work on main loops. So we also removed all `is_main_loop()` checks inside and added assertions instead. > > Tested with hotspot::hotspot_all_no_apps, jdk tier1~3, langtools tier1 and 100k fuzzer tests on x86 and AArch64, no issue is found. Good. Thank you for cleaning this. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14824#pullrequestreview-1524814976 From duke at openjdk.org Tue Jul 11 20:20:30 2023 From: duke at openjdk.org (Chad Rakoczy) Date: Tue, 11 Jul 2023 20:20:30 GMT Subject: RFR: 8311813: C1: Uninitialized PhiResolver::_loop field Message-ID: [JDK-8311813](https://bugs.openjdk.org/browse/JDK-8311813) Initialize `PhiResolver::_loop` field to `nullptr` Additional testing: - [x] Linux x86_64 fastdebug `tier2` - [x] Linux x86_64 release `tier2` - [x] Linux x86_64 fastdebug `gtest:all` - [x] Linux x86_64 release `gtest:all` - [x] Linux x86_64 fastdebug `test/hotspot/jtreg/compiler/c1` - [x] Linux x86_64 release `test/hotspot/jtreg/compiler/c1` ------------- Commit messages: - 8311813: C1: Initialize PhiResolver::_loop field to nullptr Changes: https://git.openjdk.org/jdk/pull/14819/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14819&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8311813 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14819.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14819/head:pull/14819 PR: https://git.openjdk.org/jdk/pull/14819 From pli at openjdk.org Wed Jul 12 02:04:59 2023 From: pli at openjdk.org (Pengfei Li) Date: Wed, 12 Jul 2023 02:04:59 GMT Subject: RFR: 8311691: C2: Remove legacy code related to PostLoopMultiversioning In-Reply-To: References: Message-ID: On Tue, 11 Jul 2023 09:56:51 GMT, Pengfei Li wrote: > As discussed in JDK-8308994, we are working on re-implementation of post loop vectorization and planning to refactor current SuperWord code. As nobody is using or maintaining the old implementation now, to make the refactoring work easier, we propose to remove the legacy code of the old implementation first. > > This patch removes all code realted to `PostLoopMultiversioning` inside and outside SuperWord. After the removal, `SLP_extract()` in SuperWord should only work on main loops. So we also removed all `is_main_loop()` checks inside and added assertions instead. > > Tested with hotspot::hotspot_all_no_apps, jdk tier1~3, langtools tier1 and 100k fuzzer tests on x86 and AArch64, no issue is found. Thanks Vladimir for approving this. May I have another review? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14824#issuecomment-1631737551 From thartmann at openjdk.org Wed Jul 12 05:28:54 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 12 Jul 2023 05:28:54 GMT Subject: RFR: 8308103: Massive (up to ~30x) increase in C2 compilation time since JDK 17 In-Reply-To: References: Message-ID: On Fri, 30 Jun 2023 13:23:38 GMT, Roland Westrelin wrote: > A long chain of nodes are sunk out of a loop. Every time a node is > moved out of the loop, a cast is created to pin the node out of the > loop. When its input is next sunk, the cast is removed (the cast is > replaced by its input) and a new cast is created. Some nodes on the > chain have 2 other nodes in the chain as uses. When such a node is > sunk, 2 cast nodes are created, one for each use. So as the compiler > moves forward in the chain, the number of cast to remove grows. From > some profiling, removing those casts is what takes a lot of time. > > The fix I propose is, when a node is processed, to check whether a > cast at the out of loop control was already created for that node and > to reuse it. > > The test case takes 6 minutes when I run it without the fix and 3 > seconds with it. The fix looks good to me but I still have a hard time to comprehend that this leads to a 30x increase in compilation time. And I'm worried that we have similar issues in other code. As a follow-up, could we have `PhaseIdealLoop::register_new_node` check the hash and return an existing node? ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14732#pullrequestreview-1525540555 From thartmann at openjdk.org Wed Jul 12 05:29:57 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 12 Jul 2023 05:29:57 GMT Subject: RFR: 8311813: C1: Uninitialized PhiResolver::_loop field In-Reply-To: References: Message-ID: <8VXeeyMEBms5t-JL05DoHghpBdrUkrG0b3ALB2dWf98=.a0962e45-2e39-42bb-8f2c-29ebd975dbe9@github.com> On Mon, 10 Jul 2023 23:47:56 GMT, Chad Rakoczy wrote: > [JDK-8311813](https://bugs.openjdk.org/browse/JDK-8311813) > > Initialize `PhiResolver::_loop` field to `nullptr` > > Additional testing: > - [x] Linux x86_64 fastdebug `tier2` > - [x] Linux x86_64 release `tier2` > - [x] Linux x86_64 fastdebug `gtest:all` > - [x] Linux x86_64 release `gtest:all` > - [x] Linux x86_64 fastdebug `test/hotspot/jtreg/compiler/c1` > - [x] Linux x86_64 release `test/hotspot/jtreg/compiler/c1` Looks good and trivial. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14819#pullrequestreview-1525541458 From shade at openjdk.org Wed Jul 12 07:30:04 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 12 Jul 2023 07:30:04 GMT Subject: RFR: 8311813: C1: Uninitialized PhiResolver::_loop field In-Reply-To: References: Message-ID: On Mon, 10 Jul 2023 23:47:56 GMT, Chad Rakoczy wrote: > [JDK-8311813](https://bugs.openjdk.org/browse/JDK-8311813) > > Initialize `PhiResolver::_loop` field to `nullptr` > > Additional testing: > - [x] Linux x86_64 fastdebug `tier2` > - [x] Linux x86_64 release `tier2` > - [x] Linux x86_64 fastdebug `gtest:all` > - [x] Linux x86_64 release `gtest:all` > - [x] Linux x86_64 fastdebug `test/hotspot/jtreg/compiler/c1` > - [x] Linux x86_64 release `test/hotspot/jtreg/compiler/c1` Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14819#pullrequestreview-1525692733 From gcao at openjdk.org Wed Jul 12 08:45:34 2023 From: gcao at openjdk.org (Gui Cao) Date: Wed, 12 Jul 2023 08:45:34 GMT Subject: RFR: 8311923: TestIRMatching.java fails on RISC-V Message-ID: Hi, we are experiencing test failures in test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestIRMatching.java using fastdebug: One or more @IR rules failed: Failed IR Rules (1) of Methods (1) ---------------------------------- 1) Method "public java.lang.Object[] ir_framework.tests.CheckCastArray.arrayCopy(java.lang.Object[],java.lang.Class)" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeatureOr={}, applyIfCPUFeature={}, counts={}, failOn={"_#CHECKCAST_ARRAYCOPY#_"}, applyIfAnd={}, applyIfOr={}, applyIfNot={})" > Phase "PrintOptoAssembly": - failOn: Graph contains forbidden nodes: * Constraint 1: "(.*((?i:call_leaf_nofp,runtime)|CALL,\\s?runtime leaf nofp|BCTRL.*.leaf call).*checkcast_arraycopy.*)" - Matched forbidden node: * 15a + CALL, runtime leaf nofp 0x0000003f7fbd9600 #@CallLeafNoFPDirect checkcast_arraycopy >>> Check stdout for compilation output of the failed methods After troubleshooting, the problem is related to the definition of the matching rules for the CHECKCAST_ARRAY, CHECKCAST_ARRAY_OF fields in test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java: public static final String CHECKCAST_ARRAY = PREFIX + "CHECKCAST_ARRAY" + POSTFIX; static { String regex = "(((?i:cmp|CLFI|CLR).*precise \[.*:|.*(?i:mov|or).*precise \[.*:.*\\R.*(cmp|CMP|CLR))" + END; optoOnly(CHECKCAST_ARRAY, regex); } public static final String CHECKCAST_ARRAY_OF = COMPOSITE_PREFIX + "CHECKCAST_ARRAY_OF" + POSTFIX; static { String regex = "(((?i:cmp|CLFI|CLR).*precise \[.*" + IS_REPLACED + ":|.*(?i:mov|or).*precise \[.*" + IS_REPLACED + ":.*\\R.*(cmp|CMP|CLR))" + END; optoOnly(CHECKCAST_ARRAY_OF, regex); } This rule is used to match the Opto compilation log of the `array()/arrayCopy()` method of the subclass CheckCastArray in test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestIRMatching.java. If the match does not work, the above test fails. The `array()` part of the Opto log for CheckCastArray on the aarch64 platform is as follows: 06c B2: # out( B5 B3 ) <- in( B1 ) Freq: 0.999999 06c + mov R12, narrowklass: precise [ir_framework/tests/MyClass: 0x0000ffff58420808 * (java/lang/Cloneable,java/io/Serializable): :Constant:exact * # compressed klass ptr 074 + cmp R10, R12 // compressed ptr 078 bne B5 # unsigned P=0.100000 C=-1.000000 The `array()` part of the Opto log for CheckCastArray on the riscv platform is as follows: 054 B2: # out( B5 B3 ) <- in( B1 ) Freq: 0.999999 054 + mv R29, narrowklass: precise [ir_framework/tests/MyClass: 0x0000003f34437df8 * (java/lang/Cloneable,java/io/Serializable): :Constant:exact * # compressed klass ptr, #@loadConNKlass 062 + bne R7, R29, B5 #@cmpN_branch P=0.100000 C=-1.000000 >From the above Opto log, we can see that the match rule of CHECKCAST_ARRAY in test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java can be matched with aarch64. However, on the riscv platform, since the mv instruction is generated, it does not correspond to the matching rule, so the test case fails. To solve this problem, we modified the match rules defined by CHECKCAST_ARRAY, CHECKCAST_ARRAY_OF in test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java to include the mv directive in the match rules. Same problem with ALLOC_ARRAY, ALLOC_ARRAY_OF in test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java public static final String ALLOC_ARRAY = PREFIX + "ALLOC_ARRAY" + POSTFIX; static { String optoRegex = "(.*precise \[.*\\R((.*(?i:mov|xor|nop|spill).*|\\s*|.*(LGHI|LI).*)\\R)*.*(?i:call,static).*wrapper for: _new_array_Java" + END; allocNodes(ALLOC_ARRAY, "AllocateArray", optoRegex); } public static final String ALLOC_ARRAY_OF = COMPOSITE_PREFIX + "ALLOC_ARRAY_OF" + POSTFIX; static { String regex = "(.*precise \[.*" + IS_REPLACED + ":.*\\R((.*(?i:mov|xorl|nop|spill).*|\\s*|.*(LGHI|LI).*)\\R)*.*(?i:call,static).*wrapper for: _new_array_Java" + END; optoOnly(ALLOC_ARRAY_OF, regex); } This rule is used to match the compilation log of the `allocArray()` method of the subclass AllocArray in test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestIRMatching.java, and if it doesn't match it also fails. test fails. The `allocArray()` part of the Opto log for AllocArray on the aarch64 platform is as follows: 1a8 B14: # out( B17 B15 ) <- in( B1 ) Freq: 0.000100017 1a8 + mov R1, precise [ir_framework/tests/MyClass: 0x0000ffff244beb58 * (java/lang/Cloneable,java/io/Serializable): :Constant:exact * # ptr 1b4 call,static 0x0000ffff6fb9c540 // ==> wrapper for: _new_array_Java # ir_framework.tests.AllocArray::allocArray @ bci:2 (line 930) L[0]=_ STK[0]=R29 # OopMap {rfp=Oop off=440/0x1b8} The `allocArray()` part of the Opto log for AllocArray on the riscv platform is as follows: 16a B14: # out( B17 B15 ) <- in( B1 ) Freq: 0.000100017 16a + mv R11, precise [ir_framework/tests/MyClass: 0x00007fff4c34c018 * (java/lang/Cloneable,java/io/Serializable): :Constant:exact * # ptr, #@loadConP 182 + li R12, #2 # int, #@loadConI 184 CALL,static 0x00007fff9bb74840 #@CallStaticJavaDirect wrapper for: _new_array_Java # ir_framework.tests.AllocArray::allocArray @ bci:2 (line 930) L[0]=_ STK[0]=R8 # OopMap {fp=Oop off=392/0x188} As we can see in the above logs, the aarch64 log matches the empty line under precise, which also matches the ALLOC_ARRAY_OF rule in test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java, but in the riscv log. precise is followed by the li, which doesn't match the rule and therefore also causes the test case to fail. To solve this problem, we firstly modified the loadConI, loadConL nodes in riscv.ad, and changed the log message of li into mv, and secondly, we modified the ALLOC_ARRAY and ALLOC_ARRAY_OF match rule in test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java, and added the mv directive to the match rule. ## Testing: qemu system and unmatched board: - [x] test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestIRMatching.java (fastdebug) ------------- Commit messages: - 8311923: TestIRMatching.java fails on RISC-V Changes: https://git.openjdk.org/jdk/pull/14848/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14848&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8311923 Stats: 9 lines in 2 files changed: 0 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/14848.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14848/head:pull/14848 PR: https://git.openjdk.org/jdk/pull/14848 From cslucas at openjdk.org Wed Jul 12 15:12:23 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 12 Jul 2023 15:12:23 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v21] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: On Thu, 6 Jul 2023 13:06:30 GMT, Cesar Soares Lucas wrote: >> Can I please get reviews for this PR? >> >> The most common and frequent use of NonEscaping Phis merging object allocations is for debugging information. The two graphs below show numbers for Renaissance and DaCapo benchmarks - similar results are obtained for all other applications that I tested. >> >> With what frequency does each IR node type occurs as an allocation merge user? I.e., if the same node type uses a Phi N times the counter is incremented by N: >> >> ![image](https://user-images.githubusercontent.com/2249648/222280517-4dcf5871-2564-4207-b49e-22aee47fa49d.png) >> >> What are the most common users of allocation merges? I.e., if the same node type uses a Phi N times the counter is incremented by 1: >> >> ![image](https://user-images.githubusercontent.com/2249648/222280608-ca742a4e-1622-4e69-a778-e4db6805ea02.png) >> >> This PR adds support scalar replacing allocations participating in merges used as debug information OR as a base for field loads. I plan to create subsequent PRs to enable scalar replacement of merges used by other node types (CmpP is next on the list) subsequently. >> >> The approach I used for _rematerialization_ is pretty straightforward. It consists basically of the following. 1) New IR node (suggested by V. Kozlov), named SafePointScalarMergeNode, to represent a set of SafePointScalarObjectNode; 2) Each scalar replaceable input participating in a merge will get a SafePointScalarObjectNode like if it weren't part of a merge. 3) Add a new Class to support the rematerialization of SR objects that are part of a merge; 4) Patch HotSpot to be able to serialize and deserialize debug information related to allocation merges; 5) Patch C2 to generate unique types for SR objects participating in some allocation merges. >> >> The approach I used for _enabling the scalar replacement of some of the inputs of the allocation merge_ is also pretty straightforward: call `MemNode::split_through_phi` to, well, split AddP->Load* through the merge which will render the Phi useless. >> >> I tested this with JTREG tests tier 1-4 (Windows, Linux, and Mac) and didn't see regression. I also experimented with several applications and didn't see any failure. I also ran tests with "-ea -esa -Xbatch -Xcomp -XX:+UnlockExperimentalVMOptions -XX:-TieredCompilation -server -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP" and didn't observe any related failures. > > Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: > > - Merge branch 'openjdk:master' into rematerialization-of-merges > - Addressing PR feedback. > - Merge remote-tracking branch 'origin/master' into rematerialization-of-merges > - Merge branch 'openjdk:master' into rematerialization-of-merges > - Rome minor refactorings. > - Merge remote-tracking branch 'origin/master' into rematerialization-of-merges > Catching up with master. > - Address PR review 6: debug format output & some refactoring. > - Catching up with master branch. > > Merge remote-tracking branch 'origin/master' into rematerialization-of-merges > - Address PR review 6: refactoring around rematerialization & improve test cases. > - Address PR review 5: refactor on rematerialization & add tests. > - ... and 12 more: https://git.openjdk.org/jdk/compare/97e99f01...25b683d6 Thank you all for reviewing this PR! Your feedback made it much better! ------------- PR Comment: https://git.openjdk.org/jdk/pull/12897#issuecomment-1632721475 From epeter at openjdk.org Wed Jul 12 15:24:43 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 12 Jul 2023 15:24:43 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v23] In-Reply-To: References: Message-ID: > For some changes to `SuperWord`, and maybe auto-vectorization in general, I want to strengthen the IR Framework. > > **Motivation** > I want to not just find the relevant IR nodes, but also assert that they have the maximal length that they could have on the respective platform (given the CPU features and `MaxVectorSize`). Without this verification it is possible that a future change leads to a regression where we still vectorize but at shorter vector widths as before - leading to performance loss. > > **How to use it** > > All `IRNode`s in `test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java` that are created with `vectorNode` are now all matched with their `type` and `size`. The regex might now look something like this: > > `"(\d+(\s){2}(VectorCastF2X.*)+(\s){2}===.*vector[A-Za-z][8]:{int})"` > which would match with IR nodes dumped like that: > `1150 VectorCastF2X === _ 1151 [[ 1146 ]] #vectory[8]:{int} ...` > > The goal was to keep it simple and straight forward. In most cases, you can just use the nodes as before, and implicitly we now check for maximal size automatically. However, in some cases we want to ensure there is no or only a limited number of nodes (`failOn` or comparison `<` or `<=` or `=0`) - in those cases we usually want to make sure there is not any node of any size, so we match with any size by default. The size can also explicitly be constrained using `IRNode.VECTOR_SIZE`. > > Some examples: > 1. `@IR(counts = {IRNode.LOAD_VECTOR_I, " >0 "})` -> search for a `LoadVector` node with `type` `int`, and maximal `size` possible on the machine (limited by CPU features and `MaxVectorSize`). This is the most common use case. > 2. `@IR(failOn = { IRNode.LOAD_VECTOR_L, IRNode.STORE_VECTOR })` -> fail if there is a `LoadVector` with type `long`, of `any` size. > 3. `@IR(counts = { IRNode.XOR_VI, IRNode.VECTOR_SIZE_4, " > 0 "})` -> find at least one `XorV` node with type `int` and exactly `4` elements. Useful for VectorAPI when the vector species is fixed. > 4. `@IR(counts = { IRNode.LOAD_VECTOR_D, IRNode.VECTOR_SIZE + "min(4, max_double)", " >0 " })` -> search for a `LoadVector` node with `type` `double`, and `size` exactly equals to `min(4, max_double)` (so 4 elements, or if the hardware allows fewer `doubles`, then that number). > 5. `@IR(counts = { IRNode.ABS_VF, IRNode.VECTOR_SIZE + "min(LoopMaxUnroll, max_float)", ">= 1" })` -> find at least one `AbsV` nodes with type `float`, and the `size` exactly equals to the smaller of `LoopMaxUnroll` or the maximal size allow... Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 67 commits: - Merge branch 'master' into JDK-8310308 - Fix with canTrustVectorSize for Cascade Lake - TestSpillTheBeans.java - print VMInfo from Test VM - merge from master, manual merge for VectorLogicalOpIdentityTest.java - Response to Tobias' review - more for Christian's reviews - Apply suggestions from code review Co-authored-by: Christian Hagedorn - fix TestUnorderedReductionPartialVectorization.java - Fix 2 IR framework tests - ... and 57 more: https://git.openjdk.org/jdk/compare/aa7367f1...78c3c5cd ------------- Changes: https://git.openjdk.org/jdk/pull/14539/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14539&range=22 Stats: 3409 lines in 67 files changed: 1336 ins; 21 del; 2052 mod Patch: https://git.openjdk.org/jdk/pull/14539.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14539/head:pull/14539 PR: https://git.openjdk.org/jdk/pull/14539 From roland at openjdk.org Wed Jul 12 15:57:14 2023 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 12 Jul 2023 15:57:14 GMT Subject: RFR: 8275202: C2: optimize out more redundant conditions In-Reply-To: References: <978cgwy3Nb_x7yU6jZz0f6zhTBZfphstisAkBf1Vktc=.283d06eb-4f79-40cf-b8dd-a9c230e59902@github.com> Message-ID: On Tue, 11 Jul 2023 16:34:15 GMT, Vladimir Kozlov wrote: > Do you know why CCP phase can't help for simple cases (not in loop)? Can we fix CCP to handle simple cases? CCP and IGVN keep track of types in a global table. In the example above: if (i < 10) { if (i < 42) { `i`'s type is `[min, 9]` only in the if branch so that `[min,9]` can't be stored in a global table that contains types that must be valid everywhere in the IR graph. The way c2 handles control dependent types is with cast nodes. The change I propose is equivalent to having cast nodes on every projection of every `CmpI`/`CmpU`/`CmpL`/`CmpL` for both inputs of the cmp node and then have logic in igvn to narrow the type of the cast. Instead of using cast nodes, that change keeps track of types in side tables. I think it's much better than trying to make things work with cast nodes because it's a lot less invasive and disruptive to the rest of c2. It's a single new standalone pass contained in a couple files that runs during loop opts. What's your opinion on the compile time overhead for this? Do you find the number I mention too high? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14586#issuecomment-1632797335 From qamai at openjdk.org Wed Jul 12 16:08:32 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 12 Jul 2023 16:08:32 GMT Subject: RFR: 8282365: Optimize divideUnsigned and remainderUnsigned for constants [v16] In-Reply-To: References: Message-ID: On Mon, 3 Jul 2023 10:35:06 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 50 commits: >> >> - missing java_negate >> - Merge branch 'master' into unsignedDiv >> - whitespace >> - move asserts to use sites >> - windows complaints >> - compiler complaints >> - undefined internal linkage >> - add tests, special casing large shift >> - draft >> - Merge branch 'master' into unsignedDiv >> - ... and 40 more: https://git.openjdk.org/jdk/compare/5b147eb5...eb1f5dd9 > > src/hotspot/share/opto/divnode.cpp line 39: > >> 37: #include "utilities/powerOfTwo.hpp" >> 38: >> 39: // Portions of code courtesy of Clifford Click > > Not sure if this line should be removed? I will revert that line > src/hotspot/share/opto/divnode.cpp line 188: > >> 186: max_dividend = max_juint; >> 187: } >> 188: if (julong(magic_const) <= max_julong / max_dividend) { > > Could `max_dividend` ever be `zero`? I guess only if the dividend was exactly `zero`, in which case we should probably not end up here, or is that somehow possible? I will change this to `max_dividend == 0 || julong(magic_const) <= max_julong / max_dividend` to be safe. > src/hotspot/share/opto/divnode.cpp line 191: > >> 189: // No overflow here, just do the transformation >> 190: if (shift_const == 32) { >> 191: q = phase->intcon(0); > > Would it not be nicer to handle this special case directly in the `URShiftLNode`? Just replace it during `Value` with zero, if the shift constant is too large. No, because `x >> 64 == x` in int64 arithmetic, while the semantics we need here is integer arithmetic, so we need special handling for this case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1261396413 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1261400137 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1261405044 From qamai at openjdk.org Wed Jul 12 16:14:23 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 12 Jul 2023 16:14:23 GMT Subject: RFR: 8282365: Optimize divideUnsigned and remainderUnsigned for constants [v16] In-Reply-To: References: Message-ID: On Mon, 3 Jul 2023 12:22:53 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/divnode.cpp line 441: >> >>> 439: jlong magic_const; >>> 440: jint shift_const; >>> 441: bool magic_const_ovf; >> >> `does_magic_const_overflow` Would that work too? > > I'm not sure exactly what this boolean means, and it is making it diffucult to undersand the logic below The constant is a u65, this boolean indicates whether the constant does not fit into a u64. We need to compute `[x * M / 2**s]`. If `M` is not a u64 then it is `M1 + 2**64`, which results in the mathematical transformation I wrote below around line 465. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1261413399 From qamai at openjdk.org Wed Jul 12 16:20:35 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 12 Jul 2023 16:20:35 GMT Subject: RFR: 8282365: Optimize divideUnsigned and remainderUnsigned for constants [v16] In-Reply-To: References: Message-ID: On Mon, 3 Jul 2023 12:25:04 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 50 commits: >> >> - missing java_negate >> - Merge branch 'master' into unsignedDiv >> - whitespace >> - move asserts to use sites >> - windows complaints >> - compiler complaints >> - undefined internal linkage >> - add tests, special casing large shift >> - draft >> - Merge branch 'master' into unsignedDiv >> - ... and 40 more: https://git.openjdk.org/jdk/compare/5b147eb5...eb1f5dd9 > > src/hotspot/share/opto/divnode.cpp line 462: > >> 460: } >> 461: >> 462: // Just do the minimum for now > > Minimum of what? Not sure what you mean I mean we can do better, we are trying to prove that the multiplication does not overflow a u128, since the constant is a u65 if the dividend is a u63 then the inequality holds, we can have a more strict bound since the constant is known already. > src/hotspot/share/opto/divnode.cpp line 931: > >> 929: const Type* t = phase->type(in(2)); >> 930: if(t == TypeInt::ONE) { // Identity? >> 931: return nullptr; // Skip it > > Does `Value` handle this? `x / 1 = x` so I don't think `Value` can handle it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1261418367 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1261422585 From qamai at openjdk.org Wed Jul 12 16:20:36 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 12 Jul 2023 16:20:36 GMT Subject: RFR: 8282365: Optimize divideUnsigned and remainderUnsigned for constants [v16] In-Reply-To: References: Message-ID: On Wed, 12 Jul 2023 16:15:17 GMT, Quan Anh Mai wrote: >> src/hotspot/share/opto/divnode.cpp line 462: >> >>> 460: } >>> 461: >>> 462: // Just do the minimum for now >> >> Minimum of what? Not sure what you mean > > I mean we can do better, we are trying to prove that the multiplication does not overflow a u128, since the constant is a u65 if the dividend is a u63 then the inequality holds, we can have a more strict bound since the constant is known already. I will add more clarification to this in the comment, basically the operation is `x * (M1 + 2**64) / 2**s = ((x * M1) / 2**64 + 1) / 2**(s - 64)`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1261420568 From qamai at openjdk.org Wed Jul 12 16:28:29 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 12 Jul 2023 16:28:29 GMT Subject: RFR: 8282365: Optimize divideUnsigned and remainderUnsigned for constants [v16] In-Reply-To: References: Message-ID: On Mon, 3 Jul 2023 12:28:49 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 50 commits: >> >> - missing java_negate >> - Merge branch 'master' into unsignedDiv >> - whitespace >> - move asserts to use sites >> - windows complaints >> - compiler complaints >> - undefined internal linkage >> - add tests, special casing large shift >> - draft >> - Merge branch 'master' into unsignedDiv >> - ... and 40 more: https://git.openjdk.org/jdk/compare/5b147eb5...eb1f5dd9 > > src/hotspot/share/opto/divnode.cpp line 906: > >> 904: } >> 905: >> 906: // TODO: Improve Value inference of both signed and unsigned division > > Did you miss a `TODO` here? Currently, we only do `Value` for constant folding, we can restrict the value range of the division in a more rigorous way, e.g `min_jint / 3 < x / y < max_jint / 3` if `y > 3`. > src/hotspot/share/opto/divnode.cpp line 1419: > >> 1417: } >> 1418: juint con = ti->get_con(); >> 1419: const Type* u = phase->type(in(1)); > > This is a constant foldable bailout? Why do you do it earlier here? > > Generally, I'm starting to wonder if all this code duplication makes sense in all the `Ideal` methods? Yes we bailout to avoid having to do all the calculations. Let me think how to avoid all these duplications. > src/hotspot/share/opto/divnode.cpp line 1428: > >> 1426: return new AndINode(in(1), phase->intcon(con - 1)); >> 1427: } >> 1428: // TODO: This can be calculated directly, see https://arxiv.org/abs/1902.01961 > > Stranded `TODO`? A modulus can be calculated directly without going through the division transformation. Which reduces the cost a little bit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1261426552 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1261428309 PR Review Comment: https://git.openjdk.org/jdk/pull/9947#discussion_r1261430069 From sviswanathan at openjdk.org Wed Jul 12 16:56:55 2023 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 12 Jul 2023 16:56:55 GMT Subject: RFR: 8311691: C2: Remove legacy code related to PostLoopMultiversioning In-Reply-To: References: Message-ID: On Tue, 11 Jul 2023 09:56:51 GMT, Pengfei Li wrote: > As discussed in JDK-8308994, we are working on re-implementation of post loop vectorization and planning to refactor current SuperWord code. As nobody is using or maintaining the old implementation now, to make the refactoring work easier, we propose to remove the legacy code of the old implementation first. > > This patch removes all code realted to `PostLoopMultiversioning` inside and outside SuperWord. After the removal, `SLP_extract()` in SuperWord should only work on main loops. So we also removed all `is_main_loop()` checks inside and added assertions instead. > > Tested with hotspot::hotspot_all_no_apps, jdk tier1~3, langtools tier1 and 100k fuzzer tests on x86 and AArch64, no issue is found. Looks good to me as well. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14824#pullrequestreview-1526807176 From qamai at openjdk.org Wed Jul 12 17:50:13 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 12 Jul 2023 17:50:13 GMT Subject: RFR: 8282365: Optimize divideUnsigned and remainderUnsigned for constants [v15] In-Reply-To: References: Message-ID: On Mon, 3 Jul 2023 13:55:48 GMT, Emanuel Peter wrote: >> May I have a second review for this patch, please? > > @merykitty I just discussed the testing with @TobiHartmann . He just came across this test: > `test/hotspot/jtreg/compiler/c2/TestUnsignedByteCompare1.java`. > The cool thing is that you can "simulate" constants with `MethodHandles.constant`. At runtime apparently the invocation specualte-and-traps it to a constant value. That means you can just set a new value, it depopts, and hopefully eventually re-compiles with the next constants. > > You could easily set up one of these tests per node. Any maybe throw in some interesting ranges for the `dividend`. > > An interesting experiment would be to have a IR test that works with a random constant, and then have an IR rule that fails if we find a`div` node. At least for those cases where that should work. And then you can easily compare the div results with a non-compiled method that computes the same value. @eme64 Thanks a lot for taking a look at this patch, I will address your remaining comments soon. The basic idea of the transformation in `javaArithmetic.hpp` is to find `M` and `s` such that `x / c = floor(x * M / 2**s)` for every interesting value of `x`. The remaining transformation in `divnode.cpp` is to convert this calculation from integer arithmetic to modular arithmetic. This is easy if the representative in the congruence class of an operand is always equal to itself, in which case we can do the calculation directly. For other cases, we have to do additional calculation to take into consideration the difference between arithmetic calculations in 2 domains. ------------- PR Comment: https://git.openjdk.org/jdk/pull/9947#issuecomment-1632957417 From kvn at openjdk.org Wed Jul 12 18:57:20 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 12 Jul 2023 18:57:20 GMT Subject: RFR: 8275202: C2: optimize out more redundant conditions In-Reply-To: <978cgwy3Nb_x7yU6jZz0f6zhTBZfphstisAkBf1Vktc=.283d06eb-4f79-40cf-b8dd-a9c230e59902@github.com> References: <978cgwy3Nb_x7yU6jZz0f6zhTBZfphstisAkBf1Vktc=.283d06eb-4f79-40cf-b8dd-a9c230e59902@github.com> Message-ID: On Wed, 21 Jun 2023 12:47:26 GMT, Roland Westrelin wrote: > This change adds a new loop opts pass to optimize redundant conditions > such as the second one in: > > > if (i < 10) { > if (i < 42) { > > > In the branch of the first if, the type of i can be narrowed down to > [min_jint, 9] which can then be used to constant fold the second > condition. > > The compiler already keeps track of type[n] for every node in the > current compilation unit. That's not sufficient to optimize the > snippet above though because the type of i can only be narrowed in > some sections of the control flow (that is a subset of all > controls). The solution is to build a new table that tracks the type > of n at every control c > > > type'[n, root] = type[n] // initialized from igvn's type table > type'[n, c] = type[n, idom(c)] > > > This pass iterates over the CFG looking for conditions such as: > > > if (i < 10) { > > > that allows narrowing the type of i and updates the type' table > accordingly. > > At a region r: > > > type'[n, r] = meet(type'[n, r->in(1)], type'[n, r->in(2)]...) > > > For a Phi phi at a region r: > > > type'[phi, r] = meet(type'[phi->in(1), r->in(1)], type'[phi->in(2), r->in(2)]...) > > > Once a type is narrowed, uses are enqueued and their types are > computed by calling the Value() methods. If a use's type is narrowed, > it's recorded at c in the type' table. Value() methods retrieve types > from the type table, not the type' table. To address that issue while > leaving Value() methods unchanged, before calling Value() at c, the > type table is updated so: > > > type[n] = type'[n, c] > > > An exception is for Phi::Value which needs to retrieve the type of > nodes are various controls: there, a new type(Node* n, Node* c) > method is used. > > For most n and c, type'[n, c] is likely the same as type[n], the type > recorded in the global igvn table (that is there shouldn't be many > nodes at only a few control for which we can narrow the type down). As > a consequence, the types'[n, c] table is implemented with: > > - At c, narrowed down types are stored in a GrowableArray. Each entry > records the previous type at idom(c) and the narrowed down type at > c. > > - The GrowableArray of type updates is recorded in a hash table > indexed by c. If there's no update at c, there's no entry in the > hash table. > > This pass operates in 2 steps: > > - it first iterates over the graph looking for conditions that narrow > the types of some nodes and propagate type updates to uses until a > fix point. > > - it transforms the graph so newly found constant nodes are folded. > > > The new pass is run on every loop opts. There are a couple rea... Thank you for explanation. Yes, time increase is concerning. But I agree that it should run to fixed point. May be look on the case which consume a lot of time and investigate why. May be you have false positives when you look for narrowing type conditions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14586#issuecomment-1633047075 From duke at openjdk.org Wed Jul 12 19:27:14 2023 From: duke at openjdk.org (Chad Rakoczy) Date: Wed, 12 Jul 2023 19:27:14 GMT Subject: Integrated: 8311813: C1: Uninitialized PhiResolver::_loop field In-Reply-To: References: Message-ID: On Mon, 10 Jul 2023 23:47:56 GMT, Chad Rakoczy wrote: > [JDK-8311813](https://bugs.openjdk.org/browse/JDK-8311813) > > Initialize `PhiResolver::_loop` field to `nullptr` > > Additional testing: > - [x] Linux x86_64 fastdebug `tier2` > - [x] Linux x86_64 release `tier2` > - [x] Linux x86_64 fastdebug `gtest:all` > - [x] Linux x86_64 release `gtest:all` > - [x] Linux x86_64 fastdebug `test/hotspot/jtreg/compiler/c1` > - [x] Linux x86_64 release `test/hotspot/jtreg/compiler/c1` This pull request has now been integrated. Changeset: 489a32fe Author: Chad Rakoczy Committer: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/489a32fe40e2a2c539296d51d4ffc0abc036d33c Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8311813: C1: Uninitialized PhiResolver::_loop field Reviewed-by: thartmann, shade ------------- PR: https://git.openjdk.org/jdk/pull/14819 From pli at openjdk.org Thu Jul 13 01:51:18 2023 From: pli at openjdk.org (Pengfei Li) Date: Thu, 13 Jul 2023 01:51:18 GMT Subject: Integrated: 8311691: C2: Remove legacy code related to PostLoopMultiversioning In-Reply-To: References: Message-ID: On Tue, 11 Jul 2023 09:56:51 GMT, Pengfei Li wrote: > As discussed in JDK-8308994, we are working on re-implementation of post loop vectorization and planning to refactor current SuperWord code. As nobody is using or maintaining the old implementation now, to make the refactoring work easier, we propose to remove the legacy code of the old implementation first. > > This patch removes all code realted to `PostLoopMultiversioning` inside and outside SuperWord. After the removal, `SLP_extract()` in SuperWord should only work on main loops. So we also removed all `is_main_loop()` checks inside and added assertions instead. > > Tested with hotspot::hotspot_all_no_apps, jdk tier1~3, langtools tier1 and 100k fuzzer tests on x86 and AArch64, no issue is found. This pull request has now been integrated. Changeset: a38582e9 Author: Pengfei Li URL: https://git.openjdk.org/jdk/commit/a38582e941c0234e76d1dbea60c731c83d2c9977 Stats: 608 lines in 9 files changed: 7 ins; 564 del; 37 mod 8311691: C2: Remove legacy code related to PostLoopMultiversioning Reviewed-by: kvn, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/14824 From thartmann at openjdk.org Thu Jul 13 05:54:39 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 13 Jul 2023 05:54:39 GMT Subject: [jdk21] RFR: 8303279: C2: crash in SubTypeCheckNode::sub() at IGVN split if Message-ID: Backport of [JDK-8303279](https://bugs.openjdk.java.net/browse/JDK-8303279). Applies cleanly. Thanks, Tobias ------------- Commit messages: - 8303279: C2: crash in SubTypeCheckNode::sub() at IGVN split if Changes: https://git.openjdk.org/jdk21/pull/119/files Webrev: https://webrevs.openjdk.org/?repo=jdk21&pr=119&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8303279 Stats: 110 lines in 7 files changed: 94 ins; 4 del; 12 mod Patch: https://git.openjdk.org/jdk21/pull/119.diff Fetch: git fetch https://git.openjdk.org/jdk21.git pull/119/head:pull/119 PR: https://git.openjdk.org/jdk21/pull/119 From chagedorn at openjdk.org Thu Jul 13 06:15:17 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 13 Jul 2023 06:15:17 GMT Subject: [jdk21] RFR: 8303279: C2: crash in SubTypeCheckNode::sub() at IGVN split if In-Reply-To: References: Message-ID: On Thu, 13 Jul 2023 05:47:04 GMT, Tobias Hartmann wrote: > Backport of [JDK-8303279](https://bugs.openjdk.java.net/browse/JDK-8303279). Applies cleanly. > > Thanks, > Tobias Looks good. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk21/pull/119#pullrequestreview-1527649487 From thartmann at openjdk.org Thu Jul 13 06:26:09 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 13 Jul 2023 06:26:09 GMT Subject: [jdk21] RFR: 8303279: C2: crash in SubTypeCheckNode::sub() at IGVN split if In-Reply-To: References: Message-ID: On Thu, 13 Jul 2023 05:47:04 GMT, Tobias Hartmann wrote: > Backport of [JDK-8303279](https://bugs.openjdk.java.net/browse/JDK-8303279). Applies cleanly. > > Thanks, > Tobias Thanks, Christian! ------------- PR Comment: https://git.openjdk.org/jdk21/pull/119#issuecomment-1633624184 From thartmann at openjdk.org Thu Jul 13 08:34:25 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 13 Jul 2023 08:34:25 GMT Subject: [jdk21] Integrated: 8303279: C2: crash in SubTypeCheckNode::sub() at IGVN split if In-Reply-To: References: Message-ID: On Thu, 13 Jul 2023 05:47:04 GMT, Tobias Hartmann wrote: > Backport of [JDK-8303279](https://bugs.openjdk.java.net/browse/JDK-8303279). Applies cleanly. > > Thanks, > Tobias This pull request has now been integrated. Changeset: f7924758 Author: Tobias Hartmann URL: https://git.openjdk.org/jdk21/commit/f79247584ecf9012618ec87637c6b4b213a90e6d Stats: 110 lines in 7 files changed: 94 ins; 4 del; 12 mod 8303279: C2: crash in SubTypeCheckNode::sub() at IGVN split if Reviewed-by: chagedorn Backport-of: caadad4fdc78799dab2d492dba9b9f74b22d036e ------------- PR: https://git.openjdk.org/jdk21/pull/119 From roland at openjdk.org Thu Jul 13 15:45:02 2023 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 13 Jul 2023 15:45:02 GMT Subject: RFR: 8308103: Massive (up to ~30x) increase in C2 compilation time since JDK 17 In-Reply-To: References: Message-ID: On Wed, 12 Jul 2023 05:25:46 GMT, Tobias Hartmann wrote: > The fix looks good to me but I still have a hard time to comprehend that this leads to a 30x increase in compilation time. And I'm worried that we have similar issues in other code. As a follow-up, could we have `PhaseIdealLoop::register_new_node` check the hash and return an existing node? There's an exponential increase of the number of casts that are created. First node to be sunk is: 386 RShiftI === _ 385 316 [[ 441 417 429 ]] !orig=[615] !jvms: TestSinkingNodesCausesLongCompilation::mainTest @ bci:104 (line 46) ``` It has 3 uses so 3 casts are created: 1137 CastII === 426 385 [[ ]] #int unconditional dependency 1139 CastII === 414 385 [[ ]] #int unconditional dependency 1141 CastII === 438 385 [[ ]] #int unconditional dependency Next: ``` 385 AddI === _ 384 605 [[ 1141 1137 1139 ]] !jvms: TestSinkingNodesCausesLongCompilation::mainTest @ bci:99 (line 45) (input of previous one) The 3 previous casts are removed and new ones are created: 1143 CastII === 414 384 [[ ]] #int unconditional dependency 1145 CastII === 426 384 [[ ]] #int unconditional dependency 1147 CastII === 438 384 [[ ]] #int unconditional dependency next: 384 LShiftI === _ 605 383 [[ 1147 1143 1145 ]] !jvms: TestSinkingNodesCausesLongCompilation::mainTest @ bci:99 (line 45) input of previous one. Same as step before, 3 just created casts are moved and new ones created: 1149 CastII === 426 605 [[ ]] #int unconditional dependency 1151 CastII === 414 605 [[ ]] #int unconditional dependency 1153 CastII === 438 605 [[ ]] #int unconditional dependency next: 605 RShiftI === _ 606 316 [[ 1146 1153 1142 1144 1149 1151 ]] !orig=[386],[615] !jvms: TestSinkingNodesCausesLongCompilation::mainTest @ bci:104 (line 46) which was input to both 384 and 385 just sunk. It has 6 uses. The 3 casts above and 3 clones of 385. So 3 casts are removed and 6 casts are created: 1155 CastII === 414 606 [[ ]] #int unconditional dependency 1157 CastII === 426 606 [[ ]] #int unconditional dependency 1159 CastII === 426 606 [[ ]] #int unconditional dependency 1161 CastII === 414 606 [[ ]] #int unconditional dependency 1163 CastII === 438 606 [[ ]] #int unconditional dependency 1165 CastII === 438 606 [[ ]] #int unconditional dependency The same sequence of nodes repeats and every 3 nodes there's a RShiftI and the number of clones double. At the last step, the number of casts is 12288. That happens after 39 nodes are sunk. All of this seems pretty specific to that transformation so a local fix seems good enough to me. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14732#issuecomment-1634476079 From vlivanov at openjdk.org Thu Jul 13 16:30:06 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 13 Jul 2023 16:30:06 GMT Subject: RFR: 8308869: C2: use profile data in subtype checks when profile has more than one class [v8] In-Reply-To: References: <0t_NDTIV7WMVq09RidCDQ3hh0sW3FEDGhwne8_7GD_E=.adf6a3df-2782-4975-b59a-85bd0b89d9d4@github.com> Message-ID: On Tue, 11 Jul 2023 14:10:51 GMT, Roland Westrelin wrote: >> In this simple micro benchmark: >> >> https://github.com/franz1981/java-puzzles/blob/main/src/main/java/red/hat/puzzles/polymorphism/RequireNonNullCheckcastScalability.java#L70 >> >> Performance drops sharply with polluted profile: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 false thrpt 10 1453.372 ? 24.919 ops/us >> >> >> to: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 28.579 ? 2.280 ops/us >> >> >> The test has 2 type checks to 2 different interfaces so caching with >> `secondary_super_cache` doesn't help. >> >> The micro-benchmark only uses 2 different concrete classes >> (`DuplicatedContext` and `NonDuplicatedContext`) and they are recorded >> in profile data at the type checks. But c2 only take advantage of >> profile data at type checks if they report a single class. >> >> What I propose is that the full blown type check expanded in >> `Phase::gen_subtype_check()` takes advantage of profile data. So in >> the case of the micro benchmark, before checking the >> `secondary_super_cache`, generated code checks whether the object >> being type checked is a `DuplicatedContext` or a >> `NonDuplicatedContext`. >> >> This works fairly well on this micro benchmark: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 871.224 ? 20.750 ops/us >> >> >> It also scales much better if there are multiple threads running the >> same test (`secondary_super_cache` doesn't scale well: see >> JDK-8180450). >> >> Now if the micro-benchmark is changed according to the comment: >> >> https://github.com/franz1981/java-puzzles/blob/d2d60af3d0dfe7a2567807395138edcb1d1c24f5/src/main/java/red/hat/puzzles/polymorphism/RequireNonNullCheckcastScalability.java#L62 >> >> so the type check hits in the `secondary_super_cache`, the current >> code performs much better: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 871.224 ? 20.750 ops/us >> >> >> but leveraging profiling as explained above performs even better: >> >> >> Benchmark ... > > Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: > > - never common SubTypeCheckNode nodes > - keep both ways of doing profile Looks good. I assume SA code is not sensitive to receiver profiling details and the changes there are JVMCI-specific. src/hotspot/share/opto/c2_globals.hpp line 775: > 773: "Verify receiver types at runtime") \ > 774: \ > 775: product(intx, TypeProfileSubTypeCheckCommonThreshold, 50, \ Is it better to declare it diagnostic? ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14375#pullrequestreview-1528818294 PR Review Comment: https://git.openjdk.org/jdk/pull/14375#discussion_r1262792322 From vlivanov at openjdk.org Thu Jul 13 16:30:09 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Thu, 13 Jul 2023 16:30:09 GMT Subject: RFR: 8308869: C2: use profile data in subtype checks when profile has more than one class [v7] In-Reply-To: References: <0t_NDTIV7WMVq09RidCDQ3hh0sW3FEDGhwne8_7GD_E=.adf6a3df-2782-4975-b59a-85bd0b89d9d4@github.com> Message-ID: On Tue, 11 Jul 2023 14:08:53 GMT, Roland Westrelin wrote: >> test/hotspot/jtreg/compiler/c2/irTests/ProfileAtTypeCheck.java line 44: >> >>> 42: flags.add("-XX:TypeProfileSubTypeCheckCommonThreshold=90"); >>> 43: if (!Platform.is32bit()) { >>> 44: flags.add("-XX:-UseCompressedClassPointers"); >> >> What's the purpose of `-XX:-UseCompressedClassPointers` on 64-bit platforms? Make it easier to match the IR? > > It is to make it easier to match the IR. I could update the rules so it's not required. I'm fine with it either way. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14375#discussion_r1262796420 From xxinliu at amazon.com Thu Jul 13 19:42:31 2023 From: xxinliu at amazon.com (Liu, Xin) Date: Thu, 13 Jul 2023 12:42:31 -0700 Subject: Update on PEA in C2 (Episode 4) Message-ID: <2264bc72-edd7-0a3c-fedb-81dceff42e99@amazon.com> ?Hi? We would like to update what we have done in C2 PEA in the last couple of months. We rootcaused some runtime errors. There are 2 reasons. 1) we need to replace the old object with the materialized object in SafePointNode, or we will end up with wrong objects after deoptimisation. 2) we need to replace the old object with the materialized object at Parse::do_exits. We have to track allocation state inter-procedurally when the method is inlined. GraphKit::backfill_materialized() scans the inputs of a SafePointNode and do the replacement. By fixing the runtime error, C2 PEA starts running non-trivial Java programs. We look into 2 examples from Graal website: https://www.graalvm.org/22.1/examples/java-performance-examples/ blender.java is the kernel of sunflow. Sunflow is a ray tracer in Java. C2 PEA makes it 38.58% faster due to allocation reduction.? Bender.java with C2 PEA still has 14% performance gap comparing with Graal CE.? Graal PEA features a memory Read/Write replacement and can simplify a double modulo to an integer modulo. We file a JBS issue (JDK-8309636) but don't want to sidetracked by it. In dacapo/sunflow, we measure the same execution time . The Geomean of allocation rate reduces from ?6716.596Mb/s to 5755.249 Mb/s , or 14.31%. Average of allocation rate reduces from 7141.490 Mb/s to 6080.981 ?Mb/s , or 14.85%. CountUppercase.java is a typical java program with stream API. We found that C2 PEA has 30% more allocation than default. The problem comes from object composition. I will explain it later. For hotspot:tier-1 test, we still have 12 known failures. 3 of them are due to object composition as well. 7 are locked up due to AbstractQueuedSynchronizer. ============================== ?? TEST????????????????????????????????????????????? TOTAL? PASS FAIL ERROR >> jtreg:test/hotspot/jtreg:tier1???????????????????? 2227 2210???? 4???? 8 << ============================== Remain problem: object composition An object may contain fields of other objects. Those objects form a directed cyclic graph. One revelation is that it's impossible to get an object materialized individually. We believe the minimal unit of materialization is a strongly connected component of object graph. Besides correctness, it also has problem for EA/SR. If we can't clone the entire strongly connected componenet, the original object will retain the connection of those materialized objects. We materialize those objects because they escape. The escapement will proprogate to the original object over Field(-F>). As result, the original object can't be eliminated or scalar replaced. We have added an option 'PEAParanoid' to detect this issue. Graal PEA has a node called CommitAllocationNode which groups all relevant VirtualObject nodes and processes them in 2 passes. https://github.com/oracle/graal/blob/2f3a8d5ab0cd538bd323fa29812509873e6f7807/compiler/src/jdk.internal.vm.compiler/src/org/graalvm/compiler/replacements/DefaultJavaLoweringProvider.java#L900 We plan to materialize an object using DFS. It traverses all other virtual objects through fields. We expect to fix the performance issue of CountUppercase.java and some regression failures with this feature. We also refactored the implementation. The goal is to align the key data structure 'aliases' to Graal PEA. 'aliases' maps one node to a virtual object, so we can recognize some nodes are aliases of virtual objects in DFS. By moving almost all merging logic to MergeProcessor, it is now less intrusive in merge_common. Here is the PR: https://github.com/navyxliu/jdk/pull/55 thanks, --lx From vladimir.kozlov at oracle.com Fri Jul 14 06:10:41 2023 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 13 Jul 2023 23:10:41 -0700 Subject: Update on PEA in C2 (Episode 4) In-Reply-To: <2264bc72-edd7-0a3c-fedb-81dceff42e99@amazon.com> References: <2264bc72-edd7-0a3c-fedb-81dceff42e99@amazon.com> Message-ID: Thank you for update. Vladimir K On 7/13/23 12:42 PM, Liu, Xin wrote: > ?Hi? > > We would like to update what we have done in C2 PEA in the last couple of months. > > We rootcaused some runtime errors. There are 2 reasons. > 1) we need to replace the old object with the materialized object in SafePointNode, or we will end up with > wrong objects after deoptimisation. > > 2) we need to replace the old object with the materialized object at Parse::do_exits. We have to track > allocation state inter-procedurally when the method is inlined. > > GraphKit::backfill_materialized() scans the inputs of a SafePointNode and do the replacement. By fixing the > runtime error, C2 PEA starts running non-trivial Java programs. > > We look into 2 examples from Graal website: https://www.graalvm.org/22.1/examples/java-performance-examples/ > > blender.java is the kernel of sunflow. Sunflow is a ray tracer in Java. C2 PEA makes it 38.58% faster due to > allocation reduction.? Bender.java with C2 PEA still has 14% performance gap comparing with Graal CE.? Graal > > PEA features a memory Read/Write replacement and can simplify a double modulo to an integer modulo. We file a > JBS issue (JDK-8309636) but don't want to sidetracked by it. > > In dacapo/sunflow, we measure the same execution time . The Geomean of allocation rate reduces from > ?6716.596Mb/s to 5755.249 Mb/s , or 14.31%. Average of allocation rate reduces from 7141.490 Mb/s to 6080.981 > ?Mb/s , or 14.85%. > > CountUppercase.java is a typical java program with stream API. We found that C2 PEA has 30% more allocation than > default. The problem comes from object composition. I will explain it later. > > For hotspot:tier-1 test, we still have 12 known failures. 3 of them are due to object composition as well. 7 > are locked up due to AbstractQueuedSynchronizer. > > ============================== > ?? TEST????????????????????????????????????????????? TOTAL? PASS FAIL ERROR > >> jtreg:test/hotspot/jtreg:tier1???????????????????? 2227 2210 4???? 8 << > ============================== > > Remain problem: object composition > > An object may contain fields of other objects. Those objects form a directed cyclic graph. One revelation is > that it's impossible to get an object materialized individually. We believe the minimal unit of > materialization is a strongly connected component of object graph. > > Besides correctness, it also has problem for EA/SR. If we can't clone the entire strongly connected > componenet, the original object will retain the connection of those materialized objects. We materialize those > objects because they escape. The escapement will proprogate to the original object over Field(-F>). As result, > the original object can't be eliminated or scalar replaced. We have added an option 'PEAParanoid' to detect > this issue. > > Graal PEA has a node called CommitAllocationNode which groups all relevant VirtualObject nodes and processes > them in 2 passes. > https://github.com/oracle/graal/blob/2f3a8d5ab0cd538bd323fa29812509873e6f7807/compiler/src/jdk.internal.vm.compiler/src/org/graalvm/compiler/replacements/DefaultJavaLoweringProvider.java#L900 > > We plan to materialize an object using DFS. It traverses all other virtual objects through fields. We > expect to fix the performance issue of CountUppercase.java and some regression failures with this feature. > > We also refactored the implementation. The goal is to align the key data structure 'aliases' to Graal > PEA. 'aliases' maps one node to a virtual object, so we can recognize some nodes are aliases of virtual > objects in DFS. By moving almost all merging logic to MergeProcessor, it is now less intrusive in > merge_common. Here is the PR: > https://github.com/navyxliu/jdk/pull/55 > > thanks, > > --lx > From epeter at openjdk.org Fri Jul 14 14:01:44 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 14 Jul 2023 14:01:44 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v24] In-Reply-To: References: Message-ID: > For some changes to `SuperWord`, and maybe auto-vectorization in general, I want to strengthen the IR Framework. > > **Motivation** > I want to not just find the relevant IR nodes, but also assert that they have the maximal length that they could have on the respective platform (given the CPU features and `MaxVectorSize`). Without this verification it is possible that a future change leads to a regression where we still vectorize but at shorter vector widths as before - leading to performance loss. > > **How to use it** > > All `IRNode`s in `test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java` that are created with `vectorNode` are now all matched with their `type` and `size`. The regex might now look something like this: > > `"(\d+(\s){2}(VectorCastF2X.*)+(\s){2}===.*vector[A-Za-z][8]:{int})"` > which would match with IR nodes dumped like that: > `1150 VectorCastF2X === _ 1151 [[ 1146 ]] #vectory[8]:{int} ...` > > The goal was to keep it simple and straight forward. In most cases, you can just use the nodes as before, and implicitly we now check for maximal size automatically. However, in some cases we want to ensure there is no or only a limited number of nodes (`failOn` or comparison `<` or `<=` or `=0`) - in those cases we usually want to make sure there is not any node of any size, so we match with any size by default. The size can also explicitly be constrained using `IRNode.VECTOR_SIZE`. > > Some examples: > 1. `@IR(counts = {IRNode.LOAD_VECTOR_I, " >0 "})` -> search for a `LoadVector` node with `type` `int`, and maximal `size` possible on the machine (limited by CPU features and `MaxVectorSize`). This is the most common use case. > 2. `@IR(failOn = { IRNode.LOAD_VECTOR_L, IRNode.STORE_VECTOR })` -> fail if there is a `LoadVector` with type `long`, of `any` size. > 3. `@IR(counts = { IRNode.XOR_VI, IRNode.VECTOR_SIZE_4, " > 0 "})` -> find at least one `XorV` node with type `int` and exactly `4` elements. Useful for VectorAPI when the vector species is fixed. > 4. `@IR(counts = { IRNode.LOAD_VECTOR_D, IRNode.VECTOR_SIZE + "min(4, max_double)", " >0 " })` -> search for a `LoadVector` node with `type` `double`, and `size` exactly equals to `min(4, max_double)` (so 4 elements, or if the hardware allows fewer `doubles`, then that number). > 5. `@IR(counts = { IRNode.ABS_VF, IRNode.VECTOR_SIZE + "min(LoopMaxUnroll, max_float)", ">= 1" })` -> find at least one `AbsV` nodes with type `float`, and the `size` exactly equals to the smaller of `LoopMaxUnroll` or the maximal size allow... Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 69 commits: - Merge branch 'master' into JDK-8310308 - Duplicated =1 counts for vector nodes in compiler/vectorapi/reshape/tests/TestVectorCast.java - Merge branch 'master' into JDK-8310308 - Fix with canTrustVectorSize for Cascade Lake - TestSpillTheBeans.java - print VMInfo from Test VM - merge from master, manual merge for VectorLogicalOpIdentityTest.java - Response to Tobias' review - more for Christian's reviews - Apply suggestions from code review Co-authored-by: Christian Hagedorn - ... and 59 more: https://git.openjdk.org/jdk/compare/167d1c18...b77c1317 ------------- Changes: https://git.openjdk.org/jdk/pull/14539/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14539&range=23 Stats: 3553 lines in 67 files changed: 1480 ins; 21 del; 2052 mod Patch: https://git.openjdk.org/jdk/pull/14539.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14539/head:pull/14539 PR: https://git.openjdk.org/jdk/pull/14539 From cslucas at openjdk.org Fri Jul 14 15:16:25 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Fri, 14 Jul 2023 15:16:25 GMT Subject: Withdrawn: 8306625 - Missing instructions on IR-based test framework ALLOC Regex In-Reply-To: References: Message-ID: On Tue, 25 Apr 2023 01:18:39 GMT, Cesar Soares Lucas wrote: > On AArch64 with -XX:-UseTLAB, C2 can add an `add`, `mulw` or `addw` around the method call to allocate an object/array. When this happens the current Regex of the IR-based test framework will NOT recognize the instruction sequence as an allocation and the result will be a false-negative test results. > > This PR is to adjust the four Regex to account for those possible instructions. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/13631 From chagedorn at openjdk.org Fri Jul 14 15:21:17 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Fri, 14 Jul 2023 15:21:17 GMT Subject: RFR: 8306625 - Missing instructions on IR-based test framework ALLOC Regex In-Reply-To: References: Message-ID: On Tue, 25 Apr 2023 01:18:39 GMT, Cesar Soares Lucas wrote: > On AArch64 with -XX:-UseTLAB, C2 can add an `add`, `mulw` or `addw` around the method call to allocate an object/array. When this happens the current Regex of the IR-based test framework will NOT recognize the instruction sequence as an allocation and the result will be a false-negative test results. > > This PR is to adjust the four Regex to account for those possible instructions. I guess this is not a problem anymore since the integration of https://github.com/openjdk/jdk/pull/14583 which removed `UseTLAB` from the whitelist. We will not automatically perform any IR matching anymore when passing `-XX:-UseTLAB` as additional `javaoption/vmoption`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13631#issuecomment-1636012787 From vlivanov at openjdk.org Fri Jul 14 21:37:16 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 14 Jul 2023 21:37:16 GMT Subject: RFR: 8308869: C2: use profile data in subtype checks when profile has more than one class [v8] In-Reply-To: References: <0t_NDTIV7WMVq09RidCDQ3hh0sW3FEDGhwne8_7GD_E=.adf6a3df-2782-4975-b59a-85bd0b89d9d4@github.com> Message-ID: On Tue, 11 Jul 2023 14:10:51 GMT, Roland Westrelin wrote: >> In this simple micro benchmark: >> >> https://github.com/franz1981/java-puzzles/blob/main/src/main/java/red/hat/puzzles/polymorphism/RequireNonNullCheckcastScalability.java#L70 >> >> Performance drops sharply with polluted profile: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 false thrpt 10 1453.372 ? 24.919 ops/us >> >> >> to: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 28.579 ? 2.280 ops/us >> >> >> The test has 2 type checks to 2 different interfaces so caching with >> `secondary_super_cache` doesn't help. >> >> The micro-benchmark only uses 2 different concrete classes >> (`DuplicatedContext` and `NonDuplicatedContext`) and they are recorded >> in profile data at the type checks. But c2 only take advantage of >> profile data at type checks if they report a single class. >> >> What I propose is that the full blown type check expanded in >> `Phase::gen_subtype_check()` takes advantage of profile data. So in >> the case of the micro benchmark, before checking the >> `secondary_super_cache`, generated code checks whether the object >> being type checked is a `DuplicatedContext` or a >> `NonDuplicatedContext`. >> >> This works fairly well on this micro benchmark: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 871.224 ? 20.750 ops/us >> >> >> It also scales much better if there are multiple threads running the >> same test (`secondary_super_cache` doesn't scale well: see >> JDK-8180450). >> >> Now if the micro-benchmark is changed according to the comment: >> >> https://github.com/franz1981/java-puzzles/blob/d2d60af3d0dfe7a2567807395138edcb1d1c24f5/src/main/java/red/hat/puzzles/polymorphism/RequireNonNullCheckcastScalability.java#L62 >> >> so the type check hits in the `secondary_super_cache`, the current >> code performs much better: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 871.224 ? 20.750 ops/us >> >> >> but leveraging profiling as explained above performs even better: >> >> >> Benchmark ... > > Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: > > - never common SubTypeCheckNode nodes > - keep both ways of doing profile test/hotspot/jtreg/compiler/c2/irTests/ProfileAtTypeCheck.java line 47: > 45: } > 46: flags.add("-XX:+IgnoreUnrecognizedVMOptions"); > 47: flags.add("-XX:+UseParallelGC"); The test fails when another GC is specified externally. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14375#discussion_r1264192068 From epeter at openjdk.org Sat Jul 15 15:12:03 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Sat, 15 Jul 2023 15:12:03 GMT Subject: RFR: 8310308: IR Framework: check for type and size of vector nodes [v25] In-Reply-To: References: Message-ID: > For some changes to `SuperWord`, and maybe auto-vectorization in general, I want to strengthen the IR Framework. > > **Motivation** > I want to not just find the relevant IR nodes, but also assert that they have the maximal length that they could have on the respective platform (given the CPU features and `MaxVectorSize`). Without this verification it is possible that a future change leads to a regression where we still vectorize but at shorter vector widths as before - leading to performance loss. > > **How to use it** > > All `IRNode`s in `test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java` that are created with `vectorNode` are now all matched with their `type` and `size`. The regex might now look something like this: > > `"(\d+(\s){2}(VectorCastF2X.*)+(\s){2}===.*vector[A-Za-z][8]:{int})"` > which would match with IR nodes dumped like that: > `1150 VectorCastF2X === _ 1151 [[ 1146 ]] #vectory[8]:{int} ...` > > The goal was to keep it simple and straight forward. In most cases, you can just use the nodes as before, and implicitly we now check for maximal size automatically. However, in some cases we want to ensure there is no or only a limited number of nodes (`failOn` or comparison `<` or `<=` or `=0`) - in those cases we usually want to make sure there is not any node of any size, so we match with any size by default. The size can also explicitly be constrained using `IRNode.VECTOR_SIZE`. > > Some examples: > 1. `@IR(counts = {IRNode.LOAD_VECTOR_I, " >0 "})` -> search for a `LoadVector` node with `type` `int`, and maximal `size` possible on the machine (limited by CPU features and `MaxVectorSize`). This is the most common use case. > 2. `@IR(failOn = { IRNode.LOAD_VECTOR_L, IRNode.STORE_VECTOR })` -> fail if there is a `LoadVector` with type `long`, of `any` size. > 3. `@IR(counts = { IRNode.XOR_VI, IRNode.VECTOR_SIZE_4, " > 0 "})` -> find at least one `XorV` node with type `int` and exactly `4` elements. Useful for VectorAPI when the vector species is fixed. > 4. `@IR(counts = { IRNode.LOAD_VECTOR_D, IRNode.VECTOR_SIZE + "min(4, max_double)", " >0 " })` -> search for a `LoadVector` node with `type` `double`, and `size` exactly equals to `min(4, max_double)` (so 4 elements, or if the hardware allows fewer `doubles`, then that number). > 5. `@IR(counts = { IRNode.ABS_VF, IRNode.VECTOR_SIZE + "min(LoopMaxUnroll, max_float)", ">= 1" })` -> find at least one `AbsV` nodes with type `float`, and the `size` exactly equals to the smaller of `LoopMaxUnroll` or the maximal size allow... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: duplicate rules in VectorLogicalOpIdentityTest.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14539/files - new: https://git.openjdk.org/jdk/pull/14539/files/b77c1317..5226f570 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14539&range=24 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14539&range=23-24 Stats: 40 lines in 1 file changed: 20 ins; 0 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/14539.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14539/head:pull/14539 PR: https://git.openjdk.org/jdk/pull/14539 From fyang at openjdk.org Mon Jul 17 03:51:21 2023 From: fyang at openjdk.org (Fei Yang) Date: Mon, 17 Jul 2023 03:51:21 GMT Subject: RFR: 8311923: TestIRMatching.java fails on RISC-V In-Reply-To: References: Message-ID: On Wed, 12 Jul 2023 08:39:11 GMT, Gui Cao wrote: > Hi, we are experiencing test failures in test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestIRMatching.java using fastdebug: > > > One or more @IR rules failed: > > Failed IR Rules (1) of Methods (1) > ---------------------------------- > 1) Method "public java.lang.Object[] ir_framework.tests.CheckCastArray.arrayCopy(java.lang.Object[],java.lang.Class)" - [Failed IR rules: 1]: > * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeatureOr={}, applyIfCPUFeature={}, counts={}, failOn={"_#CHECKCAST_ARRAYCOPY#_"}, applyIfAnd={}, applyIfOr={}, applyIfNot={})" > > Phase "PrintOptoAssembly": > - failOn: Graph contains forbidden nodes: > * Constraint 1: "(.*((?i:call_leaf_nofp,runtime)|CALL,\\s?runtime leaf nofp|BCTRL.*.leaf call).*checkcast_arraycopy.*)" > - Matched forbidden node: > * 15a + CALL, runtime leaf nofp 0x0000003f7fbd9600 #@CallLeafNoFPDirect checkcast_arraycopy > >>>> Check stdout for compilation output of the failed methods > > Through the description of the problem in the JBS issue, to fix this, we modified the matching rules in test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java and added the mv directive to the matching rules. > > ## Testing: > qemu system and unmatched board: > - [x] test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestIRMatching.java (fastdebug) test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 210: > 208: optoOnly(ALLOC_OF, regex); > 209: } > 210: Suggestion: I think it will be safer to update regex for `ALLOC` and `ALLOC_OF` adding matching for RISC-V `mv` at the same time. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14848#discussion_r1264814840 From gcao at openjdk.org Mon Jul 17 04:15:25 2023 From: gcao at openjdk.org (Gui Cao) Date: Mon, 17 Jul 2023 04:15:25 GMT Subject: RFR: 8311923: TestIRMatching.java fails on RISC-V [v2] In-Reply-To: References: Message-ID: > Hi, we are experiencing test failures in test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestIRMatching.java using fastdebug: > > > One or more @IR rules failed: > > Failed IR Rules (1) of Methods (1) > ---------------------------------- > 1) Method "public java.lang.Object[] ir_framework.tests.CheckCastArray.arrayCopy(java.lang.Object[],java.lang.Class)" - [Failed IR rules: 1]: > * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeatureOr={}, applyIfCPUFeature={}, counts={}, failOn={"_#CHECKCAST_ARRAYCOPY#_"}, applyIfAnd={}, applyIfOr={}, applyIfNot={})" > > Phase "PrintOptoAssembly": > - failOn: Graph contains forbidden nodes: > * Constraint 1: "(.*((?i:call_leaf_nofp,runtime)|CALL,\\s?runtime leaf nofp|BCTRL.*.leaf call).*checkcast_arraycopy.*)" > - Matched forbidden node: > * 15a + CALL, runtime leaf nofp 0x0000003f7fbd9600 #@CallLeafNoFPDirect checkcast_arraycopy > >>>> Check stdout for compilation output of the failed methods > > Through the description of the problem in the JBS issue, to fix this, we modified the matching rules in test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java and added the mv directive to the matching rules. > > ## Testing: > qemu system and unmatched board: > - [x] test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestIRMatching.java (fastdebug) Gui Cao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: 8311923: TestIRMatching.java fails on RISC-V ------------- Changes: https://git.openjdk.org/jdk/pull/14848/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14848&range=01 Stats: 11 lines in 2 files changed: 0 ins; 0 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/14848.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14848/head:pull/14848 PR: https://git.openjdk.org/jdk/pull/14848 From gcao at openjdk.org Mon Jul 17 04:15:25 2023 From: gcao at openjdk.org (Gui Cao) Date: Mon, 17 Jul 2023 04:15:25 GMT Subject: RFR: 8311923: TestIRMatching.java fails on RISC-V [v2] In-Reply-To: References: Message-ID: On Mon, 17 Jul 2023 03:48:07 GMT, Fei Yang wrote: > Suggestion: I think it will be safer and more consistent to update regex for `ALLOC` and `ALLOC_OF` adding matching for RISC-V `mv` at the same time. Thanks for your review, Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14848#discussion_r1264823806 From thartmann at openjdk.org Mon Jul 17 05:42:16 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 17 Jul 2023 05:42:16 GMT Subject: RFR: 8308103: Massive (up to ~30x) increase in C2 compilation time since JDK 17 In-Reply-To: References: Message-ID: On Fri, 30 Jun 2023 13:23:38 GMT, Roland Westrelin wrote: > A long chain of nodes are sunk out of a loop. Every time a node is > moved out of the loop, a cast is created to pin the node out of the > loop. When its input is next sunk, the cast is removed (the cast is > replaced by its input) and a new cast is created. Some nodes on the > chain have 2 other nodes in the chain as uses. When such a node is > sunk, 2 cast nodes are created, one for each use. So as the compiler > moves forward in the chain, the number of cast to remove grows. From > some profiling, removing those casts is what takes a lot of time. > > The fix I propose is, when a node is processed, to check whether a > cast at the out of loop control was already created for that node and > to reuse it. > > The test case takes 6 minutes when I run it without the fix and 3 > seconds with it. Makes sense. Thanks for the details, Roland. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14732#issuecomment-1637411399 From fyang at openjdk.org Mon Jul 17 06:40:18 2023 From: fyang at openjdk.org (Fei Yang) Date: Mon, 17 Jul 2023 06:40:18 GMT Subject: RFR: 8311923: TestIRMatching.java fails on RISC-V [v2] In-Reply-To: References: Message-ID: On Mon, 17 Jul 2023 04:15:25 GMT, Gui Cao wrote: >> Hi, we are experiencing test failures in test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestIRMatching.java using fastdebug: >> >> >> One or more @IR rules failed: >> >> Failed IR Rules (1) of Methods (1) >> ---------------------------------- >> 1) Method "public java.lang.Object[] ir_framework.tests.CheckCastArray.arrayCopy(java.lang.Object[],java.lang.Class)" - [Failed IR rules: 1]: >> * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeatureOr={}, applyIfCPUFeature={}, counts={}, failOn={"_#CHECKCAST_ARRAYCOPY#_"}, applyIfAnd={}, applyIfOr={}, applyIfNot={})" >> > Phase "PrintOptoAssembly": >> - failOn: Graph contains forbidden nodes: >> * Constraint 1: "(.*((?i:call_leaf_nofp,runtime)|CALL,\\s?runtime leaf nofp|BCTRL.*.leaf call).*checkcast_arraycopy.*)" >> - Matched forbidden node: >> * 15a + CALL, runtime leaf nofp 0x0000003f7fbd9600 #@CallLeafNoFPDirect checkcast_arraycopy >> >>>>> Check stdout for compilation output of the failed methods >> >> Through the description of the problem in the JBS issue, to fix this, we modified the matching rules in test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java and added the mv directive to the matching rules. >> >> ## Testing: >> qemu system and unmatched board: >> - [x] test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestIRMatching.java (fastdebug) > > Gui Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains one additional commit since the last revision: > > 8311923: TestIRMatching.java fails on RISC-V LGTM. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14848#pullrequestreview-1532079853 From rrich at openjdk.org Mon Jul 17 14:31:18 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 17 Jul 2023 14:31:18 GMT Subject: RFR: 8308340: C2: Idealize Fma nodes [v4] In-Reply-To: References: Message-ID: On Wed, 5 Jul 2023 11:11:22 GMT, Fei Gao wrote: >> Some platforms, like aarch64, ppc, and riscv, support fusing `Math.fma(-a, b, c)` or `Math.fma(a, -b, c)` by generating partially symmetric match rules like: >> >> >> match(Set dst (FmaF src3 (Binary (NegF src1) src2))); >> match(Set dst (FmaF src3 (Binary src1 (NegF src2)))); >> >> >> Since `Fma` is partially commutative, the patch is to convert `Math.fma(-a, b, c)` to `Math.fma(b, -a, c)` in gvn phase, making node patterns canonical. Then we can remove redundant rules. >> >> Also, we should guarantee that C2 generates `Fma` nodes only on platforms supporting `Fma` instructions before matcher, so we can remove all `predicate(UseFMA)` for all `Fma` rules. >> >> After the patch, the code size of libjvm.so on aarch64 platform decreased by 63.4k. >> >> The patch passed all tier 1 - 3 on aarch64 and x86 platforms. > > Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into fg8308340 > - Merge branch 'master' into fg8308340 > - Move check for UseFMA from c2compiler.cpp to Matcher::match_rule_supported in .ad files > - Merge branch 'master' into fg8308340 > - 8308340: C2: Idealize Fma nodes > > Some platforms, like aarch64, ppc, and riscv, support fusing > `Math.fma(-a, b, c)` or `Math.fma(a, -b, c)` by generating > partially symmetric match rules like: > > ``` > match(Set dst (FmaF src3 (Binary (NegF src1) src2))); > match(Set dst (FmaF src3 (Binary src1 (NegF src2)))); > ``` > > Since `Fma` is partially communitive, the patch is to convert > `Math.fma(-a, b, c)` to `Math.fma(b, -a, c)` in gvn phase, > making node patterns canonical. Then we can remove redundant > rules. > > Also, we should guarantee that C2 generates `Fma` nodes only on > platforms supporting `Fma` instructions before matcher, so we > can remove all `predicate(UseFMA)` for all `Fma` rules. > > After the patch, the code size of libjvm.so on aarch64 platform > decreased by 63.4k. > > The patch passed all tier 1 - 3 on aarch64 and x86 platforms. src/hotspot/share/opto/vectornode.cpp line 1879: > 1877: > 1878: Node* FmaVNode::Ideal(PhaseGVN* phase, bool can_reshape) { > 1879: // We canonicalize the node by converting "(-a)*b+c" into "b*(-a)+c" Could you explain a little bit more please? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14576#discussion_r1265452868 From rrich at openjdk.org Mon Jul 17 14:41:18 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 17 Jul 2023 14:41:18 GMT Subject: RFR: 8308340: C2: Idealize Fma nodes [v4] In-Reply-To: References: Message-ID: On Wed, 5 Jul 2023 11:11:22 GMT, Fei Gao wrote: >> Some platforms, like aarch64, ppc, and riscv, support fusing `Math.fma(-a, b, c)` or `Math.fma(a, -b, c)` by generating partially symmetric match rules like: >> >> >> match(Set dst (FmaF src3 (Binary (NegF src1) src2))); >> match(Set dst (FmaF src3 (Binary src1 (NegF src2)))); >> >> >> Since `Fma` is partially commutative, the patch is to convert `Math.fma(-a, b, c)` to `Math.fma(b, -a, c)` in gvn phase, making node patterns canonical. Then we can remove redundant rules. >> >> Also, we should guarantee that C2 generates `Fma` nodes only on platforms supporting `Fma` instructions before matcher, so we can remove all `predicate(UseFMA)` for all `Fma` rules. >> >> After the patch, the code size of libjvm.so on aarch64 platform decreased by 63.4k. >> >> The patch passed all tier 1 - 3 on aarch64 and x86 platforms. > > Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into fg8308340 > - Merge branch 'master' into fg8308340 > - Move check for UseFMA from c2compiler.cpp to Matcher::match_rule_supported in .ad files > - Merge branch 'master' into fg8308340 > - 8308340: C2: Idealize Fma nodes > > Some platforms, like aarch64, ppc, and riscv, support fusing > `Math.fma(-a, b, c)` or `Math.fma(a, -b, c)` by generating > partially symmetric match rules like: > > ``` > match(Set dst (FmaF src3 (Binary (NegF src1) src2))); > match(Set dst (FmaF src3 (Binary src1 (NegF src2)))); > ``` > > Since `Fma` is partially communitive, the patch is to convert > `Math.fma(-a, b, c)` to `Math.fma(b, -a, c)` in gvn phase, > making node patterns canonical. Then we can remove redundant > rules. > > Also, we should guarantee that C2 generates `Fma` nodes only on > platforms supporting `Fma` instructions before matcher, so we > can remove all `predicate(UseFMA)` for all `Fma` rules. > > After the patch, the code size of libjvm.so on aarch64 platform > decreased by 63.4k. > > The patch passed all tier 1 - 3 on aarch64 and x86 platforms. I've added the patch to our nightly testing about 10 days ago and forgot about it (sorry). So it passed several iterations of tier1-4 of hotspot and jdk, all of langtools and jaxp, renaissance benchmarks as functional tests. All testing with fastdebug and release builds on the main platforms and also on Linux/PPC64le. PPC changes do look good to me. I'm not the greatest C2 expert though. So I'd suggest to get another review. ------------- PR Review: https://git.openjdk.org/jdk/pull/14576#pullrequestreview-1532962008 From cslucas at openjdk.org Mon Jul 17 21:50:21 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 17 Jul 2023 21:50:21 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v18] In-Reply-To: <72OcyhmFKGyTwDy8LQ0blp5HG5dg5l9OsU5dh9osVxo=.73b3a79e-ff24-4f41-b39b-650a9036ee76@github.com> References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> <-A7bd8C0q5o1WuRSeSkYYnUoApV4s9uijPmiNB2Wteo=.c5bc944c-88a3-4228-bd41-091ac6c8fb1d@github.com> <72OcyhmFKGyTwDy8LQ0blp5HG5dg5l9OsU5dh9osVxo=.73b3a79e-ff24-4f41-b39b-650a9036ee76@github.com> Message-ID: On Tue, 20 Jun 2023 16:44:28 GMT, Vladimir Ivanov wrote: >> Thank you once more for the comments @iwanowww . I?ll address them asap. >> >> Can I ask what requirements are there for a product flag? > >> Can I ask what requirements are there for a product flag? > > Product flags are treated as part of public API of the JVM. So, changes in behavior have to go through CSR process. Also, a product flag has to be deprecated/obsoleted first before it can be removed which takes multiple releases to happen. Better to avoid introducing new product flags unless it is well-justified or necessary. @iwanowww @vnkozlov - can I ask one of you to sponsor this PR? ------------- PR Comment: https://git.openjdk.org/jdk/pull/12897#issuecomment-1638931020 From cslucas at openjdk.org Mon Jul 17 23:05:29 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 17 Jul 2023 23:05:29 GMT Subject: Integrated: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges In-Reply-To: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: On Tue, 7 Mar 2023 01:40:48 GMT, Cesar Soares Lucas wrote: > Can I please get reviews for this PR? > > The most common and frequent use of NonEscaping Phis merging object allocations is for debugging information. The two graphs below show numbers for Renaissance and DaCapo benchmarks - similar results are obtained for all other applications that I tested. > > With what frequency does each IR node type occurs as an allocation merge user? I.e., if the same node type uses a Phi N times the counter is incremented by N: > > ![image](https://user-images.githubusercontent.com/2249648/222280517-4dcf5871-2564-4207-b49e-22aee47fa49d.png) > > What are the most common users of allocation merges? I.e., if the same node type uses a Phi N times the counter is incremented by 1: > > ![image](https://user-images.githubusercontent.com/2249648/222280608-ca742a4e-1622-4e69-a778-e4db6805ea02.png) > > This PR adds support scalar replacing allocations participating in merges used as debug information OR as a base for field loads. I plan to create subsequent PRs to enable scalar replacement of merges used by other node types (CmpP is next on the list) subsequently. > > The approach I used for _rematerialization_ is pretty straightforward. It consists basically of the following. 1) New IR node (suggested by V. Kozlov), named SafePointScalarMergeNode, to represent a set of SafePointScalarObjectNode; 2) Each scalar replaceable input participating in a merge will get a SafePointScalarObjectNode like if it weren't part of a merge. 3) Add a new Class to support the rematerialization of SR objects that are part of a merge; 4) Patch HotSpot to be able to serialize and deserialize debug information related to allocation merges; 5) Patch C2 to generate unique types for SR objects participating in some allocation merges. > > The approach I used for _enabling the scalar replacement of some of the inputs of the allocation merge_ is also pretty straightforward: call `MemNode::split_through_phi` to, well, split AddP->Load* through the merge which will render the Phi useless. > > I tested this with JTREG tests tier 1-4 (Windows, Linux, and Mac) and didn't see regression. I also experimented with several applications and didn't see any failure. I also ran tests with "-ea -esa -Xbatch -Xcomp -XX:+UnlockExperimentalVMOptions -XX:-TieredCompilation -server -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP" and didn't observe any related failures. This pull request has now been integrated. Changeset: a53345ad Author: Cesar Soares Lucas Committer: Vladimir Kozlov URL: https://git.openjdk.org/jdk/commit/a53345ad03e07ab2a990721a506ebc25eed0f7c9 Stats: 2733 lines in 26 files changed: 2485 ins; 108 del; 140 mod 8287061: Support for rematerializing scalar replaced objects participating in allocation merges Reviewed-by: kvn, vlivanov ------------- PR: https://git.openjdk.org/jdk/pull/12897 From gcao at openjdk.org Tue Jul 18 01:37:06 2023 From: gcao at openjdk.org (Gui Cao) Date: Tue, 18 Jul 2023 01:37:06 GMT Subject: RFR: 8311923: TestIRMatching.java fails on RISC-V [v2] In-Reply-To: References: Message-ID: On Mon, 17 Jul 2023 06:36:51 GMT, Fei Yang wrote: >> Gui Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains one additional commit since the last revision: >> >> 8311923: TestIRMatching.java fails on RISC-V > > LGTM. @RealFYang Thanks for your review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14848#issuecomment-1639143007 From fgao at openjdk.org Tue Jul 18 01:44:05 2023 From: fgao at openjdk.org (Fei Gao) Date: Tue, 18 Jul 2023 01:44:05 GMT Subject: RFR: 8308340: C2: Idealize Fma nodes [v4] In-Reply-To: References: Message-ID: <2mha0SwQTlnWaTLdEMX6RQ1KqhasbfWKz503bmkxKhw=.eb3a1229-b222-4d02-b2f6-35c0e50e5766@github.com> On Mon, 17 Jul 2023 14:37:52 GMT, Richard Reingruber wrote: > I've added the patch to our nightly testing about 10 days ago and forgot about it (sorry). So it passed several iterations of tier1-4 of hotspot and jdk, all of langtools and jaxp, renaissance benchmarks as functional tests. All testing with fastdebug and release builds on the main platforms and also on Linux/PPC64le. > > PPC changes do look good to me. I'm not the greatest C2 expert though. So I'd suggest to get another review. Thanks a lot for your review and test work @reinrich! > src/hotspot/share/opto/vectornode.cpp line 1879: > >> 1877: >> 1878: Node* FmaVNode::Ideal(PhaseGVN* phase, bool can_reshape) { >> 1879: // We canonicalize the node by converting "(-a)*b+c" into "b*(-a)+c" > > Could you explain a little bit more please? Thanks for your review! For vectorapi masked operations, like `av.neg().lanewise(VectorOperators.FMA, bv, cv, mask)`, the inactive lanes of the output should save the first input of the node, so the inactive lanes of the output should be equal to lane values in `av.neg()`. If we exchange the inputs, the inactive lanes will be equal to `bv`, which is incorrect. So we shouldn't swap edges for masked nodes. The newly added testcases in ` jdk/test/hotspot/jtreg/compiler/vectorapi/VectorFusedMultiplyAddSubTest.java` can cover this. Fortunately, there is no such constraint for non-masked vector nodes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14576#issuecomment-1639146994 PR Review Comment: https://git.openjdk.org/jdk/pull/14576#discussion_r1266078495 From fyang at openjdk.org Tue Jul 18 01:45:15 2023 From: fyang at openjdk.org (Fei Yang) Date: Tue, 18 Jul 2023 01:45:15 GMT Subject: RFR: 8308869: C2: use profile data in subtype checks when profile has more than one class [v8] In-Reply-To: References: <0t_NDTIV7WMVq09RidCDQ3hh0sW3FEDGhwne8_7GD_E=.adf6a3df-2782-4975-b59a-85bd0b89d9d4@github.com> Message-ID: On Tue, 11 Jul 2023 14:10:51 GMT, Roland Westrelin wrote: >> In this simple micro benchmark: >> >> https://github.com/franz1981/java-puzzles/blob/main/src/main/java/red/hat/puzzles/polymorphism/RequireNonNullCheckcastScalability.java#L70 >> >> Performance drops sharply with polluted profile: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 false thrpt 10 1453.372 ? 24.919 ops/us >> >> >> to: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 28.579 ? 2.280 ops/us >> >> >> The test has 2 type checks to 2 different interfaces so caching with >> `secondary_super_cache` doesn't help. >> >> The micro-benchmark only uses 2 different concrete classes >> (`DuplicatedContext` and `NonDuplicatedContext`) and they are recorded >> in profile data at the type checks. But c2 only take advantage of >> profile data at type checks if they report a single class. >> >> What I propose is that the full blown type check expanded in >> `Phase::gen_subtype_check()` takes advantage of profile data. So in >> the case of the micro benchmark, before checking the >> `secondary_super_cache`, generated code checks whether the object >> being type checked is a `DuplicatedContext` or a >> `NonDuplicatedContext`. >> >> This works fairly well on this micro benchmark: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 871.224 ? 20.750 ops/us >> >> >> It also scales much better if there are multiple threads running the >> same test (`secondary_super_cache` doesn't scale well: see >> JDK-8180450). >> >> Now if the micro-benchmark is changed according to the comment: >> >> https://github.com/franz1981/java-puzzles/blob/d2d60af3d0dfe7a2567807395138edcb1d1c24f5/src/main/java/red/hat/puzzles/polymorphism/RequireNonNullCheckcastScalability.java#L62 >> >> so the type check hits in the `secondary_super_cache`, the current >> code performs much better: >> >> >> Benchmark (typePollution) Mode Cnt Score Error Units >> RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 871.224 ? 20.750 ops/us >> >> >> but leveraging profiling as explained above performs even better: >> >> >> Benchmark ... > > Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: > > - never common SubTypeCheckNode nodes > - keep both ways of doing profile Hello, we witnessed the same problem on linux-riscv64 platform. So I prepared changes for this platform by referencing the aarch64 port. [14375-riscv-v4.diff.txt](https://github.com/openjdk/jdk/files/12074890/14375-riscv-v4.diff.txt) Tier1-3 tested and this also passed the newly added test by this PR. Could you please add this? Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14375#issuecomment-1639148430 From jsjolen at openjdk.org Tue Jul 18 11:41:28 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 18 Jul 2023 11:41:28 GMT Subject: RFR: 8312200: Fix Parse::catch_call_exceptions memory leak Message-ID: Hi, We used to allocate some `GrowableArray`s onto the `node_arena` in `Parse::catch_call_exceptions`. This leaves the allocated memory until the compilation has finished, potentially increasing the maximum memory usage. I've fixed this by allocating the memory on a temporary `Arena` instead. I also switched the `GrowableArray`s themselves to being stack allocated. Also a few stylistic issues were addressed. ------------- Commit messages: - Fix memory leak Changes: https://git.openjdk.org/jdk/pull/14921/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14921&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8312200 Stats: 20 lines in 1 file changed: 1 ins; 0 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/14921.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14921/head:pull/14921 PR: https://git.openjdk.org/jdk/pull/14921 From jsjolen at openjdk.org Tue Jul 18 11:41:28 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 18 Jul 2023 11:41:28 GMT Subject: RFR: 8312200: Fix Parse::catch_call_exceptions memory leak In-Reply-To: References: Message-ID: On Tue, 18 Jul 2023 11:33:38 GMT, Johan Sj?len wrote: > Hi, > > We used to allocate some `GrowableArray`s onto the `node_arena` in `Parse::catch_call_exceptions`. This leaves the allocated memory until the compilation has finished, potentially increasing the maximum memory usage. I've fixed this by allocating the memory on a temporary `Arena` instead. I also switched the `GrowableArray`s themselves to being stack allocated. > > Also a few stylistic issues were addressed. src/hotspot/share/opto/doCall.cpp line 802: > 800: ciExceptionHandler* h = handlers.handler(); > 801: int h_bci = h->handler_bci(); > 802: ciInstanceKlass* h_klass = h->is_catch_all() ? env()->Throwable_klass() : h->catch_klass(); Reduced the alignment here by 1, I think that `h_extype` used to be put here, which is what caused an unnecessary extra space. src/hotspot/share/opto/doCall.cpp line 814: > 812: } > 813: } > 814: const Type* h_extype = TypeOopPtr::make_from_klass(h_klass); Removed the alignment here, as it's far away from the previously declared variables in this scope. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14921#discussion_r1266641979 PR Review Comment: https://git.openjdk.org/jdk/pull/14921#discussion_r1266642512 From jsjolen at openjdk.org Tue Jul 18 13:14:06 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 18 Jul 2023 13:14:06 GMT Subject: RFR: 8312200: Fix Parse::catch_call_exceptions memory leak In-Reply-To: References: Message-ID: On Tue, 18 Jul 2023 11:33:38 GMT, Johan Sj?len wrote: > Hi, > > We used to allocate some `GrowableArray`s onto the `node_arena` in `Parse::catch_call_exceptions`. This leaves the allocated memory until the compilation has finished, potentially increasing the maximum memory usage. I've fixed this by allocating the memory on a temporary `Arena` instead. I also switched the `GrowableArray`s themselves to being stack allocated. > > Also a few stylistic issues were addressed. Passes tier1. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14921#issuecomment-1640204416 From gcao at openjdk.org Tue Jul 18 13:39:30 2023 From: gcao at openjdk.org (Gui Cao) Date: Tue, 18 Jul 2023 13:39:30 GMT Subject: RFR: 8311923: TestIRMatching.java fails on RISC-V [v3] In-Reply-To: References: Message-ID: > Hi, we are experiencing test failures in test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestIRMatching.java using fastdebug: > > > One or more @IR rules failed: > > Failed IR Rules (1) of Methods (1) > ---------------------------------- > 1) Method "public java.lang.Object[] ir_framework.tests.CheckCastArray.arrayCopy(java.lang.Object[],java.lang.Class)" - [Failed IR rules: 1]: > * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeatureOr={}, applyIfCPUFeature={}, counts={}, failOn={"_#CHECKCAST_ARRAYCOPY#_"}, applyIfAnd={}, applyIfOr={}, applyIfNot={})" > > Phase "PrintOptoAssembly": > - failOn: Graph contains forbidden nodes: > * Constraint 1: "(.*((?i:call_leaf_nofp,runtime)|CALL,\\s?runtime leaf nofp|BCTRL.*.leaf call).*checkcast_arraycopy.*)" > - Matched forbidden node: > * 15a + CALL, runtime leaf nofp 0x0000003f7fbd9600 #@CallLeafNoFPDirect checkcast_arraycopy > >>>> Check stdout for compilation output of the failed methods > > Through the description of the problem in the JBS issue, to fix this, we modified the matching rules in test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java and added the mv directive to the matching rules. > > ## Testing: > qemu system and unmatched board: > - [x] test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestIRMatching.java (fastdebug) Gui Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' into JDK-8311923 - 8311923: TestIRMatching.java fails on RISC-V ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14848/files - new: https://git.openjdk.org/jdk/pull/14848/files/f41ed12e..56ebf190 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14848&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14848&range=01-02 Stats: 5647 lines in 156 files changed: 4583 ins; 541 del; 523 mod Patch: https://git.openjdk.org/jdk/pull/14848.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14848/head:pull/14848 PR: https://git.openjdk.org/jdk/pull/14848 From rrich at openjdk.org Tue Jul 18 16:04:18 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 18 Jul 2023 16:04:18 GMT Subject: RFR: 8308340: C2: Idealize Fma nodes [v4] In-Reply-To: <2mha0SwQTlnWaTLdEMX6RQ1KqhasbfWKz503bmkxKhw=.eb3a1229-b222-4d02-b2f6-35c0e50e5766@github.com> References: <2mha0SwQTlnWaTLdEMX6RQ1KqhasbfWKz503bmkxKhw=.eb3a1229-b222-4d02-b2f6-35c0e50e5766@github.com> Message-ID: On Tue, 18 Jul 2023 01:34:48 GMT, Fei Gao wrote: >> src/hotspot/share/opto/vectornode.cpp line 1879: >> >>> 1877: >>> 1878: Node* FmaVNode::Ideal(PhaseGVN* phase, bool can_reshape) { >>> 1879: // We canonicalize the node by converting "(-a)*b+c" into "b*(-a)+c" >> >> Could you explain a little bit more please? > > Thanks for your review! > > For vectorapi masked operations, like `av.neg().lanewise(VectorOperators.FMA, bv, cv, mask)`, the inactive lanes of the output should save the first input of the node, so the inactive lanes of the output should be equal to lane values in `av.neg()`. If we exchange the inputs, the inactive lanes will be equal to `bv`, which is incorrect. So we shouldn't swap edges for masked nodes. The newly added testcases in ` jdk/test/hotspot/jtreg/compiler/vectorapi/VectorFusedMultiplyAddSubTest.java` can cover this. Fortunately, there is no such constraint for non-masked vector nodes. Thanks for the explanation. I think I understood it to some degree. What happens with the subgraphs that are not canonicalized? They will have extra vector operations, right? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14576#discussion_r1266990548 From kvn at openjdk.org Tue Jul 18 16:09:13 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 18 Jul 2023 16:09:13 GMT Subject: RFR: 8312200: Fix Parse::catch_call_exceptions memory leak In-Reply-To: References: Message-ID: On Tue, 18 Jul 2023 11:33:38 GMT, Johan Sj?len wrote: > Hi, > > We used to allocate some `GrowableArray`s onto the `node_arena` in `Parse::catch_call_exceptions`. This leaves the allocated memory until the compilation has finished, potentially increasing the maximum memory usage. I've fixed this by allocating the memory on a temporary `Arena` instead. I also switched the `GrowableArray`s themselves to being stack allocated. > > Also a few stylistic issues were addressed. Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14921#pullrequestreview-1535407217 From gcao at openjdk.org Wed Jul 19 02:18:14 2023 From: gcao at openjdk.org (Gui Cao) Date: Wed, 19 Jul 2023 02:18:14 GMT Subject: RFR: 8311923: TestIRMatching.java fails on RISC-V [v4] In-Reply-To: References: Message-ID: > Hi, we are experiencing test failures in test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestIRMatching.java using fastdebug: > > > One or more @IR rules failed: > > Failed IR Rules (1) of Methods (1) > ---------------------------------- > 1) Method "public java.lang.Object[] ir_framework.tests.CheckCastArray.arrayCopy(java.lang.Object[],java.lang.Class)" - [Failed IR rules: 1]: > * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeatureOr={}, applyIfCPUFeature={}, counts={}, failOn={"_#CHECKCAST_ARRAYCOPY#_"}, applyIfAnd={}, applyIfOr={}, applyIfNot={})" > > Phase "PrintOptoAssembly": > - failOn: Graph contains forbidden nodes: > * Constraint 1: "(.*((?i:call_leaf_nofp,runtime)|CALL,\\s?runtime leaf nofp|BCTRL.*.leaf call).*checkcast_arraycopy.*)" > - Matched forbidden node: > * 15a + CALL, runtime leaf nofp 0x0000003f7fbd9600 #@CallLeafNoFPDirect checkcast_arraycopy > >>>> Check stdout for compilation output of the failed methods > > Through the description of the problem in the JBS issue, to fix this, we modified the matching rules in test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java and added the mv directive to the matching rules. > > ## Testing: > qemu system and unmatched board: > - [x] test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestIRMatching.java (fastdebug) Gui Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge remote-tracking branch 'upstream/master' into JDK-8311923 - Merge branch 'master' into JDK-8311923 - 8311923: TestIRMatching.java fails on RISC-V ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14848/files - new: https://git.openjdk.org/jdk/pull/14848/files/56ebf190..da48fe4d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14848&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14848&range=02-03 Stats: 1160 lines in 66 files changed: 806 ins; 117 del; 237 mod Patch: https://git.openjdk.org/jdk/pull/14848.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14848/head:pull/14848 PR: https://git.openjdk.org/jdk/pull/14848 From fgao at openjdk.org Wed Jul 19 02:24:44 2023 From: fgao at openjdk.org (Fei Gao) Date: Wed, 19 Jul 2023 02:24:44 GMT Subject: RFR: 8308340: C2: Idealize Fma nodes [v4] In-Reply-To: References: <2mha0SwQTlnWaTLdEMX6RQ1KqhasbfWKz503bmkxKhw=.eb3a1229-b222-4d02-b2f6-35c0e50e5766@github.com> Message-ID: On Tue, 18 Jul 2023 15:58:01 GMT, Richard Reingruber wrote: >> Thanks for your review! >> >> For vectorapi masked operations, like `av.neg().lanewise(VectorOperators.FMA, bv, cv, mask)`, the inactive lanes of the output should save the first input of the node, so the inactive lanes of the output should be equal to lane values in `av.neg()`. If we exchange the inputs, the inactive lanes will be equal to `bv`, which is incorrect. So we shouldn't swap edges for masked nodes. The newly added testcases in ` jdk/test/hotspot/jtreg/compiler/vectorapi/VectorFusedMultiplyAddSubTest.java` can cover this. Fortunately, there is no such constraint for non-masked vector nodes. > > Thanks for the explanation. I think I understood it to some degree. > What happens with the subgraphs that are not canonicalized? They will have extra vector operations, right? Yes. For `av.neg().lanewise(VectorOperators.FMA, bv, cv, mask)`, the subgraph is like: `match (Set dst (FmaV (Binary (NegV src1) src2) (Binary src3 pg)));`, almost all platforms don't support fuse it directly, so it should be split into two vector operations: `NegV` + `FmaV`. I suppose the `NegV` is what you called as "the extra vector operation", right? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14576#discussion_r1267462815 From cslucas at openjdk.org Wed Jul 19 04:32:44 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 19 Jul 2023 04:32:44 GMT Subject: RFR: 8309893: Integrate ReplicateB/S/I/L/F/D nodes to Replicate node In-Reply-To: <4zQZ1W7GpPyOY0TGusvqNKUoCORK1WUEwSxRnWC4JVE=.127f84f6-a406-43d2-98e7-52b4fa0b5f3d@github.com> References: <4zQZ1W7GpPyOY0TGusvqNKUoCORK1WUEwSxRnWC4JVE=.127f84f6-a406-43d2-98e7-52b4fa0b5f3d@github.com> Message-ID: On Tue, 11 Jul 2023 14:10:25 GMT, Eric Liu wrote: > This patch creates ReplicateNode to replace ReplicateB/S/I/L/F/DNode, like other vector nodes introduced recently, e.g., PopulateIndexNode and ReverseVNode, etc. This refers from: > https://mail.openjdk.org/pipermail/panama-dev/2020-April/008484.html > > After merging these nodes, code will be easier to maintain. E.g., matching rules can be simplified. > > Besides AArch64, this patch tries to keep other ad files as the same before, only supplies some necessary predicate. E.g., for matching rules using ReplicateB before, they are now matching Replicate with a new predicate "Matcher::vector_element_basic_type(n) == T_BYTE". This would be easy for review and lower risks. > > [TEST] > x86: Tested with option "-XX:UseAVX=0/1/2/3". > AArch64: Tested on SVE machine and Neon machine. > > Full jtreg passed without new issue. test/hotspot/jtreg/compiler/vectorization/runner/ArrayInvariantFillTest.java line 53: > 51: private static final int SIZE = 543; > 52: > 53: private boolean booleanInv; Looks like this field is not used anywhere here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14830#discussion_r1267522820 From thartmann at openjdk.org Wed Jul 19 05:32:41 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 19 Jul 2023 05:32:41 GMT Subject: RFR: 8312200: Fix Parse::catch_call_exceptions memory leak In-Reply-To: References: Message-ID: <4P5Jg9DHJneGV_cBmatRFTvAO8UlowvNAOAFnb1YRhM=.893b4e9a-d47c-4b00-a83e-29d485d3b6df@github.com> On Tue, 18 Jul 2023 11:33:38 GMT, Johan Sj?len wrote: > Hi, > > We used to allocate some `GrowableArray`s onto the `node_arena` in `Parse::catch_call_exceptions`. This leaves the allocated memory until the compilation has finished, potentially increasing the maximum memory usage. I've fixed this by allocating the memory on a temporary `Arena` instead. I also switched the `GrowableArray`s themselves to being stack allocated. > > Also a few stylistic issues were addressed. Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14921#pullrequestreview-1536274771 From thartmann at openjdk.org Wed Jul 19 05:44:45 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 19 Jul 2023 05:44:45 GMT Subject: RFR: 8311557: [JVMCI] deadlock with JVMTI thread suspension In-Reply-To: References: Message-ID: <27vQ3Kz7luCnGXMnNJv_ezhr4sPY-oCLDXsI56T4f4g=.d2884744-540f-4569-bd13-d381b6518d20@github.com> On Fri, 7 Jul 2023 06:13:21 GMT, Tom Rodriguez wrote: > Java based JVMCI compiler threads are more like normal Java threads so they aren't `hidden_from_external_view` like the native compilers. This can leak to deadlocks if you use JVMTI to suspend all threads since this will block the compiler queue and can block execution if background compilation is disabled. It's reasonable to treat libgraal threads like native threads in this regard. Making jargraal threads hidden too would interfere with using profiling and debugging tool on them so I've left that alone but it might be worth changing the JVMTI suspend and resume functions to explicitly skip compiler threads as well. That looks reasonable to me. > Making jargraal threads hidden too would interfere with using profiling and debugging tool on them so I've left that alone but it might be worth changing the JVMTI suspend and resume functions to explicitly skip compiler threads as well. Please file a follow-up issue for that. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14799#pullrequestreview-1536293628 From gcao at openjdk.org Wed Jul 19 05:48:57 2023 From: gcao at openjdk.org (Gui Cao) Date: Wed, 19 Jul 2023 05:48:57 GMT Subject: RFR: 8311923: TestIRMatching.java fails on RISC-V [v5] In-Reply-To: References: Message-ID: <3WfJ6dMgFAqfrcdmHlUPOr3ZdZY32qTsQnyA5TetFWo=.f3e79099-ba21-4cda-8165-5aa25766d38f@github.com> > Hi, we are experiencing test failures in test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestIRMatching.java using fastdebug: > > > One or more @IR rules failed: > > Failed IR Rules (1) of Methods (1) > ---------------------------------- > 1) Method "public java.lang.Object[] ir_framework.tests.CheckCastArray.arrayCopy(java.lang.Object[],java.lang.Class)" - [Failed IR rules: 1]: > * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeatureOr={}, applyIfCPUFeature={}, counts={}, failOn={"_#CHECKCAST_ARRAYCOPY#_"}, applyIfAnd={}, applyIfOr={}, applyIfNot={})" > > Phase "PrintOptoAssembly": > - failOn: Graph contains forbidden nodes: > * Constraint 1: "(.*((?i:call_leaf_nofp,runtime)|CALL,\\s?runtime leaf nofp|BCTRL.*.leaf call).*checkcast_arraycopy.*)" > - Matched forbidden node: > * 15a + CALL, runtime leaf nofp 0x0000003f7fbd9600 #@CallLeafNoFPDirect checkcast_arraycopy > >>>> Check stdout for compilation output of the failed methods > > Through the description of the problem in the JBS issue, to fix this, we modified the matching rules in test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java and added the mv directive to the matching rules. > > ## Testing: > qemu system and unmatched board: > - [x] test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestIRMatching.java (fastdebug) Gui Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' into JDK-8311923 - Merge remote-tracking branch 'upstream/master' into JDK-8311923 - Merge branch 'master' into JDK-8311923 - 8311923: TestIRMatching.java fails on RISC-V ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14848/files - new: https://git.openjdk.org/jdk/pull/14848/files/da48fe4d..bb46f77e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14848&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14848&range=03-04 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14848.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14848/head:pull/14848 PR: https://git.openjdk.org/jdk/pull/14848 From thartmann at openjdk.org Wed Jul 19 06:44:47 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 19 Jul 2023 06:44:47 GMT Subject: RFR: 8305636: Expand and clean up predicate classes and move them into separate files In-Reply-To: References: Message-ID: On Mon, 10 Jul 2023 15:17:37 GMT, Christian Hagedorn wrote: > This is the third clean-up PR towards fixing issues with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). This patch does not change anything in the way the old Assertion Predicates work. > > After collecting and moving the predicate code in the last clean-up PR https://github.com/openjdk/jdk/pull/14017 to the classes `Predicates/ParsePredicates`, I'm now completely moving the code to separate `predicates.cpp/hpp` files. By doing so, I also updated the predicate description and updated some namings. Since this description is also moved to the new files, I've committed the description update separately to better reflect these changes. > > Changes include: > - Moved `Predicates/ParsePredicates` classes to new files `predicates.cpp/hpp`. > - Turning the `Predicates` utility class into a real class to represent all predicates: > - Contains three `PredicateBlock` fields for each Predicate Block (see description of `Predicate Block`). > - The `PredicateBlock` class offers methods to query the presence of predicates and to access them (e.g. get the Parse Predicate projection). > - In the process, the `ParsePredicates` could be removed as the Parse Predicates are now covered by the `PredicateBlock` class. > - New `AssertionPredicatesWithHalt` class to skip over Assertion Predicates (will be further cleaned up later with the complete fix in JDK-8288981). > - Updated predicate description and moved to `predicates.hpp`. > - While testing a prototype fix of JDK-8288981, I've came to the conclusion that we should not move all Assertion Predicates to a separate block below the Parse and Hoisted Predicates, because it prevented further application of Loop Predication due to pins of data nodes to these Assertion Predicates while the Hoisted Predicates needed them above the Assertion Predicates (i.e. dominance problems leading to bad graph assertions). I've removed that part of the description that gave a heads-up about that change. > - Small clean-ups such as variable renaming or code move. > > Not included: > - Refactoring predicate traversal to clone/copy/initialize predicates for loop unswitching, pre/main/post, loop peeling etc. (this is only done in the actual fix in JDK-8288981 which requires some updates anyways - so this refactoring is not done here (yet)). > > Testing: Tier1-7, hs-precheckin-comp, hs-comp-stress > > Thanks, > Christian Nice refactoring. Looks good to me otherwise! src/hotspot/share/opto/predicates.hpp line 52: > 50: * "a[i*scale + offset]", where scale and offset are loop-invariant, out of > 51: * a counted loop. Each Hoisted Range Check Predicate is accompanied by > 52: * additional Assertion Predicates. As we discussed offline, I would suggest to remove this part. src/hotspot/share/opto/predicates.hpp line 54: > 52: * additional Assertion Predicates. > 53: * - Loop Predicate: This Hoisted Predicate is created to hoist a loop-invariant check a range check of the > 54: * form "a[i*scale + offset]", where scale and offset are loop-invariant, out of a This needs re-phrasing: "a loop-invariant check a range check" src/hotspot/share/opto/predicates.hpp line 72: > 70: * - Assertion Predicate: An always true predicate which will never fail (its range is already covered by an earlier > 71: * Hoisted Predicate or the main-loop entry guard) but is required in order to fold away a dead > 72: * sub loop inside which some data could be proven to be dead (by the type system) and replaced This needs re-phrasing: "a dead sub loop inside which some data could be" src/hotspot/share/opto/predicates.hpp line 136: > 134: * other iterations of the main-loop in-between by implication. > 135: * Note that Range Check Elimination could remove additional range > 136: * checks which we were not possible to remove with Loop Predication Suggestion: * checks which were not possible to remove with Loop Predication src/hotspot/share/opto/predicates.hpp line 211: > 209: AssertionPredicatesWithHalt(Node* assertion_predicate_proj) : _entry(find_entry(assertion_predicate_proj)) {} > 210: > 211: // Returns the control input node into the first assertion predicate If. If there are no assertion predicates, it. Suggestion: // Returns the control input node into the first assertion predicate If. If there are no assertion predicates, it ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14814#pullrequestreview-1536307400 PR Review Comment: https://git.openjdk.org/jdk/pull/14814#discussion_r1267597982 PR Review Comment: https://git.openjdk.org/jdk/pull/14814#discussion_r1267573771 PR Review Comment: https://git.openjdk.org/jdk/pull/14814#discussion_r1267578402 PR Review Comment: https://git.openjdk.org/jdk/pull/14814#discussion_r1267602419 PR Review Comment: https://git.openjdk.org/jdk/pull/14814#discussion_r1267605184 From chagedorn at openjdk.org Wed Jul 19 07:12:05 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 19 Jul 2023 07:12:05 GMT Subject: RFR: 8305636: Expand and clean up predicate classes and move them into separate files [v2] In-Reply-To: References: Message-ID: > This is the third clean-up PR towards fixing issues with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). This patch does not change anything in the way the old Assertion Predicates work. > > After collecting and moving the predicate code in the last clean-up PR https://github.com/openjdk/jdk/pull/14017 to the classes `Predicates/ParsePredicates`, I'm now completely moving the code to separate `predicates.cpp/hpp` files. By doing so, I also updated the predicate description and updated some namings. Since this description is also moved to the new files, I've committed the description update separately to better reflect these changes. > > Changes include: > - Moved `Predicates/ParsePredicates` classes to new files `predicates.cpp/hpp`. > - Turning the `Predicates` utility class into a real class to represent all predicates: > - Contains three `PredicateBlock` fields for each Predicate Block (see description of `Predicate Block`). > - The `PredicateBlock` class offers methods to query the presence of predicates and to access them (e.g. get the Parse Predicate projection). > - In the process, the `ParsePredicates` could be removed as the Parse Predicates are now covered by the `PredicateBlock` class. > - New `AssertionPredicatesWithHalt` class to skip over Assertion Predicates (will be further cleaned up later with the complete fix in JDK-8288981). > - Updated predicate description and moved to `predicates.hpp`. > - While testing a prototype fix of JDK-8288981, I've came to the conclusion that we should not move all Assertion Predicates to a separate block below the Parse and Hoisted Predicates, because it prevented further application of Loop Predication due to pins of data nodes to these Assertion Predicates while the Hoisted Predicates needed them above the Assertion Predicates (i.e. dominance problems leading to bad graph assertions). I've removed that part of the description that gave a heads-up about that change. > - Small clean-ups such as variable renaming or code move. > > Not included: > - Refactoring predicate traversal to clone/copy/initialize predicates for loop unswitching, pre/main/post, loop peeling etc. (this is only done in the actual fix in JDK-8288981 which requires some updates anyways - so this refactoring is not done here (yet)). > > Testing: Tier1-7, hs-precheckin-comp, hs-comp-stress > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with two additional commits since the last revision: - Update src/hotspot/share/opto/predicates.hpp Co-authored-by: Tobias Hartmann - Renaming Hoisted Predicate -> Hoisted Check Predicate in description and comments as discussed offline with Tobias, fixing additional typos in description ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14814/files - new: https://git.openjdk.org/jdk/pull/14814/files/4b5f0a3e..020cff3c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14814&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14814&range=00-01 Stats: 54 lines in 2 files changed: 2 ins; 4 del; 48 mod Patch: https://git.openjdk.org/jdk/pull/14814.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14814/head:pull/14814 PR: https://git.openjdk.org/jdk/pull/14814 From chagedorn at openjdk.org Wed Jul 19 07:12:05 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 19 Jul 2023 07:12:05 GMT Subject: RFR: 8305636: Expand and clean up predicate classes and move them into separate files In-Reply-To: References: Message-ID: On Mon, 10 Jul 2023 15:17:37 GMT, Christian Hagedorn wrote: > This is the third clean-up PR towards fixing issues with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). This patch does not change anything in the way the old Assertion Predicates work. > > After collecting and moving the predicate code in the last clean-up PR https://github.com/openjdk/jdk/pull/14017 to the classes `Predicates/ParsePredicates`, I'm now completely moving the code to separate `predicates.cpp/hpp` files. By doing so, I also updated the predicate description and updated some namings. Since this description is also moved to the new files, I've committed the description update separately to better reflect these changes. > > Changes include: > - Moved `Predicates/ParsePredicates` classes to new files `predicates.cpp/hpp`. > - Turning the `Predicates` utility class into a real class to represent all predicates: > - Contains three `PredicateBlock` fields for each Predicate Block (see description of `Predicate Block`). > - The `PredicateBlock` class offers methods to query the presence of predicates and to access them (e.g. get the Parse Predicate projection). > - In the process, the `ParsePredicates` could be removed as the Parse Predicates are now covered by the `PredicateBlock` class. > - New `AssertionPredicatesWithHalt` class to skip over Assertion Predicates (will be further cleaned up later with the complete fix in JDK-8288981). > - Updated predicate description and moved to `predicates.hpp`. > - While testing a prototype fix of JDK-8288981, I've came to the conclusion that we should not move all Assertion Predicates to a separate block below the Parse and Hoisted Predicates, because it prevented further application of Loop Predication due to pins of data nodes to these Assertion Predicates while the Hoisted Predicates needed them above the Assertion Predicates (i.e. dominance problems leading to bad graph assertions). I've removed that part of the description that gave a heads-up about that change. > - Small clean-ups such as variable renaming or code move. > > Not included: > - Refactoring predicate traversal to clone/copy/initialize predicates for loop unswitching, pre/main/post, loop peeling etc. (this is only done in the actual fix in JDK-8288981 which requires some updates anyways - so this refactoring is not done here (yet)). > > Testing: Tier1-7, hs-precheckin-comp, hs-comp-stress > > Thanks, > Christian Thanks a lot Tobias for your careful review! I've pushed the discussed changes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14814#issuecomment-1641533619 From roland at openjdk.org Wed Jul 19 07:27:15 2023 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 19 Jul 2023 07:27:15 GMT Subject: RFR: 8308103: Massive (up to ~30x) increase in C2 compilation time since JDK 17 [v2] In-Reply-To: References: Message-ID: > A long chain of nodes are sunk out of a loop. Every time a node is > moved out of the loop, a cast is created to pin the node out of the > loop. When its input is next sunk, the cast is removed (the cast is > replaced by its input) and a new cast is created. Some nodes on the > chain have 2 other nodes in the chain as uses. When such a node is > sunk, 2 cast nodes are created, one for each use. So as the compiler > moves forward in the chain, the number of cast to remove grows. From > some profiling, removing those casts is what takes a lot of time. > > The fix I propose is, when a node is processed, to check whether a > cast at the out of loop control was already created for that node and > to reuse it. > > The test case takes 6 minutes when I run it without the fix and 3 > seconds with it. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - review - Merge branch 'master' into JDK-8308103 - fix & test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14732/files - new: https://git.openjdk.org/jdk/pull/14732/files/cdb24902..c3ba2887 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14732&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14732&range=00-01 Stats: 63051 lines in 970 files changed: 11697 ins; 46140 del; 5214 mod Patch: https://git.openjdk.org/jdk/pull/14732.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14732/head:pull/14732 PR: https://git.openjdk.org/jdk/pull/14732 From roland at openjdk.org Wed Jul 19 07:27:15 2023 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 19 Jul 2023 07:27:15 GMT Subject: RFR: 8308103: Massive (up to ~30x) increase in C2 compilation time since JDK 17 [v2] In-Reply-To: <8BHYhGbLecJ2CGb5QSI1L-FJ8Ju74GBwXk39tD9f3as=.d7ffb74c-0269-420c-b7e2-e713ba3b92ca@github.com> References: <8BHYhGbLecJ2CGb5QSI1L-FJ8Ju74GBwXk39tD9f3as=.d7ffb74c-0269-420c-b7e2-e713ba3b92ca@github.com> Message-ID: On Mon, 3 Jul 2023 06:05:59 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - review >> - Merge branch 'master' into JDK-8308103 >> - fix & test > > test/hotspot/jtreg/compiler/loopopts/TestSinkingNodesCausesLongCompilation.java line 58: > >> 56: public static void main(String[] strArr) { >> 57: TestSinkingNodesCausesLongCompilation _instance = new TestSinkingNodesCausesLongCompilation(); >> 58: for (int i = 0; i < 10; i++ ) { > > Suggestion: > > for (int i = 0; i < 10; i++) { Thanks for the comments. I updated the change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14732#discussion_r1267650043 From chagedorn at openjdk.org Wed Jul 19 07:33:45 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 19 Jul 2023 07:33:45 GMT Subject: RFR: 8308103: Massive (up to ~30x) increase in C2 compilation time since JDK 17 [v2] In-Reply-To: References: Message-ID: On Wed, 19 Jul 2023 07:27:15 GMT, Roland Westrelin wrote: >> A long chain of nodes are sunk out of a loop. Every time a node is >> moved out of the loop, a cast is created to pin the node out of the >> loop. When its input is next sunk, the cast is removed (the cast is >> replaced by its input) and a new cast is created. Some nodes on the >> chain have 2 other nodes in the chain as uses. When such a node is >> sunk, 2 cast nodes are created, one for each use. So as the compiler >> moves forward in the chain, the number of cast to remove grows. From >> some profiling, removing those casts is what takes a lot of time. >> >> The fix I propose is, when a node is processed, to check whether a >> cast at the out of loop control was already created for that node and >> to reuse it. >> >> The test case takes 6 minutes when I run it without the fix and 3 >> seconds with it. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - review > - Merge branch 'master' into JDK-8308103 > - fix & test Looks good to me, too! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14732#pullrequestreview-1536435561 From roland at openjdk.org Wed Jul 19 07:33:47 2023 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 19 Jul 2023 07:33:47 GMT Subject: RFR: 8308103: Massive (up to ~30x) increase in C2 compilation time since JDK 17 [v2] In-Reply-To: References: Message-ID: On Tue, 11 Jul 2023 16:05:35 GMT, Vladimir Kozlov wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - review >> - Merge branch 'master' into JDK-8308103 >> - fix & test > > Good. @vnkozlov @TobiHartmann @chhagedorn thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14732#issuecomment-1641565398 From roland at openjdk.org Wed Jul 19 07:44:44 2023 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 19 Jul 2023 07:44:44 GMT Subject: RFR: 8308869: C2: use profile data in subtype checks when profile has more than one class [v8] In-Reply-To: References: <0t_NDTIV7WMVq09RidCDQ3hh0sW3FEDGhwne8_7GD_E=.adf6a3df-2782-4975-b59a-85bd0b89d9d4@github.com> Message-ID: On Thu, 13 Jul 2023 16:24:09 GMT, Vladimir Ivanov wrote: >> Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: >> >> - never common SubTypeCheckNode nodes >> - keep both ways of doing profile > > src/hotspot/share/opto/c2_globals.hpp line 775: > >> 773: "Verify receiver types at runtime") \ >> 774: \ >> 775: product(intx, TypeProfileSubTypeCheckCommonThreshold, 50, \ > > Is it better to declare it diagnostic? I don't have a strong opinion but the similar `TypeProfileMajorReceiverPercent` is product. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14375#discussion_r1267671141 From thartmann at openjdk.org Wed Jul 19 09:04:54 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 19 Jul 2023 09:04:54 GMT Subject: RFR: 8305636: Expand and clean up predicate classes and move them into separate files [v2] In-Reply-To: References: Message-ID: On Wed, 19 Jul 2023 07:12:05 GMT, Christian Hagedorn wrote: >> This is the third clean-up PR towards fixing issues with Assertion Predicates ([JDK-8288981](https://bugs.openjdk.org/browse/JDK-8288981)). This patch does not change anything in the way the old Assertion Predicates work. >> >> After collecting and moving the predicate code in the last clean-up PR https://github.com/openjdk/jdk/pull/14017 to the classes `Predicates/ParsePredicates`, I'm now completely moving the code to separate `predicates.cpp/hpp` files. By doing so, I also updated the predicate description and updated some namings. Since this description is also moved to the new files, I've committed the description update separately to better reflect these changes. >> >> Changes include: >> - Moved `Predicates/ParsePredicates` classes to new files `predicates.cpp/hpp`. >> - Turning the `Predicates` utility class into a real class to represent all predicates: >> - Contains three `PredicateBlock` fields for each Predicate Block (see description of `Predicate Block`). >> - The `PredicateBlock` class offers methods to query the presence of predicates and to access them (e.g. get the Parse Predicate projection). >> - In the process, the `ParsePredicates` could be removed as the Parse Predicates are now covered by the `PredicateBlock` class. >> - New `AssertionPredicatesWithHalt` class to skip over Assertion Predicates (will be further cleaned up later with the complete fix in JDK-8288981). >> - Updated predicate description and moved to `predicates.hpp`. >> - While testing a prototype fix of JDK-8288981, I've came to the conclusion that we should not move all Assertion Predicates to a separate block below the Parse and Hoisted Predicates, because it prevented further application of Loop Predication due to pins of data nodes to these Assertion Predicates while the Hoisted Predicates needed them above the Assertion Predicates (i.e. dominance problems leading to bad graph assertions). I've removed that part of the description that gave a heads-up about that change. >> - Small clean-ups such as variable renaming or code move. >> >> Not included: >> - Refactoring predicate traversal to clone/copy/initialize predicates for loop unswitching, pre/main/post, loop peeling etc. (this is only done in the actual fix in JDK-8288981 which requires some updates anyways - so this refactoring is not done here (yet)). >> >> Testing: Tier1-7, hs-precheckin-comp, hs-comp-stress >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with two additional commits since the last revision: > > - Update src/hotspot/share/opto/predicates.hpp > > Co-authored-by: Tobias Hartmann > - Renaming Hoisted Predicate -> Hoisted Check Predicate in description and comments as discussed offline with Tobias, fixing additional typos in description Looks good, thanks for updating. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14814#pullrequestreview-1536622791 From jsjolen at openjdk.org Wed Jul 19 09:06:49 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 19 Jul 2023 09:06:49 GMT Subject: RFR: 8312200: Fix Parse::catch_call_exceptions memory leak In-Reply-To: References: Message-ID: <27HHYKhzou_AWmy4-j9gnMg8b-9UFN1WOa8eJna89w0=.61e02ff1-8e00-4dae-a12e-5ea120b91313@github.com> On Tue, 18 Jul 2023 11:33:38 GMT, Johan Sj?len wrote: > Hi, > > We used to allocate some `GrowableArray`s onto the `node_arena` in `Parse::catch_call_exceptions`. This leaves the allocated memory until the compilation has finished, potentially increasing the maximum memory usage. I've fixed this by allocating the memory on a temporary `Arena` instead. I also switched the `GrowableArray`s themselves to being stack allocated. > > Also a few stylistic issues were addressed. Thank you. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14921#issuecomment-1641706410 From jsjolen at openjdk.org Wed Jul 19 09:06:50 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Wed, 19 Jul 2023 09:06:50 GMT Subject: Integrated: 8312200: Fix Parse::catch_call_exceptions memory leak In-Reply-To: References: Message-ID: On Tue, 18 Jul 2023 11:33:38 GMT, Johan Sj?len wrote: > Hi, > > We used to allocate some `GrowableArray`s onto the `node_arena` in `Parse::catch_call_exceptions`. This leaves the allocated memory until the compilation has finished, potentially increasing the maximum memory usage. I've fixed this by allocating the memory on a temporary `Arena` instead. I also switched the `GrowableArray`s themselves to being stack allocated. > > Also a few stylistic issues were addressed. This pull request has now been integrated. Changeset: d33e8e6f Author: Johan Sj?len URL: https://git.openjdk.org/jdk/commit/d33e8e6f93d7b0806e1d0087c3c0a11fe1bc8e21 Stats: 20 lines in 1 file changed: 1 ins; 0 del; 19 mod 8312200: Fix Parse::catch_call_exceptions memory leak Reviewed-by: kvn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/14921 From gcao at openjdk.org Wed Jul 19 09:22:08 2023 From: gcao at openjdk.org (Gui Cao) Date: Wed, 19 Jul 2023 09:22:08 GMT Subject: RFR: 8311923: TestIRMatching.java fails on RISC-V [v6] In-Reply-To: References: Message-ID: <_SzmOHC7KTtbDPu9J__J9svNDl9VL-CNY6gi1vkyzT4=.33310e90-88de-4723-904e-fabc5ef8d91e@github.com> > Hi, we are experiencing test failures in test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestIRMatching.java using fastdebug: > > > One or more @IR rules failed: > > Failed IR Rules (1) of Methods (1) > ---------------------------------- > 1) Method "public java.lang.Object[] ir_framework.tests.CheckCastArray.arrayCopy(java.lang.Object[],java.lang.Class)" - [Failed IR rules: 1]: > * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeatureOr={}, applyIfCPUFeature={}, counts={}, failOn={"_#CHECKCAST_ARRAYCOPY#_"}, applyIfAnd={}, applyIfOr={}, applyIfNot={})" > > Phase "PrintOptoAssembly": > - failOn: Graph contains forbidden nodes: > * Constraint 1: "(.*((?i:call_leaf_nofp,runtime)|CALL,\\s?runtime leaf nofp|BCTRL.*.leaf call).*checkcast_arraycopy.*)" > - Matched forbidden node: > * 15a + CALL, runtime leaf nofp 0x0000003f7fbd9600 #@CallLeafNoFPDirect checkcast_arraycopy > >>>> Check stdout for compilation output of the failed methods > > Through the description of the problem in the JBS issue, to fix this, we modified the matching rules in test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java and added the mv directive to the matching rules. > > ## Testing: > qemu system and unmatched board: > - [x] test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestIRMatching.java (fastdebug) Gui Cao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge remote-tracking branch 'upstream/master' into JDK-8311923 - Merge branch 'master' into JDK-8311923 - Merge remote-tracking branch 'upstream/master' into JDK-8311923 - Merge branch 'master' into JDK-8311923 - 8311923: TestIRMatching.java fails on RISC-V ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14848/files - new: https://git.openjdk.org/jdk/pull/14848/files/bb46f77e..0c88b1dd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14848&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14848&range=04-05 Stats: 24 lines in 3 files changed: 1 ins; 0 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/14848.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14848/head:pull/14848 PR: https://git.openjdk.org/jdk/pull/14848 From rrich at openjdk.org Wed Jul 19 10:33:46 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 19 Jul 2023 10:33:46 GMT Subject: RFR: 8308340: C2: Idealize Fma nodes [v4] In-Reply-To: References: <2mha0SwQTlnWaTLdEMX6RQ1KqhasbfWKz503bmkxKhw=.eb3a1229-b222-4d02-b2f6-35c0e50e5766@github.com> Message-ID: On Wed, 19 Jul 2023 02:22:14 GMT, Fei Gao wrote: >> Thanks for the explanation. I think I understood it to some degree. >> What happens with the subgraphs that are not canonicalized? They will have extra vector operations, right? > > Yes. For `av.neg().lanewise(VectorOperators.FMA, bv, cv, mask)`, the subgraph is like: > `match (Set dst (FmaV (Binary (NegV src1) src2) (Binary src3 pg)));`, almost no platform supports fusing it directly, so it should be split into two vector operations: `NegV` + `FmaV`. I suppose the `NegV` is what you called as "the extra vector operation", right? Yes that's what I meant. Thanks. Now on PPC, my understanding would be that with the symmetrical match-rules (removed with this pr) the `NegV` wouldn't be generated. Is my understanding correct? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14576#discussion_r1267885380 From roland at openjdk.org Wed Jul 19 11:34:56 2023 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 19 Jul 2023 11:34:56 GMT Subject: Integrated: 8308103: Massive (up to ~30x) increase in C2 compilation time since JDK 17 In-Reply-To: References: Message-ID: On Fri, 30 Jun 2023 13:23:38 GMT, Roland Westrelin wrote: > A long chain of nodes are sunk out of a loop. Every time a node is > moved out of the loop, a cast is created to pin the node out of the > loop. When its input is next sunk, the cast is removed (the cast is > replaced by its input) and a new cast is created. Some nodes on the > chain have 2 other nodes in the chain as uses. When such a node is > sunk, 2 cast nodes are created, one for each use. So as the compiler > moves forward in the chain, the number of cast to remove grows. From > some profiling, removing those casts is what takes a lot of time. > > The fix I propose is, when a node is processed, to check whether a > cast at the out of loop control was already created for that node and > to reuse it. > > The test case takes 6 minutes when I run it without the fix and 3 > seconds with it. This pull request has now been integrated. Changeset: c6ab9c29 Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/c6ab9c2905203e1ec897b3404f9179ff975d0054 Stats: 70 lines in 2 files changed: 69 ins; 0 del; 1 mod 8308103: Massive (up to ~30x) increase in C2 compilation time since JDK 17 Reviewed-by: kvn, thartmann, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/14732 From gcao at openjdk.org Wed Jul 19 13:16:53 2023 From: gcao at openjdk.org (Gui Cao) Date: Wed, 19 Jul 2023 13:16:53 GMT Subject: Integrated: 8311923: TestIRMatching.java fails on RISC-V In-Reply-To: References: Message-ID: On Wed, 12 Jul 2023 08:39:11 GMT, Gui Cao wrote: > Hi, we are experiencing test failures in test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestIRMatching.java using fastdebug: > > > One or more @IR rules failed: > > Failed IR Rules (1) of Methods (1) > ---------------------------------- > 1) Method "public java.lang.Object[] ir_framework.tests.CheckCastArray.arrayCopy(java.lang.Object[],java.lang.Class)" - [Failed IR rules: 1]: > * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeatureOr={}, applyIfCPUFeature={}, counts={}, failOn={"_#CHECKCAST_ARRAYCOPY#_"}, applyIfAnd={}, applyIfOr={}, applyIfNot={})" > > Phase "PrintOptoAssembly": > - failOn: Graph contains forbidden nodes: > * Constraint 1: "(.*((?i:call_leaf_nofp,runtime)|CALL,\\s?runtime leaf nofp|BCTRL.*.leaf call).*checkcast_arraycopy.*)" > - Matched forbidden node: > * 15a + CALL, runtime leaf nofp 0x0000003f7fbd9600 #@CallLeafNoFPDirect checkcast_arraycopy > >>>> Check stdout for compilation output of the failed methods > > Through the description of the problem in the JBS issue, to fix this, we modified the matching rules in test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java and added the mv directive to the matching rules. > > ## Testing: > qemu system and unmatched board: > - [x] test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestIRMatching.java (fastdebug) This pull request has now been integrated. Changeset: e7adbdb1 Author: Gui Cao Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/e7adbdb1f1506b82392907f7e4a5c8882d3198eb Stats: 11 lines in 2 files changed: 0 ins; 0 del; 11 mod 8311923: TestIRMatching.java fails on RISC-V Reviewed-by: fyang ------------- PR: https://git.openjdk.org/jdk/pull/14848 From roland at openjdk.org Wed Jul 19 13:36:27 2023 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 19 Jul 2023 13:36:27 GMT Subject: RFR: 8308869: C2: use profile data in subtype checks when profile has more than one class [v9] In-Reply-To: <0t_NDTIV7WMVq09RidCDQ3hh0sW3FEDGhwne8_7GD_E=.adf6a3df-2782-4975-b59a-85bd0b89d9d4@github.com> References: <0t_NDTIV7WMVq09RidCDQ3hh0sW3FEDGhwne8_7GD_E=.adf6a3df-2782-4975-b59a-85bd0b89d9d4@github.com> Message-ID: <-ssaBgw9bGq2MyUaNq_LfEONlBAhkOedksLfu1J0Jbo=.bce452bf-3953-4242-91ba-c7a4baf3bdf4@github.com> > In this simple micro benchmark: > > https://github.com/franz1981/java-puzzles/blob/main/src/main/java/red/hat/puzzles/polymorphism/RequireNonNullCheckcastScalability.java#L70 > > Performance drops sharply with polluted profile: > > > Benchmark (typePollution) Mode Cnt Score Error Units > RequireNonNullCheckcastScalability.isDuplicated1 false thrpt 10 1453.372 ? 24.919 ops/us > > > to: > > > Benchmark (typePollution) Mode Cnt Score Error Units > RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 28.579 ? 2.280 ops/us > > > The test has 2 type checks to 2 different interfaces so caching with > `secondary_super_cache` doesn't help. > > The micro-benchmark only uses 2 different concrete classes > (`DuplicatedContext` and `NonDuplicatedContext`) and they are recorded > in profile data at the type checks. But c2 only take advantage of > profile data at type checks if they report a single class. > > What I propose is that the full blown type check expanded in > `Phase::gen_subtype_check()` takes advantage of profile data. So in > the case of the micro benchmark, before checking the > `secondary_super_cache`, generated code checks whether the object > being type checked is a `DuplicatedContext` or a > `NonDuplicatedContext`. > > This works fairly well on this micro benchmark: > > > Benchmark (typePollution) Mode Cnt Score Error Units > RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 871.224 ? 20.750 ops/us > > > It also scales much better if there are multiple threads running the > same test (`secondary_super_cache` doesn't scale well: see > JDK-8180450). > > Now if the micro-benchmark is changed according to the comment: > > https://github.com/franz1981/java-puzzles/blob/d2d60af3d0dfe7a2567807395138edcb1d1c24f5/src/main/java/red/hat/puzzles/polymorphism/RequireNonNullCheckcastScalability.java#L62 > > so the type check hits in the `secondary_super_cache`, the current > code performs much better: > > > Benchmark (typePollution) Mode Cnt Score Error Units > RequireNonNullCheckcastScalability.isDuplicated1 true thrpt 10 871.224 ? 20.750 ops/us > > > but leveraging profiling as explained above performs even better: > > > Benchmark (typePollution) Mode Cnt Score Error Units > RequireNonNullCheckcastScalability.isDuplic... Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: - riscv support - improvements to test - Merge branch 'master' into JDK-8308869 - never common SubTypeCheckNode nodes - keep both ways of doing profile - whitespace - reworked change - Merge branch 'master' into JDK-8308869 - more test failures - Merge branch 'master' into JDK-8308869 - ... and 6 more: https://git.openjdk.org/jdk/compare/1f1cbe10...8d9a08d1 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14375/files - new: https://git.openjdk.org/jdk/pull/14375/files/4072e7ea..8d9a08d1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14375&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14375&range=07-08 Stats: 63244 lines in 973 files changed: 11759 ins; 46227 del; 5258 mod Patch: https://git.openjdk.org/jdk/pull/14375.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14375/head:pull/14375 PR: https://git.openjdk.org/jdk/pull/14375 From roland at openjdk.org Wed Jul 19 13:36:33 2023 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 19 Jul 2023 13:36:33 GMT Subject: RFR: 8308869: C2: use profile data in subtype checks when profile has more than one class [v8] In-Reply-To: References: <0t_NDTIV7WMVq09RidCDQ3hh0sW3FEDGhwne8_7GD_E=.adf6a3df-2782-4975-b59a-85bd0b89d9d4@github.com> Message-ID: On Fri, 14 Jul 2023 21:34:17 GMT, Vladimir Ivanov wrote: >> Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: >> >> - never common SubTypeCheckNode nodes >> - keep both ways of doing profile > > test/hotspot/jtreg/compiler/c2/irTests/ProfileAtTypeCheck.java line 47: > >> 45: } >> 46: flags.add("-XX:+IgnoreUnrecognizedVMOptions"); >> 47: flags.add("-XX:+UseParallelGC"); > > The test fails when another GC is specified externally. Should be fixed in the new commit. I also improved the test a bit (added one test case with array store checks, duplicated the rules so there's no need to run with -XX:-UseCompressedClassPointers, run tests with only c1 profile collection and only interpreter profile collection to exercise new profile collection code paths independently). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14375#discussion_r1268077722 From roland at openjdk.org Wed Jul 19 13:36:29 2023 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 19 Jul 2023 13:36:29 GMT Subject: RFR: 8308869: C2: use profile data in subtype checks when profile has more than one class [v8] In-Reply-To: References: <0t_NDTIV7WMVq09RidCDQ3hh0sW3FEDGhwne8_7GD_E=.adf6a3df-2782-4975-b59a-85bd0b89d9d4@github.com> Message-ID: On Tue, 18 Jul 2023 01:42:07 GMT, Fei Yang wrote: > Hello, we witnessed the same problem on linux-riscv64 platform. So I prepared changes for this platform by referencing the aarch64 port. [14375-riscv-v4.diff.txt](https://github.com/openjdk/jdk/files/12074890/14375-riscv-v4.diff.txt) Tier1-3 tested and this also passed the newly added test by this PR. Could you please add this? Thanks. Added. Thanks for contributing riscv support. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14375#issuecomment-1642087362 From dnsimon at openjdk.org Wed Jul 19 13:49:54 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 19 Jul 2023 13:49:54 GMT Subject: RFR: 8312235: [JVMCI] need version of ConstantPool.lookupConstant without eager resolution Message-ID: <9zF7nYvZ2ZU7gIquOEdKlAhyyX2AQ3pVmnwKh9Yz4aI=.192df7cb-66aa-43e3-8d3d-58ffa18b8617@github.com> The existing `jdk.vm.ci.meta.ConstantPool.lookupConstant(int cpi)` method forces eager resolving of constants. For `DynamicConstant`, `MethodHandle` and `MethodType`, this can mean invoking bootstrap methods, something that should not be done during JIT compilation. To avoid this, this PR adds the following to `jdk.vm.ci.meta.ConstantPool`: /** * Looks up a constant at the specified index. * * If {@code resolve == false} and the denoted constant is of type * {@code JVM_CONSTANT_Dynamic}, {@code JVM_CONSTANT_MethodHandle} or * {@code JVM_CONSTANT_MethodType} and it's not yet resolved then * {@code null} is returned. * * @param cpi the constant pool index * @return the {@code Constant} or {@code JavaType} instance representing the constant pool * entry */ Object lookupConstant(int cpi, boolean resolve); --------- ### Progress - [ ] Change must be properly reviewed (1 review required, with at least 1 [Reviewer](https://openjdk.org/bylaws#reviewer)) - [x] Change must not contain extraneous whitespace - [x] Commit message must refer to an issue ### Reviewing
Using git Checkout this PR locally: \ `$ git fetch https://git.openjdk.org/jdk.git pull/14927/head:pull/14927` \ `$ git checkout pull/14927` Update a local copy of the PR: \ `$ git checkout pull/14927` \ `$ git pull https://git.openjdk.org/jdk.git pull/14927/head`
Using Skara CLI tools Checkout this PR locally: \ `$ git pr checkout 14927` View PR using the GUI difftool: \ `$ git pr show -t 14927`
Using diff file Download this PR as a diff file: \ https://git.openjdk.org/jdk/pull/14927.diff
------------- Commit messages: - update TestDynamicConstant to test new ConstantPool.lookupConstant method - add ConstantPool.lookupConstant(int cpi, boolean resolve) Changes: https://git.openjdk.org/jdk/pull/14927/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14927&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8312235 Stats: 358 lines in 8 files changed: 195 ins; 152 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/14927.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14927/head:pull/14927 PR: https://git.openjdk.org/jdk/pull/14927 From duke at openjdk.org Wed Jul 19 22:38:52 2023 From: duke at openjdk.org (Joshua Cao) Date: Wed, 19 Jul 2023 22:38:52 GMT Subject: RFR: 8312420: Integrate Graal's blender micro benchmark Message-ID: <5sj7hpmmChUitKVYH-je8xq3AAA_GkjcFXJl6uGnGQc=.59e78f75-12a8-4b7b-8817-627f65149718@github.com> We would like to integrate Graal's blender micro benchmark from https://www.graalvm.org/22.1/examples/java-performance-examples/. We have been using this benchmark to test our partial escape analysis work (https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2023-July/066670.html). This test can exist independently of the project. example command to run test: make run-test TEST=micro:org.openjdk.bench.vm.compiler.pea.Blender MICRO="FORK=1;OPTIONS=-prof gc -gc true" example output (not complete): Benchmark (iteration) Mode Cnt Score Error Units [29/1913] Blender.initialize 1 avgt 227997775.000 ns/op Blender.initialize:?gc.alloc.rate 1 avgt 167.192 MB/sec Blender.initialize:?gc.alloc.rate.norm 1 avgt 40000081.600 B/op Blender.initialize:?gc.count 1 avgt 4.000 counts Blender.initialize:?gc.time 1 avgt 65.000 ms Blender.initialize 2 avgt 226255767.800 ns/op Blender.initialize:?gc.alloc.rate 2 avgt 168.466 MB/sec Blender.initialize:?gc.alloc.rate.norm 2 avgt 40000081.600 B/op Blender.initialize:?gc.count 2 avgt 4.000 counts Blender.initialize:?gc.time 2 avgt 58.000 ms Blender.initialize 3 avgt 225596324.600 ns/op Blender.initialize:?gc.alloc.rate 3 avgt 168.960 MB/sec Blender.initialize:?gc.alloc.rate.norm 3 avgt 40000081.600 B/op Blender.initialize:?gc.count 3 avgt 4.000 counts Blender.initialize:?gc.time 3 avgt 55.000 ms Blender.initialize 4 avgt 224856811.000 ns/op Blender.initialize:?gc.alloc.rate 4 avgt 169.520 MB/sec Blender.initialize:?gc.alloc.rate.norm 4 avgt 40000081.600 B/op Blender.initialize:?gc.count 4 avgt 4.000 counts Blender.initialize:?gc.time 4 avgt 55.000 ms Blender.initialize 5 avgt 225413704.400 ns/op Blender.initialize:?gc.alloc.rate 5 avgt 169.126 MB/sec Blender.initialize:?gc.alloc.rate.norm 5 avgt 40000081.600 B/op Blender.initialize:?gc.count 5 avgt 4.000 counts Blender.initialize:?gc.time 5 avgt 58.000 ms Blender.initialize 6 avgt 224426973.800 ns/op Blender.initialize:?gc.alloc.rate 6 avgt 169.867 MB/sec Blender.initialize:?gc.alloc.rate.norm 6 avgt 40000081.600 B/op Blender.initialize:?gc.count 6 avgt 4.000 counts Blender.initialize:?gc.time 6 avgt 58.000 ms Blender.initialize 7 avgt 225148411.800 ns/op Blender.initialize:?gc.alloc.rate 7 avgt 169.308 MB/sec Blender.initialize:?gc.alloc.rate.norm 7 avgt 40000081.600 B/op Blender.initialize:?gc.count 7 avgt 4.000 counts ------------- Commit messages: - 8312420: Integrate Graal's blender micro benchmark Changes: https://git.openjdk.org/jdk/pull/14941/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14941&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8312420 Stats: 103 lines in 1 file changed: 103 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14941.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14941/head:pull/14941 PR: https://git.openjdk.org/jdk/pull/14941 From fgao at openjdk.org Thu Jul 20 02:39:43 2023 From: fgao at openjdk.org (Fei Gao) Date: Thu, 20 Jul 2023 02:39:43 GMT Subject: RFR: 8308340: C2: Idealize Fma nodes [v4] In-Reply-To: References: <2mha0SwQTlnWaTLdEMX6RQ1KqhasbfWKz503bmkxKhw=.eb3a1229-b222-4d02-b2f6-35c0e50e5766@github.com> Message-ID: <25TmMNCogoj1jgszVmsFMDfkBVov20V_zM9G0x8cqDQ=.5502aded-0509-4afb-b5ad-47084dbfa430@github.com> On Wed, 19 Jul 2023 10:31:09 GMT, Richard Reingruber wrote: >> Yes. For `av.neg().lanewise(VectorOperators.FMA, bv, cv, mask)`, the subgraph is like: >> `match (Set dst (FmaV (Binary (NegV src1) src2) (Binary src3 pg)));`, almost no platform supports fusing it directly, so it should be split into two vector operations: `NegV` + `FmaV`. I suppose the `NegV` is what you called as "the extra vector operation", right? > > Yes that's what I meant. Thanks. Now on PPC, my understanding would be that with the symmetrical match-rules (removed with this pr) the `NegV` wouldn't be generated. Is my understanding correct? Hi @reinrich, I'm sorry that I didn't explain the transformation here clearly enough to lead to your misunderstanding. Let's revisit the comment here. The example `av.neg().lanewise(VectorOperators.FMA, bv, cv, mask)` I gave here was used to explain the latter part "// except vectorapi masked nodes, since the inactive lanes should // save the first input of the masked node." It means the pr has no real impact on subgraph or codegen for `FmaV` nodes **with mask**, certainly including `PPC`. Since it doesn't apply the change to vector nodes with mask (`is_predicated_vector()`), and the pr doesn't remove any rules **with mask**. > Now on PPC, my understanding would be that with the symmetrical match-rules (removed with this pr) the NegV wouldn't be generated. Is my understanding correct? The symmetrical match-rules removed with this pr works only for `FmaV` nodes **without mask**. The reason why we can remove them is that here we apply the transformation to these non-masked vector nodes. For example, on `PPC` backend, we removed ` match(Set dst (FmaVF dst (Binary (NegVF src1) src2)));` and kept `match(Set dst (FmaVF dst (Binary src1 (NegVF src2))));`, because all `(-a)*b+c` should be converted into `b*(-a)+c` here. Therefore, even without these removed symmetrical match-rules, whether `(-a)*b+c` or `a*(-b)+c` can be fused and the `NegV` won't be generated, given that the backend supports it. Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14576#discussion_r1268866696 From thartmann at openjdk.org Thu Jul 20 06:33:53 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 20 Jul 2023 06:33:53 GMT Subject: [jdk21] RFR: 8308103: Massive (up to ~30x) increase in C2 compilation time since JDK 17 Message-ID: Backport of [JDK-8308103](https://bugs.openjdk.java.net/browse/JDK-8308103). Applies cleanly. Thanks, Tobias ------------- Commit messages: - 8308103: Massive (up to ~30x) increase in C2 compilation time since JDK 17 Changes: https://git.openjdk.org/jdk21/pull/141/files Webrev: https://webrevs.openjdk.org/?repo=jdk21&pr=141&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308103 Stats: 70 lines in 2 files changed: 69 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk21/pull/141.diff Fetch: git fetch https://git.openjdk.org/jdk21.git pull/141/head:pull/141 PR: https://git.openjdk.org/jdk21/pull/141 From chagedorn at openjdk.org Thu Jul 20 06:39:49 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 20 Jul 2023 06:39:49 GMT Subject: [jdk21] RFR: 8308103: Massive (up to ~30x) increase in C2 compilation time since JDK 17 In-Reply-To: References: Message-ID: <7p0kGIhXrDjBm6QRhKXCdEdYB--u1B0l2HPH5gD-zgY=.790a63e3-3101-4da5-a958-6d47b7cc9c7b@github.com> On Thu, 20 Jul 2023 06:26:59 GMT, Tobias Hartmann wrote: > Backport of [JDK-8308103](https://bugs.openjdk.java.net/browse/JDK-8308103). Applies cleanly. > > Thanks, > Tobias Looks good. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk21/pull/141#pullrequestreview-1538537011 From thartmann at openjdk.org Thu Jul 20 06:54:40 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 20 Jul 2023 06:54:40 GMT Subject: [jdk21] RFR: 8308103: Massive (up to ~30x) increase in C2 compilation time since JDK 17 In-Reply-To: References: Message-ID: On Thu, 20 Jul 2023 06:26:59 GMT, Tobias Hartmann wrote: > Backport of [JDK-8308103](https://bugs.openjdk.java.net/browse/JDK-8308103). Applies cleanly. > > Thanks, > Tobias Thanks, Christian! ------------- PR Comment: https://git.openjdk.org/jdk21/pull/141#issuecomment-1643370211 From yzheng at openjdk.org Thu Jul 20 07:06:56 2023 From: yzheng at openjdk.org (Yudi Zheng) Date: Thu, 20 Jul 2023 07:06:56 GMT Subject: RFR: 8295698: AArch64: test/jdk/sun/security/ec/ed/EdDSATest.java failed with -XX:+UseSHA3Intrinsics In-Reply-To: References: Message-ID: On Wed, 2 Nov 2022 03:06:21 GMT, Dong Bo wrote: > In JDK-8252204, when implemented SHA3 intrinsics, we use `digest_length` to differentiate SHA3-224, SHA3-256, SHA3-384, SHA3-512 and calculate `block_size` with `block_size = 200 - 2 * digest_length`. > However, there are two extra SHA3 instances, SHAKE256 and SHAKE128, allowing an arbitrary `digest_length`: > > digest_length block_size > SHA3-224 28 144 > SHA3-256 32 136 > SHA3-384 48 104 > SHA3-512 64 72 > SHAKE128 variable 168 > SHAKE256 variable 136 > > > This causes SIGSEGV crash or hash code mismatch with `test/jdk/sun/security/ec/ed/EdDSATest.java`. The test calls `SHAKE256` in `Ed448`. > > The main idea of the patch is to pass the `block_size` to differentiate SHA3 instances. > Tests `test/jdk/sun/security/ec/ed/EdDSATest.java` and `./test/jdk/sun/security/provider/MessageDigest/SHA3.java` both passed. > And tier1~3 passed on SHA3 supported hardware. > > The SHA3 intrinsics still deliver 20%~40% performance improvement on our pre-silicon simulated platform. > The latency and throughput of crypto SHA3 ops are designed to be 1 cpu cycle and 2 execution pipes respectively. > > Compared with the main stream code, the performance change with this patch are negligible on real hardware and simulation platform. > Based on the JMH results of SHA3 intirinsics, performance can be improved by ~50% on some hardware, while some hardware have ~30% regression. > These performance details are available in the comments of the issue page. > I guess the performance benefit of SHA3 intrinsics is dependent on the micro architecture, it should be switched on/off based on the running platform. src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 3910: > 3908: > 3909: // block_size == 136, bit4 == 0 and bit5 == 0, SHA3-256 or SHAKE256 > 3910: __ andw(c_rarg5, block_size, 48); does `c_rarg5` serve as a temporary register here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10939#discussion_r1269021039 From pli at openjdk.org Thu Jul 20 08:17:48 2023 From: pli at openjdk.org (Pengfei Li) Date: Thu, 20 Jul 2023 08:17:48 GMT Subject: [jdk21] RFR: 8308103: Massive (up to ~30x) increase in C2 compilation time since JDK 17 In-Reply-To: References: Message-ID: On Thu, 20 Jul 2023 06:26:59 GMT, Tobias Hartmann wrote: > Backport of [JDK-8308103](https://bugs.openjdk.java.net/browse/JDK-8308103). Applies cleanly. > > Thanks, > Tobias This patch causes assertion failure. See https://bugs.openjdk.org/browse/JDK-8312440 ------------- PR Comment: https://git.openjdk.org/jdk21/pull/141#issuecomment-1643481243 From thartmann at openjdk.org Thu Jul 20 08:34:49 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 20 Jul 2023 08:34:49 GMT Subject: [jdk21] RFR: 8308103: Massive (up to ~30x) increase in C2 compilation time since JDK 17 In-Reply-To: References: Message-ID: On Thu, 20 Jul 2023 06:26:59 GMT, Tobias Hartmann wrote: > Backport of [JDK-8308103](https://bugs.openjdk.java.net/browse/JDK-8308103). Applies cleanly. > > Thanks, > Tobias Thanks! I'll close this backport for now then. ------------- PR Comment: https://git.openjdk.org/jdk21/pull/141#issuecomment-1643505010 From thartmann at openjdk.org Thu Jul 20 08:34:49 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 20 Jul 2023 08:34:49 GMT Subject: [jdk21] Withdrawn: 8308103: Massive (up to ~30x) increase in C2 compilation time since JDK 17 In-Reply-To: References: Message-ID: On Thu, 20 Jul 2023 06:26:59 GMT, Tobias Hartmann wrote: > Backport of [JDK-8308103](https://bugs.openjdk.java.net/browse/JDK-8308103). Applies cleanly. > > Thanks, > Tobias This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk21/pull/141 From rrich at openjdk.org Thu Jul 20 09:10:42 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 20 Jul 2023 09:10:42 GMT Subject: RFR: 8308340: C2: Idealize Fma nodes [v4] In-Reply-To: <25TmMNCogoj1jgszVmsFMDfkBVov20V_zM9G0x8cqDQ=.5502aded-0509-4afb-b5ad-47084dbfa430@github.com> References: <2mha0SwQTlnWaTLdEMX6RQ1KqhasbfWKz503bmkxKhw=.eb3a1229-b222-4d02-b2f6-35c0e50e5766@github.com> <25TmMNCogoj1jgszVmsFMDfkBVov20V_zM9G0x8cqDQ=.5502aded-0509-4afb-b5ad-47084dbfa430@github.com> Message-ID: On Thu, 20 Jul 2023 02:35:47 GMT, Fei Gao wrote: >> Yes that's what I meant. Thanks. Now on PPC, my understanding would be that with the symmetrical match-rules (removed with this pr) the `NegV` wouldn't be generated. Is my understanding correct? > > Hi @reinrich, I'm sorry that I didn't explain the transformation here clearly enough to lead to your misunderstanding. Let's revisit the comment here. > > The example `av.neg().lanewise(VectorOperators.FMA, bv, cv, mask)` I gave here was used to explain the latter part > "// except vectorapi masked nodes, since the inactive lanes should > // save the first input of the masked node." > > It means the pr has no real impact on subgraph or codegen for `FmaV` nodes **with mask**, certainly including `PPC`. Since it doesn't apply the change to vector nodes with mask (`is_predicated_vector()`), and the pr doesn't remove any rules **with mask**. > >> Now on PPC, my understanding would be that with the symmetrical match-rules (removed with this pr) the NegV wouldn't be generated. Is my understanding correct? > > The symmetrical match-rules removed with this pr works only for `FmaV` nodes **without mask**. The reason why we can remove them is that here we apply the transformation to these non-masked vector nodes. For example, on `PPC` backend, we removed ` match(Set dst (FmaVF dst (Binary (NegVF src1) src2)));` and kept `match(Set dst (FmaVF dst (Binary src1 (NegVF src2))));`, because all `(-a)*b+c` should be converted into `b*(-a)+c` here. Therefore, even without these removed symmetrical match-rules, whether `(-a)*b+c` or `a*(-b)+c` can be fused and the `NegV` won't be generated, given that the backend supports it. > > Thanks. Thanks. How are `FmaV` nodes with mask handled then? Are they transformed into equivalent nodes without mask? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14576#discussion_r1269172591 From fgao at openjdk.org Thu Jul 20 09:37:46 2023 From: fgao at openjdk.org (Fei Gao) Date: Thu, 20 Jul 2023 09:37:46 GMT Subject: RFR: 8308340: C2: Idealize Fma nodes [v4] In-Reply-To: References: <2mha0SwQTlnWaTLdEMX6RQ1KqhasbfWKz503bmkxKhw=.eb3a1229-b222-4d02-b2f6-35c0e50e5766@github.com> <25TmMNCogoj1jgszVmsFMDfkBVov20V_zM9G0x8cqDQ=.5502aded-0509-4afb-b5ad-47084dbfa430@github.com> Message-ID: On Thu, 20 Jul 2023 09:07:29 GMT, Richard Reingruber wrote: >> Hi @reinrich, I'm sorry that I didn't explain the transformation here clearly enough to lead to your misunderstanding. Let's revisit the comment here. >> >> The example `av.neg().lanewise(VectorOperators.FMA, bv, cv, mask)` I gave here was used to explain the latter part >> "// except vectorapi masked nodes, since the inactive lanes should >> // save the first input of the masked node." >> >> It means the pr has no real impact on subgraph or codegen for `FmaV` nodes **with mask**, certainly including `PPC`. Since it doesn't apply the change to vector nodes with mask (`is_predicated_vector()`), and the pr doesn't remove any rules **with mask**. >> >>> Now on PPC, my understanding would be that with the symmetrical match-rules (removed with this pr) the NegV wouldn't be generated. Is my understanding correct? >> >> The symmetrical match-rules removed with this pr works only for `FmaV` nodes **without mask**. The reason why we can remove them is that here we apply the transformation to these non-masked vector nodes. For example, on `PPC` backend, we removed ` match(Set dst (FmaVF dst (Binary (NegVF src1) src2)));` and kept `match(Set dst (FmaVF dst (Binary src1 (NegVF src2))));`, because all `(-a)*b+c` should be converted into `b*(-a)+c` here. Therefore, even without these removed symmetrical match-rules, whether `(-a)*b+c` or `a*(-b)+c` can be fused and the `NegV` won't be generated, given that the backend supports it. >> >> Thanks. > > Thanks. How are `FmaV` nodes with mask handled then? Are they transformed into equivalent nodes without mask? Actually, there is no handling on `FmaV` nodes **with mask** in this patch, whether in the C2 mid-end or codegen backend. The gvn transformation just skips them. And I suppose `FmaV` nodes with mask can't be transformed into nodes **without mask**, except that C2 can guarantee that the mask is all true (this transformation has not been supported by current C2). Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14576#discussion_r1269205784 From fyang at openjdk.org Thu Jul 20 09:46:46 2023 From: fyang at openjdk.org (Fei Yang) Date: Thu, 20 Jul 2023 09:46:46 GMT Subject: RFR: 8308340: C2: Idealize Fma nodes [v3] In-Reply-To: References: Message-ID: On Tue, 27 Jun 2023 16:05:54 GMT, Vladimir Kozlov wrote: >> Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Merge branch 'master' into fg8308340 >> - Move check for UseFMA from c2compiler.cpp to Matcher::match_rule_supported in .ad files >> - Merge branch 'master' into fg8308340 >> - 8308340: C2: Idealize Fma nodes >> >> Some platforms, like aarch64, ppc, and riscv, support fusing >> `Math.fma(-a, b, c)` or `Math.fma(a, -b, c)` by generating >> partially symmetric match rules like: >> >> ``` >> match(Set dst (FmaF src3 (Binary (NegF src1) src2))); >> match(Set dst (FmaF src3 (Binary src1 (NegF src2)))); >> ``` >> >> Since `Fma` is partially communitive, the patch is to convert >> `Math.fma(-a, b, c)` to `Math.fma(b, -a, c)` in gvn phase, >> making node patterns canonical. Then we can remove redundant >> rules. >> >> Also, we should guarantee that C2 generates `Fma` nodes only on >> platforms supporting `Fma` instructions before matcher, so we >> can remove all `predicate(UseFMA)` for all `Fma` rules. >> >> After the patch, the code size of libjvm.so on aarch64 platform >> decreased by 63.4k. >> >> The patch passed all tier 1 - 3 on aarch64 and x86 platforms. > > Looks good to me. > You need second review. > Thanks for your review @vnkozlov . > > I would appreciate it very much if some expert on ppc or riscv could help review it! Perhaps @RealFYang @reinrich Hello, the RISC-V part looks fine from what this PR is supposed to do. And this has passed tier1-3 tests on linux-riscv64 platform. Note that I didn't check the shared code changes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14576#issuecomment-1643608041 From fgao at openjdk.org Thu Jul 20 09:46:50 2023 From: fgao at openjdk.org (Fei Gao) Date: Thu, 20 Jul 2023 09:46:50 GMT Subject: RFR: 8308340: C2: Idealize Fma nodes [v4] In-Reply-To: References: Message-ID: <4SQrLthjIMgRbYC2PQ5ykhqBDCfDKCALk6TvADgkZHE=.1cf8c650-7bf6-4234-bfb0-8a49277b820b@github.com> On Wed, 5 Jul 2023 11:11:22 GMT, Fei Gao wrote: >> Some platforms, like aarch64, ppc, and riscv, support fusing `Math.fma(-a, b, c)` or `Math.fma(a, -b, c)` by generating partially symmetric match rules like: >> >> >> match(Set dst (FmaF src3 (Binary (NegF src1) src2))); >> match(Set dst (FmaF src3 (Binary src1 (NegF src2)))); >> >> >> Since `Fma` is partially commutative, the patch is to convert `Math.fma(-a, b, c)` to `Math.fma(b, -a, c)` in gvn phase, making node patterns canonical. Then we can remove redundant rules. >> >> Also, we should guarantee that C2 generates `Fma` nodes only on platforms supporting `Fma` instructions before matcher, so we can remove all `predicate(UseFMA)` for all `Fma` rules. >> >> After the patch, the code size of libjvm.so on aarch64 platform decreased by 63.4k. >> >> The patch passed all tier 1 - 3 on aarch64 and x86 platforms. > > Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into fg8308340 > - Merge branch 'master' into fg8308340 > - Move check for UseFMA from c2compiler.cpp to Matcher::match_rule_supported in .ad files > - Merge branch 'master' into fg8308340 > - 8308340: C2: Idealize Fma nodes > > Some platforms, like aarch64, ppc, and riscv, support fusing > `Math.fma(-a, b, c)` or `Math.fma(a, -b, c)` by generating > partially symmetric match rules like: > > ``` > match(Set dst (FmaF src3 (Binary (NegF src1) src2))); > match(Set dst (FmaF src3 (Binary src1 (NegF src2)))); > ``` > > Since `Fma` is partially communitive, the patch is to convert > `Math.fma(-a, b, c)` to `Math.fma(b, -a, c)` in gvn phase, > making node patterns canonical. Then we can remove redundant > rules. > > Also, we should guarantee that C2 generates `Fma` nodes only on > platforms supporting `Fma` instructions before matcher, so we > can remove all `predicate(UseFMA)` for all `Fma` rules. > > After the patch, the code size of libjvm.so on aarch64 platform > decreased by 63.4k. > > The patch passed all tier 1 - 3 on aarch64 and x86 platforms. > Hello, the RISC-V part looks fine from what this PR is supposed to do. And this has passed tier1-3 tests on linux-riscv64 platform. I didn't check the shared code changes. @RealFYang Thanks for your review and test work! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14576#issuecomment-1643610013 From rrich at openjdk.org Thu Jul 20 09:57:44 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 20 Jul 2023 09:57:44 GMT Subject: RFR: 8308340: C2: Idealize Fma nodes [v4] In-Reply-To: References: <2mha0SwQTlnWaTLdEMX6RQ1KqhasbfWKz503bmkxKhw=.eb3a1229-b222-4d02-b2f6-35c0e50e5766@github.com> <25TmMNCogoj1jgszVmsFMDfkBVov20V_zM9G0x8cqDQ=.5502aded-0509-4afb-b5ad-47084dbfa430@github.com> Message-ID: On Thu, 20 Jul 2023 09:34:27 GMT, Fei Gao wrote: >> Thanks. How are `FmaV` nodes with mask handled then? Are they transformed into equivalent nodes without mask? > > Actually, there is no handling on `FmaV` nodes **with mask** in this patch, whether in the C2 mid-end or codegen backend. The gvn transformation just skips them. And I suppose `FmaV` nodes with mask can't be transformed into nodes **without mask**, except that C2 can guarantee that the mask is all true (this transformation has not been supported by current C2). Thanks. I guess I would have to dig deeper into the vector api implementation to really understand how it works. Thanks for your patience. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14576#discussion_r1269230335 From dfenacci at openjdk.org Thu Jul 20 12:26:09 2023 From: dfenacci at openjdk.org (Damon Fenacci) Date: Thu, 20 Jul 2023 12:26:09 GMT Subject: RFR: JDK-8304954: SegmentedCodeCache fails when using large pages Message-ID: # Issue When large pages are enabled and segmented code cache is used, the VM tries to use one page for each segment. If the amount of reserved code cache is limited, this can make the total size of the code cache bigger than the reserved size, which in turn makes the VM fail, claiming that there is not enough space (e.g. this happens when running `java -XX:+UseLargePages -XX:+SegmentedCodeCache -XX:InitialCodeCacheSize=2g -XX:ReservedCodeCacheSize=2g -XX:LargePageSizeInBytes=1g -Xlog:pagesize*=debug -version`). This behaviour is not correct as the VM should fall back and try with a smaller page size (and print a warning). # Solution Right before reserving heap space for code segments we introduce a test (if we are using large pages) that checks for the total size of aligned code segments and selects smaller and smaller page sizes until the total size fits in the reserved size for the code cache. # Test The regression test runs a new VM with `-XX:+UseLargePages -XX:LargePageSizeInBytes=1g`. The `main` method then checks if the two flags have been "taken". If so, another process is started that checks for a specific output, otherwise the test passes (i.e. the current system doesn't allow 1GB large pages) ------------- Commit messages: - JDK-8304954: exclude ZGC from test - JDK-8304954: adding test - JDK-8304954: fix syntax - JDK-8304954: print warning only when page size actually changes - JDK-8304954: use loop to find smaller page in case there are multiple failing large pages - JDK-8304954: update warning message - JDK-8304954: SegmentedCodeCache fails when using large pages Changes: https://git.openjdk.org/jdk/pull/14903/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14903&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8304954 Stats: 100 lines in 3 files changed: 94 ins; 1 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/14903.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14903/head:pull/14903 PR: https://git.openjdk.org/jdk/pull/14903 From stuefe at openjdk.org Thu Jul 20 14:06:42 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 20 Jul 2023 14:06:42 GMT Subject: RFR: JDK-8304954: SegmentedCodeCache fails when using large pages In-Reply-To: References: Message-ID: On Mon, 17 Jul 2023 13:34:55 GMT, Damon Fenacci wrote: > # Issue > > When large pages are enabled and segmented code cache is used, the VM tries to use one page for each segment. If the amount of reserved code cache is limited, this can make the total size of the code cache bigger than the reserved size, which in turn makes the VM fail, claiming that there is not enough space (e.g. this happens when running `java -XX:+UseLargePages -XX:+SegmentedCodeCache -XX:InitialCodeCacheSize=2g -XX:ReservedCodeCacheSize=2g -XX:LargePageSizeInBytes=1g -Xlog:pagesize*=debug -version`). > This behaviour is not correct as the VM should fall back and try with a smaller page size (and print a warning). > > # Solution > > Right before reserving heap space for code segments we introduce a test (if we are using large pages) that checks for the total size of aligned code segments and selects smaller and smaller page sizes until the total size fits in the reserved size for the code cache. > > # Test > > The regression test runs a new VM with `-XX:+UseLargePages -XX:LargePageSizeInBytes=1g`. The `main` method then checks if the two flags have been "taken". If so, another process is started that checks for a specific output, otherwise the test passes (i.e. the current system doesn't allow 1GB large pages) Hi @dafedafe, good catch! May I propose a much simpler fix, though? The problem is that we don't pass the correct value for `min_pages` to `CodeCache::page_size`. `CodeCache::page_size()` calls either one of `os::page_size_for_region_aligned()` or `os::page_size_for_region_unaligned()`, which already tries to fit the given memory region by iterating through page sizes, exactly like you do. If you change `CodeCache::reserve_heap_memory()` to pass in a minimum number of pages of "8" (as we pass "8" in the earlier call that calculates the alignment of the segments), the error goes away. Arguably, one should calculate the page size just once and then always use that calculated value, instead of recalculating it differently each time. diff --git a/src/hotspot/share/code/codeCache.cpp b/src/hotspot/share/code/codeCache.cpp index 2ea72a1fcbd..7a30bfb1783 100644 --- a/src/hotspot/share/code/codeCache.cpp +++ b/src/hotspot/share/code/codeCache.cpp @@ -356,7 +356,7 @@ size_t CodeCache::page_size(bool aligned, size_t min_pages) { ReservedCodeSpace CodeCache::reserve_heap_memory(size_t size) { // Align and reserve space for code cache - const size_t rs_ps = page_size(); + const size_t rs_ps = page_size(false, 8); const size_t rs_align = MAX2(rs_ps, os::vm_allocation_granularity()); const size_t rs_size = align_up(size, rs_align); ReservedCodeSpace rs(rs_size, rs_align, rs_ps); On my machine with both 1G and 2M pages configured, with the fix, we now don't crash but use 2 MB pages for the code cache: thomas at starfish$ ./images/jdk/bin/java -XX:+UseLargePages -XX:+SegmentedCodeCache -XX:InitialCodeCacheSize=2g -XX:ReservedCodeCacheSize=2g -XX:LargePageSizeInBytes=1g -Xlog:pagesize*=debug -version [0.001s][info][pagesize] Static hugepage support: [0.001s][info][pagesize] hugepage size: 2M, nr_hugepages: 2000, nr_overcommit_hugepages: 0 [0.001s][info][pagesize] hugepage size: 1G, nr_hugepages: 5, nr_overcommit_hugepages: 0 [0.001s][info][pagesize] default hugepage size: 2M [0.001s][info][pagesize] Transparent hugepage (THP) support: [0.001s][info][pagesize] THP mode: always [0.001s][info][pagesize] THP pagesize: 2M [0.001s][info][pagesize] Overriding default large page size (2M) using LargePageSizeInBytes: 1G [0.001s][info][pagesize] UseLargePages=1, UseTransparentHugePages=0, UseHugeTLBFS=1, UseSHM=0 [0.001s][info][pagesize] Large page support enabled. Usable page sizes: 4k, 2M, 1G. Default large page size: 1G. ... [0.002s][info ][pagesize] CodeHeap 'non-nmethods': min=8M max=8M base=0x00007f38d3400000 size=8M page_size=2M [0.003s][info ][pagesize] CodeHeap 'profiled nmethods': min=1020M max=1020M base=0x00007f3893800000 size=1020M page_size=2M [0.092s][info ][pagesize] CodeHeap 'non-profiled nmethods': min=1020M max=1020M base=0x00007f38d3c00000 size=1020M page_size=2M Cheers, Thomas ------------- PR Review: https://git.openjdk.org/jdk/pull/14903#pullrequestreview-1539349817 From dnsimon at openjdk.org Thu Jul 20 18:05:07 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 20 Jul 2023 18:05:07 GMT Subject: RFR: [JVMCI] ConstantPool should not force eager resolution [v2] In-Reply-To: <9zF7nYvZ2ZU7gIquOEdKlAhyyX2AQ3pVmnwKh9Yz4aI=.192df7cb-66aa-43e3-8d3d-58ffa18b8617@github.com> References: <9zF7nYvZ2ZU7gIquOEdKlAhyyX2AQ3pVmnwKh9Yz4aI=.192df7cb-66aa-43e3-8d3d-58ffa18b8617@github.com> Message-ID: <37fawNo9anSeBjPYP7tn-wcCUFp4UobyMq39dHvBJAg=.d90c0d6f-5c2b-407e-a730-d44dd7a7d4b5@github.com> > The existing `jdk.vm.ci.meta.ConstantPool.lookupConstant(int cpi)` method forces eager resolving of constants. For `DynamicConstant`, `MethodHandle` and `MethodType`, this can mean invoking bootstrap methods, something that should not be done during JIT compilation. To avoid this, this PR adds the following to `jdk.vm.ci.meta.ConstantPool`: > > > /** > * Looks up a constant at the specified index. > * > * If {@code resolve == false} and the denoted constant is of type > * {@code JVM_CONSTANT_Dynamic}, {@code JVM_CONSTANT_MethodHandle} or > * {@code JVM_CONSTANT_MethodType} and it's not yet resolved then > * {@code null} is returned. > * > * @param cpi the constant pool index > * @return the {@code Constant} or {@code JavaType} instance representing the constant pool > * entry > */ > Object lookupConstant(int cpi, boolean resolve); > > > Likewise, `jdk.vm.ci.meta.ConstantPool.lookupBootstrapMethodInvocation` has been fixed to no longer invoke the associated bootstrap method. > > --------- > ### Progress > - [ ] Change must be properly reviewed (1 review required, with at least 1 [Reviewer](https://openjdk.org/bylaws#reviewer)) > - [x] Change must not contain extraneous whitespace > - [x] Commit message must refer to an issue > > > > ### Reviewing >
Using git > > Checkout this PR locally: \ > `$ git fetch https://git.openjdk.org/jdk.git pull/14927/head:pull/14927` \ > `$ git checkout pull/14927` > > Update a local copy of the PR: \ > `$ git checkout pull/14927` \ > `$ git pull https://git.openjdk.org/jdk.git pull/14927/head` > >
>
Using Skara CLI tools > > Checkout this PR locally: \ > `$ git pr checkout 14927` > > View PR using the GUI difftool: \ > `$ git pr show -t 14927` > >
>
Using diff file > > Download this PR as a diff file: \ > https://git.openjdk.org/jdk/pull/14927.diff > >
Doug Simon has updated the pull request incrementally with one additional commit since the last revision: avoid bootstrap method invocation by lookupBootstrapMethodInfo ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14927/files - new: https://git.openjdk.org/jdk/pull/14927/files/54c54047..4621c13a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14927&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14927&range=00-01 Stats: 126 lines in 5 files changed: 76 ins; 19 del; 31 mod Patch: https://git.openjdk.org/jdk/pull/14927.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14927/head:pull/14927 PR: https://git.openjdk.org/jdk/pull/14927 From dnsimon at openjdk.org Thu Jul 20 18:17:45 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 20 Jul 2023 18:17:45 GMT Subject: RFR: [JVMCI] ConstantPool should not force eager resolution [v2] In-Reply-To: References: <9zF7nYvZ2ZU7gIquOEdKlAhyyX2AQ3pVmnwKh9Yz4aI=.192df7cb-66aa-43e3-8d3d-58ffa18b8617@github.com> <37fawNo9anSeBjPYP7tn-wcCUFp4UobyMq39dHvBJAg=.d90c0d6f-5c2b-407e-a730-d44dd7a7d4b5@github.com> Message-ID: On Thu, 20 Jul 2023 18:08:16 GMT, Tom Rodriguez wrote: >> Doug Simon has updated the pull request incrementally with one additional commit since the last revision: >> >> avoid bootstrap method invocation by lookupBootstrapMethodInfo > > src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotConstantPool.java line 666: > >> 664: * entries do not have a symbol in the constant pool slot. >> 665: */ >> 666: return compilerToVM().lookupConstantInPool(this, cpi, true); > > This is true because it's always safe to do so? I don't really know what these "pseudo strings" are. In any case, I don't think resolving them involves calling any Java code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14927#discussion_r1269807593 From never at openjdk.org Thu Jul 20 18:17:43 2023 From: never at openjdk.org (Tom Rodriguez) Date: Thu, 20 Jul 2023 18:17:43 GMT Subject: RFR: [JVMCI] ConstantPool should not force eager resolution [v2] In-Reply-To: <37fawNo9anSeBjPYP7tn-wcCUFp4UobyMq39dHvBJAg=.d90c0d6f-5c2b-407e-a730-d44dd7a7d4b5@github.com> References: <9zF7nYvZ2ZU7gIquOEdKlAhyyX2AQ3pVmnwKh9Yz4aI=.192df7cb-66aa-43e3-8d3d-58ffa18b8617@github.com> <37fawNo9anSeBjPYP7tn-wcCUFp4UobyMq39dHvBJAg=.d90c0d6f-5c2b-407e-a730-d44dd7a7d4b5@github.com> Message-ID: On Thu, 20 Jul 2023 18:05:07 GMT, Doug Simon wrote: >> The existing `jdk.vm.ci.meta.ConstantPool.lookupConstant(int cpi)` method forces eager resolving of constants. For `DynamicConstant`, `MethodHandle` and `MethodType`, this can mean invoking bootstrap methods, something that should not be done during JIT compilation. To avoid this, this PR adds the following to `jdk.vm.ci.meta.ConstantPool`: >> >> >> /** >> * Looks up a constant at the specified index. >> * >> * If {@code resolve == false} and the denoted constant is of type >> * {@code JVM_CONSTANT_Dynamic}, {@code JVM_CONSTANT_MethodHandle} or >> * {@code JVM_CONSTANT_MethodType} and it's not yet resolved then >> * {@code null} is returned. >> * >> * @param cpi the constant pool index >> * @return the {@code Constant} or {@code JavaType} instance representing the constant pool >> * entry >> */ >> Object lookupConstant(int cpi, boolean resolve); >> >> >> Likewise, `jdk.vm.ci.meta.ConstantPool.lookupBootstrapMethodInvocation` has been fixed to no longer invoke the associated bootstrap method. >> >> --------- >> ### Progress >> - [ ] Change must be properly reviewed (1 review required, with at least 1 [Reviewer](https://openjdk.org/bylaws#reviewer)) >> - [x] Change must not contain extraneous whitespace >> - [x] Commit message must refer to an issue >> >> >> >> ### Reviewing >>
Using git >> >> Checkout this PR locally: \ >> `$ git fetch https://git.openjdk.org/jdk.git pull/14927/head:pull/14927` \ >> `$ git checkout pull/14927` >> >> Update a local copy of the PR: \ >> `$ git checkout pull/14927` \ >> `$ git pull https://git.openjdk.org/jdk.git pull/14927/head` >> >>
>>
Using Skara CLI tools >> >> Checkout this PR locally: \ >> `$ git pr checkout 14927` >> >> View PR using the GUI difftool: \ >> `$ git pr show -t 14927` >> >>
>>
Using diff file >> >> Download this PR as a diff file: \ >> https://git.openjdk.org/jdk/pull/14927.diff >> >>
> > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > avoid bootstrap method invocation by lookupBootstrapMethodInfo Marked as reviewed by never (Reviewer). src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotConstantPool.java line 666: > 664: * entries do not have a symbol in the constant pool slot. > 665: */ > 666: return compilerToVM().lookupConstantInPool(this, cpi, true); This is true because it's always safe to do so? ------------- PR Review: https://git.openjdk.org/jdk/pull/14927#pullrequestreview-1539822479 PR Review Comment: https://git.openjdk.org/jdk/pull/14927#discussion_r1269804156 From never at openjdk.org Thu Jul 20 18:17:46 2023 From: never at openjdk.org (Tom Rodriguez) Date: Thu, 20 Jul 2023 18:17:46 GMT Subject: RFR: [JVMCI] ConstantPool should not force eager resolution [v2] In-Reply-To: References: <9zF7nYvZ2ZU7gIquOEdKlAhyyX2AQ3pVmnwKh9Yz4aI=.192df7cb-66aa-43e3-8d3d-58ffa18b8617@github.com> <37fawNo9anSeBjPYP7tn-wcCUFp4UobyMq39dHvBJAg=.d90c0d6f-5c2b-407e-a730-d44dd7a7d4b5@github.com> Message-ID: <6V-Tbii4tAcrEFtxbp5zHQD5RpmxyAPqOulUrJ1It2Y=.15b3339a-3219-4de2-9f60-cd8ce3038a89@github.com> On Thu, 20 Jul 2023 18:12:10 GMT, Doug Simon wrote: >> src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotConstantPool.java line 666: >> >>> 664: * entries do not have a symbol in the constant pool slot. >>> 665: */ >>> 666: return compilerToVM().lookupConstantInPool(this, cpi, true); >> >> This is true because it's always safe to do so? > > I don't really know what these "pseudo strings" are. In any case, I don't think resolving them involves calling any Java code. Sounds good. It's not currently causing any problems and we can investigate further if we start restricting when we `can_call_java` and problems show up here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14927#discussion_r1269809593 From never at openjdk.org Thu Jul 20 19:14:43 2023 From: never at openjdk.org (Tom Rodriguez) Date: Thu, 20 Jul 2023 19:14:43 GMT Subject: RFR: 8311557: [JVMCI] deadlock with JVMTI thread suspension In-Reply-To: References: Message-ID: On Fri, 7 Jul 2023 06:13:21 GMT, Tom Rodriguez wrote: > Java based JVMCI compiler threads are more like normal Java threads so they aren't `hidden_from_external_view` like the native compilers. This can leak to deadlocks if you use JVMTI to suspend all threads since this will block the compiler queue and can block execution if background compilation is disabled. It's reasonable to treat libgraal threads like native threads in this regard. Making jargraal threads hidden too would interfere with using profiling and debugging tool on them so I've left that alone but it might be worth changing the JVMTI suspend and resume functions to explicitly skip compiler threads as well. Thanks for the review. Part of the problem is that it's not clear what the relationship between things like `can_call_java` and `hidden_from_external_view` is and whether making some of these methods safer for JVMCI threads will confuse users of JVMTI. There will likely be some follow up issues related to this as we navigate these distinctions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14799#issuecomment-1644458471 From duke at openjdk.org Thu Jul 20 21:16:45 2023 From: duke at openjdk.org (nitinsingh130519) Date: Thu, 20 Jul 2023 21:16:45 GMT Subject: RFR: 8311964 : Some jtreg tests failing on x86 with error 'unrecognized VM options' (C2 flags) Message-ID: This commit addresses the issue of failing jtreg tests on the x86 platform. The tests were failing due to unrecognized VM options 'LoopUnswitching', 'OptimizeFill' and 'StressCCP' which are not available in x86 binary. Following tests were affected: 1. TestInfiniteLoopWithUnmergedBackedgesMain.java - Affected by 'LoopUnswitching' 2. TestBackedgeLoadArrayFillMain.java - Affected by 'OptimizeFill' 3. TestRangeCheckCmpUOverflowVsSub.java - Affected by 'StressCCP' Changes have been made to these tests to avoid the use of these flags when testing on x86. These changes maintain the integrity of the tests while ensuring compatibility across different platforms. Related commits: - [ee63f83ed705c9cd3c49316fc4936668744f415d](https://github.com/microsoft/openjdk-jdk17u/commit/ee63f83ed705c9cd3c49316fc4936668744f415d#diff-ac9e408e8f32ed8c6260b005b8c386bc7a0a8738a8b8d2fe91c82b66f9f6ab7e) - [d21597aec91bbd41960923385f6a1feb31f14a0c](https://github.com/microsoft/openjdk-jdk17u/commit/d21597aec91bbd41960923385f6a1feb31f14a0c) - [e6c27925d23fe283a23c6adbe263658909c3739d](https://github.com/microsoft/openjdk-jdk17u/commit/e6c27925d23fe283a23c6adbe263658909c3739d#diff-36a07bd10ab86c032882232a1d18b96fb6a399c01dc05739c24ea12e9abc2d55) ------------- Commit messages: - Fix failing jtreg tests due to missing c2 flags on x86 Changes: https://git.openjdk.org/jdk/pull/14942/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14942&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8311964 Stats: 3 lines in 3 files changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14942.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14942/head:pull/14942 PR: https://git.openjdk.org/jdk/pull/14942 From dhanalla at openjdk.org Thu Jul 20 22:50:39 2023 From: dhanalla at openjdk.org (Dhamoder Nalla) Date: Thu, 20 Jul 2023 22:50:39 GMT Subject: RFR: 8311964 : Some jtreg tests failing on x86 with error 'unrecognized VM options' (C2 flags) In-Reply-To: References: Message-ID: On Wed, 19 Jul 2023 23:19:28 GMT, nitinsingh130519 wrote: > This commit addresses the issue of failing jtreg tests on the x86 platform. The tests were failing due to unrecognized VM options 'LoopUnswitching', 'OptimizeFill' and 'StressCCP' which are not available in x86 binary. > > Following tests were affected: > > 1. TestInfiniteLoopWithUnmergedBackedgesMain.java - Affected by 'LoopUnswitching' > 2. TestBackedgeLoadArrayFillMain.java - Affected by 'OptimizeFill' > 3. TestRangeCheckCmpUOverflowVsSub.java - Affected by 'StressCCP' > > Changes have been made to these tests to avoid the use of these flags when testing on x86. > > These changes maintain the integrity of the tests while ensuring compatibility across different platforms. > > Related commits: > > - [ee63f83ed705c9cd3c49316fc4936668744f415d](https://github.com/microsoft/openjdk-jdk17u/commit/ee63f83ed705c9cd3c49316fc4936668744f415d#diff-ac9e408e8f32ed8c6260b005b8c386bc7a0a8738a8b8d2fe91c82b66f9f6ab7e) > - [d21597aec91bbd41960923385f6a1feb31f14a0c](https://github.com/microsoft/openjdk-jdk17u/commit/d21597aec91bbd41960923385f6a1feb31f14a0c) > - [e6c27925d23fe283a23c6adbe263658909c3739d](https://github.com/microsoft/openjdk-jdk17u/commit/e6c27925d23fe283a23c6adbe263658909c3739d#diff-36a07bd10ab86c032882232a1d18b96fb6a399c01dc05739c24ea12e9abc2d55) Marked as reviewed by dhanalla (Author). ------------- PR Review: https://git.openjdk.org/jdk/pull/14942#pullrequestreview-1540220930 From thartmann at openjdk.org Fri Jul 21 05:14:39 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 21 Jul 2023 05:14:39 GMT Subject: RFR: 8312420: Integrate Graal's blender micro benchmark In-Reply-To: <5sj7hpmmChUitKVYH-je8xq3AAA_GkjcFXJl6uGnGQc=.59e78f75-12a8-4b7b-8817-627f65149718@github.com> References: <5sj7hpmmChUitKVYH-je8xq3AAA_GkjcFXJl6uGnGQc=.59e78f75-12a8-4b7b-8817-627f65149718@github.com> Message-ID: On Wed, 19 Jul 2023 22:31:40 GMT, Joshua Cao wrote: > We would like to integrate Graal's blender micro benchmark from https://www.graalvm.org/22.1/examples/java-performance-examples/. We have been using this benchmark to test our partial escape analysis work (https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2023-July/066670.html). This test can exist independently of the project. > > > example command to run test: > > > make run-test TEST=micro:org.openjdk.bench.vm.compiler.pea.Blender MICRO="FORK=1;OPTIONS=-prof gc -gc true" > > > example output (not complete): > > > Benchmark (iteration) Mode Cnt Score Error Units [29/1913] > Blender.initialize 1 avgt 227997775.000 ns/op > Blender.initialize:?gc.alloc.rate 1 avgt 167.192 MB/sec > Blender.initialize:?gc.alloc.rate.norm 1 avgt 40000081.600 B/op > Blender.initialize:?gc.count 1 avgt 4.000 counts > Blender.initialize:?gc.time 1 avgt 65.000 ms > Blender.initialize 2 avgt 226255767.800 ns/op > Blender.initialize:?gc.alloc.rate 2 avgt 168.466 MB/sec > Blender.initialize:?gc.alloc.rate.norm 2 avgt 40000081.600 B/op > Blender.initialize:?gc.count 2 avgt 4.000 counts > Blender.initialize:?gc.time 2 avgt 58.000 ms > Blender.initialize 3 avgt 225596324.600 ns/op > Blender.initialize:?gc.alloc.rate 3 avgt 168.960 MB/sec > Blender.initialize:?gc.alloc.rate.norm 3 avgt 40000081.600 B/op > Blender.initialize:?gc.count 3 avgt 4.000 counts > Blender.initialize:?gc.time 3 avgt 55.000 ms > Blender.initialize 4 avgt 224856811.000 ns/op > Blender.initialize:?gc.alloc.rate 4 avgt 169.520 MB/sec > Blender.initialize:?gc.alloc.rate.norm 4 avgt 40000081.600 B/op > Blender.initialize:?gc.count 4 avgt 4.000 counts > Blender.initialize:?gc.time ... The copyright situation is unclear to me. Who owns this code? @dougxc could you shed some light on this? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14941#issuecomment-1644991399 From thartmann at openjdk.org Fri Jul 21 06:57:44 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 21 Jul 2023 06:57:44 GMT Subject: RFR: 8311964 : Some jtreg tests failing on x86 with error 'unrecognized VM options' (C2 flags) In-Reply-To: References: Message-ID: On Wed, 19 Jul 2023 23:19:28 GMT, nitinsingh130519 wrote: > This commit addresses the issue of failing jtreg tests on the x86 platform. The tests were failing due to unrecognized VM options 'LoopUnswitching', 'OptimizeFill' and 'StressCCP' which are not available in x86 binary. > > Following tests were affected: > > 1. TestInfiniteLoopWithUnmergedBackedgesMain.java - Affected by 'LoopUnswitching' > 2. TestBackedgeLoadArrayFillMain.java - Affected by 'OptimizeFill' > 3. TestRangeCheckCmpUOverflowVsSub.java - Affected by 'StressCCP' > > Changes have been made to these tests to avoid the use of these flags when testing on x86. > > These changes maintain the integrity of the tests while ensuring compatibility across different platforms. > > Related commits: > > - [ee63f83ed705c9cd3c49316fc4936668744f415d](https://github.com/microsoft/openjdk-jdk17u/commit/ee63f83ed705c9cd3c49316fc4936668744f415d#diff-ac9e408e8f32ed8c6260b005b8c386bc7a0a8738a8b8d2fe91c82b66f9f6ab7e) > - [d21597aec91bbd41960923385f6a1feb31f14a0c](https://github.com/microsoft/openjdk-jdk17u/commit/d21597aec91bbd41960923385f6a1feb31f14a0c) > - [e6c27925d23fe283a23c6adbe263658909c3739d](https://github.com/microsoft/openjdk-jdk17u/commit/e6c27925d23fe283a23c6adbe263658909c3739d#diff-36a07bd10ab86c032882232a1d18b96fb6a399c01dc05739c24ea12e9abc2d55) Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14942#pullrequestreview-1540558275 From dnsimon at openjdk.org Fri Jul 21 07:11:39 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 21 Jul 2023 07:11:39 GMT Subject: RFR: 8312420: Integrate Graal's blender micro benchmark In-Reply-To: <5sj7hpmmChUitKVYH-je8xq3AAA_GkjcFXJl6uGnGQc=.59e78f75-12a8-4b7b-8817-627f65149718@github.com> References: <5sj7hpmmChUitKVYH-je8xq3AAA_GkjcFXJl6uGnGQc=.59e78f75-12a8-4b7b-8817-627f65149718@github.com> Message-ID: On Wed, 19 Jul 2023 22:31:40 GMT, Joshua Cao wrote: > We would like to integrate Graal's blender micro benchmark from https://www.graalvm.org/22.1/examples/java-performance-examples/. We have been using this benchmark to test our partial escape analysis work (https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2023-July/066670.html). This test can exist independently of the project. > > > example command to run test: > > > make run-test TEST=micro:org.openjdk.bench.vm.compiler.pea.Blender MICRO="FORK=1;OPTIONS=-prof gc -gc true" > > > example output (not complete): > > > Benchmark (iteration) Mode Cnt Score Error Units [29/1913] > Blender.initialize 1 avgt 227997775.000 ns/op > Blender.initialize:?gc.alloc.rate 1 avgt 167.192 MB/sec > Blender.initialize:?gc.alloc.rate.norm 1 avgt 40000081.600 B/op > Blender.initialize:?gc.count 1 avgt 4.000 counts > Blender.initialize:?gc.time 1 avgt 65.000 ms > Blender.initialize 2 avgt 226255767.800 ns/op > Blender.initialize:?gc.alloc.rate 2 avgt 168.466 MB/sec > Blender.initialize:?gc.alloc.rate.norm 2 avgt 40000081.600 B/op > Blender.initialize:?gc.count 2 avgt 4.000 counts > Blender.initialize:?gc.time 2 avgt 58.000 ms > Blender.initialize 3 avgt 225596324.600 ns/op > Blender.initialize:?gc.alloc.rate 3 avgt 168.960 MB/sec > Blender.initialize:?gc.alloc.rate.norm 3 avgt 40000081.600 B/op > Blender.initialize:?gc.count 3 avgt 4.000 counts > Blender.initialize:?gc.time 3 avgt 55.000 ms > Blender.initialize 4 avgt 224856811.000 ns/op > Blender.initialize:?gc.alloc.rate 4 avgt 169.520 MB/sec > Blender.initialize:?gc.alloc.rate.norm 4 avgt 40000081.600 B/op > Blender.initialize:?gc.count 4 avgt 4.000 counts > Blender.initialize:?gc.time ... This is code I wrote for the purpose of demonstrating the advantage of Graal's PEA. As such, the copyright is owned by Oracle. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14941#issuecomment-1645085904 From duke at openjdk.org Fri Jul 21 07:46:40 2023 From: duke at openjdk.org (Joshua Cao) Date: Fri, 21 Jul 2023 07:46:40 GMT Subject: RFR: 8312420: Integrate Graal's blender micro benchmark In-Reply-To: <5sj7hpmmChUitKVYH-je8xq3AAA_GkjcFXJl6uGnGQc=.59e78f75-12a8-4b7b-8817-627f65149718@github.com> References: <5sj7hpmmChUitKVYH-je8xq3AAA_GkjcFXJl6uGnGQc=.59e78f75-12a8-4b7b-8817-627f65149718@github.com> Message-ID: <3Tt1Oj75h-pOB0gIKdkQIuugSWz0hodGdb7YZmNtZ6g=.f065670a-abc5-4e71-96d5-f935656f66bd@github.com> On Wed, 19 Jul 2023 22:31:40 GMT, Joshua Cao wrote: > We would like to integrate Graal's blender micro benchmark from https://www.graalvm.org/22.1/examples/java-performance-examples/. We have been using this benchmark to test our partial escape analysis work (https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2023-July/066670.html). This test can exist independently of the project. > > > example command to run test: > > > make run-test TEST=micro:org.openjdk.bench.vm.compiler.pea.Blender MICRO="FORK=1;OPTIONS=-prof gc -gc true" > > > example output (not complete): > > > Benchmark (iteration) Mode Cnt Score Error Units [29/1913] > Blender.initialize 1 avgt 227997775.000 ns/op > Blender.initialize:?gc.alloc.rate 1 avgt 167.192 MB/sec > Blender.initialize:?gc.alloc.rate.norm 1 avgt 40000081.600 B/op > Blender.initialize:?gc.count 1 avgt 4.000 counts > Blender.initialize:?gc.time 1 avgt 65.000 ms > Blender.initialize 2 avgt 226255767.800 ns/op > Blender.initialize:?gc.alloc.rate 2 avgt 168.466 MB/sec > Blender.initialize:?gc.alloc.rate.norm 2 avgt 40000081.600 B/op > Blender.initialize:?gc.count 2 avgt 4.000 counts > Blender.initialize:?gc.time 2 avgt 58.000 ms > Blender.initialize 3 avgt 225596324.600 ns/op > Blender.initialize:?gc.alloc.rate 3 avgt 168.960 MB/sec > Blender.initialize:?gc.alloc.rate.norm 3 avgt 40000081.600 B/op > Blender.initialize:?gc.count 3 avgt 4.000 counts > Blender.initialize:?gc.time 3 avgt 55.000 ms > Blender.initialize 4 avgt 224856811.000 ns/op > Blender.initialize:?gc.alloc.rate 4 avgt 169.520 MB/sec > Blender.initialize:?gc.alloc.rate.norm 4 avgt 40000081.600 B/op > Blender.initialize:?gc.count 4 avgt 4.000 counts > Blender.initialize:?gc.time ... Can we still merge this into OpenJDK? For example, I can close this PR, leave the JBS issue open, and let someone at Oracle author the patch. Would folks at Oracle want to integrate this benchmark into OpenJDK? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14941#issuecomment-1645130498 From dnsimon at openjdk.org Fri Jul 21 07:54:40 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 21 Jul 2023 07:54:40 GMT Subject: RFR: 8312420: Integrate Graal's blender micro benchmark In-Reply-To: <5sj7hpmmChUitKVYH-je8xq3AAA_GkjcFXJl6uGnGQc=.59e78f75-12a8-4b7b-8817-627f65149718@github.com> References: <5sj7hpmmChUitKVYH-je8xq3AAA_GkjcFXJl6uGnGQc=.59e78f75-12a8-4b7b-8817-627f65149718@github.com> Message-ID: On Wed, 19 Jul 2023 22:31:40 GMT, Joshua Cao wrote: > We would like to integrate Graal's blender micro benchmark from https://www.graalvm.org/22.1/examples/java-performance-examples/. We have been using this benchmark to test our partial escape analysis work (https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2023-July/066670.html). This test can exist independently of the project. > > > example command to run test: > > > make run-test TEST=micro:org.openjdk.bench.vm.compiler.pea.Blender MICRO="FORK=1;OPTIONS=-prof gc -gc true" > > > example output (not complete): > > > Benchmark (iteration) Mode Cnt Score Error Units [29/1913] > Blender.initialize 1 avgt 227997775.000 ns/op > Blender.initialize:?gc.alloc.rate 1 avgt 167.192 MB/sec > Blender.initialize:?gc.alloc.rate.norm 1 avgt 40000081.600 B/op > Blender.initialize:?gc.count 1 avgt 4.000 counts > Blender.initialize:?gc.time 1 avgt 65.000 ms > Blender.initialize 2 avgt 226255767.800 ns/op > Blender.initialize:?gc.alloc.rate 2 avgt 168.466 MB/sec > Blender.initialize:?gc.alloc.rate.norm 2 avgt 40000081.600 B/op > Blender.initialize:?gc.count 2 avgt 4.000 counts > Blender.initialize:?gc.time 2 avgt 58.000 ms > Blender.initialize 3 avgt 225596324.600 ns/op > Blender.initialize:?gc.alloc.rate 3 avgt 168.960 MB/sec > Blender.initialize:?gc.alloc.rate.norm 3 avgt 40000081.600 B/op > Blender.initialize:?gc.count 3 avgt 4.000 counts > Blender.initialize:?gc.time 3 avgt 55.000 ms > Blender.initialize 4 avgt 224856811.000 ns/op > Blender.initialize:?gc.alloc.rate 4 avgt 169.520 MB/sec > Blender.initialize:?gc.alloc.rate.norm 4 avgt 40000081.600 B/op > Blender.initialize:?gc.count 4 avgt 4.000 counts > Blender.initialize:?gc.time ... Simply updating the copyright message in this PR should be fine to integrate it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14941#issuecomment-1645161033 From dfenacci at openjdk.org Fri Jul 21 09:02:41 2023 From: dfenacci at openjdk.org (Damon Fenacci) Date: Fri, 21 Jul 2023 09:02:41 GMT Subject: RFR: JDK-8304954: SegmentedCodeCache fails when using large pages In-Reply-To: References: Message-ID: On Thu, 20 Jul 2023 14:03:39 GMT, Thomas Stuefe wrote: >> # Issue >> >> When large pages are enabled and segmented code cache is used, the VM tries to use one page for each segment. If the amount of reserved code cache is limited, this can make the total size of the code cache bigger than the reserved size, which in turn makes the VM fail, claiming that there is not enough space (e.g. this happens when running `java -XX:+UseLargePages -XX:+SegmentedCodeCache -XX:InitialCodeCacheSize=2g -XX:ReservedCodeCacheSize=2g -XX:LargePageSizeInBytes=1g -Xlog:pagesize*=debug -version`). >> This behaviour is not correct as the VM should fall back and try with a smaller page size (and print a warning). >> >> # Solution >> >> Right before reserving heap space for code segments we introduce a test (if we are using large pages) that checks for the total size of aligned code segments and selects smaller and smaller page sizes until the total size fits in the reserved size for the code cache. >> >> # Test >> >> The regression test runs a new VM with `-XX:+UseLargePages -XX:LargePageSizeInBytes=1g`. The `main` method then checks if the two flags have been "taken". If so, another process is started that checks for a specific output, otherwise the test passes (i.e. the current system doesn't allow 1GB large pages) > > Hi @dafedafe, > > good catch! May I propose a much simpler fix, though? > > The problem is that we don't pass the correct value for `min_pages` to `CodeCache::page_size`. > > `CodeCache::page_size()` calls either one of `os::page_size_for_region_aligned()` or `os::page_size_for_region_unaligned()`, which already tries to fit the given memory region by iterating through page sizes, exactly like you do. > > If you change `CodeCache::reserve_heap_memory()` to pass in a minimum number of pages of "8" (as we pass "8" in the earlier call that calculates the alignment of the segments), the error goes away. > > Arguably, one should calculate the page size just once and then always use that calculated value, instead of recalculating it differently each time. > > > diff --git a/src/hotspot/share/code/codeCache.cpp b/src/hotspot/share/code/codeCache.cpp > index 2ea72a1fcbd..7a30bfb1783 100644 > --- a/src/hotspot/share/code/codeCache.cpp > +++ b/src/hotspot/share/code/codeCache.cpp > @@ -356,7 +356,7 @@ size_t CodeCache::page_size(bool aligned, size_t min_pages) { > > ReservedCodeSpace CodeCache::reserve_heap_memory(size_t size) { > // Align and reserve space for code cache > - const size_t rs_ps = page_size(); > + const size_t rs_ps = page_size(false, 8); > const size_t rs_align = MAX2(rs_ps, os::vm_allocation_granularity()); > const size_t rs_size = align_up(size, rs_align); > ReservedCodeSpace rs(rs_size, rs_align, rs_ps); > > > On my machine with both 1G and 2M pages configured, with the fix, we now don't crash but use 2 MB pages for the code cache: > > > thomas at starfish$ ./images/jdk/bin/java -XX:+UseLargePages -XX:+SegmentedCodeCache -XX:InitialCodeCacheSize=2g -XX:ReservedCodeCacheSize=2g -XX:LargePageSizeInBytes=1g -Xlog:pagesize*=debug -version > [0.001s][info][pagesize] Static hugepage support: > [0.001s][info][pagesize] hugepage size: 2M, nr_hugepages: 2000, nr_overcommit_hugepages: 0 > [0.001s][info][pagesize] hugepage size: 1G, nr_hugepages: 5, nr_overcommit_hugepages: 0 > [0.001s][info][pagesize] default hugepage size: 2M > [0.001s][info][pagesize] Transparent hugepage (THP) support: > [0.001s][info][pagesize] THP mode: always > [0.001s][info][pagesize] THP pagesize: 2M > [0.001s][info][pagesize] Overriding default large page size (2M) using LargePageSizeInBytes: 1G > [0.001s][info][pagesize] UseLargePages=1, UseTransparentHugePages=0, UseHugeTLBFS=1, UseSHM=0 > [0.001s][info][pagesize] Large page support enabled. Usable page sizes: 4k, 2M, 1G. Default large page size: 1G. > ... > [0.002s][info ][pag... Thanks a lot for your review @tstuefe > The problem is that we don't pass the correct value for `min_pages` to `CodeCache::page_size`. > > `CodeCache::page_size()` calls either one of `os::page_size_for_region_aligned()` or `os::page_size_for_region_unaligned()`, which already tries to fit the given memory region by iterating through page sizes, exactly like you do. > > If you change `CodeCache::reserve_heap_memory()` to pass in a minimum number of pages of "8" (as we pass "8" in the earlier call that calculates the alignment of the segments), the error goes away. You're right, it is much simpler! I noticed the 8 passed earlier but I came to this solution since I was wondering if we really want to give a minimum of 8 pages, especially if we use large pages (on the other hand I didn't want to change the earlier code as it is used for non large pages as well). But I'm not sure this makes sense as I might not have a full picture. > On my machine with both 1G and 2M pages configured, with the fix, we now don't crash but use 2 MB pages for the code cache: You're right, with that machine (which has the same large page configuration as the one I used to test) the result is the same but I was wondering if this is always the case (e.g. with a large page of 1/2G). ------------- PR Comment: https://git.openjdk.org/jdk/pull/14903#issuecomment-1645247576 From stuefe at openjdk.org Fri Jul 21 09:32:40 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 21 Jul 2023 09:32:40 GMT Subject: RFR: JDK-8304954: SegmentedCodeCache fails when using large pages In-Reply-To: References: Message-ID: On Thu, 20 Jul 2023 14:03:39 GMT, Thomas Stuefe wrote: >> # Issue >> >> When large pages are enabled and segmented code cache is used, the VM tries to use one page for each segment. If the amount of reserved code cache is limited, this can make the total size of the code cache bigger than the reserved size, which in turn makes the VM fail, claiming that there is not enough space (e.g. this happens when running `java -XX:+UseLargePages -XX:+SegmentedCodeCache -XX:InitialCodeCacheSize=2g -XX:ReservedCodeCacheSize=2g -XX:LargePageSizeInBytes=1g -Xlog:pagesize*=debug -version`). >> This behaviour is not correct as the VM should fall back and try with a smaller page size (and print a warning). >> >> # Solution >> >> Right before reserving heap space for code segments we introduce a test (if we are using large pages) that checks for the total size of aligned code segments and selects smaller and smaller page sizes until the total size fits in the reserved size for the code cache. >> >> # Test >> >> The regression test runs a new VM with `-XX:+UseLargePages -XX:LargePageSizeInBytes=1g`. The `main` method then checks if the two flags have been "taken". If so, another process is started that checks for a specific output, otherwise the test passes (i.e. the current system doesn't allow 1GB large pages) > > Hi @dafedafe, > > good catch! May I propose a much simpler fix, though? > > The problem is that we don't pass the correct value for `min_pages` to `CodeCache::page_size`. > > `CodeCache::page_size()` calls either one of `os::page_size_for_region_aligned()` or `os::page_size_for_region_unaligned()`, which already tries to fit the given memory region by iterating through page sizes, exactly like you do. > > If you change `CodeCache::reserve_heap_memory()` to pass in a minimum number of pages of "8" (as we pass "8" in the earlier call that calculates the alignment of the segments), the error goes away. > > Arguably, one should calculate the page size just once and then always use that calculated value, instead of recalculating it differently each time. > > > diff --git a/src/hotspot/share/code/codeCache.cpp b/src/hotspot/share/code/codeCache.cpp > index 2ea72a1fcbd..7a30bfb1783 100644 > --- a/src/hotspot/share/code/codeCache.cpp > +++ b/src/hotspot/share/code/codeCache.cpp > @@ -356,7 +356,7 @@ size_t CodeCache::page_size(bool aligned, size_t min_pages) { > > ReservedCodeSpace CodeCache::reserve_heap_memory(size_t size) { > // Align and reserve space for code cache > - const size_t rs_ps = page_size(); > + const size_t rs_ps = page_size(false, 8); > const size_t rs_align = MAX2(rs_ps, os::vm_allocation_granularity()); > const size_t rs_size = align_up(size, rs_align); > ReservedCodeSpace rs(rs_size, rs_align, rs_ps); > > > On my machine with both 1G and 2M pages configured, with the fix, we now don't crash but use 2 MB pages for the code cache: > > > thomas at starfish$ ./images/jdk/bin/java -XX:+UseLargePages -XX:+SegmentedCodeCache -XX:InitialCodeCacheSize=2g -XX:ReservedCodeCacheSize=2g -XX:LargePageSizeInBytes=1g -Xlog:pagesize*=debug -version > [0.001s][info][pagesize] Static hugepage support: > [0.001s][info][pagesize] hugepage size: 2M, nr_hugepages: 2000, nr_overcommit_hugepages: 0 > [0.001s][info][pagesize] hugepage size: 1G, nr_hugepages: 5, nr_overcommit_hugepages: 0 > [0.001s][info][pagesize] default hugepage size: 2M > [0.001s][info][pagesize] Transparent hugepage (THP) support: > [0.001s][info][pagesize] THP mode: always > [0.001s][info][pagesize] THP pagesize: 2M > [0.001s][info][pagesize] Overriding default large page size (2M) using LargePageSizeInBytes: 1G > [0.001s][info][pagesize] UseLargePages=1, UseTransparentHugePages=0, UseHugeTLBFS=1, UseSHM=0 > [0.001s][info][pagesize] Large page support enabled. Usable page sizes: 4k, 2M, 1G. Default large page size: 1G. > ... > [0.002s][info ][pag... > Thanks a lot for your review @tstuefe > > > The problem is that we don't pass the correct value for `min_pages` to `CodeCache::page_size`. > > `CodeCache::page_size()` calls either one of `os::page_size_for_region_aligned()` or `os::page_size_for_region_unaligned()`, which already tries to fit the given memory region by iterating through page sizes, exactly like you do. > > If you change `CodeCache::reserve_heap_memory()` to pass in a minimum number of pages of "8" (as we pass "8" in the earlier call that calculates the alignment of the segments), the error goes away. > > You're right, it is much simpler! I noticed the 8 passed earlier but I came to this solution since I was wondering if we really want to give a minimum of 8 pages, especially if we use large pages (on the other hand I didn't want to change the earlier code as it is used for non large pages as well). But I'm not sure this makes sense as I might not have a full picture. I think we need not 8, but at least 3 if the codecache size is already 3*pagesize, otherwise we need at least 6: https://github.com/openjdk/jdk/blob/55aa122462c34d8f4cafa58f4d1f2d900449c83e/src/hotspot/share/code/codeCache.cpp#L315-L318 for all these align operations to end up with non-zero results for every segment. But I would keep it at 8. Just change one thing at a time (and I may overlook another reason for that minimum page number). That solves the immediate problem. Thinking further, I believe there is a case for keeping all segments in one 1G page: - if we don't plan on uncommitting the heap, there is no need to align the segments to page boundaries. So we could run with a single static 1 GB hugepage. - if we plan to uncommit the heap, each segment should be larger than 1 page, since we probably never will be able to uncommit a segment fully. But that's for a future RFE. The solution I proposed has the advantage that its easy to downport, since its minimally invasive. > > > On my machine with both 1G and 2M pages configured, with the fix, we now don't crash but use 2 MB pages for the code cache: > > You're right, with that machine (which has the same large page configuration as the one I used to test) the result is the same but I was wondering if this is always the case (e.g. with a large page of 1/2G). I believe as long as the number of pages > 6 this should always work. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14903#issuecomment-1645290375 From bulasevich at openjdk.org Fri Jul 21 11:03:33 2023 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Fri, 21 Jul 2023 11:03:33 GMT Subject: RFR: 8293170: Improve encoding of the debuginfo nmethod section [v2] In-Reply-To: <47iDSIn_k8jXYisbGzwQOILoNh45xPO3kkj-k2-BZ1E=.2ea0570e-4055-4517-a128-2af1b62c7529@github.com> References: <47iDSIn_k8jXYisbGzwQOILoNh45xPO3kkj-k2-BZ1E=.2ea0570e-4055-4517-a128-2af1b62c7529@github.com> Message-ID: > This is another pull request to replace https://github.com/openjdk/jdk/pull/10025 change which was blocked as not acceptable (see https://github.com/openjdk/jdk/pull/10025#pullrequestreview-1228216330) > > The objections to change #10025 were: > - specialized algorithm for given data complicates things, makes it hard to learn, test and support > - algorithm is changed for DebugInfo, and the benefit is only for one type of data > - statistics of the debug info data can (will) change, breaking the optimization > > The suggestion was: > - don't change the core algorithm, but add one on top or underneath the existing one, or reuse off-the-shelf zero-reduction schemes such as Cap'n Proto > > With this change I propose a different approach. Instead of bit coding, the sequence of zeros in a data stream is encoded with a special character that normally never appears in Unsinged5 encoding, followed by a byte containing a number of zeros. In this way the updated algorithm is a pure extension of the existing encoding algorithm: data encoded without the zero-reduction trick is unpacked in the same way as before. > > Currently there are several datasets affected by this change: Dependencies info, OopMap info, LineNumber info, Debug info. Only Debug info has a large number of zeros and gets a significant benefit. I experimented with the Cap'n Proto and lz4 algorithms on DebugInfo. The Unsinged5 algorithm has a better compression rate than these. > > DebugInfo data size is reduced by ~20% (actually, 10-30%, depending on the application). Total nmethod size reduction is ~3%. > > Performance impact: Renaisance and DaCapo benchmarks do not show any difference. Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision: addressing review comments: super call, clarifying, and adding tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12387/files - new: https://git.openjdk.org/jdk/pull/12387/files/0148f580..0632c968 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12387&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12387&range=00-01 Stats: 276 lines in 4 files changed: 274 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/12387.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12387/head:pull/12387 PR: https://git.openjdk.org/jdk/pull/12387 From bulasevich at openjdk.org Fri Jul 21 11:03:35 2023 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Fri, 21 Jul 2023 11:03:35 GMT Subject: RFR: 8293170: Improve encoding of the debuginfo nmethod section [v2] In-Reply-To: References: <47iDSIn_k8jXYisbGzwQOILoNh45xPO3kkj-k2-BZ1E=.2ea0570e-4055-4517-a128-2af1b62c7529@github.com> Message-ID: On Thu, 6 Jul 2023 22:32:41 GMT, Serguei Spitsyn wrote: >> Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision: >> >> addressing review comments: super call, clarifying, and adding tests > > src/hotspot/share/code/compressedStream.hpp line 118: > >> 116: bool handle_zero(juint value) { >> 117: if (value == 0) { >> 118: _zero_count = (_zero_count == 0xFF) ? 0 : _zero_count; > > The case of `_zero_count` overflow is not clear. Apparently, I'm missing something here. > Current code is just clearing the previously counted `_zero_count`. > I'd expect some action like storing the current number of zeros or advancing the `_position`. > Do you have a test for this? Good question. It works like this. On each write(), values are stored in the array, either normally or as a chain optimization (two bytes: zero byte and a number of zeros). When we get another zero value, the chain is updated. When the chain is full, we exit this function and the value is written by UNSIGNED5::write_uint_grow(). To make things clear, I have updated this code: - _zero_count = (_zero_count == 0xFF) ? 0 : _zero_count; + if (_zero_count == 0xFF) { // biggest zero chain length is 255 + _zero_count = 1; + // for now, write it as an ordinary value (UNSINGED5 encodes zero int as a single byte) + // the new zero sequence is started if there are more than two zero values in a raw + return false; + } Plus, I have added gtests to check this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12387#discussion_r1270543845 From bulasevich at openjdk.org Fri Jul 21 11:06:37 2023 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Fri, 21 Jul 2023 11:06:37 GMT Subject: RFR: 8293170: Improve encoding of the debuginfo nmethod section [v3] In-Reply-To: <47iDSIn_k8jXYisbGzwQOILoNh45xPO3kkj-k2-BZ1E=.2ea0570e-4055-4517-a128-2af1b62c7529@github.com> References: <47iDSIn_k8jXYisbGzwQOILoNh45xPO3kkj-k2-BZ1E=.2ea0570e-4055-4517-a128-2af1b62c7529@github.com> Message-ID: <6cnOjY0DOPL9E7tgZDZXpkSbOOi9HwNczxkjvo_eVvs=.e9e614cd-dc34-49f9-a4a6-2bf551c70469@github.com> > This is another pull request to replace https://github.com/openjdk/jdk/pull/10025 change which was blocked as not acceptable (see https://github.com/openjdk/jdk/pull/10025#pullrequestreview-1228216330) > > The objections to change #10025 were: > - specialized algorithm for given data complicates things, makes it hard to learn, test and support > - algorithm is changed for DebugInfo, and the benefit is only for one type of data > - statistics of the debug info data can (will) change, breaking the optimization > > The suggestion was: > - don't change the core algorithm, but add one on top or underneath the existing one, or reuse off-the-shelf zero-reduction schemes such as Cap'n Proto > > With this change I propose a different approach. Instead of bit coding, the sequence of zeros in a data stream is encoded with a special character that normally never appears in Unsinged5 encoding, followed by a byte containing a number of zeros. In this way the updated algorithm is a pure extension of the existing encoding algorithm: data encoded without the zero-reduction trick is unpacked in the same way as before. > > Currently there are several datasets affected by this change: Dependencies info, OopMap info, LineNumber info, Debug info. Only Debug info has a large number of zeros and gets a significant benefit. I experimented with the Cap'n Proto and lz4 algorithms on DebugInfo. The Unsinged5 algorithm has a better compression rate than these. > > DebugInfo data size is reduced by ~20% (actually, 10-30%, depending on the application). Total nmethod size reduction is ~3%. > > Performance impact: Renaisance and DaCapo benchmarks do not show any difference. Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision: whitespace error ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12387/files - new: https://git.openjdk.org/jdk/pull/12387/files/0632c968..b6b80ae5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12387&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12387&range=01-02 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/12387.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12387/head:pull/12387 PR: https://git.openjdk.org/jdk/pull/12387 From bulasevich at openjdk.org Fri Jul 21 11:06:59 2023 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Fri, 21 Jul 2023 11:06:59 GMT Subject: RFR: 8293170: Improve encoding of the debuginfo nmethod section [v3] In-Reply-To: <48GcHhDYvV1QpMaREDRwoMcmBGPTj0IVahSQZjuwLbc=.5166409e-e7b1-475f-a637-45be93e6c582@github.com> References: <47iDSIn_k8jXYisbGzwQOILoNh45xPO3kkj-k2-BZ1E=.2ea0570e-4055-4517-a128-2af1b62c7529@github.com> <48GcHhDYvV1QpMaREDRwoMcmBGPTj0IVahSQZjuwLbc=.5166409e-e7b1-475f-a637-45be93e6c582@github.com> Message-ID: On Thu, 6 Jul 2023 15:11:50 GMT, Chris Plummer wrote: >> Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision: >> >> whitespace error > > src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/code/CompressedReadStream.java line 106: > >> 104: @Override >> 105: public void setPosition(int position) { >> 106: this.position = position; > > Maybe a call to `super()` would be better. ok. thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12387#discussion_r1270550701 From roland at openjdk.org Fri Jul 21 12:14:46 2023 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 21 Jul 2023 12:14:46 GMT Subject: RFR: 8308504: C2: "malformed control flow" after JDK-8303466 Message-ID: <7Vy9q8XEuzC40Za_CK-nc6G3EAZocEJ0T8-gdWTi9So=.d9aece29-c2d4-49c7-9a2c-22abc56920e8@github.com> I took that bug over from Emanuel because he's away: https://github.com/openjdk/jdk/pull/14331 I tried adding a `CastII` to narrow the limit of the loop as I suggested in a comment on the PR but I found that doesn't work in all cases: if the type of the initial value for the loop variable is not narrow enough, then the narrower type for the limit doesn't help narrow the loop phi type. What I propose instead is to add an assert predicate that catches when the main loop is unreachable but the zero trip count doesn't constant fold. For that to work, the order of predicates must be preserved when they are copied or updated. I had to make some small changes to guarantee that. ------------- Commit messages: - fix and test Changes: https://git.openjdk.org/jdk/pull/14973/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14973&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308504 Stats: 195 lines in 4 files changed: 154 ins; 16 del; 25 mod Patch: https://git.openjdk.org/jdk/pull/14973.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14973/head:pull/14973 PR: https://git.openjdk.org/jdk/pull/14973 From dcubed at openjdk.org Fri Jul 21 16:17:44 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Fri, 21 Jul 2023 16:17:44 GMT Subject: RFR: JDK-8310316: Failing HotSpot Compiler directives are too verbose In-Reply-To: References: Message-ID: <_w6wxQzahKg7fwk4wLuer3-q1CIZSWTgLNItlmQiiZM=.2dae2ecd-94b7-419b-9c60-f9335e77159a@github.com> On Thu, 20 Jul 2023 15:04:28 GMT, Eric Nothum wrote: > Previously jcmd printed the whole file if a compiler directive was added that was not in json format. This example illustrates the issue: > > > ./jcmd 331311 Compiler.directives_add ./example.txt > 331311: > Syntax error on line 1 byte 1: Json must start with an object or an array. > At 'This'. > This is my very interesting text, > followed by some more exciting text. > > Parsing of compiler directives failed > Could not load file: ./example.txt > > The json error message is not printed if the silent field is set in the `DirectivesParser` object. > The proposed change adds a boolean parameter silent that is propagated from `CompilerDirectivesAddDCmd::execute` to the `DirectivesParser` constructor. The default value for the new parameter is set to false, which represents the original behavior. In case where a compiler directive is added, the parameter is set to true and the error message will be reduced. > > The proposed change reduces the error message to: > > > ./jcmd 335703 Compiler.directives_add ./example.txt > 335703: > Parsing of compiler directives failed > Could not load file: ./example.txt Redirecting to the proper label: ------------- PR Comment: https://git.openjdk.org/jdk/pull/14957#issuecomment-1645881127 From rrich at openjdk.org Fri Jul 21 17:05:48 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 21 Jul 2023 17:05:48 GMT Subject: RFR: 8312495: assert(0 <= i && i < _len) failed: illegal index after JDK-8287061 on big endian platforms Message-ID: <51ZON6Vdj1yrENrGYzmMEndXIRvJ1d7Nk72a4ikx3rw=.ee657823-f69f-42ce-886a-25a7f6d71840@github.com> On big endian platforms `jint` values are stored in the high part of `StackValue` values. Therefore the the `StackValue` cannot be cast directly to `jint`. More details why this has to be like this are given in the JBS issue. This is a common pattern. See also https://github.com/openjdk/jdk/blob/8d29329138d44800ee4c0c02dacc01a06097de66/src/hotspot/share/runtime/deoptimization.cpp#L1386-L1387 https://github.com/openjdk/jdk/blob/8d29329138d44800ee4c0c02dacc01a06097de66/src/hotspot/share/runtime/deoptimization.cpp#L1513-L1514 ### Testing Manny iterations of vmTestbase/vm/mlvm/meth/stress/compiler/sequences/Test.java. JTReg tests: tier1-4 of hotspot and jdk. All of Langtools and jaxp. Renaissance benchmarks as functional tests. All testing was done with fastdebug and release builds on the main platforms and also on Linux/PPC64le. ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/14976/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14976&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8312495 Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14976.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14976/head:pull/14976 PR: https://git.openjdk.org/jdk/pull/14976 From dnsimon at openjdk.org Fri Jul 21 20:32:57 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 21 Jul 2023 20:32:57 GMT Subject: RFR: 8312524: [JVMCI] serviceability/dcmd/compiler/CompilerQueueTest.java fails Message-ID: <5FDrZ2dpXh882r6ye9CJkICOq1VMFwZ4Y9gK17swnYE=.302db85d-fc8a-4ddd-96fd-def99ba745a4@github.com> This PR adds logic to the CompileBroker for implementing `WhiteBox.lockCompilation()` when `UseJVMCICompiler` is true. ------------- Commit messages: - implement whitebox compilation locking for JVMCI in CompileBroker Changes: https://git.openjdk.org/jdk/pull/14979/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14979&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8312524 Stats: 18 lines in 1 file changed: 14 ins; 3 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14979.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14979/head:pull/14979 PR: https://git.openjdk.org/jdk/pull/14979 From never at openjdk.org Fri Jul 21 20:36:41 2023 From: never at openjdk.org (Tom Rodriguez) Date: Fri, 21 Jul 2023 20:36:41 GMT Subject: RFR: 8312524: [JVMCI] serviceability/dcmd/compiler/CompilerQueueTest.java fails In-Reply-To: <5FDrZ2dpXh882r6ye9CJkICOq1VMFwZ4Y9gK17swnYE=.302db85d-fc8a-4ddd-96fd-def99ba745a4@github.com> References: <5FDrZ2dpXh882r6ye9CJkICOq1VMFwZ4Y9gK17swnYE=.302db85d-fc8a-4ddd-96fd-def99ba745a4@github.com> Message-ID: <9zR0zH5Su5vzWi6S7F21x8plCCZNUF9ZDlXdtJX8wZU=.99f2bebd-0c7d-4ad7-8d52-36b31522dc45@github.com> On Fri, 21 Jul 2023 20:23:31 GMT, Doug Simon wrote: > This PR adds logic to the CompileBroker for implementing `WhiteBox.lockCompilation()` when `UseJVMCICompiler` is true. Looks good. ------------- Marked as reviewed by never (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14979#pullrequestreview-1541819411 From clanger at openjdk.org Fri Jul 21 21:26:43 2023 From: clanger at openjdk.org (Christoph Langer) Date: Fri, 21 Jul 2023 21:26:43 GMT Subject: RFR: 8312495: assert(0 <= i && i < _len) failed: illegal index after JDK-8287061 on big endian platforms In-Reply-To: <51ZON6Vdj1yrENrGYzmMEndXIRvJ1d7Nk72a4ikx3rw=.ee657823-f69f-42ce-886a-25a7f6d71840@github.com> References: <51ZON6Vdj1yrENrGYzmMEndXIRvJ1d7Nk72a4ikx3rw=.ee657823-f69f-42ce-886a-25a7f6d71840@github.com> Message-ID: On Fri, 21 Jul 2023 14:24:35 GMT, Richard Reingruber wrote: > On big endian platforms `jint` values are stored in the high part of `StackValue` values. Therefore the the `StackValue` cannot be cast directly to `jint`. More details why this has to be like this are given in the JBS issue. > > This is a common pattern. See also > > https://github.com/openjdk/jdk/blob/8d29329138d44800ee4c0c02dacc01a06097de66/src/hotspot/share/runtime/deoptimization.cpp#L1386-L1387 > https://github.com/openjdk/jdk/blob/8d29329138d44800ee4c0c02dacc01a06097de66/src/hotspot/share/runtime/deoptimization.cpp#L1513-L1514 > > ### Testing > Manny iterations of vmTestbase/vm/mlvm/meth/stress/compiler/sequences/Test.java. > > JTReg tests: tier1-4 of hotspot and jdk. All of Langtools and jaxp. Renaissance benchmarks as functional tests. > > All testing was done with fastdebug and release builds on the main platforms and also on Linux/PPC64le. Looks good. Minor formatting comment. src/hotspot/share/code/debugInfo.cpp line 248: > 246: intptr_t val = sv_selector->get_int(); > 247: jint selector = (jint)*((jint*)&val); > 248: not necessary to add a new line here ------------- Marked as reviewed by clanger (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14976#pullrequestreview-1541864001 PR Review Comment: https://git.openjdk.org/jdk/pull/14976#discussion_r1271120352 From dlong at openjdk.org Fri Jul 21 22:15:40 2023 From: dlong at openjdk.org (Dean Long) Date: Fri, 21 Jul 2023 22:15:40 GMT Subject: RFR: 8312495: assert(0 <= i && i < _len) failed: illegal index after JDK-8287061 on big endian platforms In-Reply-To: <51ZON6Vdj1yrENrGYzmMEndXIRvJ1d7Nk72a4ikx3rw=.ee657823-f69f-42ce-886a-25a7f6d71840@github.com> References: <51ZON6Vdj1yrENrGYzmMEndXIRvJ1d7Nk72a4ikx3rw=.ee657823-f69f-42ce-886a-25a7f6d71840@github.com> Message-ID: On Fri, 21 Jul 2023 14:24:35 GMT, Richard Reingruber wrote: > On big endian platforms `jint` values are stored in the high part of `StackValue` values. Therefore the the `StackValue` cannot be cast directly to `jint`. More details why this has to be like this are given in the JBS issue. > > This is a common pattern. See also > > https://github.com/openjdk/jdk/blob/8d29329138d44800ee4c0c02dacc01a06097de66/src/hotspot/share/runtime/deoptimization.cpp#L1386-L1387 > https://github.com/openjdk/jdk/blob/8d29329138d44800ee4c0c02dacc01a06097de66/src/hotspot/share/runtime/deoptimization.cpp#L1513-L1514 > > ### Testing > Manny iterations of vmTestbase/vm/mlvm/meth/stress/compiler/sequences/Test.java. > > JTReg tests: tier1-4 of hotspot and jdk. All of Langtools and jaxp. Renaissance benchmarks as functional tests. > > All testing was done with fastdebug and release builds on the main platforms and also on Linux/PPC64le. How about introducing something like `jint Value::get_jint()` that can be used in place of this error-prone pattern? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14976#issuecomment-1646285828 From kvn at openjdk.org Fri Jul 21 22:26:41 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 21 Jul 2023 22:26:41 GMT Subject: RFR: 8312495: assert(0 <= i && i < _len) failed: illegal index after JDK-8287061 on big endian platforms In-Reply-To: <51ZON6Vdj1yrENrGYzmMEndXIRvJ1d7Nk72a4ikx3rw=.ee657823-f69f-42ce-886a-25a7f6d71840@github.com> References: <51ZON6Vdj1yrENrGYzmMEndXIRvJ1d7Nk72a4ikx3rw=.ee657823-f69f-42ce-886a-25a7f6d71840@github.com> Message-ID: On Fri, 21 Jul 2023 14:24:35 GMT, Richard Reingruber wrote: > On big endian platforms `jint` values are stored in the high part of `StackValue` values. Therefore the the `StackValue` cannot be cast directly to `jint`. More details why this has to be like this are given in the JBS issue. > > This is a common pattern. See also > > https://github.com/openjdk/jdk/blob/8d29329138d44800ee4c0c02dacc01a06097de66/src/hotspot/share/runtime/deoptimization.cpp#L1386-L1387 > https://github.com/openjdk/jdk/blob/8d29329138d44800ee4c0c02dacc01a06097de66/src/hotspot/share/runtime/deoptimization.cpp#L1513-L1514 > > ### Testing > Manny iterations of vmTestbase/vm/mlvm/meth/stress/compiler/sequences/Test.java. > > JTReg tests: tier1-4 of hotspot and jdk. All of Langtools and jaxp. Renaissance benchmarks as functional tests. > > All testing was done with fastdebug and release builds on the main platforms and also on Linux/PPC64le. src/hotspot/share/code/debugInfo.cpp line 247: > 245: // On big endian platforms the jint is in the high part of the StackValue > 246: intptr_t val = sv_selector->get_int(); > 247: jint selector = (jint)*((jint*)&val); You don't need outer cast but you may need `()` around `(&val)`. See the example: https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/stackValueCollection.cpp#L29 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14976#discussion_r1271147128 From kvn at openjdk.org Fri Jul 21 22:29:39 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 21 Jul 2023 22:29:39 GMT Subject: RFR: 8312495: assert(0 <= i && i < _len) failed: illegal index after JDK-8287061 on big endian platforms In-Reply-To: References: <51ZON6Vdj1yrENrGYzmMEndXIRvJ1d7Nk72a4ikx3rw=.ee657823-f69f-42ce-886a-25a7f6d71840@github.com> Message-ID: On Fri, 21 Jul 2023 22:13:09 GMT, Dean Long wrote: > How about introducing something like `jint Value::get_jint()` that can be used in place of this error-prone pattern? And may be rename current method `get_int()` to `get_intptr()` to match type of return value. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14976#issuecomment-1646298211 From jwaters at openjdk.org Sat Jul 22 02:57:11 2023 From: jwaters at openjdk.org (Julian Waters) Date: Sat, 22 Jul 2023 02:57:11 GMT Subject: RFR: 8312502: Mass migrate HotSpot attributes to the correct location Message-ID: Someone had to do it, so I did. Moves attributes to the correct place as specified in the HotSpot Style Guide once and for all ------------- Commit messages: - arguments.hpp - arguments.hpp - globalDefinitions_gcc.hpp - assembler_aarch64.hpp - macroAssembler_aarch64.cpp - vmError.cpp - vmError.cpp - macroAssembler_aarch64.cpp - assembler_aarch64.hpp - os_linux.cpp - ... and 29 more: https://git.openjdk.org/jdk/compare/8cd43bff...58b52fce Changes: https://git.openjdk.org/jdk/pull/14969/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14969&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8312502 Stats: 170 lines in 34 files changed: 60 ins; 0 del; 110 mod Patch: https://git.openjdk.org/jdk/pull/14969.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14969/head:pull/14969 PR: https://git.openjdk.org/jdk/pull/14969 From rrich at openjdk.org Sat Jul 22 06:13:56 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Sat, 22 Jul 2023 06:13:56 GMT Subject: RFR: 8312495: assert(0 <= i && i < _len) failed: illegal index after JDK-8287061 on big endian platforms In-Reply-To: <51ZON6Vdj1yrENrGYzmMEndXIRvJ1d7Nk72a4ikx3rw=.ee657823-f69f-42ce-886a-25a7f6d71840@github.com> References: <51ZON6Vdj1yrENrGYzmMEndXIRvJ1d7Nk72a4ikx3rw=.ee657823-f69f-42ce-886a-25a7f6d71840@github.com> Message-ID: On Fri, 21 Jul 2023 14:24:35 GMT, Richard Reingruber wrote: > On big endian platforms `jint` values are stored in the high part of `StackValue` values. Therefore the the `StackValue` cannot be cast directly to `jint`. More details why this has to be like this are given in the JBS issue. > > This is a common pattern. See also > > https://github.com/openjdk/jdk/blob/8d29329138d44800ee4c0c02dacc01a06097de66/src/hotspot/share/runtime/deoptimization.cpp#L1386-L1387 > https://github.com/openjdk/jdk/blob/8d29329138d44800ee4c0c02dacc01a06097de66/src/hotspot/share/runtime/deoptimization.cpp#L1513-L1514 > > ### Testing > Manny iterations of vmTestbase/vm/mlvm/meth/stress/compiler/sequences/Test.java. > > JTReg tests: tier1-4 of hotspot and jdk. All of Langtools and jaxp. Renaissance benchmarks as functional tests. > > All testing was done with fastdebug and release builds on the main platforms and also on Linux/PPC64le. Thanks for the feedback! > > How about introducing something like `jint Value::get_jint()` that can be used in place of this error-prone pattern? > > And may be rename current method `get_int()` to `get_intptr()` to match type of return value. I agree. I thought it should be done in a dedicated RFE. But I can do it in this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14976#issuecomment-1646500655 From dlong at openjdk.org Sat Jul 22 08:14:40 2023 From: dlong at openjdk.org (Dean Long) Date: Sat, 22 Jul 2023 08:14:40 GMT Subject: RFR: 8312495: assert(0 <= i && i < _len) failed: illegal index after JDK-8287061 on big endian platforms In-Reply-To: References: <51ZON6Vdj1yrENrGYzmMEndXIRvJ1d7Nk72a4ikx3rw=.ee657823-f69f-42ce-886a-25a7f6d71840@github.com> Message-ID: On Sat, 22 Jul 2023 06:10:48 GMT, Richard Reingruber wrote: > I agree. I thought it should be done in a dedicated RFE. But I can do it in this PR. It seems reasonable to me to allow small enhancements along with bug fixes. If it's a separate RFE then you have to either push that first, or do a temporary bug fix that doesn't use the enhancement. What are the chances that we would want to back-port one without the other? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14976#issuecomment-1646523419 From jwaters at openjdk.org Sun Jul 23 05:50:58 2023 From: jwaters at openjdk.org (Julian Waters) Date: Sun, 23 Jul 2023 05:50:58 GMT Subject: RFR: 8312502: Mass migrate HotSpot attributes to the correct location [v2] In-Reply-To: References: Message-ID: > Someone had to do it, so I did. Moves attributes to the correct place as specified in the HotSpot Style Guide once and for all Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 40 additional commits since the last revision: - Merge branch 'openjdk:master' into patch-6 - arguments.hpp - arguments.hpp - globalDefinitions_gcc.hpp - assembler_aarch64.hpp - macroAssembler_aarch64.cpp - vmError.cpp - vmError.cpp - macroAssembler_aarch64.cpp - assembler_aarch64.hpp - ... and 30 more: https://git.openjdk.org/jdk/compare/c685c4b6...afff56f2 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14969/files - new: https://git.openjdk.org/jdk/pull/14969/files/58b52fce..afff56f2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14969&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14969&range=00-01 Stats: 18185 lines in 210 files changed: 7290 ins; 10195 del; 700 mod Patch: https://git.openjdk.org/jdk/pull/14969.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14969/head:pull/14969 PR: https://git.openjdk.org/jdk/pull/14969 From kbarrett at openjdk.org Sun Jul 23 11:53:59 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Sun, 23 Jul 2023 11:53:59 GMT Subject: RFR: 8312502: Mass migrate HotSpot attributes to the correct location [v2] In-Reply-To: References: Message-ID: <6TM5USxRwOvQTy8muWykwuq_o0b-2nCM_8-fFoLGVIg=.ae329b8d-8781-407b-ba4a-fb8d8abe685c@github.com> On Sun, 23 Jul 2023 05:50:58 GMT, Julian Waters wrote: >> Someone had to do it, so I did. Moves attributes to the correct place as specified in the HotSpot Style Guide once and for all > > Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 40 additional commits since the last revision: > > - Merge branch 'openjdk:master' into patch-6 > - arguments.hpp > - arguments.hpp > - globalDefinitions_gcc.hpp > - assembler_aarch64.hpp > - macroAssembler_aarch64.cpp > - vmError.cpp > - vmError.cpp > - macroAssembler_aarch64.cpp > - assembler_aarch64.hpp > - ... and 30 more: https://git.openjdk.org/jdk/compare/65e01d8a...afff56f2 Why? What is the benefit from this that makes the resulting code churn worthwhile? We already discussed this kind of code churn a bit circa https://github.com/openjdk/jdk/pull/11081#issuecomment-1313274792 and didn't like it then. I don't see anything to change that. The style guide only talks about the new C++ `[[attribute...]]` syntax, which has a couple of valid locations. These are all gcc `__attribute__` and MSVC `__declspec`, and are often located in places where the new syntax isn't permitted. Moving the non-standard "attributes" around has the potential to change semantics. I don't know that it does, but this PR should contain discussion and references to documentation showing it doesn't. *If* it is to be done, there are some former one-liners that have been made multi-line by moving an attribute macro, where there were multiple in a cluster with no blank lines between them. (Unwritten) HotSpot style only elides whitespace between declarations when they are all one-liners. I think I prefer preceding attributes to be on their own line, rather than on the same line as the declaration, so I can mostly skip over them as I'm reading the code. But that might be just me; I don't think that's been discussed by the Group. It probably should have been brought up when permitting attributes was added to the style guide, but it doesn't look like that happened. src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 38: > 36: #ifdef __GNUC__ > 37: > 38: // ISO C++ asm is always implicitly volatile I can find no evidence for this claim, and it seems to me likely incorrect. This is also way outside the described scope of this PR. src/hotspot/share/c1/c1_CFGPrinter.hpp line 66: > 64: void dec_indent(); > 65: ATTRIBUTE_PRINTF(2, 3) > 66: void print(const char* format, ...); This is an example where rearranging the attributes is out of character with usual practice. And I think it makes it harder to read. src/hotspot/share/compiler/compileLog.hpp line 75: > 73: > 74: ATTRIBUTE_PRINTF(2, 3) > 75: void set_context(const char* format, ...); Whitespace between return type and function name is pretty pointless here. Also above. src/hotspot/share/utilities/globalDefinitions_gcc.hpp line 161: > 159: #define NOINLINE [[gnu::noinline]] > 160: #define ALWAYSINLINE [[gnu::always_inline]] inline > 161: #define ATTRIBUTE_FLATTEN [[gnu::flatten]] This is way beyond the described scope of this PR. src/hotspot/share/utilities/xmlstream.hpp line 149: > 147: void text(const char* format, ...); > 148: ATTRIBUTE_PRINTF(2, 0) > 149: void va_text(const char* format, va_list ap) { This file is a particularly bad (to me) example of what happens without whitespace between a declaration and the attributes for the next declaration. I find this really hard to parse. And the extra whitespace following return types makes it even worse for me. ------------- Changes requested by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14969#pullrequestreview-1542234010 PR Review Comment: https://git.openjdk.org/jdk/pull/14969#discussion_r1271430790 PR Review Comment: https://git.openjdk.org/jdk/pull/14969#discussion_r1271431103 PR Review Comment: https://git.openjdk.org/jdk/pull/14969#discussion_r1271431994 PR Review Comment: https://git.openjdk.org/jdk/pull/14969#discussion_r1271432593 PR Review Comment: https://git.openjdk.org/jdk/pull/14969#discussion_r1271433180 From jwaters at openjdk.org Sun Jul 23 12:38:47 2023 From: jwaters at openjdk.org (Julian Waters) Date: Sun, 23 Jul 2023 12:38:47 GMT Subject: RFR: 8312502: Mass migrate HotSpot attributes to the correct location [v2] In-Reply-To: <6TM5USxRwOvQTy8muWykwuq_o0b-2nCM_8-fFoLGVIg=.ae329b8d-8781-407b-ba4a-fb8d8abe685c@github.com> References: <6TM5USxRwOvQTy8muWykwuq_o0b-2nCM_8-fFoLGVIg=.ae329b8d-8781-407b-ba4a-fb8d8abe685c@github.com> Message-ID: <2f0Rc29Pp42L2yJD43ynG-z0bZ65EyovRNU8D1IiC5o=.d9d98e2e-5405-4430-ab42-4570435a5468@github.com> On Sun, 23 Jul 2023 11:32:55 GMT, Kim Barrett wrote: >> Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 40 additional commits since the last revision: >> >> - Merge branch 'openjdk:master' into patch-6 >> - arguments.hpp >> - arguments.hpp >> - globalDefinitions_gcc.hpp >> - assembler_aarch64.hpp >> - macroAssembler_aarch64.cpp >> - vmError.cpp >> - vmError.cpp >> - macroAssembler_aarch64.cpp >> - assembler_aarch64.hpp >> - ... and 30 more: https://git.openjdk.org/jdk/compare/67e2bb59...afff56f2 > > src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 38: > >> 36: #ifdef __GNUC__ >> 37: >> 38: // ISO C++ asm is always implicitly volatile > > I can find no evidence for this claim, and it seems to me likely incorrect. This is also way outside the > described scope of this PR. Hi Kim, it's actually listed under https://gcc.gnu.org/onlinedocs/gcc/Basic-Asm.html as follows: > Qualifiers volatile The optional volatile qualifier has no effect. All basic asm blocks are implicitly volatile. I'll take this outside of this PR though ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14969#discussion_r1271440784 From jwaters at openjdk.org Sun Jul 23 12:46:59 2023 From: jwaters at openjdk.org (Julian Waters) Date: Sun, 23 Jul 2023 12:46:59 GMT Subject: RFR: 8312502: Mass migrate HotSpot attributes to the correct location [v2] In-Reply-To: <6TM5USxRwOvQTy8muWykwuq_o0b-2nCM_8-fFoLGVIg=.ae329b8d-8781-407b-ba4a-fb8d8abe685c@github.com> References: <6TM5USxRwOvQTy8muWykwuq_o0b-2nCM_8-fFoLGVIg=.ae329b8d-8781-407b-ba4a-fb8d8abe685c@github.com> Message-ID: On Sun, 23 Jul 2023 11:45:25 GMT, Kim Barrett wrote: >> Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 40 additional commits since the last revision: >> >> - Merge branch 'openjdk:master' into patch-6 >> - arguments.hpp >> - arguments.hpp >> - globalDefinitions_gcc.hpp >> - assembler_aarch64.hpp >> - macroAssembler_aarch64.cpp >> - vmError.cpp >> - vmError.cpp >> - macroAssembler_aarch64.cpp >> - assembler_aarch64.hpp >> - ... and 30 more: https://git.openjdk.org/jdk/compare/624faab5...afff56f2 > > src/hotspot/share/utilities/globalDefinitions_gcc.hpp line 161: > >> 159: #define NOINLINE [[gnu::noinline]] >> 160: #define ALWAYSINLINE [[gnu::always_inline]] inline >> 161: #define ATTRIBUTE_FLATTEN [[gnu::flatten]] > > This is way beyond the described scope of this PR. I changed these to the standard attributes so the compilers would concretely enforce the checks of which areas the attributes appertained to, as opposed to the regular __attribute__ syntax which don't perform such checks (Same reasoning for the ones in compilerWarnings as well) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14969#discussion_r1271441981 From aph at openjdk.org Sun Jul 23 16:54:59 2023 From: aph at openjdk.org (Andrew Haley) Date: Sun, 23 Jul 2023 16:54:59 GMT Subject: RFR: 8312502: Mass migrate HotSpot attributes to the correct location [v2] In-Reply-To: References: Message-ID: On Sun, 23 Jul 2023 05:50:58 GMT, Julian Waters wrote: >> Someone had to do it, so I did. Moves attributes to the correct place as specified in the HotSpot Style Guide once and for all > > Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 40 additional commits since the last revision: > > - Merge branch 'openjdk:master' into patch-6 > - arguments.hpp > - arguments.hpp > - globalDefinitions_gcc.hpp > - assembler_aarch64.hpp > - macroAssembler_aarch64.cpp > - vmError.cpp > - vmError.cpp > - macroAssembler_aarch64.cpp > - assembler_aarch64.hpp > - ... and 30 more: https://git.openjdk.org/jdk/compare/57f455e2...afff56f2 > Why? What is the benefit from this that makes the resulting code churn worthwhile? > > We already discussed this kind of code churn a bit circa [#11081 (comment)](https://github.com/openjdk/jdk/pull/11081#issuecomment-1313274792) and didn't like it then. I don't see anything to change that. I agree. Such changes don't much help maintainers, and speaking as the lead of both the 8u and 11u projects, really won't help backporting. I'd say no. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14969#issuecomment-1646888896 From dholmes at openjdk.org Sun Jul 23 21:45:00 2023 From: dholmes at openjdk.org (David Holmes) Date: Sun, 23 Jul 2023 21:45:00 GMT Subject: RFR: 8312502: Mass migrate HotSpot attributes to the correct location [v2] In-Reply-To: References: <6TM5USxRwOvQTy8muWykwuq_o0b-2nCM_8-fFoLGVIg=.ae329b8d-8781-407b-ba4a-fb8d8abe685c@github.com> Message-ID: On Sun, 23 Jul 2023 12:43:51 GMT, Julian Waters wrote: >> src/hotspot/share/utilities/globalDefinitions_gcc.hpp line 161: >> >>> 159: #define NOINLINE [[gnu::noinline]] >>> 160: #define ALWAYSINLINE [[gnu::always_inline]] inline >>> 161: #define ATTRIBUTE_FLATTEN [[gnu::flatten]] >> >> This is way beyond the described scope of this PR. > > I changed these to the standard attributes so the compilers would concretely enforce the checks of which areas the attributes appertained to, as opposed to the regular __attribute__ syntax which don't perform such checks (Same reasoning for the ones in compilerWarnings as well) Again I agree with Kim, this is not simply moving an attribute to a new location. It is a change that has to be validated. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14969#discussion_r1271561284 From dholmes at openjdk.org Sun Jul 23 21:44:57 2023 From: dholmes at openjdk.org (David Holmes) Date: Sun, 23 Jul 2023 21:44:57 GMT Subject: RFR: 8312502: Mass migrate HotSpot attributes to the correct location [v2] In-Reply-To: References: Message-ID: On Sun, 23 Jul 2023 05:50:58 GMT, Julian Waters wrote: >> Someone had to do it, so I did. Moves attributes to the correct place as specified in the HotSpot Style Guide once and for all > > Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 40 additional commits since the last revision: > > - Merge branch 'openjdk:master' into patch-6 > - arguments.hpp > - arguments.hpp > - globalDefinitions_gcc.hpp > - assembler_aarch64.hpp > - macroAssembler_aarch64.cpp > - vmError.cpp > - vmError.cpp > - macroAssembler_aarch64.cpp > - assembler_aarch64.hpp > - ... and 30 more: https://git.openjdk.org/jdk/compare/f058acfd...afff56f2 > Someone had to do it, so I did. Why did someone _have_ to do it? Is it incorrect? Unless there is some semantic significance to this then it is just unnecessary churn and I really don't like the chosen style. Sorry. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14969#issuecomment-1646966208 From dholmes at openjdk.org Sun Jul 23 21:44:59 2023 From: dholmes at openjdk.org (David Holmes) Date: Sun, 23 Jul 2023 21:44:59 GMT Subject: RFR: 8312502: Mass migrate HotSpot attributes to the correct location [v2] In-Reply-To: <6TM5USxRwOvQTy8muWykwuq_o0b-2nCM_8-fFoLGVIg=.ae329b8d-8781-407b-ba4a-fb8d8abe685c@github.com> References: <6TM5USxRwOvQTy8muWykwuq_o0b-2nCM_8-fFoLGVIg=.ae329b8d-8781-407b-ba4a-fb8d8abe685c@github.com> Message-ID: On Sun, 23 Jul 2023 11:35:19 GMT, Kim Barrett wrote: >> Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 40 additional commits since the last revision: >> >> - Merge branch 'openjdk:master' into patch-6 >> - arguments.hpp >> - arguments.hpp >> - globalDefinitions_gcc.hpp >> - assembler_aarch64.hpp >> - macroAssembler_aarch64.cpp >> - vmError.cpp >> - vmError.cpp >> - macroAssembler_aarch64.cpp >> - assembler_aarch64.hpp >> - ... and 30 more: https://git.openjdk.org/jdk/compare/f058acfd...afff56f2 > > src/hotspot/share/c1/c1_CFGPrinter.hpp line 66: > >> 64: void dec_indent(); >> 65: ATTRIBUTE_PRINTF(2, 3) >> 66: void print(const char* format, ...); > > This is an example where rearranging the attributes is out of character with usual practice. > And I think it makes it harder to read. I agree with Kim, I do not like this style of using a new line for the attribute. I also prefer to see these attributes in their original location where I can generally ignore them while reading the code. > src/hotspot/share/utilities/xmlstream.hpp line 149: > >> 147: void text(const char* format, ...); >> 148: ATTRIBUTE_PRINTF(2, 0) >> 149: void va_text(const char* format, va_list ap) { > > This file is a particularly bad (to me) example of what happens without whitespace between a declaration > and the attributes for the next declaration. I find this really hard to parse. And the extra whitespace following > return types makes it even worse for me. +1 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14969#discussion_r1271561053 PR Review Comment: https://git.openjdk.org/jdk/pull/14969#discussion_r1271561299 From eliu at openjdk.org Mon Jul 24 01:15:46 2023 From: eliu at openjdk.org (Eric Liu) Date: Mon, 24 Jul 2023 01:15:46 GMT Subject: RFR: 8309893: Integrate ReplicateB/S/I/L/F/D nodes to Replicate node In-Reply-To: References: <4zQZ1W7GpPyOY0TGusvqNKUoCORK1WUEwSxRnWC4JVE=.127f84f6-a406-43d2-98e7-52b4fa0b5f3d@github.com> Message-ID: On Wed, 19 Jul 2023 04:29:57 GMT, Cesar Soares Lucas wrote: >> This patch creates ReplicateNode to replace ReplicateB/S/I/L/F/DNode, like other vector nodes introduced recently, e.g., PopulateIndexNode and ReverseVNode, etc. This refers from: >> https://mail.openjdk.org/pipermail/panama-dev/2020-April/008484.html >> >> After merging these nodes, code will be easier to maintain. E.g., matching rules can be simplified. >> >> Besides AArch64, this patch tries to keep other ad files as the same before, only supplies some necessary predicate. E.g., for matching rules using ReplicateB before, they are now matching Replicate with a new predicate "Matcher::vector_element_basic_type(n) == T_BYTE". This would be easy for review and lower risks. >> >> [TEST] >> x86: Tested with option "-XX:UseAVX=0/1/2/3". >> AArch64: Tested on SVE machine and Neon machine. >> >> Full jtreg passed without new issue. > > test/hotspot/jtreg/compiler/vectorization/runner/ArrayInvariantFillTest.java line 53: > >> 51: private static final int SIZE = 543; >> 52: >> 53: private boolean booleanInv; > > Looks like this field is not used anywhere here. Thanks, I will fix it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14830#discussion_r1271620890 From kbarrett at openjdk.org Mon Jul 24 02:36:45 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 24 Jul 2023 02:36:45 GMT Subject: RFR: 8312502: Mass migrate HotSpot attributes to the correct location [v2] In-Reply-To: References: Message-ID: On Sun, 23 Jul 2023 05:50:58 GMT, Julian Waters wrote: >> Someone had to do it, so I did. Moves attributes to the correct place as specified in the HotSpot Style Guide once and for all > > Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 40 additional commits since the last revision: > > - Merge branch 'openjdk:master' into patch-6 > - arguments.hpp > - arguments.hpp > - globalDefinitions_gcc.hpp > - assembler_aarch64.hpp > - macroAssembler_aarch64.cpp > - vmError.cpp > - vmError.cpp > - macroAssembler_aarch64.cpp > - assembler_aarch64.hpp > - ... and 30 more: https://git.openjdk.org/jdk/compare/c63a77e5...afff56f2 See also: https://openjdk.org/guide/#things-to-consider-before-proposing-changes-to-openjdk-code This change looks like a case of pure "Modernizing", which isn't looked on particularly favorably for its own sake. There generally needs to be some additional benefits. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14969#issuecomment-1647108763 From fgao at openjdk.org Mon Jul 24 04:12:02 2023 From: fgao at openjdk.org (Fei Gao) Date: Mon, 24 Jul 2023 04:12:02 GMT Subject: RFR: 8308340: C2: Idealize Fma nodes [v5] In-Reply-To: References: Message-ID: > Some platforms, like aarch64, ppc, and riscv, support fusing `Math.fma(-a, b, c)` or `Math.fma(a, -b, c)` by generating partially symmetric match rules like: > > > match(Set dst (FmaF src3 (Binary (NegF src1) src2))); > match(Set dst (FmaF src3 (Binary src1 (NegF src2)))); > > > Since `Fma` is partially commutative, the patch is to convert `Math.fma(-a, b, c)` to `Math.fma(b, -a, c)` in gvn phase, making node patterns canonical. Then we can remove redundant rules. > > Also, we should guarantee that C2 generates `Fma` nodes only on platforms supporting `Fma` instructions before matcher, so we can remove all `predicate(UseFMA)` for all `Fma` rules. > > After the patch, the code size of libjvm.so on aarch64 platform decreased by 63.4k. > > The patch passed all tier 1 - 3 on aarch64 and x86 platforms. Fei Gao has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - Merge branch 'master' into fg8308340 - Merge branch 'master' into fg8308340 - Merge branch 'master' into fg8308340 - Move check for UseFMA from c2compiler.cpp to Matcher::match_rule_supported in .ad files - Merge branch 'master' into fg8308340 - 8308340: C2: Idealize Fma nodes Some platforms, like aarch64, ppc, and riscv, support fusing `Math.fma(-a, b, c)` or `Math.fma(a, -b, c)` by generating partially symmetric match rules like: ``` match(Set dst (FmaF src3 (Binary (NegF src1) src2))); match(Set dst (FmaF src3 (Binary src1 (NegF src2)))); ``` Since `Fma` is partially communitive, the patch is to convert `Math.fma(-a, b, c)` to `Math.fma(b, -a, c)` in gvn phase, making node patterns canonical. Then we can remove redundant rules. Also, we should guarantee that C2 generates `Fma` nodes only on platforms supporting `Fma` instructions before matcher, so we can remove all `predicate(UseFMA)` for all `Fma` rules. After the patch, the code size of libjvm.so on aarch64 platform decreased by 63.4k. The patch passed all tier 1 - 3 on aarch64 and x86 platforms. ------------- Changes: https://git.openjdk.org/jdk/pull/14576/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14576&range=04 Stats: 608 lines in 20 files changed: 389 ins; 118 del; 101 mod Patch: https://git.openjdk.org/jdk/pull/14576.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14576/head:pull/14576 PR: https://git.openjdk.org/jdk/pull/14576 From thartmann at openjdk.org Mon Jul 24 05:02:50 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 24 Jul 2023 05:02:50 GMT Subject: RFR: JDK-8310316: Failing HotSpot Compiler directives are too verbose In-Reply-To: References: Message-ID: On Thu, 20 Jul 2023 15:04:28 GMT, Eric Nothum wrote: > Previously jcmd printed the whole file if a compiler directive was added that was not in json format. This example illustrates the issue: > > > ./jcmd 331311 Compiler.directives_add ./example.txt > 331311: > Syntax error on line 1 byte 1: Json must start with an object or an array. > At 'This'. > This is my very interesting text, > followed by some more exciting text. > > Parsing of compiler directives failed > Could not load file: ./example.txt > > The json error message is not printed if the silent field is set in the `DirectivesParser` object. > The proposed change adds a boolean parameter silent that is propagated from `CompilerDirectivesAddDCmd::execute` to the `DirectivesParser` constructor. The default value for the new parameter is set to false, which represents the original behavior. In case where a compiler directive is added, the parameter is set to true and the error message will be reduced. > > The proposed change reduces the error message to: > > > ./jcmd 335703 Compiler.directives_add ./example.txt > 335703: > Parsing of compiler directives failed > Could not load file: ./example.txt Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14957#pullrequestreview-1542680930 From thartmann at openjdk.org Mon Jul 24 05:05:39 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 24 Jul 2023 05:05:39 GMT Subject: RFR: 8312524: [JVMCI] serviceability/dcmd/compiler/CompilerQueueTest.java fails In-Reply-To: <5FDrZ2dpXh882r6ye9CJkICOq1VMFwZ4Y9gK17swnYE=.302db85d-fc8a-4ddd-96fd-def99ba745a4@github.com> References: <5FDrZ2dpXh882r6ye9CJkICOq1VMFwZ4Y9gK17swnYE=.302db85d-fc8a-4ddd-96fd-def99ba745a4@github.com> Message-ID: On Fri, 21 Jul 2023 20:23:31 GMT, Doug Simon wrote: > This PR adds logic to the CompileBroker for implementing `WhiteBox.lockCompilation()` when `UseJVMCICompiler` is true. Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14979#pullrequestreview-1542682643 From jwaters at openjdk.org Mon Jul 24 05:34:04 2023 From: jwaters at openjdk.org (Julian Waters) Date: Mon, 24 Jul 2023 05:34:04 GMT Subject: RFR: 8312502: Mass migrate HotSpot attributes to the correct location [v3] In-Reply-To: References: Message-ID: > Someone had to do it, so I did. Moves attributes to the correct place as specified in the HotSpot Style Guide once and for all Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 41 additional commits since the last revision: - Merge branch 'openjdk:master' into patch-6 - Merge branch 'openjdk:master' into patch-6 - arguments.hpp - arguments.hpp - globalDefinitions_gcc.hpp - assembler_aarch64.hpp - macroAssembler_aarch64.cpp - vmError.cpp - vmError.cpp - macroAssembler_aarch64.cpp - ... and 31 more: https://git.openjdk.org/jdk/compare/1a8dd18b...d60d8923 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14969/files - new: https://git.openjdk.org/jdk/pull/14969/files/afff56f2..d60d8923 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14969&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14969&range=01-02 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14969.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14969/head:pull/14969 PR: https://git.openjdk.org/jdk/pull/14969 From thartmann at openjdk.org Mon Jul 24 06:05:48 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 24 Jul 2023 06:05:48 GMT Subject: RFR: 8308504: C2: "malformed control flow" after JDK-8303466 In-Reply-To: <7Vy9q8XEuzC40Za_CK-nc6G3EAZocEJ0T8-gdWTi9So=.d9aece29-c2d4-49c7-9a2c-22abc56920e8@github.com> References: <7Vy9q8XEuzC40Za_CK-nc6G3EAZocEJ0T8-gdWTi9So=.d9aece29-c2d4-49c7-9a2c-22abc56920e8@github.com> Message-ID: <_SgyPQU-1E5dnJ-ef1J79WxumfrU-Ph_Nfob-ALKs8Q=.e9ba8273-92b3-4201-9e2b-aac915d37d75@github.com> On Fri, 21 Jul 2023 12:04:12 GMT, Roland Westrelin wrote: > I took that bug over from Emanuel because he's away: > https://github.com/openjdk/jdk/pull/14331 > > I tried adding a `CastII` to narrow the limit of the loop as I > suggested in a comment on the PR but I found that doesn't work in all > cases: if the type of the initial value for the loop variable is not > narrow enough, then the narrower type for the limit doesn't help > narrow the loop phi type. > > What I propose instead is to add an assert predicate that catches when > the main loop is unreachable but the zero trip count doesn't constant > fold. For that to work, the order of predicates must be preserved when > they are copied or updated. I had to make some small changes to > guarantee that. src/hotspot/share/opto/loopPredicate.cpp line 1519: > 1517: // init + (current stride - initial stride) is within the loop so narrow its type by leveraging the type of the iv Phi > 1518: iv_phi_assertion_predicate_condition(loop->_head->as_CountedLoop(), new_proj, opaque_init, max_value); > 1519: new_proj = add_template_assertion_predicate_helper(predicate_proj, reason, new_proj, bol, Op_If); Shouldn't this use the the `bol` returned by `iv_phi_assertion_predicate_condition`? src/hotspot/share/opto/loopTransform.cpp line 1345: > 1343: predicate = predicate->in(0)->in(0); > 1344: } > 1345: while(predicates.size() > 0) { Suggestion: while (predicates.size() > 0) { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14973#discussion_r1271786307 PR Review Comment: https://git.openjdk.org/jdk/pull/14973#discussion_r1271752307 From chagedorn at openjdk.org Mon Jul 24 07:16:42 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 24 Jul 2023 07:16:42 GMT Subject: RFR: JDK-8310316: Failing HotSpot Compiler directives are too verbose In-Reply-To: References: Message-ID: <511-K51kJvjcSspuunOOMt5EXttP0FugNdYJdgdjgW8=.7c169acd-0b90-4b99-ad35-f330bd1911eb@github.com> On Thu, 20 Jul 2023 15:04:28 GMT, Eric Nothum wrote: > Previously jcmd printed the whole file if a compiler directive was added that was not in json format. This example illustrates the issue: > > > ./jcmd 331311 Compiler.directives_add ./example.txt > 331311: > Syntax error on line 1 byte 1: Json must start with an object or an array. > At 'This'. > This is my very interesting text, > followed by some more exciting text. > > Parsing of compiler directives failed > Could not load file: ./example.txt > > The json error message is not printed if the silent field is set in the `DirectivesParser` object. > The proposed change adds a boolean parameter silent that is propagated from `CompilerDirectivesAddDCmd::execute` to the `DirectivesParser` constructor. The default value for the new parameter is set to false, which represents the original behavior. In case where a compiler directive is added, the parameter is set to true and the error message will be reduced. > > The proposed change reduces the error message to: > > > ./jcmd 335703 Compiler.directives_add ./example.txt > 335703: > Parsing of compiler directives failed > Could not load file: ./example.txt Looks good to me, too! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14957#pullrequestreview-1542833380 From dongbo at openjdk.org Mon Jul 24 08:53:08 2023 From: dongbo at openjdk.org (Dong Bo) Date: Mon, 24 Jul 2023 08:53:08 GMT Subject: RFR: 8295698: AArch64: test/jdk/sun/security/ec/ed/EdDSATest.java failed with -XX:+UseSHA3Intrinsics In-Reply-To: References: Message-ID: On Thu, 20 Jul 2023 07:04:10 GMT, Yudi Zheng wrote: >> In JDK-8252204, when implemented SHA3 intrinsics, we use `digest_length` to differentiate SHA3-224, SHA3-256, SHA3-384, SHA3-512 and calculate `block_size` with `block_size = 200 - 2 * digest_length`. >> However, there are two extra SHA3 instances, SHAKE256 and SHAKE128, allowing an arbitrary `digest_length`: >> >> digest_length block_size >> SHA3-224 28 144 >> SHA3-256 32 136 >> SHA3-384 48 104 >> SHA3-512 64 72 >> SHAKE128 variable 168 >> SHAKE256 variable 136 >> >> >> This causes SIGSEGV crash or hash code mismatch with `test/jdk/sun/security/ec/ed/EdDSATest.java`. The test calls `SHAKE256` in `Ed448`. >> >> The main idea of the patch is to pass the `block_size` to differentiate SHA3 instances. >> Tests `test/jdk/sun/security/ec/ed/EdDSATest.java` and `./test/jdk/sun/security/provider/MessageDigest/SHA3.java` both passed. >> And tier1~3 passed on SHA3 supported hardware. >> >> The SHA3 intrinsics still deliver 20%~40% performance improvement on our pre-silicon simulated platform. >> The latency and throughput of crypto SHA3 ops are designed to be 1 cpu cycle and 2 execution pipes respectively. >> >> Compared with the main stream code, the performance change with this patch are negligible on real hardware and simulation platform. >> Based on the JMH results of SHA3 intirinsics, performance can be improved by ~50% on some hardware, while some hardware have ~30% regression. >> These performance details are available in the comments of the issue page. >> I guess the performance benefit of SHA3 intrinsics is dependent on the micro architecture, it should be switched on/off based on the running platform. > > src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 3910: > >> 3908: >> 3909: // block_size == 136, bit4 == 0 and bit5 == 0, SHA3-256 or SHAKE256 >> 3910: __ andw(c_rarg5, block_size, 48); > > does `c_rarg5` serve as a temporary register here? Yes, it is a save-on-call register. If it was active, it should been saved. Similar usages can be found at other intrinsics, such as https://github.com/openjdk/jdk/blob/ab821aa24f248e042d367ccd908fc1f68ebe8333/src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp#L3468. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10939#discussion_r1271944466 From duke at openjdk.org Mon Jul 24 09:11:45 2023 From: duke at openjdk.org (Eric Nothum) Date: Mon, 24 Jul 2023 09:11:45 GMT Subject: RFR: JDK-8310316: Failing HotSpot Compiler directives are too verbose In-Reply-To: References: Message-ID: On Thu, 20 Jul 2023 15:04:28 GMT, Eric Nothum wrote: > Previously jcmd printed the whole file if a compiler directive was added that was not in json format. This example illustrates the issue: > > > ./jcmd 331311 Compiler.directives_add ./example.txt > 331311: > Syntax error on line 1 byte 1: Json must start with an object or an array. > At 'This'. > This is my very interesting text, > followed by some more exciting text. > > Parsing of compiler directives failed > Could not load file: ./example.txt > > The json error message is not printed if the silent field is set in the `DirectivesParser` object. > The proposed change adds a boolean parameter silent that is propagated from `CompilerDirectivesAddDCmd::execute` to the `DirectivesParser` constructor. The default value for the new parameter is set to false, which represents the original behavior. In case where a compiler directive is added, the parameter is set to true and the error message will be reduced. > > The proposed change reduces the error message to: > > > ./jcmd 335703 Compiler.directives_add ./example.txt > 335703: > Parsing of compiler directives failed > Could not load file: ./example.txt Thanks everyone for the reviews :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/14957#issuecomment-1647517197 From duke at openjdk.org Mon Jul 24 09:19:48 2023 From: duke at openjdk.org (Eric Nothum) Date: Mon, 24 Jul 2023 09:19:48 GMT Subject: Integrated: JDK-8310316: Failing HotSpot Compiler directives are too verbose In-Reply-To: References: Message-ID: On Thu, 20 Jul 2023 15:04:28 GMT, Eric Nothum wrote: > Previously jcmd printed the whole file if a compiler directive was added that was not in json format. This example illustrates the issue: > > > ./jcmd 331311 Compiler.directives_add ./example.txt > 331311: > Syntax error on line 1 byte 1: Json must start with an object or an array. > At 'This'. > This is my very interesting text, > followed by some more exciting text. > > Parsing of compiler directives failed > Could not load file: ./example.txt > > The json error message is not printed if the silent field is set in the `DirectivesParser` object. > The proposed change adds a boolean parameter silent that is propagated from `CompilerDirectivesAddDCmd::execute` to the `DirectivesParser` constructor. The default value for the new parameter is set to false, which represents the original behavior. In case where a compiler directive is added, the parameter is set to true and the error message will be reduced. > > The proposed change reduces the error message to: > > > ./jcmd 335703 Compiler.directives_add ./example.txt > 335703: > Parsing of compiler directives failed > Could not load file: ./example.txt This pull request has now been integrated. Changeset: 04f39e1f Author: Eric Nothum Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/04f39e1f1e0e6c8adf75f59792f4f5b2496f7a31 Stats: 10 lines in 3 files changed: 0 ins; 0 del; 10 mod 8310316: Failing HotSpot Compiler directives are too verbose Reviewed-by: thartmann, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/14957 From dfenacci at openjdk.org Mon Jul 24 09:31:12 2023 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 24 Jul 2023 09:31:12 GMT Subject: RFR: JDK-8304954: SegmentedCodeCache fails when using large pages [v2] In-Reply-To: References: Message-ID: <_Tyx02bUkVaQ8lihHOLaIxITHK2TYkSKG98tBdh4g-o=.fc99e5ee-fc94-4dc2-90c1-0fc21610028f@github.com> > # Issue > > When large pages are enabled and segmented code cache is used, the VM tries to use one page for each segment. If the amount of reserved code cache is limited, this can make the total size of the code cache bigger than the reserved size, which in turn makes the VM fail, claiming that there is not enough space (e.g. this happens when running `java -XX:+UseLargePages -XX:+SegmentedCodeCache -XX:InitialCodeCacheSize=2g -XX:ReservedCodeCacheSize=2g -XX:LargePageSizeInBytes=1g -Xlog:pagesize*=debug -version`). > This behaviour is not correct as the VM should fall back and try with a smaller page size (and print a warning). > > # Solution > > When reserving heap space for code cache we give a minimum of 8 pages. Since the page size is already calculated right before for segment sizes, it is saved and passed as an argument instead. > > https://github.com/openjdk/jdk/blob/67fbd87378a9b3861f1676977f9f2b36052add29/src/hotspot/share/code/codeCache.cpp#L315 > > Additionally a warning is printed if large pages are enabled and we end up using a smaller page sizes for code caching. > > # Test > > The regression test runs a new VM with `-XX:+UseLargePages -XX:LargePageSizeInBytes=1g`. The `main` method then checks if the two flags have been "taken". If so, another process is started that checks for a specific output, otherwise the test passes (i.e. the current system doesn't allow 1GB large pages) Damon Fenacci has updated the pull request incrementally with eight additional commits since the last revision: - JDK-8304954: avoid calculating page_size twice - JDK-8304954: fix checking output in test - JDK-8304954: add warning if large pages cannot be used - Revert "JDK-8304954: SegmentedCodeCache fails when using large pages" This reverts commit c19d6d45bee3db8f3d51bbf66bc4e6d2616b57e0. - Revert "JDK-8304954: update warning message" This reverts commit cfd433d6ba4edc3d0cd9c183a05f9f8514316d52. - Revert "JDK-8304954: use loop to find smaller page in case there are multiple failing large pages" This reverts commit ed2b5efc48e34205bb136d1751d4a4544a9915aa. - Revert "JDK-8304954: print warning only when page size actually changes" This reverts commit b218957c27556acd3e3c8106b7a2dea65f3664c7. - Revert "JDK-8304954: fix syntax" This reverts commit a5ff639918d71185f5b8a7ce946263580803b4f6. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14903/files - new: https://git.openjdk.org/jdk/pull/14903/files/876f098b..070d87da Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14903&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14903&range=00-01 Stats: 51 lines in 3 files changed: 14 ins; 30 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/14903.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14903/head:pull/14903 PR: https://git.openjdk.org/jdk/pull/14903 From dfenacci at openjdk.org Mon Jul 24 11:04:12 2023 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 24 Jul 2023 11:04:12 GMT Subject: RFR: JDK-8304954: SegmentedCodeCache fails when using large pages [v3] In-Reply-To: References: Message-ID: <7fyrQsivcH5kmVMJW7M3Mgs9K0h7epyGc0cDAMXcFCM=.74edbd46-74fd-430d-bd46-cf7f82fd6c8f@github.com> > # Issue > > When large pages are enabled and segmented code cache is used, the VM tries to use one page for each segment. If the amount of reserved code cache is limited, this can make the total size of the code cache bigger than the reserved size, which in turn makes the VM fail, claiming that there is not enough space (e.g. this happens when running `java -XX:+UseLargePages -XX:+SegmentedCodeCache -XX:InitialCodeCacheSize=2g -XX:ReservedCodeCacheSize=2g -XX:LargePageSizeInBytes=1g -Xlog:pagesize*=debug -version`). > This behaviour is not correct as the VM should fall back and try with a smaller page size (and print a warning). > > # Solution > > When reserving heap space for code cache we give a minimum of 8 pages. Since the page size is already calculated right before for segment sizes, it is saved and passed as an argument instead. > > https://github.com/openjdk/jdk/blob/67fbd87378a9b3861f1676977f9f2b36052add29/src/hotspot/share/code/codeCache.cpp#L315 > > Additionally a warning is printed if large pages are enabled and we end up using a smaller page sizes for code caching. > > # Test > > The regression test runs a new VM with `-XX:+UseLargePages -XX:LargePageSizeInBytes=1g`. The `main` method then checks if the two flags have been "taken". If so, another process is started that checks for a specific output, otherwise the test passes (i.e. the current system doesn't allow 1GB large pages) Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: JDK-8304954: move warning out of reserve heap method ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14903/files - new: https://git.openjdk.org/jdk/pull/14903/files/070d87da..cb089a4b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14903&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14903&range=01-02 Stats: 29 lines in 1 file changed: 15 ins; 14 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14903.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14903/head:pull/14903 PR: https://git.openjdk.org/jdk/pull/14903 From roland at openjdk.org Mon Jul 24 11:05:15 2023 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 24 Jul 2023 11:05:15 GMT Subject: RFR: 8308504: C2: "malformed control flow" after JDK-8303466 [v2] In-Reply-To: <7Vy9q8XEuzC40Za_CK-nc6G3EAZocEJ0T8-gdWTi9So=.d9aece29-c2d4-49c7-9a2c-22abc56920e8@github.com> References: <7Vy9q8XEuzC40Za_CK-nc6G3EAZocEJ0T8-gdWTi9So=.d9aece29-c2d4-49c7-9a2c-22abc56920e8@github.com> Message-ID: > I took that bug over from Emanuel because he's away: > https://github.com/openjdk/jdk/pull/14331 > > I tried adding a `CastII` to narrow the limit of the loop as I > suggested in a comment on the PR but I found that doesn't work in all > cases: if the type of the initial value for the loop variable is not > narrow enough, then the narrower type for the limit doesn't help > narrow the loop phi type. > > What I propose instead is to add an assert predicate that catches when > the main loop is unreachable but the zero trip count doesn't constant > fold. For that to work, the order of predicates must be preserved when > they are copied or updated. I had to make some small changes to > guarantee that. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/loopTransform.cpp Co-authored-by: Tobias Hartmann ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14973/files - new: https://git.openjdk.org/jdk/pull/14973/files/46c60fc3..c65b4bf1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14973&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14973&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14973.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14973/head:pull/14973 PR: https://git.openjdk.org/jdk/pull/14973 From roland at openjdk.org Mon Jul 24 11:18:10 2023 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 24 Jul 2023 11:18:10 GMT Subject: RFR: 8308504: C2: "malformed control flow" after JDK-8303466 [v3] In-Reply-To: <7Vy9q8XEuzC40Za_CK-nc6G3EAZocEJ0T8-gdWTi9So=.d9aece29-c2d4-49c7-9a2c-22abc56920e8@github.com> References: <7Vy9q8XEuzC40Za_CK-nc6G3EAZocEJ0T8-gdWTi9So=.d9aece29-c2d4-49c7-9a2c-22abc56920e8@github.com> Message-ID: <8MfyIWOwZcFRmz3pogkoUpYBA2nOCMaWegkSDJVL1j4=.aaf39cb9-09af-4aa7-8d63-6cf38f8c1d34@github.com> > I took that bug over from Emanuel because he's away: > https://github.com/openjdk/jdk/pull/14331 > > I tried adding a `CastII` to narrow the limit of the loop as I > suggested in a comment on the PR but I found that doesn't work in all > cases: if the type of the initial value for the loop variable is not > narrow enough, then the narrower type for the limit doesn't help > narrow the loop phi type. > > What I propose instead is to add an assert predicate that catches when > the main loop is unreachable but the zero trip count doesn't constant > fold. For that to work, the order of predicates must be preserved when > they are copied or updated. I had to make some small changes to > guarantee that. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14973/files - new: https://git.openjdk.org/jdk/pull/14973/files/c65b4bf1..5aa8a8e1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14973&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14973&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14973.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14973/head:pull/14973 PR: https://git.openjdk.org/jdk/pull/14973 From roland at openjdk.org Mon Jul 24 11:18:11 2023 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 24 Jul 2023 11:18:11 GMT Subject: RFR: 8308504: C2: "malformed control flow" after JDK-8303466 [v3] In-Reply-To: <_SgyPQU-1E5dnJ-ef1J79WxumfrU-Ph_Nfob-ALKs8Q=.e9ba8273-92b3-4201-9e2b-aac915d37d75@github.com> References: <7Vy9q8XEuzC40Za_CK-nc6G3EAZocEJ0T8-gdWTi9So=.d9aece29-c2d4-49c7-9a2c-22abc56920e8@github.com> <_SgyPQU-1E5dnJ-ef1J79WxumfrU-Ph_Nfob-ALKs8Q=.e9ba8273-92b3-4201-9e2b-aac915d37d75@github.com> Message-ID: On Mon, 24 Jul 2023 06:02:56 GMT, Tobias Hartmann wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > src/hotspot/share/opto/loopPredicate.cpp line 1519: > >> 1517: // init + (current stride - initial stride) is within the loop so narrow its type by leveraging the type of the iv Phi >> 1518: iv_phi_assertion_predicate_condition(loop->_head->as_CountedLoop(), new_proj, opaque_init, max_value); >> 1519: new_proj = add_template_assertion_predicate_helper(predicate_proj, reason, new_proj, bol, Op_If); > > Shouldn't this use the the `bol` returned by `iv_phi_assertion_predicate_condition`? Thank you for looking at this. And good catch! Some last minute refactoring gone wrong. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14973#discussion_r1272119796 From dfenacci at openjdk.org Mon Jul 24 11:29:42 2023 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 24 Jul 2023 11:29:42 GMT Subject: RFR: JDK-8304954: SegmentedCodeCache fails when using large pages [v3] In-Reply-To: References: Message-ID: On Fri, 21 Jul 2023 09:28:42 GMT, Thomas Stuefe wrote: >> Hi @dafedafe, >> >> good catch! May I propose a much simpler fix, though? >> >> The problem is that we don't pass the correct value for `min_pages` to `CodeCache::page_size`. >> >> `CodeCache::page_size()` calls either one of `os::page_size_for_region_aligned()` or `os::page_size_for_region_unaligned()`, which already tries to fit the given memory region by iterating through page sizes, exactly like you do. >> >> If you change `CodeCache::reserve_heap_memory()` to pass in a minimum number of pages of "8" (as we pass "8" in the earlier call that calculates the alignment of the segments), the error goes away. >> >> Arguably, one should calculate the page size just once and then always use that calculated value, instead of recalculating it differently each time. >> >> >> diff --git a/src/hotspot/share/code/codeCache.cpp b/src/hotspot/share/code/codeCache.cpp >> index 2ea72a1fcbd..7a30bfb1783 100644 >> --- a/src/hotspot/share/code/codeCache.cpp >> +++ b/src/hotspot/share/code/codeCache.cpp >> @@ -356,7 +356,7 @@ size_t CodeCache::page_size(bool aligned, size_t min_pages) { >> >> ReservedCodeSpace CodeCache::reserve_heap_memory(size_t size) { >> // Align and reserve space for code cache >> - const size_t rs_ps = page_size(); >> + const size_t rs_ps = page_size(false, 8); >> const size_t rs_align = MAX2(rs_ps, os::vm_allocation_granularity()); >> const size_t rs_size = align_up(size, rs_align); >> ReservedCodeSpace rs(rs_size, rs_align, rs_ps); >> >> >> On my machine with both 1G and 2M pages configured, with the fix, we now don't crash but use 2 MB pages for the code cache: >> >> >> thomas at starfish$ ./images/jdk/bin/java -XX:+UseLargePages -XX:+SegmentedCodeCache -XX:InitialCodeCacheSize=2g -XX:ReservedCodeCacheSize=2g -XX:LargePageSizeInBytes=1g -Xlog:pagesize*=debug -version >> [0.001s][info][pagesize] Static hugepage support: >> [0.001s][info][pagesize] hugepage size: 2M, nr_hugepages: 2000, nr_overcommit_hugepages: 0 >> [0.001s][info][pagesize] hugepage size: 1G, nr_hugepages: 5, nr_overcommit_hugepages: 0 >> [0.001s][info][pagesize] default hugepage size: 2M >> [0.001s][info][pagesize] Transparent hugepage (THP) support: >> [0.001s][info][pagesize] THP mode: always >> [0.001s][info][pagesize] THP pagesize: 2M >> [0.001s][info][pagesize] Overriding default large page size (2M) using LargePageSizeInBytes: 1G >> [0.001s][info][pagesize] UseLargePages=1, UseTransparentHugePages=0, UseHugeTLBFS=1, UseSHM=0 >> [0.001s][info][pagesize] Large page support enabled.... > >> Thanks a lot for your review @tstuefe >> >> > The problem is that we don't pass the correct value for `min_pages` to `CodeCache::page_size`. >> > `CodeCache::page_size()` calls either one of `os::page_size_for_region_aligned()` or `os::page_size_for_region_unaligned()`, which already tries to fit the given memory region by iterating through page sizes, exactly like you do. >> > If you change `CodeCache::reserve_heap_memory()` to pass in a minimum number of pages of "8" (as we pass "8" in the earlier call that calculates the alignment of the segments), the error goes away. >> >> You're right, it is much simpler! I noticed the 8 passed earlier but I came to this solution since I was wondering if we really want to give a minimum of 8 pages, especially if we use large pages (on the other hand I didn't want to change the earlier code as it is used for non large pages as well). But I'm not sure this makes sense as I might not have a full picture. > > I think we need not 8, but at least 3 if the codecache size is already 3*pagesize, otherwise we need at least 6: > > https://github.com/openjdk/jdk/blob/55aa122462c34d8f4cafa58f4d1f2d900449c83e/src/hotspot/share/code/codeCache.cpp#L315-L318 > > for all these align operations to end up with non-zero results for every segment. > > But I would keep it at 8. Just change one thing at a time (and I may overlook another reason for that minimum page number). That solves the immediate problem. > > Thinking further, I believe there is a case for keeping all segments in one 1G page: > - if we don't plan on uncommitting the heap, there is no need to align the segments to page boundaries. So we could run with a single static 1 GB hugepage. > - if we plan to uncommit the heap, each segment should be larger than 1 page, since we probably never will be able to uncommit a segment fully. > > But that's for a future RFE. The solution I proposed has the advantage that its easy to downport, since its minimally invasive. > >> >> > On my machine with both 1G and 2M pages configured, with the fix, we now don't crash but use 2 MB pages for the code cache: >> >> You're right, with that machine (which has the same large page configuration as the one I used to test) the result is the same but I was wondering if this is always the case (e.g. with a large page of 1/2G). > > I believe as long as the number of pages > 6 this should always work. Hi @tstuefe > But I would keep it at 8. Just change one thing at a time (and I may overlook another reason for that minimum page number). That solves the immediate problem. Fair enough: I've changed the code to use a minimum of 8 pages (and calculated only once). I haven't made it the default since it is also used here to enable `SegmentedCodeCache`: https://github.com/openjdk/jdk/blob/67fbd87378a9b3861f1676977f9f2b36052add29/src/hotspot/share/compiler/compilerDefinitions.cpp#L318-L323 I've also left the warning for when we use large pages and we cannot reserve pages of that size. Thanks a lot! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14903#issuecomment-1647728413 From dfenacci at openjdk.org Mon Jul 24 11:46:54 2023 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 24 Jul 2023 11:46:54 GMT Subject: RFR: JDK-8304954: SegmentedCodeCache fails when using large pages [v4] In-Reply-To: References: Message-ID: > # Issue > > When large pages are enabled and segmented code cache is used, the VM tries to use one page for each segment. If the amount of reserved code cache is limited, this can make the total size of the code cache bigger than the reserved size, which in turn makes the VM fail, claiming that there is not enough space (e.g. this happens when running `java -XX:+UseLargePages -XX:+SegmentedCodeCache -XX:InitialCodeCacheSize=2g -XX:ReservedCodeCacheSize=2g -XX:LargePageSizeInBytes=1g -Xlog:pagesize*=debug -version`). > This behaviour is not correct as the VM should fall back and try with a smaller page size (and print a warning). > > # Solution > > When reserving heap space for code cache we give a minimum of 8 pages. Since the page size is already calculated right before for segment sizes, it is saved and passed as an argument instead. > > https://github.com/openjdk/jdk/blob/67fbd87378a9b3861f1676977f9f2b36052add29/src/hotspot/share/code/codeCache.cpp#L315 > > Additionally a warning is printed if large pages are enabled and we end up using a smaller page sizes for code caching. > > # Test > > The regression test runs a new VM with `-XX:+UseLargePages -XX:LargePageSizeInBytes=1g`. The `main` method then checks if the two flags have been "taken". If so, another process is started that checks for a specific output, otherwise the test passes (i.e. the current system doesn't allow 1GB large pages) Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: JDK-8304954: merge ifs checking when to print warning ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14903/files - new: https://git.openjdk.org/jdk/pull/14903/files/cb089a4b..e1b09a9f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14903&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14903&range=02-03 Stats: 3 lines in 1 file changed: 0 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14903.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14903/head:pull/14903 PR: https://git.openjdk.org/jdk/pull/14903 From thartmann at openjdk.org Mon Jul 24 11:50:44 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 24 Jul 2023 11:50:44 GMT Subject: RFR: JDK-8304954: SegmentedCodeCache fails when using large pages [v4] In-Reply-To: References: Message-ID: On Mon, 24 Jul 2023 11:46:54 GMT, Damon Fenacci wrote: >> # Issue >> >> When large pages are enabled and segmented code cache is used, the VM tries to use one page for each segment. If the amount of reserved code cache is limited, this can make the total size of the code cache bigger than the reserved size, which in turn makes the VM fail, claiming that there is not enough space (e.g. this happens when running `java -XX:+UseLargePages -XX:+SegmentedCodeCache -XX:InitialCodeCacheSize=2g -XX:ReservedCodeCacheSize=2g -XX:LargePageSizeInBytes=1g -Xlog:pagesize*=debug -version`). >> This behaviour is not correct as the VM should fall back and try with a smaller page size (and print a warning). >> >> # Solution >> >> When reserving heap space for code cache we give a minimum of 8 pages. Since the page size is already calculated right before for segment sizes, it is saved and passed as an argument instead. >> >> https://github.com/openjdk/jdk/blob/67fbd87378a9b3861f1676977f9f2b36052add29/src/hotspot/share/code/codeCache.cpp#L315 >> >> Additionally a warning is printed if large pages are enabled and we end up using a smaller page sizes for code caching. >> >> # Test >> >> The regression test runs a new VM with `-XX:+UseLargePages -XX:LargePageSizeInBytes=1g`. The `main` method then checks if the two flags have been "taken". If so, another process is started that checks for a specific output, otherwise the test passes (i.e. the current system doesn't allow 1GB large pages) > > Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8304954: merge ifs checking when to print warning Looks good to me! ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14903#pullrequestreview-1543310982 From stuefe at openjdk.org Mon Jul 24 13:02:44 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 24 Jul 2023 13:02:44 GMT Subject: RFR: JDK-8304954: SegmentedCodeCache fails when using large pages [v4] In-Reply-To: References: Message-ID: On Mon, 24 Jul 2023 11:46:54 GMT, Damon Fenacci wrote: >> # Issue >> >> When large pages are enabled and segmented code cache is used, the VM tries to use one page for each segment. If the amount of reserved code cache is limited, this can make the total size of the code cache bigger than the reserved size, which in turn makes the VM fail, claiming that there is not enough space (e.g. this happens when running `java -XX:+UseLargePages -XX:+SegmentedCodeCache -XX:InitialCodeCacheSize=2g -XX:ReservedCodeCacheSize=2g -XX:LargePageSizeInBytes=1g -Xlog:pagesize*=debug -version`). >> This behaviour is not correct as the VM should fall back and try with a smaller page size (and print a warning). >> >> # Solution >> >> When reserving heap space for code cache we give a minimum of 8 pages. Since the page size is already calculated right before for segment sizes, it is saved and passed as an argument instead. >> >> https://github.com/openjdk/jdk/blob/67fbd87378a9b3861f1676977f9f2b36052add29/src/hotspot/share/code/codeCache.cpp#L315 >> >> Additionally a warning is printed if large pages are enabled and we end up using a smaller page sizes for code caching. >> >> # Test >> >> The regression test runs a new VM with `-XX:+UseLargePages -XX:LargePageSizeInBytes=1g`. The `main` method then checks if the two flags have been "taken". If so, another process is started that checks for a specific output, otherwise the test passes (i.e. the current system doesn't allow 1GB large pages) > > Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8304954: merge ifs checking when to print warning Basically good, some remarks inline. src/hotspot/share/code/codeCache.cpp line 321: > 319: "Reverting to smaller page size (" SIZE_FORMAT "%s).", > 320: byte_size_in_exact_unit(LargePageSizeInBytes), exact_unit_for_byte_size(LargePageSizeInBytes), > 321: byte_size_in_exact_unit(ps), exact_unit_for_byte_size(ps)); You could use EXACTFMT and EXACTFMTARGS to shorten this, see globalDefinitions.hpp src/hotspot/share/code/codeCache.cpp line 324: > 322: log_warning(codecache)("%s", msg); > 323: warning("%s", msg); > 324: } This "warn if page size is not as expected" topic is more complex. For a start, the text is misleading. You don't reserve anything here. "reserve" has a very clear meaning. Here, you only calculated a page size that fits CodeCacheSize geometry, and then found it is smaller than what the user probably wanted, and you want to warn the user about this. Makes sense. We also have the following cases: - code cache could, in theory, be satisfied with large pages, but its size is not aligned with large page size. E.g. 2MB pages and CodeCacheSize=101m, would result in code cache using 4KB pages. - The user may not have LPSIB specified, so it's 0, but he specified +UseLargePages. Then, he may not have a specific page size in mind but may still want to know if the page size of the code cache is not a large page. I think if you warn here when we divert from planned page size for geometry reasons, you should warn for these cases too. An acceptable minimum would be "ps < os::default_large_page_size()" test/hotspot/jtreg/compiler/codecache/CheckLargePages.java line 27: > 25: * @test > 26: * @bug 8304954 > 27: * @summary Test checks that if using large pages and code cache gets above the limit it tries to revert to smaller pages instead of failing Proposal: "Code cache reservation should gracefully downgrade to using smaller pages if the code cache size is too small to host the requested page size." test/hotspot/jtreg/compiler/codecache/CheckLargePages.java line 28: > 26: * @bug 8304954 > 27: * @summary Test checks that if using large pages and code cache gets above the limit it tries to revert to smaller pages instead of failing > 28: * @requires vm.gc != "Z" Why not Z? test/hotspot/jtreg/compiler/codecache/CheckLargePages.java line 54: > 52: "-XX:ReservedCodeCacheSize=2g", > 53: "-XX:LargePageSizeInBytes=1g", > 54: "-Xlog:pagesize*=debug", If all you scan for is the "Failed to reserve" (please change the text :-), then you should *not* specify Xlog, since its a warning, and we expect to see this warning unconditionally (UL: everything above "log_info" is printed unconditionally if no Log options are present). test/hotspot/jtreg/compiler/codecache/CheckLargePages.java line 61: > 59: } else { > 60: System.out.println("1GB large pages not supported: UseLargePages=" + largePages + > 61: (largePages ? ", largePageSize=" + largePageSize : "") + ". Skipping"); It would be nice to have an actual test for the downgrade. Eg if system supports 1g and 2m and 4k, check that we downgrade to 2m. ------------- Changes requested by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14903#pullrequestreview-1543381861 PR Review Comment: https://git.openjdk.org/jdk/pull/14903#discussion_r1272221249 PR Review Comment: https://git.openjdk.org/jdk/pull/14903#discussion_r1272196328 PR Review Comment: https://git.openjdk.org/jdk/pull/14903#discussion_r1272217249 PR Review Comment: https://git.openjdk.org/jdk/pull/14903#discussion_r1272197970 PR Review Comment: https://git.openjdk.org/jdk/pull/14903#discussion_r1272213945 PR Review Comment: https://git.openjdk.org/jdk/pull/14903#discussion_r1272219700 From stuefe at openjdk.org Mon Jul 24 13:02:45 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 24 Jul 2023 13:02:45 GMT Subject: RFR: JDK-8304954: SegmentedCodeCache fails when using large pages [v4] In-Reply-To: References: Message-ID: On Mon, 24 Jul 2023 12:30:44 GMT, Thomas Stuefe wrote: >> Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: >> >> JDK-8304954: merge ifs checking when to print warning > > test/hotspot/jtreg/compiler/codecache/CheckLargePages.java line 28: > >> 26: * @bug 8304954 >> 27: * @summary Test checks that if using large pages and code cache gets above the limit it tries to revert to smaller pages instead of failing >> 28: * @requires vm.gc != "Z" > > Why not Z? I'd limit this to linux only. No need to run on OSes that don't support large pages. (If we want to be really correct, windows too, but I'm not so sure how stable and well-tested large page support there is) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14903#discussion_r1272199486 From stuefe at openjdk.org Mon Jul 24 13:15:46 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 24 Jul 2023 13:15:46 GMT Subject: RFR: JDK-8304954: SegmentedCodeCache fails when using large pages [v4] In-Reply-To: References: Message-ID: On Mon, 24 Jul 2023 11:46:54 GMT, Damon Fenacci wrote: >> # Issue >> >> When large pages are enabled and segmented code cache is used, the VM tries to use one page for each segment. If the amount of reserved code cache is limited, this can make the total size of the code cache bigger than the reserved size, which in turn makes the VM fail, claiming that there is not enough space (e.g. this happens when running `java -XX:+UseLargePages -XX:+SegmentedCodeCache -XX:InitialCodeCacheSize=2g -XX:ReservedCodeCacheSize=2g -XX:LargePageSizeInBytes=1g -Xlog:pagesize*=debug -version`). >> This behaviour is not correct as the VM should fall back and try with a smaller page size (and print a warning). >> >> # Solution >> >> When reserving heap space for code cache we give a minimum of 8 pages. Since the page size is already calculated right before for segment sizes, it is saved and passed as an argument instead. >> >> https://github.com/openjdk/jdk/blob/67fbd87378a9b3861f1676977f9f2b36052add29/src/hotspot/share/code/codeCache.cpp#L315 >> >> Additionally a warning is printed if large pages are enabled and we end up using a smaller page sizes for code caching. >> >> # Test >> >> The regression test runs a new VM with `-XX:+UseLargePages -XX:LargePageSizeInBytes=1g`. The `main` method then checks if the two flags have been "taken". If so, another process is started that checks for a specific output, otherwise the test passes (i.e. the current system doesn't allow 1GB large pages) > > Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: > > JDK-8304954: merge ifs checking when to print warning src/hotspot/share/code/codeCache.cpp line 322: > 320: byte_size_in_exact_unit(LargePageSizeInBytes), exact_unit_for_byte_size(LargePageSizeInBytes), > 321: byte_size_in_exact_unit(ps), exact_unit_for_byte_size(ps)); > 322: log_warning(codecache)("%s", msg); Also, no need to print into a temp buffer. log_xxx() accepts var-args, so you can pass the format string + args directly. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14903#discussion_r1272246036 From dfenacci at openjdk.org Mon Jul 24 13:33:44 2023 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 24 Jul 2023 13:33:44 GMT Subject: RFR: JDK-8304954: SegmentedCodeCache fails when using large pages [v4] In-Reply-To: References: Message-ID: <-JV81fieOqw0DmOOiez1dFPBWTe_B79CXVzhdLYq_JI=.0d205e43-c162-422b-9e52-54f6ae769034@github.com> On Mon, 24 Jul 2023 12:29:04 GMT, Thomas Stuefe wrote: >> Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: >> >> JDK-8304954: merge ifs checking when to print warning > > src/hotspot/share/code/codeCache.cpp line 324: > >> 322: log_warning(codecache)("%s", msg); >> 323: warning("%s", msg); >> 324: } > > This "warn if page size is not as expected" topic is more complex. > > For a start, the text is misleading. You don't reserve anything here. "reserve" has a very clear meaning. Here, you only calculated a page size that fits CodeCacheSize geometry, and then found it is smaller than what the user probably wanted, and you want to warn the user about this. Makes sense. > > We also have the following cases: > > - code cache could, in theory, be satisfied with large pages, but its size is not aligned with large page size. E.g. 2MB pages and CodeCacheSize=101m, would result in code cache using 4KB pages. > - The user may not have LPSIB specified, so it's 0, but he specified +UseLargePages. Then, he may not have a specific page size in mind but may still want to know if the page size of the code cache is not a large page. > > I think if you warn here when we divert from planned page size for geometry reasons, you should warn for these cases too. An acceptable minimum would be "ps < os::default_large_page_size()" Right, thanks! There is just this point that it is not completely clear to me: > * code cache could, in theory, be satisfied with large pages, but its size is not aligned with large page size. E.g. 2MB pages and CodeCacheSize=101m, would result in code cache using 4KB pages. Do you mean that with such a configuration (2MB pages and CodeCacheSize=101m) would the code cache use 4KB pages anyway later on? Wouldn't `CodeCache::page_size` return 2MB pages (and possibly align the code cache to 102MB)? (I guess I'm missing something here) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14903#discussion_r1272268249 From duke at openjdk.org Mon Jul 24 13:41:51 2023 From: duke at openjdk.org (Tobias Hotz) Date: Mon, 24 Jul 2023 13:41:51 GMT Subject: RFR: 8312213: Remove unnecessary TEST instructions on x86 when flags reg will already be set Message-ID: This patch adds peephole rules to remove TEST instructions that operate on the result and right after a AND, XOR or OR instruction. This pattern can emerge if the result of the and is compared against two values where one of the values is zero. The matcher does not have the capability to know that the instruction mentioned above also set the flag register to the same value. According to https://www.felixcloutier.com/x86/and, https://www.felixcloutier.com/x86/xor, https://www.felixcloutier.com/x86/or and https://www.felixcloutier.com/x86/test the flags are set to same values for TEST, AND, XOR and OR, so this should be safe. By adding peephole rules to remove the TEST instructions, the resulting assembly code can be shortend and a small speedup can be observed: Results on Intel Core i5-8250U CPU Before this patch: Benchmark Mode Cnt Score Error Units TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 182.353 ? 1.751 ns/op TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 212.836 ? 0.310 ns/op TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 2.072 ? 0.002 ns/op TestRemovalPeephole.benchmarkOrTestFusableInt avgt 8 72.052 ? 0.215 ns/op TestRemovalPeephole.benchmarkOrTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op TestRemovalPeephole.benchmarkOrTestFusableLong avgt 8 113.396 ? 0.666 ns/op TestRemovalPeephole.benchmarkOrTestFusableLongSingle avgt 8 1.183 ? 0.001 ns/op TestRemovalPeephole.benchmarkXorTestFusableInt avgt 8 88.683 ? 2.034 ns/op TestRemovalPeephole.benchmarkXorTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op TestRemovalPeephole.benchmarkXorTestFusableLong avgt 8 113.271 ? 0.602 ns/op TestRemovalPeephole.benchmarkXorTestFusableLongSingle avgt 8 1.183????0.001??ns/op After this patch: Benchmark Mode Cnt Score Error Units Change TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 141.615 ? 4.747 ns/op ~29% faster TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op (unchanged) TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 213.249 ? 1.094 ns/op (unchanged) TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 2.074 ? 0.011 ns/op (unchanged) TestRemovalPeephole.benchmarkOrTestFusableInt avgt 8 91.500 ? 0.142 ns/op ~27% slower TestRemovalPeephole.benchmarkOrTestFusableIntSingle avgt 8 0.814 ? 0.002 ns/op ~45% faster TestRemovalPeephole.benchmarkOrTestFusableLong avgt 8 113.691 ? 0.839 ns/op (unchanged) TestRemovalPeephole.benchmarkOrTestFusableLongSingle avgt 8 1.185 ? 0.001 ns/op (unchanged) TestRemovalPeephole.benchmarkXorTestFusableInt avgt 8 84.953 ? 0.766 ns/op ~4% faster TestRemovalPeephole.benchmarkXorTestFusableIntSingle avgt 8 0.814 ? 0.001 ns/op ~72% faster TestRemovalPeephole.benchmarkXorTestFusableLong avgt 8 113.158 ? 0.629 ns/op (unchanged) TestRemovalPeephole.benchmarkXorTestFusableLongSingle avgt 8 1.184 ? 0.002 ns/op (unchanged) Results from my i5 13600k: Before this patch: Benchmark Mode Cnt Score Error Units TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 71,368 ? 0,857 ns/op TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 0,430 ? 0,001 ns/op TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 66,734 ? 0,128 ns/op TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 0,489 ? 0,001 ns/op TestRemovalPeephole.benchmarkOrTestFusableInt avgt 8 57,290 ? 0,200 ns/op TestRemovalPeephole.benchmarkOrTestFusableIntSingle avgt 8 0,390 ? 0,001 ns/op TestRemovalPeephole.benchmarkOrTestFusableLong avgt 8 70,606 ? 1,595 ns/op TestRemovalPeephole.benchmarkOrTestFusableLongSingle avgt 8 0,390 ? 0,001 ns/op TestRemovalPeephole.benchmarkXorTestFusableInt avgt 8 62,514 ? 0,735 ns/op TestRemovalPeephole.benchmarkXorTestFusableIntSingle avgt 8 0,391 ? 0,001 ns/op TestRemovalPeephole.benchmarkXorTestFusableLong avgt 8 70,886 ? 0,572 ns/op TestRemovalPeephole.benchmarkXorTestFusableLongSingle avgt 8 0,391 ? 0,001 ns/op After this patch: Benchmark Mode Cnt Score Error Units Change TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 65,083 ? 0,246 ns/op ~10% faster TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 0,429 ? 0,001 ns/op (unchanged) TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 66,533 ? 0,096 ns/op (unchanged) TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 0,488 ? 0,001 ns/op (unchanged) TestRemovalPeephole.benchmarkOrTestFusableInt avgt 8 74,042 ? 0,412 ns/op ~10% slower TestRemovalPeephole.benchmarkOrTestFusableIntSingle avgt 8 0,391 ? 0,002 ns/op (unchanged) TestRemovalPeephole.benchmarkOrTestFusableLong avgt 8 70,850 ? 0,342 ns/op (unchanged) TestRemovalPeephole.benchmarkOrTestFusableLongSingle avgt 8 0,390 ? 0,002 ns/op (unchanged) TestRemovalPeephole.benchmarkXorTestFusableInt avgt 8 56,626 ? 2,274 ns/op ~10% faster TestRemovalPeephole.benchmarkXorTestFusableIntSingle avgt 8 0,392 ? 0,002 ns/op (unchanged) TestRemovalPeephole.benchmarkXorTestFusableLong avgt 8 69,942 ? 1,380 ns/op (unchanged) TestRemovalPeephole.benchmarkXorTestFusableLongSingle avgt 8 0,391 ? 0,001 ns/op (unchanged) As you can see, there are some cases where this PR the tests run slower, but especially on older architectures, the speed gains outrank the cases where some performance is lost by a lot. Also, the emitted instructuon sequence is still two bytes per removed test instruction shorter. The long variants never see any speedup whatsoever, but as other architectures might observe a speedup here as well and the resulting assembly is at least 2 bytes shorter, I think the rules for long still makes sense. ------------- Commit messages: - Use a new approach by telling the peephole which rules set and clear which flags - Merge remote-tracking branch 'upstream/master' into testPeephole - Remove the old peepreplace empty block - we didn't use them - Add more benchmark cases - Add new benchmarks - Merge remote-tracking branch 'upstream/master' into testPeephole - Add new matching rules for xor/or - Do not use stdlib in peephole func - Add comment indication how the new peepprocedures work - Merge peephole rules into two large peephole operations - ... and 5 more: https://git.openjdk.org/jdk/compare/2e12a123...c434ade8 Changes: https://git.openjdk.org/jdk/pull/14172/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14172&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8312213 Stats: 528 lines in 12 files changed: 522 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/14172.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14172/head:pull/14172 PR: https://git.openjdk.org/jdk/pull/14172 From jkarthikeyan at openjdk.org Mon Jul 24 13:41:52 2023 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Mon, 24 Jul 2023 13:41:52 GMT Subject: RFR: 8312213: Remove unnecessary TEST instructions on x86 when flags reg will already be set In-Reply-To: References: Message-ID: <4rMFy9Kv7BP8lJIiw2f6D5aT0eHJz9quBe7FYoX-eEE=.eccfa8ff-ea8f-4626-b6b3-6ac0d510e8ad@github.com> On Fri, 26 May 2023 10:32:00 GMT, Tobias Hotz wrote: > This patch adds peephole rules to remove TEST instructions that operate on the result and right after a AND, XOR or OR instruction. > This pattern can emerge if the result of the and is compared against two values where one of the values is zero. The matcher does not have the capability to know that the instruction mentioned above also set the flag register to the same value. > According to https://www.felixcloutier.com/x86/and, https://www.felixcloutier.com/x86/xor, https://www.felixcloutier.com/x86/or and https://www.felixcloutier.com/x86/test the flags are set to same values for TEST, AND, XOR and OR, so this should be safe. > By adding peephole rules to remove the TEST instructions, the resulting assembly code can be shortend and a small speedup can be observed: > Results on Intel Core i5-8250U CPU > Before this patch: > > Benchmark Mode Cnt Score Error Units > TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 182.353 ? 1.751 ns/op > TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op > TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 212.836 ? 0.310 ns/op > TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 2.072 ? 0.002 ns/op > TestRemovalPeephole.benchmarkOrTestFusableInt avgt 8 72.052 ? 0.215 ns/op > TestRemovalPeephole.benchmarkOrTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op > TestRemovalPeephole.benchmarkOrTestFusableLong avgt 8 113.396 ? 0.666 ns/op > TestRemovalPeephole.benchmarkOrTestFusableLongSingle avgt 8 1.183 ? 0.001 ns/op > TestRemovalPeephole.benchmarkXorTestFusableInt avgt 8 88.683 ? 2.034 ns/op > TestRemovalPeephole.benchmarkXorTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op > TestRemovalPeephole.benchmarkXorTestFusableLong avgt 8 113.271 ? 0.602 ns/op > TestRemovalPeephole.benchmarkXorTestFusableLongSingle avgt 8 1.183????0.001??ns/op > > After this patch: > > Benchmark Mode Cnt Score Error Units Change > TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 141.615 ? 4.747 ns/op ~29% faster > TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op (unchanged) > TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 213.249 ? 1.094 ns/op (unchanged) > TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 2.074 ? 0.011... The changes to the ad file is nice! It's a lot more readable to me now. Is it possible to add some more entries to the JMH benchmark to show the results of the other `and` matches, such as the ones with memory? And do you need a JBS entry for this PR? Hmm, is it possible to add multiple `peepmatch` attributes to a single peephole currently? Since the `peepprocedure`s and `peepreplace`s in here are the same across data type, it may be nice to consolidate them in that way for maintainability, especially if more rules are added for the other operations as well. src/hotspot/cpu/x86/peephole_x86_64.cpp line 137: > 135: // This function removes the TEST instruction when it detected shapes likes AND r1, r2; TEST r1, r1 > 136: bool test_may_remove_helper(Block* block, int block_index, PhaseCFG* cfg_, PhaseRegAlloc* ra_, > 137: MachNode* (*new_root)(), uint inst0_rule, std::initializer_list rules_to_match) { Hotspot code avoids the usage of the standard library, the [style guide](https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md) has more information and rationale. src/hotspot/cpu/x86/peephole_x86_64.cpp line 142: > 140: > 141: Node* inst1 = inst0->in(1); > 142: // Only remove test if the block order is inst1 -> MachProjNode (cause AND specifies KILL cr) -> inst0 Suggestion: // Only remove test if the block order is inst1 -> MachProjNode (because AND specifies KILL cr) -> inst0 ------------- PR Review: https://git.openjdk.org/jdk/pull/14172#pullrequestreview-1460885752 PR Comment: https://git.openjdk.org/jdk/pull/14172#issuecomment-1566338728 PR Review Comment: https://git.openjdk.org/jdk/pull/14172#discussion_r1216158482 PR Review Comment: https://git.openjdk.org/jdk/pull/14172#discussion_r1216144334 From jvernee at openjdk.org Mon Jul 24 13:41:53 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 24 Jul 2023 13:41:53 GMT Subject: RFR: 8312213: Remove unnecessary TEST instructions on x86 when flags reg will already be set In-Reply-To: References: Message-ID: On Fri, 26 May 2023 10:32:00 GMT, Tobias Hotz wrote: > This patch adds peephole rules to remove TEST instructions that operate on the result and right after a AND, XOR or OR instruction. > This pattern can emerge if the result of the and is compared against two values where one of the values is zero. The matcher does not have the capability to know that the instruction mentioned above also set the flag register to the same value. > According to https://www.felixcloutier.com/x86/and, https://www.felixcloutier.com/x86/xor, https://www.felixcloutier.com/x86/or and https://www.felixcloutier.com/x86/test the flags are set to same values for TEST, AND, XOR and OR, so this should be safe. > By adding peephole rules to remove the TEST instructions, the resulting assembly code can be shortend and a small speedup can be observed: > Results on Intel Core i5-8250U CPU > Before this patch: > > Benchmark Mode Cnt Score Error Units > TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 182.353 ? 1.751 ns/op > TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op > TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 212.836 ? 0.310 ns/op > TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 2.072 ? 0.002 ns/op > TestRemovalPeephole.benchmarkOrTestFusableInt avgt 8 72.052 ? 0.215 ns/op > TestRemovalPeephole.benchmarkOrTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op > TestRemovalPeephole.benchmarkOrTestFusableLong avgt 8 113.396 ? 0.666 ns/op > TestRemovalPeephole.benchmarkOrTestFusableLongSingle avgt 8 1.183 ? 0.001 ns/op > TestRemovalPeephole.benchmarkXorTestFusableInt avgt 8 88.683 ? 2.034 ns/op > TestRemovalPeephole.benchmarkXorTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op > TestRemovalPeephole.benchmarkXorTestFusableLong avgt 8 113.271 ? 0.602 ns/op > TestRemovalPeephole.benchmarkXorTestFusableLongSingle avgt 8 1.183????0.001??ns/op > > After this patch: > > Benchmark Mode Cnt Score Error Units Change > TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 141.615 ? 4.747 ns/op ~29% faster > TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op (unchanged) > TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 213.249 ? 1.094 ns/op (unchanged) > TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 2.074 ? 0.011... > Also, can this be also applied to or, xor, not, andn, popcnt, etc? I think it can also be applied to add and sub if the output of the test only read ZF. I agree there's potentially more that can be done here. I think one issue is that we don't know which of the flags are actually used by the instruction downstream of the `test` (e.g. `jl`, `jg`, whatever). So, I think we should match the peephole rule on the downstream instructions instead, basically any instruction that accepts a `cmpOp` operator (or a variant that). We can track per instruction/rule which flags they set (for all instructions that `KILL cr`), and check whether the flags required by the downstream instruction match the flags set by the instruction feeding into the `test`. src/hotspot/cpu/x86/x86_64.ad line 13796: > 13794: ins_pipe(ialu_reg); > 13795: %} > 13796: In `test_may_remove_5` the `new_root` function points to the constructor of one of these instructions (what's passed to `peepreplace`). Since the function pointer is unused, these are really not needed. (for `peepreplace` you can just pass `testI_reg` or `testL_reg`, it doesn't really matter). ------------- PR Review: https://git.openjdk.org/jdk/pull/14172#pullrequestreview-1528697844 PR Review Comment: https://git.openjdk.org/jdk/pull/14172#discussion_r1262715419 From qamai at openjdk.org Mon Jul 24 13:41:54 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 24 Jul 2023 13:41:54 GMT Subject: RFR: 8312213: Remove unnecessary TEST instructions on x86 when flags reg will already be set In-Reply-To: References: Message-ID: On Fri, 26 May 2023 10:32:00 GMT, Tobias Hotz wrote: > This patch adds peephole rules to remove TEST instructions that operate on the result and right after a AND, XOR or OR instruction. > This pattern can emerge if the result of the and is compared against two values where one of the values is zero. The matcher does not have the capability to know that the instruction mentioned above also set the flag register to the same value. > According to https://www.felixcloutier.com/x86/and, https://www.felixcloutier.com/x86/xor, https://www.felixcloutier.com/x86/or and https://www.felixcloutier.com/x86/test the flags are set to same values for TEST, AND, XOR and OR, so this should be safe. > By adding peephole rules to remove the TEST instructions, the resulting assembly code can be shortend and a small speedup can be observed: > Results on Intel Core i5-8250U CPU > Before this patch: > > Benchmark Mode Cnt Score Error Units > TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 182.353 ? 1.751 ns/op > TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op > TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 212.836 ? 0.310 ns/op > TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 2.072 ? 0.002 ns/op > TestRemovalPeephole.benchmarkOrTestFusableInt avgt 8 72.052 ? 0.215 ns/op > TestRemovalPeephole.benchmarkOrTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op > TestRemovalPeephole.benchmarkOrTestFusableLong avgt 8 113.396 ? 0.666 ns/op > TestRemovalPeephole.benchmarkOrTestFusableLongSingle avgt 8 1.183 ? 0.001 ns/op > TestRemovalPeephole.benchmarkXorTestFusableInt avgt 8 88.683 ? 2.034 ns/op > TestRemovalPeephole.benchmarkXorTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op > TestRemovalPeephole.benchmarkXorTestFusableLong avgt 8 113.271 ? 0.602 ns/op > TestRemovalPeephole.benchmarkXorTestFusableLongSingle avgt 8 1.183????0.001??ns/op > > After this patch: > > Benchmark Mode Cnt Score Error Units Change > TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 141.615 ? 4.747 ns/op ~29% faster > TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op (unchanged) > TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 213.249 ? 1.094 ns/op (unchanged) > TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 2.074 ? 0.011... Hi, The `and` nodes have a `MachProj` output which is its flag output, the matcher does not have such capability but after peephole there is only local bundle scheduler. As a result, you can connect the outputs of the `test` directly to the `MachProj` output of the `and`. Also, can this be also applied to `or`, `xor`, `not`, `andn`, `popcnt`, etc? I think it can also be applied to `add` and `sub` if the output of the `test` only read ZF. Thanks. @JornVernee I saw you created [JDK-8311969](https://bugs.openjdk.org/projects/JDK/issues/JDK-8311969), you may want to take a look at this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14172#issuecomment-1565432346 PR Comment: https://git.openjdk.org/jdk/pull/14172#issuecomment-1633970695 From duke at openjdk.org Mon Jul 24 13:41:54 2023 From: duke at openjdk.org (Tobias Hotz) Date: Mon, 24 Jul 2023 13:41:54 GMT Subject: RFR: 8312213: Remove unnecessary TEST instructions on x86 when flags reg will already be set In-Reply-To: References: Message-ID: On Fri, 26 May 2023 10:32:00 GMT, Tobias Hotz wrote: > This patch adds peephole rules to remove TEST instructions that operate on the result and right after a AND, XOR or OR instruction. > This pattern can emerge if the result of the and is compared against two values where one of the values is zero. The matcher does not have the capability to know that the instruction mentioned above also set the flag register to the same value. > According to https://www.felixcloutier.com/x86/and, https://www.felixcloutier.com/x86/xor, https://www.felixcloutier.com/x86/or and https://www.felixcloutier.com/x86/test the flags are set to same values for TEST, AND, XOR and OR, so this should be safe. > By adding peephole rules to remove the TEST instructions, the resulting assembly code can be shortend and a small speedup can be observed: > Results on Intel Core i5-8250U CPU > Before this patch: > > Benchmark Mode Cnt Score Error Units > TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 182.353 ? 1.751 ns/op > TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op > TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 212.836 ? 0.310 ns/op > TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 2.072 ? 0.002 ns/op > TestRemovalPeephole.benchmarkOrTestFusableInt avgt 8 72.052 ? 0.215 ns/op > TestRemovalPeephole.benchmarkOrTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op > TestRemovalPeephole.benchmarkOrTestFusableLong avgt 8 113.396 ? 0.666 ns/op > TestRemovalPeephole.benchmarkOrTestFusableLongSingle avgt 8 1.183 ? 0.001 ns/op > TestRemovalPeephole.benchmarkXorTestFusableInt avgt 8 88.683 ? 2.034 ns/op > TestRemovalPeephole.benchmarkXorTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op > TestRemovalPeephole.benchmarkXorTestFusableLong avgt 8 113.271 ? 0.602 ns/op > TestRemovalPeephole.benchmarkXorTestFusableLongSingle avgt 8 1.183????0.001??ns/op > > After this patch: > > Benchmark Mode Cnt Score Error Units Change > TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 141.615 ? 4.747 ns/op ~29% faster > TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op (unchanged) > TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 213.249 ? 1.094 ns/op (unchanged) > TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 2.074 ? 0.011... Thanks for the feedback. I will definitly look into other instructions such as or where this might be feasable. Regarding the multiple peepmatchs: It looks like this is not possible with the current system, but I've just updated how the peepmatch block is interpreted in the test_may_remove peepprocedures. This way we can keep the rules consice Thanks for the feedback! I double checked, and it seems that other flags than the ZF are used as well. I've added some new cases to cover xor and or in addition to the existing and case, as they all set the same flags as TEST does. @jaskarth Yeah I still need a JBS entry. If you can create on that would be nice, otherwise I'm going to use the webbug form again. I benchmarked the new peephole rules and got the following results: OLD: Benchmark Mode Cnt Score Error Units TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 71,351 ? 0,176 ns/op TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 66,763 ? 0,104 ns/op TestRemovalPeephole.benchmarkOrTestFusableInt avgt 8 57,555 ? 0,341 ns/op TestRemovalPeephole.benchmarkOrTestFusableLong avgt 8 72,158 ? 0,337 ns/op TestRemovalPeephole.benchmarkXorTestFusableInt avgt 8 61,551 ? 0,346 ns/op TestRemovalPeephole.benchmarkXorTestFusableLong avgt 8 70,554 ? 0,208 ns/op NEW: TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 65,061 ? 0,096 ns/op (10%) TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 66,855 ? 0,119 ns/op (unchanged) TestRemovalPeephole.benchmarkOrTestFusableInt avgt 8 73,707 ? 0,385 ns/op (-20%) TestRemovalPeephole.benchmarkOrTestFusableLong avgt 8 71,062 ? 1,190 ns/op (unchanged) TestRemovalPeephole.benchmarkXorTestFusableInt avgt 8 56,420 ? 1,441 ns/op (10%) TestRemovalPeephole.benchmarkXorTestFusableLong avgt 8 69,878 ? 1,318 ns/op (unchanged) I am currently not quite sure why there is a massive regression in the benchmarkOrTestFusableInt, but it seems to be reproducable at least on my machine. I plan to test another machine soon to see if this behaviour extends to other processors as well. The resulting assembly looks sane, though, and the XOR case also behaves as expected. I've added a few more benchmarks that are basically the same as the old ones and reran the benchmark on my i5 13600k and my i5 8250U. Here are the results: Results on Intel Core i5-8250U CPU Before this patch: Benchmark Mode Cnt Score Error Units TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 182.353 ? 1.751 ns/op TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 212.836 ? 0.310 ns/op TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 2.072 ? 0.002 ns/op TestRemovalPeephole.benchmarkOrTestFusableInt avgt 8 72.052 ? 0.215 ns/op TestRemovalPeephole.benchmarkOrTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op TestRemovalPeephole.benchmarkOrTestFusableLong avgt 8 113.396 ? 0.666 ns/op TestRemovalPeephole.benchmarkOrTestFusableLongSingle avgt 8 1.183 ? 0.001 ns/op TestRemovalPeephole.benchmarkXorTestFusableInt avgt 8 88.683 ? 2.034 ns/op TestRemovalPeephole.benchmarkXorTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op TestRemovalPeephole.benchmarkXorTestFusableLong avgt 8 113.271 ? 0.602 ns/op TestRemovalPeephole.benchmarkXorTestFusableLongSingle avgt 8 1.183????0.001??ns/op After this patch: Benchmark Mode Cnt Score Error Units Change TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 141.615 ? 4.747 ns/op ~29% faster TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op (unchanged) TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 213.249 ? 1.094 ns/op (unchanged) TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 2.074 ? 0.011 ns/op (unchanged) TestRemovalPeephole.benchmarkOrTestFusableInt avgt 8 91.500 ? 0.142 ns/op ~27% slower TestRemovalPeephole.benchmarkOrTestFusableIntSingle avgt 8 0.814 ? 0.002 ns/op ~45% faster TestRemovalPeephole.benchmarkOrTestFusableLong avgt 8 113.691 ? 0.839 ns/op (unchanged) TestRemovalPeephole.benchmarkOrTestFusableLongSingle avgt 8 1.185 ? 0.001 ns/op (unchanged) TestRemovalPeephole.benchmarkXorTestFusableInt avgt 8 84.953 ? 0.766 ns/op ~4% faster TestRemovalPeephole.benchmarkXorTestFusableIntSingle avgt 8 0.814 ? 0.001 ns/op ~72% faster TestRemovalPeephole.benchmarkXorTestFusableLong avgt 8 113.158 ? 0.629 ns/op (unchanged) TestRemovalPeephole.benchmarkXorTestFusableLongSingle avgt 8 1.184 ? 0.002 ns/op (unchanged) Results from my i5 13600k: Before this patch: Benchmark Mode Cnt Score Error Units TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 71,368 ? 0,857 ns/op TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 0,430 ? 0,001 ns/op TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 66,734 ? 0,128 ns/op TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 0,489 ? 0,001 ns/op TestRemovalPeephole.benchmarkOrTestFusableInt avgt 8 57,290 ? 0,200 ns/op TestRemovalPeephole.benchmarkOrTestFusableIntSingle avgt 8 0,390 ? 0,001 ns/op TestRemovalPeephole.benchmarkOrTestFusableLong avgt 8 70,606 ? 1,595 ns/op TestRemovalPeephole.benchmarkOrTestFusableLongSingle avgt 8 0,390 ? 0,001 ns/op TestRemovalPeephole.benchmarkXorTestFusableInt avgt 8 62,514 ? 0,735 ns/op TestRemovalPeephole.benchmarkXorTestFusableIntSingle avgt 8 0,391 ? 0,001 ns/op TestRemovalPeephole.benchmarkXorTestFusableLong avgt 8 70,886 ? 0,572 ns/op TestRemovalPeephole.benchmarkXorTestFusableLongSingle avgt 8 0,391 ? 0,001 ns/op After this patch: Benchmark Mode Cnt Score Error Units Change TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 65,083 ? 0,246 ns/op ~10% faster TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 0,429 ? 0,001 ns/op (unchanged) TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 66,533 ? 0,096 ns/op (unchanged) TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 0,488 ? 0,001 ns/op (unchanged) TestRemovalPeephole.benchmarkOrTestFusableInt avgt 8 74,042 ? 0,412 ns/op ~10% slower TestRemovalPeephole.benchmarkOrTestFusableIntSingle avgt 8 0,391 ? 0,002 ns/op (unchanged) TestRemovalPeephole.benchmarkOrTestFusableLong avgt 8 70,850 ? 0,342 ns/op (unchanged) TestRemovalPeephole.benchmarkOrTestFusableLongSingle avgt 8 0,390 ? 0,002 ns/op (unchanged) TestRemovalPeephole.benchmarkXorTestFusableInt avgt 8 56,626 ? 2,274 ns/op ~10% faster TestRemovalPeephole.benchmarkXorTestFusableIntSingle avgt 8 0,392 ? 0,002 ns/op (unchanged) TestRemovalPeephole.benchmarkXorTestFusableLong avgt 8 69,942 ? 1,380 ns/op (unchanged) TestRemovalPeephole.benchmarkXorTestFusableLongSingle avgt 8 0,391 ? 0,001 ns/op (unchanged) As you can see, there are still some cases where this PR the tests run slower, but especially on older architectures, the speed gains outrank the cases where some performance is lost by a lot. Also, the emitted instructuon sequence is still two bytes per removed test instruction shorter. Thanks for the feedback! I agree that this is still far from a perfect solution and tracking the required flags would be ideal, but there many rules that use the flags register, and all of these would need to be tracked. If you are interested in that, feel free to take this PR as a starting point, but I'm currently not interested in pursuing that path. Otherwise we can still keep this PR in its current form IMO as a first step to removing redudant test instructions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14172#issuecomment-1574151203 PR Comment: https://git.openjdk.org/jdk/pull/14172#issuecomment-1608411081 PR Comment: https://git.openjdk.org/jdk/pull/14172#issuecomment-1630763334 PR Comment: https://git.openjdk.org/jdk/pull/14172#issuecomment-1635551701 From jvernee at openjdk.org Mon Jul 24 13:41:55 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 24 Jul 2023 13:41:55 GMT Subject: RFR: 8312213: Remove unnecessary TEST instructions on x86 when flags reg will already be set In-Reply-To: References: Message-ID: On Thu, 13 Jul 2023 10:22:08 GMT, Quan Anh Mai wrote: >> This patch adds peephole rules to remove TEST instructions that operate on the result and right after a AND, XOR or OR instruction. >> This pattern can emerge if the result of the and is compared against two values where one of the values is zero. The matcher does not have the capability to know that the instruction mentioned above also set the flag register to the same value. >> According to https://www.felixcloutier.com/x86/and, https://www.felixcloutier.com/x86/xor, https://www.felixcloutier.com/x86/or and https://www.felixcloutier.com/x86/test the flags are set to same values for TEST, AND, XOR and OR, so this should be safe. >> By adding peephole rules to remove the TEST instructions, the resulting assembly code can be shortend and a small speedup can be observed: >> Results on Intel Core i5-8250U CPU >> Before this patch: >> >> Benchmark Mode Cnt Score Error Units >> TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 182.353 ? 1.751 ns/op >> TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op >> TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 212.836 ? 0.310 ns/op >> TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 2.072 ? 0.002 ns/op >> TestRemovalPeephole.benchmarkOrTestFusableInt avgt 8 72.052 ? 0.215 ns/op >> TestRemovalPeephole.benchmarkOrTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op >> TestRemovalPeephole.benchmarkOrTestFusableLong avgt 8 113.396 ? 0.666 ns/op >> TestRemovalPeephole.benchmarkOrTestFusableLongSingle avgt 8 1.183 ? 0.001 ns/op >> TestRemovalPeephole.benchmarkXorTestFusableInt avgt 8 88.683 ? 2.034 ns/op >> TestRemovalPeephole.benchmarkXorTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op >> TestRemovalPeephole.benchmarkXorTestFusableLong avgt 8 113.271 ? 0.602 ns/op >> TestRemovalPeephole.benchmarkXorTestFusableLongSingle avgt 8 1.183????0.001??ns/op >> >> After this patch: >> >> Benchmark Mode Cnt Score Error Units Change >> TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 141.615 ? 4.747 ns/op ~29% faster >> TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op (unchanged) >> TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 213.249 ? 1.094 ns/op (unchanged) >> TestRemovalPeephole.bench... > > @JornVernee I saw you created [JDK-8311969](https://bugs.openjdk.org/projects/JDK/issues/JDK-8311969), you may want to take a look at this. @merykitty Thanks for the pointer! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14172#issuecomment-1634290033 From qamai at openjdk.org Mon Jul 24 13:41:56 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 24 Jul 2023 13:41:56 GMT Subject: RFR: 8312213: Remove unnecessary TEST instructions on x86 when flags reg will already be set In-Reply-To: References: Message-ID: On Fri, 14 Jul 2023 09:10:24 GMT, Tobias Hotz wrote: >> This patch adds peephole rules to remove TEST instructions that operate on the result and right after a AND, XOR or OR instruction. >> This pattern can emerge if the result of the and is compared against two values where one of the values is zero. The matcher does not have the capability to know that the instruction mentioned above also set the flag register to the same value. >> According to https://www.felixcloutier.com/x86/and, https://www.felixcloutier.com/x86/xor, https://www.felixcloutier.com/x86/or and https://www.felixcloutier.com/x86/test the flags are set to same values for TEST, AND, XOR and OR, so this should be safe. >> By adding peephole rules to remove the TEST instructions, the resulting assembly code can be shortend and a small speedup can be observed: >> Results on Intel Core i5-8250U CPU >> Before this patch: >> >> Benchmark Mode Cnt Score Error Units >> TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 182.353 ? 1.751 ns/op >> TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op >> TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 212.836 ? 0.310 ns/op >> TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 2.072 ? 0.002 ns/op >> TestRemovalPeephole.benchmarkOrTestFusableInt avgt 8 72.052 ? 0.215 ns/op >> TestRemovalPeephole.benchmarkOrTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op >> TestRemovalPeephole.benchmarkOrTestFusableLong avgt 8 113.396 ? 0.666 ns/op >> TestRemovalPeephole.benchmarkOrTestFusableLongSingle avgt 8 1.183 ? 0.001 ns/op >> TestRemovalPeephole.benchmarkXorTestFusableInt avgt 8 88.683 ? 2.034 ns/op >> TestRemovalPeephole.benchmarkXorTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op >> TestRemovalPeephole.benchmarkXorTestFusableLong avgt 8 113.271 ? 0.602 ns/op >> TestRemovalPeephole.benchmarkXorTestFusableLongSingle avgt 8 1.183????0.001??ns/op >> >> After this patch: >> >> Benchmark Mode Cnt Score Error Units Change >> TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 141.615 ? 4.747 ns/op ~29% faster >> TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op (unchanged) >> TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 213.249 ? 1.094 ns/op (unchanged) >> TestRemovalPeephole.bench... > > Thanks for the feedback! > I agree that this is still far from a perfect solution and tracking the required flags would be ideal, but there many rules that use the flags register, and all of these would need to be tracked. If you are interested in that, feel free to take this PR as a starting point, but I'm currently not interested in pursuing that path. Otherwise we can still keep this PR in its current form IMO as a first step to removing redudant test instructions. @ichttt Yes, I don't think tracking all the nodes that use flags is elegant, you can get the use from the test node, then iterate the operands of the use to find the `cmpOp` operand which you can know which flag it is using. For most nodes, this will be enough, and for `add`/`sub` the transformation can be applied if `cmpOp->ccode() == Assembler::zero || cmpOp->ccode() == Assembler::notZero`. I think try to do this for `add/sub` would be beneficial since they are more common and they can be macro-fuse with `jcc`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14172#issuecomment-1636001729 From jvernee at openjdk.org Mon Jul 24 13:41:56 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 24 Jul 2023 13:41:56 GMT Subject: RFR: 8312213: Remove unnecessary TEST instructions on x86 when flags reg will already be set In-Reply-To: References: Message-ID: On Fri, 14 Jul 2023 15:09:33 GMT, Quan Anh Mai wrote: > I don't think tracking all the nodes that use flags is elegant, you can get the use from the test node, then iterate the operands of the use to find the cmpOp operand which you can know which flag it is using. Yeah, that sounds better. I think in that case we only need 2 peephole rules, one for `testI_reg` and one for `testL_reg`. In the peep procedure, the rule of the prior instruction can be mapped to the flags that it sets (e.g. a simple `switch` from rule -> mask of flags). Then, if the prior instruction sets all the flags needed by the user of the `test`, we can eliminate the `test` instruction. In the future, ADL could be extended to allow setting which flags are set or used by an instruction, and then we don't need the ad-hoc mapping of rules -> flags. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14172#issuecomment-1636129101 From chagedorn at openjdk.org Mon Jul 24 13:41:58 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 24 Jul 2023 13:41:58 GMT Subject: RFR: 8312213: Remove unnecessary TEST instructions on x86 when flags reg will already be set In-Reply-To: References: Message-ID: On Fri, 26 May 2023 10:32:00 GMT, Tobias Hotz wrote: > This patch adds peephole rules to remove TEST instructions that operate on the result and right after a AND, XOR or OR instruction. > This pattern can emerge if the result of the and is compared against two values where one of the values is zero. The matcher does not have the capability to know that the instruction mentioned above also set the flag register to the same value. > According to https://www.felixcloutier.com/x86/and, https://www.felixcloutier.com/x86/xor, https://www.felixcloutier.com/x86/or and https://www.felixcloutier.com/x86/test the flags are set to same values for TEST, AND, XOR and OR, so this should be safe. > By adding peephole rules to remove the TEST instructions, the resulting assembly code can be shortend and a small speedup can be observed: > Results on Intel Core i5-8250U CPU > Before this patch: > > Benchmark Mode Cnt Score Error Units > TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 182.353 ? 1.751 ns/op > TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op > TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 212.836 ? 0.310 ns/op > TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 2.072 ? 0.002 ns/op > TestRemovalPeephole.benchmarkOrTestFusableInt avgt 8 72.052 ? 0.215 ns/op > TestRemovalPeephole.benchmarkOrTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op > TestRemovalPeephole.benchmarkOrTestFusableLong avgt 8 113.396 ? 0.666 ns/op > TestRemovalPeephole.benchmarkOrTestFusableLongSingle avgt 8 1.183 ? 0.001 ns/op > TestRemovalPeephole.benchmarkXorTestFusableInt avgt 8 88.683 ? 2.034 ns/op > TestRemovalPeephole.benchmarkXorTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op > TestRemovalPeephole.benchmarkXorTestFusableLong avgt 8 113.271 ? 0.602 ns/op > TestRemovalPeephole.benchmarkXorTestFusableLongSingle avgt 8 1.183????0.001??ns/op > > After this patch: > > Benchmark Mode Cnt Score Error Units Change > TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 141.615 ? 4.747 ns/op ~29% faster > TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op (unchanged) > TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 213.249 ? 1.094 ns/op (unchanged) > TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 2.074 ? 0.011... JBS issue: https://bugs.openjdk.org/browse/JDK-8312213 ------------- PR Comment: https://git.openjdk.org/jdk/pull/14172#issuecomment-1639607590 From jvernee at openjdk.org Mon Jul 24 13:42:00 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Mon, 24 Jul 2023 13:42:00 GMT Subject: RFR: 8312213: Remove unnecessary TEST instructions on x86 when flags reg will already be set In-Reply-To: References: Message-ID: On Thu, 13 Jul 2023 15:17:25 GMT, Jorn Vernee wrote: >> This patch adds peephole rules to remove TEST instructions that operate on the result and right after a AND, XOR or OR instruction. >> This pattern can emerge if the result of the and is compared against two values where one of the values is zero. The matcher does not have the capability to know that the instruction mentioned above also set the flag register to the same value. >> According to https://www.felixcloutier.com/x86/and, https://www.felixcloutier.com/x86/xor, https://www.felixcloutier.com/x86/or and https://www.felixcloutier.com/x86/test the flags are set to same values for TEST, AND, XOR and OR, so this should be safe. >> By adding peephole rules to remove the TEST instructions, the resulting assembly code can be shortend and a small speedup can be observed: >> Results on Intel Core i5-8250U CPU >> Before this patch: >> >> Benchmark Mode Cnt Score Error Units >> TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 182.353 ? 1.751 ns/op >> TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op >> TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 212.836 ? 0.310 ns/op >> TestRemovalPeephole.benchmarkAndTestFusableLongSingle avgt 8 2.072 ? 0.002 ns/op >> TestRemovalPeephole.benchmarkOrTestFusableInt avgt 8 72.052 ? 0.215 ns/op >> TestRemovalPeephole.benchmarkOrTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op >> TestRemovalPeephole.benchmarkOrTestFusableLong avgt 8 113.396 ? 0.666 ns/op >> TestRemovalPeephole.benchmarkOrTestFusableLongSingle avgt 8 1.183 ? 0.001 ns/op >> TestRemovalPeephole.benchmarkXorTestFusableInt avgt 8 88.683 ? 2.034 ns/op >> TestRemovalPeephole.benchmarkXorTestFusableIntSingle avgt 8 1.406 ? 0.002 ns/op >> TestRemovalPeephole.benchmarkXorTestFusableLong avgt 8 113.271 ? 0.602 ns/op >> TestRemovalPeephole.benchmarkXorTestFusableLongSingle avgt 8 1.183????0.001??ns/op >> >> After this patch: >> >> Benchmark Mode Cnt Score Error Units Change >> TestRemovalPeephole.benchmarkAndTestFusableInt avgt 8 141.615 ? 4.747 ns/op ~29% faster >> TestRemovalPeephole.benchmarkAndTestFusableIntSingle avgt 8 1.110 ? 0.002 ns/op (unchanged) >> TestRemovalPeephole.benchmarkAndTestFusableLong avgt 8 213.249 ? 1.094 ns/op (unchanged) >> TestRemovalPeephole.bench... > > src/hotspot/cpu/x86/x86_64.ad line 13796: > >> 13794: ins_pipe(ialu_reg); >> 13795: %} >> 13796: > > In `test_may_remove_5` the `new_root` function points to the constructor of one of these instructions (what's passed to `peepreplace`). Since the function pointer is unused, these are really not needed. (for `peepreplace` you can just pass `testI_reg` or `testL_reg`, it doesn't really matter). Potentially we could modify ADLC to make `peepreplace` optional, and then pass `nullptr` to the peepprocedure if there's no `peepreplace`: diff --git a/src/hotspot/share/adlc/output_c.cpp b/src/hotspot/share/adlc/output_c.cpp index 5276987eec4..05328453f73 100644 --- a/src/hotspot/share/adlc/output_c.cpp +++ b/src/hotspot/share/adlc/output_c.cpp @@ -1469,10 +1469,14 @@ void ArchDesc::definePeephole(FILE *fp, InstructForm *node) { // End of scope for this peephole's constraints fprintf(fp, " }\n"); } else { - const char* replace_inst = nullptr; - preplace->next_instruction(replace_inst); - // Generate the target instruction - fprintf(fp, " auto replacing = [](){ return static_cast(new %sNode()); };\n", replace_inst); + if (preplace != nullptr) { + const char* replace_inst = nullptr; + preplace->next_instruction(replace_inst); + // Generate the target instruction + fprintf(fp, " auto replacing = [](){ return static_cast(new %sNode()); };\n", replace_inst); + } else { + fprintf(fp, " auto replacing = nullptr;\n"); + } // Call the precedure fprintf(fp, " bool replacement = Peephole::%s(block, block_index, cfg_, ra_, replacing", pprocedure->name()); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14172#discussion_r1262719608 From duke at openjdk.org Mon Jul 24 13:44:44 2023 From: duke at openjdk.org (Tobias Hotz) Date: Mon, 24 Jul 2023 13:44:44 GMT Subject: RFR: 8312213: Remove unnecessary TEST instructions on x86 when flags reg will already be set In-Reply-To: References: Message-ID: On Fri, 14 Jul 2023 16:58:58 GMT, Jorn Vernee wrote: >> @ichttt Yes, I don't think tracking all the nodes that use flags is elegant, you can get the use from the test node, then iterate the operands of the use to find the `cmpOp` operand which you can know which flag it is using. For most nodes, this will be enough, and for `add`/`sub` the transformation can be applied if `cmpOp->ccode() == Assembler::zero || cmpOp->ccode() == Assembler::notZero`. I think try to do this for `add/sub` would be beneficial since they are more common and they can be macro-fuse with `jcc`. > >> I don't think tracking all the nodes that use flags is elegant, you can get the use from the test node, then iterate the operands of the use to find the cmpOp operand which you can know which flag it is using. > > Yeah, that sounds better. I think in that case we only need 2 peephole rules, one for `testI_reg` and one for `testL_reg`. In the peep procedure, the rule of the prior instruction can be mapped to the flags that it sets (e.g. a simple `switch` from rule -> mask of flags). Then, if the prior instruction sets all the flags needed by the user of the `test`, we can eliminate the `test` instruction. > > In the future, ADL could be extended to allow setting which flags are set or used by an instruction, and then we don't need the ad-hoc mapping of rules -> flags. I've updated the PR to follow the approach suggested by @JornVernee . The AD files now contains the information regarding which flag from the EFLAGS register is actually set or cleared. The only downside of this I currently see is that it uses a lot of space from the flags register, which now only has space for three more flags in the future. Not all the flags that are set in the AD file are currently used, but I thoght it might be cleaner to list all of them instead of only those needed for the test removal. I've tested this locally and it seems to work as intended. We also now have less overhead in the peephole phase as there are only two peepholes remaing (one of testI_reg and one for testL_reg) instead of 6. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14172#issuecomment-1647940525 From stuefe at openjdk.org Mon Jul 24 13:56:43 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 24 Jul 2023 13:56:43 GMT Subject: RFR: JDK-8304954: SegmentedCodeCache fails when using large pages [v4] In-Reply-To: <-JV81fieOqw0DmOOiez1dFPBWTe_B79CXVzhdLYq_JI=.0d205e43-c162-422b-9e52-54f6ae769034@github.com> References: <-JV81fieOqw0DmOOiez1dFPBWTe_B79CXVzhdLYq_JI=.0d205e43-c162-422b-9e52-54f6ae769034@github.com> Message-ID: On Mon, 24 Jul 2023 13:30:39 GMT, Damon Fenacci wrote: >> src/hotspot/share/code/codeCache.cpp line 324: >> >>> 322: log_warning(codecache)("%s", msg); >>> 323: warning("%s", msg); >>> 324: } >> >> This "warn if page size is not as expected" topic is more complex. >> >> For a start, the text is misleading. You don't reserve anything here. "reserve" has a very clear meaning. Here, you only calculated a page size that fits CodeCacheSize geometry, and then found it is smaller than what the user probably wanted, and you want to warn the user about this. Makes sense. >> >> We also have the following cases: >> >> - code cache could, in theory, be satisfied with large pages, but its size is not aligned with large page size. E.g. 2MB pages and CodeCacheSize=101m, would result in code cache using 4KB pages. >> - The user may not have LPSIB specified, so it's 0, but he specified +UseLargePages. Then, he may not have a specific page size in mind but may still want to know if the page size of the code cache is not a large page. >> >> I think if you warn here when we divert from planned page size for geometry reasons, you should warn for these cases too. An acceptable minimum would be "ps < os::default_large_page_size()" > > Right, thanks! > > There is just this point that it is not completely clear to me: >> * code cache could, in theory, be satisfied with large pages, but its size is not aligned with large page size. E.g. 2MB pages and CodeCacheSize=101m, would result in code cache using 4KB pages. > > Do you mean that with such a configuration (2MB pages and CodeCacheSize=101m) would the code cache use 4KB pages anyway later on? Wouldn't `CodeCache::page_size` return 2MB pages (and possibly align the code cache to 102MB)? (I guess I'm missing something here) I may be wrong too. Looking closer, I think it aligns up. So, I think you can forget the first of the two points. Sorry for the confusion. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14903#discussion_r1272298952 From dfenacci at openjdk.org Mon Jul 24 14:18:46 2023 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 24 Jul 2023 14:18:46 GMT Subject: RFR: JDK-8304954: SegmentedCodeCache fails when using large pages [v4] In-Reply-To: References: Message-ID: On Mon, 24 Jul 2023 12:32:06 GMT, Thomas Stuefe wrote: >> test/hotspot/jtreg/compiler/codecache/CheckLargePages.java line 28: >> >>> 26: * @bug 8304954 >>> 27: * @summary Test checks that if using large pages and code cache gets above the limit it tries to revert to smaller pages instead of failing >>> 28: * @requires vm.gc != "Z" >> >> Why not Z? > > I'd limit this to linux only. No need to run on OSes that don't support large pages. > > (If we want to be really correct, windows too, but I'm not so sure how stable and well-tested large page support there is) > Why not Z? It seems that ZGC requires large pages of specific given size (2MB. The test fails with *Incompatible -XX:LargePageSizeInBytes, only 2M large pages are supported by ZGC* otherwise). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14903#discussion_r1272328246 From stuefe at openjdk.org Mon Jul 24 14:23:43 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 24 Jul 2023 14:23:43 GMT Subject: RFR: JDK-8304954: SegmentedCodeCache fails when using large pages [v4] In-Reply-To: References: Message-ID: <8VdpC_umflzJpGbJjz5zr4Hy_ArYVMPoSG8osiKeM4E=.1fdc1258-8521-4beb-84ed-fd495eb9a2a9@github.com> On Mon, 24 Jul 2023 14:15:38 GMT, Damon Fenacci wrote: >> I'd limit this to linux only. No need to run on OSes that don't support large pages. >> >> (If we want to be really correct, windows too, but I'm not so sure how stable and well-tested large page support there is) > >> Why not Z? > > It seems that ZGC requires large pages of specific given size (2MB. The test fails with *Incompatible -XX:LargePageSizeInBytes, only 2M large pages are supported by ZGC* otherwise). Oh right. But this issue is not strictly speaking limited to 1G. It could happen on 2M, too, if you were to allocate a code cache size < 8*2, e.g. 15M. If one gets the JVM to run with that little code cache. Up to you, though. I'm fine with leaving Zgc out too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14903#discussion_r1272335437 From roland at openjdk.org Mon Jul 24 14:33:51 2023 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 24 Jul 2023 14:33:51 GMT Subject: RFR: 8312440: assert(cast != nullptr) failed: must have added a cast to pin the node Message-ID: The test has a loop nest with 2 loops. A node is sunk out of the inner loop. A cast is created to pin the node out of loop. The logic added by 8308103 looks for an existing cast with the same inputs. It finds one and uses that one. That existing cast was created before the current round of loop opts. Eventhough it has the right out of inner loop (but in outer loop) control input, it was assigned a control out of both loops (the control is legal, it just happens that no use keeps the cast in the outer loop). Next, the same node is sunk out of the outer loop. The logic for sinking nodes looks for an input in the outer loop (otherwise why would the node be in the outer loop) and find none even though there should be one. The assert fires. The reason is that the cast input has control out of loop. The fix I propose is to not use an existing cast if it doesn't have the expected control. ------------- Commit messages: - fix & test Changes: https://git.openjdk.org/jdk/pull/14999/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14999&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8312440 Stats: 58 lines in 2 files changed: 57 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14999.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14999/head:pull/14999 PR: https://git.openjdk.org/jdk/pull/14999 From rrich at openjdk.org Mon Jul 24 14:54:57 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 24 Jul 2023 14:54:57 GMT Subject: RFR: 8312495: assert(0 <= i && i < _len) failed: illegal index after JDK-8287061 on big endian platforms [v2] In-Reply-To: <51ZON6Vdj1yrENrGYzmMEndXIRvJ1d7Nk72a4ikx3rw=.ee657823-f69f-42ce-886a-25a7f6d71840@github.com> References: <51ZON6Vdj1yrENrGYzmMEndXIRvJ1d7Nk72a4ikx3rw=.ee657823-f69f-42ce-886a-25a7f6d71840@github.com> Message-ID: <24Hws5aSMzWWUvGj_p6KqOgL_gFzo8JRPfHAq-YXkQg=.e58f202e-6b90-4956-a2a3-2473bff63b22@github.com> > On big endian platforms `jint` values are stored in the high part of `StackValue` values. Therefore the the `StackValue` cannot be cast directly to `jint`. More details why this has to be like this are given in the JBS issue. > > This is a common pattern. See also > > https://github.com/openjdk/jdk/blob/8d29329138d44800ee4c0c02dacc01a06097de66/src/hotspot/share/runtime/deoptimization.cpp#L1386-L1387 > https://github.com/openjdk/jdk/blob/8d29329138d44800ee4c0c02dacc01a06097de66/src/hotspot/share/runtime/deoptimization.cpp#L1513-L1514 > > ### Testing > Manny iterations of vmTestbase/vm/mlvm/meth/stress/compiler/sequences/Test.java. > > JTReg tests: tier1-4 of hotspot and jdk. All of Langtools and jaxp. Renaissance benchmarks as functional tests. > > All testing was done with fastdebug and release builds on the main platforms and also on Linux/PPC64le. Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: Encapsulate endianess dependencies in StackValue::get_jint and set_jint ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14976/files - new: https://git.openjdk.org/jdk/pull/14976/files/224a65f7..9be56893 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14976&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14976&range=00-01 Stats: 90 lines in 7 files changed: 14 ins; 21 del; 55 mod Patch: https://git.openjdk.org/jdk/pull/14976.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14976/head:pull/14976 PR: https://git.openjdk.org/jdk/pull/14976 From rrich at openjdk.org Mon Jul 24 14:59:41 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 24 Jul 2023 14:59:41 GMT Subject: RFR: 8312495: assert(0 <= i && i < _len) failed: illegal index after JDK-8287061 on big endian platforms [v2] In-Reply-To: <24Hws5aSMzWWUvGj_p6KqOgL_gFzo8JRPfHAq-YXkQg=.e58f202e-6b90-4956-a2a3-2473bff63b22@github.com> References: <51ZON6Vdj1yrENrGYzmMEndXIRvJ1d7Nk72a4ikx3rw=.ee657823-f69f-42ce-886a-25a7f6d71840@github.com> <24Hws5aSMzWWUvGj_p6KqOgL_gFzo8JRPfHAq-YXkQg=.e58f202e-6b90-4956-a2a3-2473bff63b22@github.com> Message-ID: On Mon, 24 Jul 2023 14:54:57 GMT, Richard Reingruber wrote: >> On big endian platforms `jint` values are stored in the high part of `StackValue` values. Therefore the the `StackValue` cannot be cast directly to `jint`. More details why this has to be like this are given in the JBS issue. >> >> This is a common pattern. See also >> >> https://github.com/openjdk/jdk/blob/8d29329138d44800ee4c0c02dacc01a06097de66/src/hotspot/share/runtime/deoptimization.cpp#L1386-L1387 >> https://github.com/openjdk/jdk/blob/8d29329138d44800ee4c0c02dacc01a06097de66/src/hotspot/share/runtime/deoptimization.cpp#L1513-L1514 >> >> ### Testing >> Manny iterations of vmTestbase/vm/mlvm/meth/stress/compiler/sequences/Test.java. >> >> JTReg tests: tier1-4 of hotspot and jdk. All of Langtools and jaxp. Renaissance benchmarks as functional tests. >> >> All testing was done with fastdebug and release builds on the main platforms and also on Linux/PPC64le. > > Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: > > Encapsulate endianess dependencies in StackValue::get_jint and set_jint I've done the the refactoring. There are a few places were I wasn't sure what to do: * differences in the handling of `Location::dbl` and `lng` in `reassign_fields_by_klass` and `reassign_type_array_elements` * Should the logic of `StackValueCollection::float_at` and similar be moved to `StackValue`? * what to do with `byte_array_put`? * Should `get_jint` be implemented as a simple (jint) cast of `_integer_value` on little endian platforms? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14976#issuecomment-1648074874 From chagedorn at openjdk.org Mon Jul 24 15:36:44 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 24 Jul 2023 15:36:44 GMT Subject: RFR: 8308504: C2: "malformed control flow" after JDK-8303466 [v3] In-Reply-To: <8MfyIWOwZcFRmz3pogkoUpYBA2nOCMaWegkSDJVL1j4=.aaf39cb9-09af-4aa7-8d63-6cf38f8c1d34@github.com> References: <7Vy9q8XEuzC40Za_CK-nc6G3EAZocEJ0T8-gdWTi9So=.d9aece29-c2d4-49c7-9a2c-22abc56920e8@github.com> <8MfyIWOwZcFRmz3pogkoUpYBA2nOCMaWegkSDJVL1j4=.aaf39cb9-09af-4aa7-8d63-6cf38f8c1d34@github.com> Message-ID: On Mon, 24 Jul 2023 11:18:10 GMT, Roland Westrelin wrote: >> I took that bug over from Emanuel because he's away: >> https://github.com/openjdk/jdk/pull/14331 >> >> I tried adding a `CastII` to narrow the limit of the loop as I >> suggested in a comment on the PR but I found that doesn't work in all >> cases: if the type of the initial value for the loop variable is not >> narrow enough, then the narrower type for the limit doesn't help >> narrow the loop phi type. >> >> What I propose instead is to add an assert predicate that catches when >> the main loop is unreachable but the zero trip count doesn't constant >> fold. For that to work, the order of predicates must be preserved when >> they are copied or updated. I had to make some small changes to >> guarantee that. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review > I tried adding a CastII to narrow the limit of the loop as I suggested in a comment on the PR but I found that doesn't work in all cases: if the type of the initial value for the loop variable is not narrow enough, then the narrower type for the limit doesn't help narrow the loop phi type. I hoped that this would be enough. Can you show an example where this failed? src/hotspot/share/opto/loopPredicate.cpp line 1505: > 1503: (stride > 0) != (scale > 0), overflow); > 1504: add_template_assertion_predicate_helper(predicate_proj, reason, upper_bound_proj, bol, > 1505: overflow ? Op_If : iff->Opcode()); Is this supposed to replace L1506-1510? src/hotspot/share/opto/loopPredicate.cpp line 1524: > 1522: bol = rc_predicate(loop, new_proj, scale, offset, max_value, limit, stride, rng, (stride > 0) != (scale > 0), > 1523: overflow); > 1524: add_template_assertion_predicate_helper(predicate_proj, reason, new_proj, bol, overflow ? Op_If : iff->Opcode()); You probably need to assign the result to `new_proj`? ------------- PR Review: https://git.openjdk.org/jdk/pull/14973#pullrequestreview-1543608845 PR Review Comment: https://git.openjdk.org/jdk/pull/14973#discussion_r1272338663 PR Review Comment: https://git.openjdk.org/jdk/pull/14973#discussion_r1272369875 From kvn at openjdk.org Mon Jul 24 16:51:42 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 24 Jul 2023 16:51:42 GMT Subject: RFR: 8312495: assert(0 <= i && i < _len) failed: illegal index after JDK-8287061 on big endian platforms [v2] In-Reply-To: References: <51ZON6Vdj1yrENrGYzmMEndXIRvJ1d7Nk72a4ikx3rw=.ee657823-f69f-42ce-886a-25a7f6d71840@github.com> <24Hws5aSMzWWUvGj_p6KqOgL_gFzo8JRPfHAq-YXkQg=.e58f202e-6b90-4956-a2a3-2473bff63b22@github.com> Message-ID: On Mon, 24 Jul 2023 14:56:27 GMT, Richard Reingruber wrote: > I've done the the refactoring. There are a few places were I wasn't sure what to do: Good. > * differences in the handling of `Location::dbl` and `lng` in `reassign_fields_by_klass` and `reassign_type_array_elements` What difference you talking about? > * Should the logic of `StackValueCollection::float_at` and similar be moved to `StackValue`? I think this is the case for separate RFE. > * what to do with `byte_array_put`? Explain please > * Should `get_jint` be implemented as a simple (jint) cast of `_integer_value` on little endian platforms? No. The code you suggest works for all case - there is no need to complicate it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14976#issuecomment-1648262280 From duke at openjdk.org Mon Jul 24 18:22:56 2023 From: duke at openjdk.org (Ashutosh Mehra) Date: Mon, 24 Jul 2023 18:22:56 GMT Subject: RFR: 8312596: Potential null pointer access in Compile::TracePhase::~TracePhase after JDK-8311976 Message-ID: Please review this PR to fix a potential null pointer access in using `_compile`. Updated the code to unconditionally initialize `_compile` and added an assert (similar to C1's `PhaseTraceTime` constructor) for it to be non-null. ------------- Commit messages: - 8312596: Potential null pointer access in Compile::TracePhase::~TracePhase after JDK-8311976 Changes: https://git.openjdk.org/jdk/pull/15002/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15002&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8312596 Stats: 7 lines in 1 file changed: 4 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/15002.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15002/head:pull/15002 PR: https://git.openjdk.org/jdk/pull/15002 From gbarany at openjdk.org Mon Jul 24 19:03:45 2023 From: gbarany at openjdk.org (=?UTF-8?B?R2VyZ8O2?= Barany) Date: Mon, 24 Jul 2023 19:03:45 GMT Subject: RFR: 8312579: [JVMCI] JVMCI support for virtual Vector API objects Message-ID: Optimized Vector API values are represented as raw values in SIMD registers. When deoptimizing with such a value in the state, a Java heap object must be recreated. HotSpot has a special "Location::vector" location type to mark Vector API values, and it knows how to materialize such values. Extend the JVMCI code installer to mark the appropriate values as vectors so that JVMCI-compiled code can also deoptimize with Vector API values in SIMD registers. ------------- Commit messages: - 8312579: [JVMCI] JVMCI support for virtual Vector API objects Changes: https://git.openjdk.org/jdk/pull/15003/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15003&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8312579 Stats: 23 lines in 4 files changed: 16 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/15003.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15003/head:pull/15003 PR: https://git.openjdk.org/jdk/pull/15003 From rrich at openjdk.org Mon Jul 24 19:48:43 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 24 Jul 2023 19:48:43 GMT Subject: RFR: 8312495: assert(0 <= i && i < _len) failed: illegal index after JDK-8287061 on big endian platforms [v2] In-Reply-To: <24Hws5aSMzWWUvGj_p6KqOgL_gFzo8JRPfHAq-YXkQg=.e58f202e-6b90-4956-a2a3-2473bff63b22@github.com> References: <51ZON6Vdj1yrENrGYzmMEndXIRvJ1d7Nk72a4ikx3rw=.ee657823-f69f-42ce-886a-25a7f6d71840@github.com> <24Hws5aSMzWWUvGj_p6KqOgL_gFzo8JRPfHAq-YXkQg=.e58f202e-6b90-4956-a2a3-2473bff63b22@github.com> Message-ID: On Mon, 24 Jul 2023 14:54:57 GMT, Richard Reingruber wrote: >> On big endian platforms `jint` values are stored in the high part of `StackValue` values. Therefore the the `StackValue` cannot be cast directly to `jint`. More details why this has to be like this are given in the JBS issue. >> >> This is a common pattern. See also >> >> https://github.com/openjdk/jdk/blob/8d29329138d44800ee4c0c02dacc01a06097de66/src/hotspot/share/runtime/deoptimization.cpp#L1386-L1387 >> https://github.com/openjdk/jdk/blob/8d29329138d44800ee4c0c02dacc01a06097de66/src/hotspot/share/runtime/deoptimization.cpp#L1513-L1514 >> >> ### Testing >> Manny iterations of vmTestbase/vm/mlvm/meth/stress/compiler/sequences/Test.java. >> >> JTReg tests: tier1-4 of hotspot and jdk. All of Langtools and jaxp. Renaissance benchmarks as functional tests. >> >> All testing was done with fastdebug and release builds on the main platforms and also on Linux/PPC64le. > > Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: > > Encapsulate endianess dependencies in StackValue::get_jint and set_jint The langtools/tier1 failure on linux-x86 looks unrelated # Internal Error (g1ConcurrentMark.cpp:1671), pid=37061, tid=37071 # fatal error: Overflow during reference processing, can not continue. Current mark stack depth: 65472, MarkStackSize: 65536, MarkStackSizeMax: 4194304. Please increase MarkStackSize and/or MarkStackSizeMax and restart. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14976#issuecomment-1648508192 From duke at openjdk.org Mon Jul 24 20:08:41 2023 From: duke at openjdk.org (Ashutosh Mehra) Date: Mon, 24 Jul 2023 20:08:41 GMT Subject: RFR: 8312596: Potential null pointer access in Compile::TracePhase::~TracePhase after JDK-8311976 In-Reply-To: References: Message-ID: On Mon, 24 Jul 2023 18:16:19 GMT, Ashutosh Mehra wrote: > Please review this PR to fix a potential null pointer access in using `_compile`. > Updated the code to unconditionally initialize `_compile` and added an assert (similar to C1's `PhaseTraceTime` constructor) for it to be non-null. windows-aarch64 builds failed with: d:\a\jdk\jdk\src\hotspot\cpu\aarch64\assembler_aarch64.hpp(657): error C2220: the following warning is treated as an error d:\a\jdk\jdk\src\hotspot\cpu\aarch64\assembler_aarch64.hpp(657): warning C4334: '<<': result of 32-bit shift implicitly converted to 64 bits (was 64-bit shift intended?) Seems unrelated to these changes. Probably caused by this [patch](https://github.com/openjdk/jdk/commit/7dd47998f00712515c25fb852b6c0cf958120508) @coleenp ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15002#issuecomment-1648531829 From rrich at openjdk.org Mon Jul 24 20:24:43 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 24 Jul 2023 20:24:43 GMT Subject: RFR: 8312495: assert(0 <= i && i < _len) failed: illegal index after JDK-8287061 on big endian platforms [v2] In-Reply-To: References: <51ZON6Vdj1yrENrGYzmMEndXIRvJ1d7Nk72a4ikx3rw=.ee657823-f69f-42ce-886a-25a7f6d71840@github.com> <24Hws5aSMzWWUvGj_p6KqOgL_gFzo8JRPfHAq-YXkQg=.e58f202e-6b90-4956-a2a3-2473bff63b22@github.com> Message-ID: On Mon, 24 Jul 2023 16:49:11 GMT, Vladimir Kozlov wrote: > > > * differences in the handling of `Location::dbl` and `lng` in `reassign_fields_by_klass` and `reassign_type_array_elements` > > What difference you talking about? `reassign_fields_by_klass` falls through to the `T_LONG` and `T_DOUBLE` case https://github.com/openjdk/jdk/blob/8008e27c55030b397e2040bc3cf8408e47edf412/src/hotspot/share/runtime/deoptimization.cpp#L1518 where the intptr is fed as-is into a simple `long_field_put` https://github.com/openjdk/jdk/blob/2bdfa836adbeba3319bee4ee61017907d6d84d58/src/hotspot/share/runtime/deoptimization.cpp#L1528 `reassign_type_array_elements` extracts the halfs from the intptr value and stores them separately https://github.com/openjdk/jdk/blob/8008e27c55030b397e2040bc3cf8408e47edf412/src/hotspot/share/runtime/deoptimization.cpp#L1383-L1384 Instead of the 2 stores with the complex casts just one `long_at_put` could be done. Or is there a reason not to do that? > > * Should the logic of `StackValueCollection::float_at` and similar be moved to `StackValue`? > > I think this is the case for separate RFE. Ok. > > * what to do with `byte_array_put`? > > Explain please Well, it has very similar patterns like the locations I changed so far. I guess it is better to leave it alone for now. > > * Should `get_jint` be implemented as a simple (jint) cast of `_integer_value` on little endian platforms? > > No. The code you suggest works for all case - there is no need to complicate it. Sure. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14976#issuecomment-1648550468 From kvn at openjdk.org Mon Jul 24 21:43:41 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 24 Jul 2023 21:43:41 GMT Subject: RFR: 8312495: assert(0 <= i && i < _len) failed: illegal index after JDK-8287061 on big endian platforms [v2] In-Reply-To: References: <51ZON6Vdj1yrENrGYzmMEndXIRvJ1d7Nk72a4ikx3rw=.ee657823-f69f-42ce-886a-25a7f6d71840@github.com> <24Hws5aSMzWWUvGj_p6KqOgL_gFzo8JRPfHAq-YXkQg=.e58f202e-6b90-4956-a2a3-2473bff63b22@github.com> Message-ID: On Mon, 24 Jul 2023 19:45:36 GMT, Richard Reingruber wrote: > The langtools/tier1 failure on linux-x86 looks unrelated > > ``` > # Internal Error (g1ConcurrentMark.cpp:1671), pid=37061, tid=37071 > # fatal error: Overflow during reference processing, can not continue. Current mark stack depth: 65472, MarkStackSize: 65536, MarkStackSizeMax: 4194304. Please increase MarkStackSize and/or MarkStackSizeMax and restart. > ``` https://bugs.openjdk.org/browse/JDK-8312534 ------------- PR Comment: https://git.openjdk.org/jdk/pull/14976#issuecomment-1648666684 From kvn at openjdk.org Mon Jul 24 21:59:43 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 24 Jul 2023 21:59:43 GMT Subject: RFR: 8312495: assert(0 <= i && i < _len) failed: illegal index after JDK-8287061 on big endian platforms [v2] In-Reply-To: References: <51ZON6Vdj1yrENrGYzmMEndXIRvJ1d7Nk72a4ikx3rw=.ee657823-f69f-42ce-886a-25a7f6d71840@github.com> <24Hws5aSMzWWUvGj_p6KqOgL_gFzo8JRPfHAq-YXkQg=.e58f202e-6b90-4956-a2a3-2473bff63b22@github.com> Message-ID: On Mon, 24 Jul 2023 20:22:10 GMT, Richard Reingruber wrote: > > > * differences in the handling of `Location::dbl` and `lng` in `reassign_fields_by_klass` and `reassign_type_array_elements` > > > > > > What difference you talking about? So you are asking about difference for `case T_INT: case T_FLOAT:` and not `case T_LONG: case T_DOUBLE:`. Let leave it as it is for now and file RFE to investigate (to factor out and use the same code). ------------- PR Comment: https://git.openjdk.org/jdk/pull/14976#issuecomment-1648681997 From kvn at openjdk.org Mon Jul 24 22:10:41 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 24 Jul 2023 22:10:41 GMT Subject: RFR: 8312495: assert(0 <= i && i < _len) failed: illegal index after JDK-8287061 on big endian platforms [v2] In-Reply-To: References: <51ZON6Vdj1yrENrGYzmMEndXIRvJ1d7Nk72a4ikx3rw=.ee657823-f69f-42ce-886a-25a7f6d71840@github.com> <24Hws5aSMzWWUvGj_p6KqOgL_gFzo8JRPfHAq-YXkQg=.e58f202e-6b90-4956-a2a3-2473bff63b22@github.com> Message-ID: On Mon, 24 Jul 2023 20:22:10 GMT, Richard Reingruber wrote: >>> what to do with byte_array_put? >> Explain please > Well, it has very similar patterns like the locations I changed so far. I guess it is better to leave it alone for now. It is not big change - just pass `StackValue*` argument. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14976#issuecomment-1648692078 From jwaters at openjdk.org Tue Jul 25 01:45:50 2023 From: jwaters at openjdk.org (Julian Waters) Date: Tue, 25 Jul 2023 01:45:50 GMT Subject: RFR: 8312502: Mass migrate HotSpot attributes to the correct location [v3] In-Reply-To: References: Message-ID: On Mon, 24 Jul 2023 05:34:04 GMT, Julian Waters wrote: >> Someone had to do it, so I did. Moves attributes to the correct place as specified in the HotSpot Style Guide once and for all > > Julian Waters has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 41 additional commits since the last revision: > > - Merge branch 'openjdk:master' into patch-6 > - Merge branch 'openjdk:master' into patch-6 > - arguments.hpp > - arguments.hpp > - globalDefinitions_gcc.hpp > - assembler_aarch64.hpp > - macroAssembler_aarch64.cpp > - vmError.cpp > - vmError.cpp > - macroAssembler_aarch64.cpp > - ... and 31 more: https://git.openjdk.org/jdk/compare/090c3cd9...d60d8923 Hmm... I was doing this so that what attribute specifiers applied to would be clearer since it's helpful to have the ability to make them apply to different aspects of a method as per C++11. Could I split this up and discard some of the areas in which the changes are less agreeable? (as advised by someone who reached out to me privately) ------------- PR Comment: https://git.openjdk.org/jdk/pull/14969#issuecomment-1648839267 From kbarrett at openjdk.org Tue Jul 25 02:30:40 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 25 Jul 2023 02:30:40 GMT Subject: RFR: 8312502: Mass migrate HotSpot attributes to the correct location [v3] In-Reply-To: References: Message-ID: On Tue, 25 Jul 2023 01:41:24 GMT, Julian Waters wrote: > Hmm... I was doing this so that what attribute specifiers applied to would be clearer since it's helpful to have the ability to make them apply to different aspects of a method as per C++11. Could I split this up and discard some of the areas in which the changes are less agreeable? (as advised by someone who reached out to me privately) That can be done when other attributes are being added to a declaration. For example, various error reporting functions had the `[[noreturn]]` attribute added by JDK-8303805. Many of them also had a trailing ATTRIBUTE_PRINTF, which was moved to the front at the same time. There were also one or two functions there whose trailing ATTRIBUTE_PRINTF was moved to the front even though not noreturn, for local consistency (e.g. `warning`). (So yes, sometimes it's okay to normalize some outliers. That doesn't seem like what's going on in this PR though.) There currently aren't other standard attributes to be dealt with. That may change as we adopt newer versions of the standard. But even so, the scope of the adoption changes is going to be limited. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14969#issuecomment-1648898589 From dfenacci at openjdk.org Tue Jul 25 07:41:44 2023 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 25 Jul 2023 07:41:44 GMT Subject: RFR: JDK-8304954: SegmentedCodeCache fails when using large pages [v4] In-Reply-To: References: Message-ID: On Mon, 24 Jul 2023 13:13:03 GMT, Thomas Stuefe wrote: >> Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: >> >> JDK-8304954: merge ifs checking when to print warning > > src/hotspot/share/code/codeCache.cpp line 322: > >> 320: byte_size_in_exact_unit(LargePageSizeInBytes), exact_unit_for_byte_size(LargePageSizeInBytes), >> 321: byte_size_in_exact_unit(ps), exact_unit_for_byte_size(ps)); >> 322: log_warning(codecache)("%s", msg); > > Also, no need to print into a temp buffer. log_xxx() accepts var-args, so you can pass the format string + args directly. I've used the buffer since the message is used in 2 different places: https://github.com/openjdk/jdk/blob/e1b09a9fe267d6aa48f4656652411000a4f4d2ee/src/hotspot/share/code/codeCache.cpp#L317-L323 (not 100% sure why, but I've noticed that it is how warnings are handled in other places in the code) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14903#discussion_r1273128448 From pli at openjdk.org Tue Jul 25 07:49:47 2023 From: pli at openjdk.org (Pengfei Li) Date: Tue, 25 Jul 2023 07:49:47 GMT Subject: RFR: 8312570: [TESTBUG] Jtreg compiler/loopopts/superword/TestDependencyOffsets.java fails on 512-bit SVE Message-ID: Hotspot jtreg `compiler/loopopts/superword/TestDependencyOffsets.java` fails on AArch64 CPUs with 512-bit SVE. The reason is that many test loops in the code cannot be vectorized due to data dependence but IR tests assume they can. On AArch64, these IR tests just check the CPU feature of `asimd` and incorrectly assumes AArch64 vectors are at most 256 bits. But actually, `asimd` on AArch64 only represents NEON vectors which are at most 128 bits. AArch64 CPUs may have another feature of `sve` which represents scalable vectors of at most 2048 bits. The vectorization won't succeed on 512-bit SVE CPUs if the memory offset between some read and write is less than 512 bits. As this jtreg is auto-generated by a python script, we have updated the script and re-generated this jtreg. In this new version, we checked the auto-vectorization on both NEON-only and NEON+SVE platforms. Below is the diff of the generator script. We have also attached the new script to the JBS page. @@ -321,7 +321,8 @@ class Type: p.append(Platform("avx512", ["avx512", "true"], 64)) else: assert False, "type not implemented" + self.name - p.append(Platform("asimd", ["asimd", "true"], 32)) + p.append(Platform("asimd", ["asimd", "true", "sve", "false"], 16)) + p.append(Platform("sve", ["sve", "true"], 256)) return p class Test: @@ -457,7 +458,7 @@ class Generator: lines.append(" * and various MaxVectorSize values, and +- AlignVector.") lines.append(" *") lines.append(" * Note: this test is auto-generated. Please modify / generate with script:") - lines.append(" * https://bugs.openjdk.org/browse/JDK-8308606") + lines.append(" * https://bugs.openjdk.org/browse/JDK-8312570") lines.append(" *") lines.append(" * Types: " + ", ".join([t.name for t in self.types])) lines.append(" * Offsets: " + ", ".join([str(o) for o in self.offsets])) @@ -598,7 +599,8 @@ class Generator: # IR rules for p in test.t.platforms(): elements = p.vector_width // test.t.size - lines.append(f" // CPU: {p.name} -> vector_width: {p.vector_width} -> elements in vector: {elements}") + max_pre = "max " if p.name == "sve" else "" + lines.append(f" // CPU: {p.name} -> {max_pre}vector_width: {p.vector_width} -> {max_pre}elements in vector: {elements}") ############### -AlignVector rule = PlatformIRRule(p) rule.add_pre_constraint("AlignVector", IRBool.makeFalse()) @@ -694,8 +696,8 @@ class Generator: def main(): g = Generator() g.generate("TestDependencyOffsets", - "/home/emanuel/Documents/fork7-jdk/open/test/hotspot/jtreg/compiler/loopopts/superword", - "8298935 8308606", # Big ID + "test/hotspot/jtreg/compiler/loopopts/superword", + "8298935 8308606 8312570", # Bug ID "compiler.loopopts.superword", # package ) We tested this on various of AArch64 CPUs. ------------- Commit messages: - 8312570: [TESTBUG] Jtreg compiler/loopopts/superword/TestDependencyOffsets.java fails on 512-bit SVE Changes: https://git.openjdk.org/jdk/pull/15010/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15010&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8312570 Stats: 2062 lines in 1 file changed: 1422 ins; 0 del; 640 mod Patch: https://git.openjdk.org/jdk/pull/15010.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15010/head:pull/15010 PR: https://git.openjdk.org/jdk/pull/15010 From pli at openjdk.org Tue Jul 25 07:53:42 2023 From: pli at openjdk.org (Pengfei Li) Date: Tue, 25 Jul 2023 07:53:42 GMT Subject: RFR: 8312570: [TESTBUG] Jtreg compiler/loopopts/superword/TestDependencyOffsets.java fails on 512-bit SVE In-Reply-To: References: Message-ID: On Tue, 25 Jul 2023 07:42:59 GMT, Pengfei Li wrote: > Hotspot jtreg `compiler/loopopts/superword/TestDependencyOffsets.java` fails on AArch64 CPUs with 512-bit SVE. The reason is that many test loops in the code cannot be vectorized due to data dependence but IR tests assume they can. > > On AArch64, these IR tests just check the CPU feature of `asimd` and incorrectly assumes AArch64 vectors are at most 256 bits. But actually, `asimd` on AArch64 only represents NEON vectors which are at most 128 bits. AArch64 CPUs may have another feature of `sve` which represents scalable vectors of at most 2048 bits. The vectorization won't succeed on 512-bit SVE CPUs if the memory offset between some read and write is less than 512 bits. > > As this jtreg is auto-generated by a python script, we have updated the script and re-generated this jtreg. In this new version, we checked the auto-vectorization on both NEON-only and NEON+SVE platforms. Below is the diff of the generator script. We have also attached the new script to the JBS page. > > > @@ -321,7 +321,8 @@ class Type: > p.append(Platform("avx512", ["avx512", "true"], 64)) > else: > assert False, "type not implemented" + self.name > - p.append(Platform("asimd", ["asimd", "true"], 32)) > + p.append(Platform("asimd", ["asimd", "true", "sve", "false"], 16)) > + p.append(Platform("sve", ["sve", "true"], 256)) > return p > > class Test: > @@ -457,7 +458,7 @@ class Generator: > lines.append(" * and various MaxVectorSize values, and +- AlignVector.") > lines.append(" *") > lines.append(" * Note: this test is auto-generated. Please modify / generate with script:") > - lines.append(" * https://bugs.openjdk.org/browse/JDK-8308606") > + lines.append(" * https://bugs.openjdk.org/browse/JDK-8312570") > lines.append(" *") > lines.append(" * Types: " + ", ".join([t.name for t in self.types])) > lines.append(" * Offsets: " + ", ".join([str(o) for o in self.offsets])) > @@ -598,7 +599,8 @@ class Generator: > # IR rules > for p in test.t.platforms(): > elements = p.vector_width // test.t.size > - lines.append(f" // CPU: {p.name} -> vector_width: {p.vector_width} -> elements in vector: {elements}") > + max_pre = "max " if p.name == "sve" else "" > + lines.append(f" // CPU: {p.name} -> {max_pre}vector_width: {p.vector_width} -> {max_pre}elements in vector: {elements}") > ############### -Align... @eme64 Please help look at this. And, how about adding the test generator script you wrote into the jdk repo? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15010#issuecomment-1649310477 From stuefe at openjdk.org Tue Jul 25 07:57:42 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 25 Jul 2023 07:57:42 GMT Subject: RFR: JDK-8304954: SegmentedCodeCache fails when using large pages [v4] In-Reply-To: References: Message-ID: <4GUGHqeA1ReJShydJkwCKdg4efE8b-ag-upP1UCK0EY=.46c95995-03ac-4993-a6fd-8c149d04dab9@github.com> On Tue, 25 Jul 2023 07:39:02 GMT, Damon Fenacci wrote: >> src/hotspot/share/code/codeCache.cpp line 322: >> >>> 320: byte_size_in_exact_unit(LargePageSizeInBytes), exact_unit_for_byte_size(LargePageSizeInBytes), >>> 321: byte_size_in_exact_unit(ps), exact_unit_for_byte_size(ps)); >>> 322: log_warning(codecache)("%s", msg); >> >> Also, no need to print into a temp buffer. log_xxx() accepts var-args, so you can pass the format string + args directly. > > I've used the buffer since the message is used in 2 different places: > https://github.com/openjdk/jdk/blob/e1b09a9fe267d6aa48f4656652411000a4f4d2ee/src/hotspot/share/code/codeCache.cpp#L317-L323 > (not 100% sure why, but I've noticed that it is how warnings are handled in other places in the code) Ah. missed the second warning. You don't need both. log_warning should be enough and mirrors what we usually do in other places. I assume "warning" is a relict from a time before UL. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14903#discussion_r1273146300 From pli at openjdk.org Tue Jul 25 08:40:59 2023 From: pli at openjdk.org (Pengfei Li) Date: Tue, 25 Jul 2023 08:40:59 GMT Subject: RFR: 8309697: [TESTBUG] Remove "@requires vm.flagless" from jtreg vectorization tests Message-ID: This patch removes `@require vm.flagless` annotations from HotSpot jtreg tests in `compiler/vectorization/runner`. All jtreg cases in this folder are invoked by test driver `VectorizationTestRunner.java` which checks both correctness and vectorizability (IR) for each test method. We added flagless requirement before because extra flags may mess with compiler control in the test driver for correctness check. But `flagless` has a side effect that it makes tests with extra flags skipped. So we propose to get rid of it now. To adapt the removal of `@require vm.flagless`, a few checks are added in the test driver to skip the correctness check if extra flags make the compiler control not work. This patch also moves previously hard-coded flag `-XX:-OptimizeFill` in the test driver to conditions in IR checks. Tested various of compiler control related VM flags on x86 and AArch64. ------------- Commit messages: - 8309697: [TESTBUG] Remove "@requires vm.flagless" from jtreg vectorization tests Changes: https://git.openjdk.org/jdk/pull/15011/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15011&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8309697 Stats: 81 lines in 23 files changed: 44 ins; 5 del; 32 mod Patch: https://git.openjdk.org/jdk/pull/15011.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15011/head:pull/15011 PR: https://git.openjdk.org/jdk/pull/15011 From pli at openjdk.org Tue Jul 25 08:51:42 2023 From: pli at openjdk.org (Pengfei Li) Date: Tue, 25 Jul 2023 08:51:42 GMT Subject: RFR: 8309697: [TESTBUG] Remove "@requires vm.flagless" from jtreg vectorization tests In-Reply-To: References: Message-ID: On Tue, 25 Jul 2023 08:35:11 GMT, Pengfei Li wrote: > This patch removes `@require vm.flagless` annotations from HotSpot jtreg tests in `compiler/vectorization/runner`. All jtreg cases in this folder are invoked by test driver `VectorizationTestRunner.java` which checks both correctness and vectorizability (IR) for each test method. We added flagless requirement before because extra flags may mess with compiler control in the test driver for correctness check. But `flagless` has a side effect that it makes tests with extra flags skipped. So we propose to get rid of it now. > > To adapt the removal of `@require vm.flagless`, a few checks are added in the test driver to skip the correctness check if extra flags make the compiler control not work. This patch also moves previously hard-coded flag `-XX:-OptimizeFill` in the test driver to conditions in IR checks. > > Tested various of compiler control related VM flags on x86 and AArch64. Hi @eme64 @vnkozlov, This removes flagless from jtreg vectorization tests as you previously suggested. Could you help take a look? ------------- PR Comment: https://git.openjdk.org/jdk/pull/15011#issuecomment-1649400351 From pli at openjdk.org Tue Jul 25 09:00:07 2023 From: pli at openjdk.org (Pengfei Li) Date: Tue, 25 Jul 2023 09:00:07 GMT Subject: RFR: 8312332: C2: Refactor SWPointer out from SuperWord Message-ID: As discussed in JDK-8308994, we should first do some refactoring work before proceeding with the new post loop vectorization. In this patch, we have done the following. 1) We have created new C2 source files `vectorization.[cpp|hpp]` for shared logics and utilities for C2's auto-vectorization. So far we have moved class `SWPointer` and `VectorElementSizeStats` here from `superword.[cpp|hpp]`. 2) We have decoupled `SWPointer` from class `SuperWord` and renamed it to `VPointer` as it will be used by vectorizers other than SuperWord. The original class `SWPointer` and its inner class `Tracer` both have a `_slp` field initialized in their constructors. In this patch, we have replaced them by other fields and re-written the constructors for the same functionality. Original `SWPointer::invariant()` calls function `SuperWord::find_pre_loop_end()` for loop invariant checks. To help decoupling, we moved function `find_pre_loop_end()` to class `CountedLoopNode`. As function `SWPointer::Tracer::invariant_1()` is tightly coupled with `SuperWord` but only prints some debug messages, we temporarily removed it in this patch. We will consider adding it back after later refactoring of `SuperWord` so we added a `TODO` at its call site in this patch. 3) We have a lot of memory phi node checks in loop optimizations. So we added a utility function `is_memory_phi()` in `node.hpp`. Tested tier1~3 on x86 and AArch64. Also manually verified that option `VectorizeDebug` in compiler directives still works well. ------------- Commit messages: - 8312332: C2: Refactor SWPointer out from SuperWord Changes: https://git.openjdk.org/jdk/pull/15013/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15013&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8312332 Stats: 1923 lines in 7 files changed: 966 ins; 909 del; 48 mod Patch: https://git.openjdk.org/jdk/pull/15013.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15013/head:pull/15013 PR: https://git.openjdk.org/jdk/pull/15013 From dfenacci at openjdk.org Tue Jul 25 09:12:44 2023 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 25 Jul 2023 09:12:44 GMT Subject: RFR: JDK-8304954: SegmentedCodeCache fails when using large pages [v4] In-Reply-To: References: Message-ID: On Mon, 24 Jul 2023 12:50:12 GMT, Thomas Stuefe wrote: >> Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: >> >> JDK-8304954: merge ifs checking when to print warning > > test/hotspot/jtreg/compiler/codecache/CheckLargePages.java line 61: > >> 59: } else { >> 60: System.out.println("1GB large pages not supported: UseLargePages=" + largePages + >> 61: (largePages ? ", largePageSize=" + largePageSize : "") + ". Skipping"); > > It would be nice to have an actual test for the downgrade. Eg if system supports 1g and 2m and 4k, check that we downgrade to 2m. It would be nice indeed. I was looking at a way to get all page sizes available from inside the test: do you know if that is possible? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14903#discussion_r1273238212 From stuefe at openjdk.org Tue Jul 25 09:27:43 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 25 Jul 2023 09:27:43 GMT Subject: RFR: JDK-8304954: SegmentedCodeCache fails when using large pages [v4] In-Reply-To: References: Message-ID: On Tue, 25 Jul 2023 09:09:56 GMT, Damon Fenacci wrote: >> test/hotspot/jtreg/compiler/codecache/CheckLargePages.java line 61: >> >>> 59: } else { >>> 60: System.out.println("1GB large pages not supported: UseLargePages=" + largePages + >>> 61: (largePages ? ", largePageSize=" + largePageSize : "") + ". Skipping"); >> >> It would be nice to have an actual test for the downgrade. Eg if system supports 1g and 2m and 4k, check that we downgrade to 2m. > > It would be nice indeed. I was looking at a way to get all page sizes available from inside the test: do you know if that is possible? The best solution would be something like runtime/os/HugePageConfiguration, which has functions to read hugepage config directly from the OS. Unfortunately, that is package local. It may be valid to just move your test into that directory (arguably, it has to do with both code cache and huge pages, so it could be placed there too) and use it. Another, simpler, solution would be to scan the huge page configuration information we print at startup. -Xlog:pagesize tells you the page sizes we scan for static huge pages, and then also the page sizes we decide on using (subtly different things). That output you could scan in this test. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14903#discussion_r1273256215 From dnsimon at openjdk.org Tue Jul 25 09:33:41 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 25 Jul 2023 09:33:41 GMT Subject: RFR: 8312579: [JVMCI] JVMCI support for virtual Vector API objects In-Reply-To: References: Message-ID: On Mon, 24 Jul 2023 18:58:02 GMT, Gerg? Barany wrote: > Optimized Vector API values are represented as raw values in SIMD registers. When deoptimizing with such a value in the state, a Java heap object must be recreated. HotSpot has a special "Location::vector" location type to mark Vector API values, and it knows how to materialize such values. > > Extend the JVMCI code installer to mark the appropriate values as vectors so that JVMCI-compiled code can also deoptimize with Vector API values in SIMD registers. Marked as reviewed by dnsimon (Reviewer). src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotCompiledCodeStream.java line 1036: > 1034: } > 1035: > 1036: private boolean isVector(Value vectorValue) { `vectorValue` -> `value` (otherwise the name seems to presume a true return value) ------------- PR Review: https://git.openjdk.org/jdk/pull/15003#pullrequestreview-1545034206 PR Review Comment: https://git.openjdk.org/jdk/pull/15003#discussion_r1273262605 From chagedorn at openjdk.org Tue Jul 25 09:34:43 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 25 Jul 2023 09:34:43 GMT Subject: RFR: 8312440: assert(cast != nullptr) failed: must have added a cast to pin the node In-Reply-To: References: Message-ID: On Mon, 24 Jul 2023 14:26:48 GMT, Roland Westrelin wrote: > The test has a loop nest with 2 loops. A node is sunk out of the inner > loop. A cast is created to pin the node out of loop. The logic added > by 8308103 looks for an existing cast with the same inputs. It finds > one and uses that one. That existing cast was created before the > current round of loop opts. Eventhough it has the right out of inner > loop (but in outer loop) control input, it was assigned a control out > of both loops (the control is legal, it just happens that no use keeps > the cast in the outer loop). Next, the same node is sunk out of the > outer loop. The logic for sinking nodes looks for an input in the > outer loop (otherwise why would the node be in the outer loop) and > find none even though there should be one. The assert fires. The > reason is that the cast input has control out of loop. The fix I > propose is to not use an existing cast if it doesn't have the expected > control. Looks good to me. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14999#pullrequestreview-1545036309 From dfenacci at openjdk.org Tue Jul 25 09:36:40 2023 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 25 Jul 2023 09:36:40 GMT Subject: RFR: JDK-8304954: SegmentedCodeCache fails when using large pages [v4] In-Reply-To: References: Message-ID: On Tue, 25 Jul 2023 09:24:53 GMT, Thomas Stuefe wrote: >> It would be nice indeed. I was looking at a way to get all page sizes available from inside the test: do you know if that is possible? > > The best solution would be something like runtime/os/HugePageConfiguration, which has functions to read hugepage config directly from the OS. Unfortunately, that is package local. It may be valid to just move your test into that directory (arguably, it has to do with both code cache and huge pages, so it could be placed there too) and use it. > > Another, simpler, solution would be to scan the huge page configuration information we print at startup. -Xlog:pagesize tells you the page sizes we scan for static huge pages, and then also the page sizes we decide on using (subtly different things). That output you could scan in this test. Cool, thanks! I think I'll go for the second option. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14903#discussion_r1273268405 From rrich at openjdk.org Tue Jul 25 10:05:40 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 25 Jul 2023 10:05:40 GMT Subject: RFR: 8312495: assert(0 <= i && i < _len) failed: illegal index after JDK-8287061 on big endian platforms [v2] In-Reply-To: References: <51ZON6Vdj1yrENrGYzmMEndXIRvJ1d7Nk72a4ikx3rw=.ee657823-f69f-42ce-886a-25a7f6d71840@github.com> <24Hws5aSMzWWUvGj_p6KqOgL_gFzo8JRPfHAq-YXkQg=.e58f202e-6b90-4956-a2a3-2473bff63b22@github.com> Message-ID: On Mon, 24 Jul 2023 21:57:12 GMT, Vladimir Kozlov wrote: > > > > * differences in the handling of `Location::dbl` and `lng` in `reassign_fields_by_klass` and `reassign_type_array_elements` > > > > > > > > > What difference you talking about? > > So you are asking about difference for `case T_INT: case T_FLOAT:` and not `case T_LONG: case T_DOUBLE:`. Let leave it as it is for now and file RFE to investigate (to factor out and use the same code). Ok. I've created https://bugs.openjdk.org/browse/JDK-8312753 ------------- PR Comment: https://git.openjdk.org/jdk/pull/14976#issuecomment-1649518321 From rrich at openjdk.org Tue Jul 25 10:15:59 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 25 Jul 2023 10:15:59 GMT Subject: RFR: 8312495: assert(0 <= i && i < _len) failed: illegal index after JDK-8287061 on big endian platforms [v3] In-Reply-To: <51ZON6Vdj1yrENrGYzmMEndXIRvJ1d7Nk72a4ikx3rw=.ee657823-f69f-42ce-886a-25a7f6d71840@github.com> References: <51ZON6Vdj1yrENrGYzmMEndXIRvJ1d7Nk72a4ikx3rw=.ee657823-f69f-42ce-886a-25a7f6d71840@github.com> Message-ID: > On big endian platforms `jint` values are stored in the high part of `StackValue` values. Therefore the the `StackValue` cannot be cast directly to `jint`. More details why this has to be like this are given in the JBS issue. > > This is a common pattern. See also > > https://github.com/openjdk/jdk/blob/8d29329138d44800ee4c0c02dacc01a06097de66/src/hotspot/share/runtime/deoptimization.cpp#L1386-L1387 > https://github.com/openjdk/jdk/blob/8d29329138d44800ee4c0c02dacc01a06097de66/src/hotspot/share/runtime/deoptimization.cpp#L1513-L1514 > > ### Testing > Manny iterations of vmTestbase/vm/mlvm/meth/stress/compiler/sequences/Test.java. > > JTReg tests: tier1-4 of hotspot and jdk. All of Langtools and jaxp. Renaissance benchmarks as functional tests. > > All testing was done with fastdebug and release builds on the main platforms and also on Linux/PPC64le. Richard Reingruber has updated the pull request incrementally with one additional commit since the last revision: Refactor byte_array_put ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14976/files - new: https://git.openjdk.org/jdk/pull/14976/files/9be56893..b055148a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14976&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14976&range=01-02 Stats: 8 lines in 1 file changed: 0 ins; 2 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/14976.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14976/head:pull/14976 PR: https://git.openjdk.org/jdk/pull/14976 From gbarany at openjdk.org Tue Jul 25 10:20:55 2023 From: gbarany at openjdk.org (=?UTF-8?B?R2VyZ8O2?= Barany) Date: Tue, 25 Jul 2023 10:20:55 GMT Subject: RFR: 8312579: [JVMCI] JVMCI support for virtual Vector API objects [v2] In-Reply-To: References: Message-ID: On Tue, 25 Jul 2023 09:29:57 GMT, Doug Simon wrote: >> Gerg? Barany has updated the pull request incrementally with two additional commits since the last revision: >> >> - Improve parameter naming >> - Rewrite nested ternary expressions > > src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotCompiledCodeStream.java line 1036: > >> 1034: } >> 1035: >> 1036: private boolean isVector(Value vectorValue) { > > `vectorValue` -> `value` (otherwise the name seems to presume a true return value) done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15003#discussion_r1273317321 From gbarany at openjdk.org Tue Jul 25 10:20:54 2023 From: gbarany at openjdk.org (=?UTF-8?B?R2VyZ8O2?= Barany) Date: Tue, 25 Jul 2023 10:20:54 GMT Subject: RFR: 8312579: [JVMCI] JVMCI support for virtual Vector API objects [v2] In-Reply-To: References: Message-ID: > Optimized Vector API values are represented as raw values in SIMD registers. When deoptimizing with such a value in the state, a Java heap object must be recreated. HotSpot has a special "Location::vector" location type to mark Vector API values, and it knows how to materialize such values. > > Extend the JVMCI code installer to mark the appropriate values as vectors so that JVMCI-compiled code can also deoptimize with Vector API values in SIMD registers. Gerg? Barany has updated the pull request incrementally with two additional commits since the last revision: - Improve parameter naming - Rewrite nested ternary expressions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15003/files - new: https://git.openjdk.org/jdk/pull/15003/files/5f183c27..87f81107 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15003&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15003&range=00-01 Stats: 22 lines in 1 file changed: 18 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/15003.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15003/head:pull/15003 PR: https://git.openjdk.org/jdk/pull/15003 From chagedorn at openjdk.org Tue Jul 25 10:26:44 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 25 Jul 2023 10:26:44 GMT Subject: RFR: 8312596: Potential null pointer access in Compile::TracePhase::~TracePhase after JDK-8311976 In-Reply-To: References: Message-ID: On Mon, 24 Jul 2023 18:16:19 GMT, Ashutosh Mehra wrote: > Please review this PR to fix a potential null pointer access in using `_compile`. > Updated the code to unconditionally initialize `_compile` and added an assert (similar to C1's `PhaseTraceTime` constructor) for it to be non-null. I suggest to remove the word "potential" in the bug title because we can trigger the null pointer with java -XX:+PrintIdealNodeCount --version Maybe you also want to add a Hello World like sanity test with `-XX:+PrintIdealNodeCount`. Otherwise, the fix looks good. src/hotspot/share/opto/compile.cpp line 4340: > 4338: _dolog(CITimeVerbose) > 4339: { > 4340: assert(_compile != nullptr, "sanity check"); I don't think this is ever null but I guess it does not hurt to keep it in. ------------- PR Review: https://git.openjdk.org/jdk/pull/15002#pullrequestreview-1545118482 PR Review Comment: https://git.openjdk.org/jdk/pull/15002#discussion_r1273319688 From aph at openjdk.org Tue Jul 25 11:20:45 2023 From: aph at openjdk.org (Andrew Haley) Date: Tue, 25 Jul 2023 11:20:45 GMT Subject: RFR: 8312502: Mass migrate HotSpot attributes to the correct location [v3] In-Reply-To: References: Message-ID: On Tue, 25 Jul 2023 02:28:02 GMT, Kim Barrett wrote: > Could I split this up and discard some of the areas in which the changes are less agreeable? (as advised by someone who reached out to me privately) I'm a little surprised that you're persisting with this. Kim said: > https://openjdk.org/guide/#things-to-consider-before-proposing-changes-to-openjdk-code > This change looks like a case of pure "Modernizing", which isn't looked on > particularly favorably for its own sake. There generally needs to be some > additional benefits. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14969#issuecomment-1649638229 From dfenacci at openjdk.org Tue Jul 25 12:36:11 2023 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 25 Jul 2023 12:36:11 GMT Subject: RFR: JDK-8304954: SegmentedCodeCache fails when using large pages [v5] In-Reply-To: References: Message-ID: > # Issue > > When large pages are enabled and segmented code cache is used, the VM tries to use one page for each segment. If the amount of reserved code cache is limited, this can make the total size of the code cache bigger than the reserved size, which in turn makes the VM fail, claiming that there is not enough space (e.g. this happens when running `java -XX:+UseLargePages -XX:+SegmentedCodeCache -XX:InitialCodeCacheSize=2g -XX:ReservedCodeCacheSize=2g -XX:LargePageSizeInBytes=1g -Xlog:pagesize*=debug -version`). > This behaviour is not correct as the VM should fall back and try with a smaller page size (and print a warning). > > # Solution > > When reserving heap space for code cache we give a minimum of 8 pages. Since the page size is already calculated right before for segment sizes, it is saved and passed as an argument instead. > > https://github.com/openjdk/jdk/blob/67fbd87378a9b3861f1676977f9f2b36052add29/src/hotspot/share/code/codeCache.cpp#L315 > > Additionally a warning is printed if large pages are enabled and we end up using a smaller page sizes for code caching. > > # Test > > The regression test runs a new VM with `-XX:+UseLargePages -XX:LargePageSizeInBytes=1g`. The `main` method then checks if the two flags have been "taken". If so, another process is started that checks for a specific output, otherwise the test passes (i.e. the current system doesn't allow 1GB large pages) Damon Fenacci has updated the pull request incrementally with two additional commits since the last revision: - JDK-8304954: fix warning message - JDK-8304954: various fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14903/files - new: https://git.openjdk.org/jdk/pull/14903/files/e1b09a9f..f0171bba Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14903&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14903&range=03-04 Stats: 13 lines in 2 files changed: 1 ins; 3 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/14903.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14903/head:pull/14903 PR: https://git.openjdk.org/jdk/pull/14903 From dfenacci at openjdk.org Tue Jul 25 12:36:12 2023 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 25 Jul 2023 12:36:12 GMT Subject: RFR: JDK-8304954: SegmentedCodeCache fails when using large pages [v4] In-Reply-To: References: Message-ID: On Mon, 24 Jul 2023 12:51:35 GMT, Thomas Stuefe wrote: >> Damon Fenacci has updated the pull request incrementally with one additional commit since the last revision: >> >> JDK-8304954: merge ifs checking when to print warning > > src/hotspot/share/code/codeCache.cpp line 321: > >> 319: "Reverting to smaller page size (" SIZE_FORMAT "%s).", >> 320: byte_size_in_exact_unit(LargePageSizeInBytes), exact_unit_for_byte_size(LargePageSizeInBytes), >> 321: byte_size_in_exact_unit(ps), exact_unit_for_byte_size(ps)); > > You could use EXACTFMT and EXACTFMTARGS to shorten this, see globalDefinitions.hpp Changed to make use of PROPERFMT and PROPERFMTARGS. > test/hotspot/jtreg/compiler/codecache/CheckLargePages.java line 27: > >> 25: * @test >> 26: * @bug 8304954 >> 27: * @summary Test checks that if using large pages and code cache gets above the limit it tries to revert to smaller pages instead of failing > > Proposal: "Code cache reservation should gracefully downgrade to using smaller pages if the code cache size is too small to host the requested page size." Changed the summary. Thanks! > test/hotspot/jtreg/compiler/codecache/CheckLargePages.java line 54: > >> 52: "-XX:ReservedCodeCacheSize=2g", >> 53: "-XX:LargePageSizeInBytes=1g", >> 54: "-Xlog:pagesize*=debug", > > If all you scan for is the "Failed to reserve" (please change the text :-), then you should *not* specify Xlog, since its a warning, and we expect to see this warning unconditionally (UL: everything above "log_info" is printed unconditionally if no Log options are present). `-Xlog:pagesize*=debug` removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14903#discussion_r1273459776 PR Review Comment: https://git.openjdk.org/jdk/pull/14903#discussion_r1273459553 PR Review Comment: https://git.openjdk.org/jdk/pull/14903#discussion_r1273459132 From dfenacci at openjdk.org Tue Jul 25 12:36:12 2023 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 25 Jul 2023 12:36:12 GMT Subject: RFR: JDK-8304954: SegmentedCodeCache fails when using large pages [v4] In-Reply-To: <4GUGHqeA1ReJShydJkwCKdg4efE8b-ag-upP1UCK0EY=.46c95995-03ac-4993-a6fd-8c149d04dab9@github.com> References: <4GUGHqeA1ReJShydJkwCKdg4efE8b-ag-upP1UCK0EY=.46c95995-03ac-4993-a6fd-8c149d04dab9@github.com> Message-ID: <5HSHzPQjCTE0eOL7VWc065rWtN1f407oIimWhLBho-w=.71345a63-9bcc-4651-ae9b-e0e0547e0d46@github.com> On Tue, 25 Jul 2023 07:54:41 GMT, Thomas Stuefe wrote: >> I've used the buffer since the message is used in 2 different places: >> https://github.com/openjdk/jdk/blob/e1b09a9fe267d6aa48f4656652411000a4f4d2ee/src/hotspot/share/code/codeCache.cpp#L317-L323 >> (not 100% sure why, but I've noticed that it is how warnings are handled in other places in the code) > > Ah. missed the second warning. You don't need both. log_warning should be enough and mirrors what we usually do in other places. I assume "warning" is a relict from a time before UL. Ok. I've changed it to use only `log_warning` with varargs. Thank you! >> Right, thanks! >> >> There is just this point that it is not completely clear to me: >>> * code cache could, in theory, be satisfied with large pages, but its size is not aligned with large page size. E.g. 2MB pages and CodeCacheSize=101m, would result in code cache using 4KB pages. >> >> Do you mean that with such a configuration (2MB pages and CodeCacheSize=101m) would the code cache use 4KB pages anyway later on? Wouldn't `CodeCache::page_size` return 2MB pages (and possibly align the code cache to 102MB)? (I guess I'm missing something here) > > I may be wrong too. Looking closer, I think it aligns up. So, I think you can forget the first of the two points. Sorry for the confusion. I've changed "reserve" into the more generic "use" in the message. To check for the actual large page size I've used the same `page_size` method giving 1 as minimum number of pages (`default_large_page_size` seems to be specific for linux) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14903#discussion_r1273460205 PR Review Comment: https://git.openjdk.org/jdk/pull/14903#discussion_r1273458530 From dfenacci at openjdk.org Tue Jul 25 12:36:12 2023 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 25 Jul 2023 12:36:12 GMT Subject: RFR: JDK-8304954: SegmentedCodeCache fails when using large pages [v4] In-Reply-To: <8VdpC_umflzJpGbJjz5zr4Hy_ArYVMPoSG8osiKeM4E=.1fdc1258-8521-4beb-84ed-fd495eb9a2a9@github.com> References: <8VdpC_umflzJpGbJjz5zr4Hy_ArYVMPoSG8osiKeM4E=.1fdc1258-8521-4beb-84ed-fd495eb9a2a9@github.com> Message-ID: On Mon, 24 Jul 2023 14:20:52 GMT, Thomas Stuefe wrote: >>> Why not Z? >> >> It seems that ZGC requires large pages of specific given size (2MB. The test fails with *Incompatible -XX:LargePageSizeInBytes, only 2M large pages are supported by ZGC* otherwise). > > Oh right. But this issue is not strictly speaking limited to 1G. It could happen on 2M, too, if you were to allocate a code cache size < 8*2, e.g. 15M. If one gets the JVM to run with that little code cache. > > Up to you, though. I'm fine with leaving Zgc out too. I've limited the test to linux and for the moment I left ZGC out. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14903#discussion_r1273458673 From thartmann at openjdk.org Tue Jul 25 12:57:12 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 25 Jul 2023 12:57:12 GMT Subject: RFR: 8312909: C1 should not inline through interface calls with non-subtype receiver Message-ID: <9srO-AzEErczEhwmXkbjL_21BwV8uF_xoEG1t67x0OI=.37f04313-b6cd-437f-9c17-d8e1a654b155@github.com> This is a problem with C1 compiling an interface call with an invalid receiver (see `TestInvokeinterfaceWithBadReceiverHelper`): ``` ldc String "42"; invokeinterface InterfaceMethod MyInterface.get:"()Ljava/lang/String;", 1; `String` does not implement `MyInterface` but Class Hierarchy Analysis determined that there is only one implementor of MyInterface: class MyClass implements MyInterface { @Stable String field = "42"; public String get() { return field; } } C1 emits a receiver subtype check (that will obviously fail at runtime and trigger an `IncompatibleClassChangeError`) and proceeds with inlining the `MyClass::get` method on the `String` receiver. It then tries to fold stable field load by loading it's value at compile time which asserts/fails because the `String` receiver does not have such a field. The fix is to bail out from inlining when we can statically determine that the receiver subtype check will always fail at runtime. Thanks, Tobias ------------- Commit messages: - Removed trailing whitespace - Updated bug number - C1 should not inline through interface calls with non-subtype receiver Changes: https://git.openjdk.org/jdk/pull/15018/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15018&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8312909 Stats: 109 lines in 3 files changed: 105 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/15018.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15018/head:pull/15018 PR: https://git.openjdk.org/jdk/pull/15018 From dnsimon at openjdk.org Tue Jul 25 13:15:51 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 25 Jul 2023 13:15:51 GMT Subject: RFR: 8312524: [JVMCI] serviceability/dcmd/compiler/CompilerQueueTest.java fails In-Reply-To: <5FDrZ2dpXh882r6ye9CJkICOq1VMFwZ4Y9gK17swnYE=.302db85d-fc8a-4ddd-96fd-def99ba745a4@github.com> References: <5FDrZ2dpXh882r6ye9CJkICOq1VMFwZ4Y9gK17swnYE=.302db85d-fc8a-4ddd-96fd-def99ba745a4@github.com> Message-ID: On Fri, 21 Jul 2023 20:23:31 GMT, Doug Simon wrote: > This PR adds logic to the CompileBroker for implementing `WhiteBox.lockCompilation()` when `UseJVMCICompiler` is true. Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14979#issuecomment-1649820050 From dnsimon at openjdk.org Tue Jul 25 13:15:52 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 25 Jul 2023 13:15:52 GMT Subject: Integrated: 8312524: [JVMCI] serviceability/dcmd/compiler/CompilerQueueTest.java fails In-Reply-To: <5FDrZ2dpXh882r6ye9CJkICOq1VMFwZ4Y9gK17swnYE=.302db85d-fc8a-4ddd-96fd-def99ba745a4@github.com> References: <5FDrZ2dpXh882r6ye9CJkICOq1VMFwZ4Y9gK17swnYE=.302db85d-fc8a-4ddd-96fd-def99ba745a4@github.com> Message-ID: On Fri, 21 Jul 2023 20:23:31 GMT, Doug Simon wrote: > This PR adds logic to the CompileBroker for implementing `WhiteBox.lockCompilation()` when `UseJVMCICompiler` is true. This pull request has now been integrated. Changeset: 9606cbcd Author: Doug Simon URL: https://git.openjdk.org/jdk/commit/9606cbcd2314506d0054ecba1804e5e0c2670cd6 Stats: 18 lines in 1 file changed: 14 ins; 3 del; 1 mod 8312524: [JVMCI] serviceability/dcmd/compiler/CompilerQueueTest.java fails Reviewed-by: never, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/14979 From qamai at openjdk.org Tue Jul 25 15:09:58 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 25 Jul 2023 15:09:58 GMT Subject: RFR: 8312547: Max/Min nodes Value implementation could be improved Message-ID: Hi, This patch removes the early return in `AddNode::Value` in case one of the inputs is a bottom, which may affect the value calculation of nodes such as `Min/MaxNode`. Please kindly review, thanks very much. ------------- Commit messages: - AddNode::Value should not return early - AddNode::Value should not return early Changes: https://git.openjdk.org/jdk/pull/15021/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15021&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8312547 Stats: 39 lines in 2 files changed: 22 ins; 5 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/15021.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15021/head:pull/15021 PR: https://git.openjdk.org/jdk/pull/15021 From coleenp at openjdk.org Tue Jul 25 16:16:57 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 25 Jul 2023 16:16:57 GMT Subject: RFR: 8312979: Fix assembler_aarch64.hpp after JDK-8311847 Message-ID: This passes linux-aarch64-debug. Waiting for GHA to see if it passes windows-aarch64. ------------- Commit messages: - Maybe this is better. - 8312979: Fix assembler_aarch64.hpp after JDK-8311847 Changes: https://git.openjdk.org/jdk/pull/15023/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15023&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8312979 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15023.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15023/head:pull/15023 PR: https://git.openjdk.org/jdk/pull/15023 From duke at openjdk.org Tue Jul 25 17:03:07 2023 From: duke at openjdk.org (Joshua Cao) Date: Tue, 25 Jul 2023 17:03:07 GMT Subject: RFR: 8312420: Integrate Graal's blender micro benchmark [v2] In-Reply-To: <5sj7hpmmChUitKVYH-je8xq3AAA_GkjcFXJl6uGnGQc=.59e78f75-12a8-4b7b-8817-627f65149718@github.com> References: <5sj7hpmmChUitKVYH-je8xq3AAA_GkjcFXJl6uGnGQc=.59e78f75-12a8-4b7b-8817-627f65149718@github.com> Message-ID: <-1W6PRns_akk9mk3yUsfSNQJFSXfljIHYJbmjmAk9SE=.17a2f0e4-d6e3-4d54-9e27-a830b095de8c@github.com> > We would like to integrate Graal's blender micro benchmark from https://www.graalvm.org/22.1/examples/java-performance-examples/. We have been using this benchmark to test our partial escape analysis work (https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2023-July/066670.html). This test can exist independently of the project. > > > example command to run test: > > > make run-test TEST=micro:org.openjdk.bench.vm.compiler.pea.Blender MICRO="FORK=1;OPTIONS=-prof gc -gc true" > > > example output (not complete): > > > Benchmark (iteration) Mode Cnt Score Error Units [29/1913] > Blender.initialize 1 avgt 227997775.000 ns/op > Blender.initialize:?gc.alloc.rate 1 avgt 167.192 MB/sec > Blender.initialize:?gc.alloc.rate.norm 1 avgt 40000081.600 B/op > Blender.initialize:?gc.count 1 avgt 4.000 counts > Blender.initialize:?gc.time 1 avgt 65.000 ms > Blender.initialize 2 avgt 226255767.800 ns/op > Blender.initialize:?gc.alloc.rate 2 avgt 168.466 MB/sec > Blender.initialize:?gc.alloc.rate.norm 2 avgt 40000081.600 B/op > Blender.initialize:?gc.count 2 avgt 4.000 counts > Blender.initialize:?gc.time 2 avgt 58.000 ms > Blender.initialize 3 avgt 225596324.600 ns/op > Blender.initialize:?gc.alloc.rate 3 avgt 168.960 MB/sec > Blender.initialize:?gc.alloc.rate.norm 3 avgt 40000081.600 B/op > Blender.initialize:?gc.count 3 avgt 4.000 counts > Blender.initialize:?gc.time 3 avgt 55.000 ms > Blender.initialize 4 avgt 224856811.000 ns/op > Blender.initialize:?gc.alloc.rate 4 avgt 169.520 MB/sec > Blender.initialize:?gc.alloc.rate.norm 4 avgt 40000081.600 B/op > Blender.initialize:?gc.count 4 avgt 4.000 counts > Blender.initialize:?gc.time ... Joshua Cao has updated the pull request incrementally with one additional commit since the last revision: change Amazon license to Oracle ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14941/files - new: https://git.openjdk.org/jdk/pull/14941/files/d00ebb34..c0bc6b90 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14941&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14941&range=00-01 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14941.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14941/head:pull/14941 PR: https://git.openjdk.org/jdk/pull/14941 From jbhateja at openjdk.org Tue Jul 25 17:58:42 2023 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 25 Jul 2023 17:58:42 GMT Subject: RFR: 8312547: Max/Min nodes Value implementation could be improved In-Reply-To: References: Message-ID: On Tue, 25 Jul 2023 15:02:00 GMT, Quan Anh Mai wrote: > Hi, > > This patch removes the early return in `AddNode::Value` in case one of the inputs is a bottom, which may affect the value calculation of nodes such as `Min/MaxNode`. > > Please kindly review, thanks very much. src/hotspot/share/opto/addnode.cpp line 227: > 225: const Type *bot = bottom_type(); > 226: if( (t1 == bot) || (t2 == bot) || > 227: (t1 == Type::BOTTOM) || (t2 == Type::BOTTOM) ) We can keep (t1 == Type::BOTTOM || t2 == Type::BOTTOM) condition intact. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15021#discussion_r1273884477 From coleenp at openjdk.org Tue Jul 25 18:18:03 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 25 Jul 2023 18:18:03 GMT Subject: RFR: 8312979: Fix assembler_aarch64.hpp after JDK-8311847 [v2] In-Reply-To: References: Message-ID: > This passes linux-aarch64-debug. Waiting for GHA to see if it passes windows-aarch64. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Maybe this is better. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15023/files - new: https://git.openjdk.org/jdk/pull/15023/files/0d2d8b90..b1be335a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15023&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15023&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/15023.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15023/head:pull/15023 PR: https://git.openjdk.org/jdk/pull/15023 From qamai at openjdk.org Tue Jul 25 18:38:06 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 25 Jul 2023 18:38:06 GMT Subject: RFR: 8312547: Max/Min nodes Value implementation could be improved [v2] In-Reply-To: References: Message-ID: <7i5B-9hTl8oTKGpdMEiCsKEWf8a0M1HHpOZUsLYXrPI=.29dc1ce8-9666-4aa7-b63b-36610026c53a@github.com> > Hi, > > This patch removes the early return in `AddNode::Value` in case one of the inputs is a bottom, which may affect the value calculation of nodes such as `Min/MaxNode`. > > Please kindly review, thanks very much. Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: fix min/maxfp nodes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/15021/files - new: https://git.openjdk.org/jdk/pull/15021/files/312088f7..ef2d3dfb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=15021&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=15021&range=00-01 Stats: 24 lines in 1 file changed: 12 ins; 0 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/15021.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15021/head:pull/15021 PR: https://git.openjdk.org/jdk/pull/15021 From qamai at openjdk.org Tue Jul 25 18:43:51 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 25 Jul 2023 18:43:51 GMT Subject: RFR: 8312547: Max/Min nodes Value implementation could be improved [v2] In-Reply-To: References: Message-ID: On Tue, 25 Jul 2023 17:39:42 GMT, Jatin Bhateja wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> fix min/maxfp nodes > > src/hotspot/share/opto/addnode.cpp line 227: > >> 225: const Type *bot = bottom_type(); >> 226: if( (t1 == bot) || (t2 == bot) || >> 227: (t1 == Type::BOTTOM) || (t2 == Type::BOTTOM) ) > > We can keep (t1 == Type::BOTTOM || t2 == Type::BOTTOM) condition intact. I don't think `Type::BOTTOM` can appear here, there are other nodes (such as convert nodes) that do not check for `Type::BOTTOM`. Maybe someone could shed light on this, please? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/15021#discussion_r1273943376 From dlong at openjdk.org Tue Jul 25 19:00:43 2023 From: dlong at openjdk.org (Dean Long) Date: Tue, 25 Jul 2023 19:00:43 GMT Subject: RFR: 8312596: Potential null pointer access in Compile::TracePhase::~TracePhase after JDK-8311976 In-Reply-To: References: Message-ID: On Mon, 24 Jul 2023 18:16:19 GMT, Ashutosh Mehra wrote: > Please review this PR to fix a potential null pointer access in using `_compile`. > Updated the code to unconditionally initialize `_compile` and added an assert (similar to C1's `PhaseTraceTime` constructor) for it to be non-null. Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/15002#pullrequestreview-1546174578 From vkempik at openjdk.org Tue Jul 25 19:07:52 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Tue, 25 Jul 2023 19:07:52 GMT Subject: RFR: 8312547: Max/Min nodes Value implementation could be improved [v2] In-Reply-To: <7i5B-9hTl8oTKGpdMEiCsKEWf8a0M1HHpOZUsLYXrPI=.29dc1ce8-9666-4aa7-b63b-36610026c53a@github.com> References: <7i5B-9hTl8oTKGpdMEiCsKEWf8a0M1HHpOZUsLYXrPI=.29dc1ce8-9666-4aa7-b63b-36610026c53a@github.com> Message-ID: <8iTw-PUlWlSaCAUBWqCyEOpnD2aB8Y4B6LApLCdbnrs=.691f4f9f-2fc9-48da-aeb2-3659663f5436@github.com> On Tue, 25 Jul 2023 18:38:06 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch removes the early return in `AddNode::Value` in case one of the inputs is a bottom, which may affect the value calculation of nodes such as `Min/MaxNode`. >> >> Please kindly review, thanks very much. > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > fix min/maxfp nodes could you please check your change with jmh test ( numbers before and after the patch) : org/openjdk/bench/vm/compiler/MaxMinOptimizeTest.java just want to make sure it doesn't regress ------------- PR Comment: https://git.openjdk.org/jdk/pull/15021#issuecomment-1650380796 From duke at openjdk.org Tue Jul 25 19:25:12 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 25 Jul 2023 19:25:12 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v11] In-Reply-To: References: Message-ID: > The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. > > This PR shows upto ~13x improvement for 32-bit datatypes (int, float) and upto 8x improvement for 64-bit datatypes (long, double) as shown in the performance data below. > > **Arrays.sort performance data using JMH benchmarks** > > | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | > | --- | --- | --- | --- | --- | > | ArraysSort.doubleSort | 100 | 0.639 | 0.217 | 2.9x | > | ArraysSort.doubleSort | 1000 | 8.707 | 3.421 | 2.5x | > | ArraysSort.doubleSort | 10000 | 349.267 | 43.56 | **8.0x** | > | ArraysSort.doubleSort | 100000 | 4721.17 | 579.819 | **8.1x** | > | ArraysSort.floatSort | 100 | 0.722 | 0.129 | 5.6x | > | ArraysSort.floatSort | 1000 | 9.1 | 2.356 | 3.9x | > | ArraysSort.floatSort | 10000 | 336.472 | 26.706 | **12.6x** | > | ArraysSort.floatSort | 100000 | 4804.716 | 427.397 | **11.2x** | > | ArraysSort.intSort | 100 | 0.61 | 0.111 | 5.5x | > | ArraysSort.intSort | 1000 | 8.534 | 2.025 | 4.2x | > | ArraysSort.intSort | 10000 | 310.97 | 24.082 | **12.9x** | > | ArraysSort.intSort | 100000 | 4484.94 | 381.01 | **11.8x** | > | ArraysSort.longSort | 100 | 0.636 | 0.28 | 2.3x | > | ArraysSort.longSort | 1000 | 8.646 | 4.425 | 2.0x | > | ArraysSort.longSort | 10000 | 322.116 | 53.094 | **6.1x** | > | ArraysSort.longSort | 100000 | 4448.171 | 696.773 | **6.4x** | Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: change API to enable MemorySegment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14227/files - new: https://git.openjdk.org/jdk/pull/14227/files/2bd04191..e09c0501 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=09-10 Stats: 117 lines in 6 files changed: 64 ins; 11 del; 42 mod Patch: https://git.openjdk.org/jdk/pull/14227.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14227/head:pull/14227 PR: https://git.openjdk.org/jdk/pull/14227 From duke at openjdk.org Tue Jul 25 19:50:23 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 25 Jul 2023 19:50:23 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v12] In-Reply-To: References: Message-ID: > The goal is to develop faster sort routines for x86_64 CPUs by taking advantage of AVX512 instructions. This enhancement provides an order of magnitude speedup for Arrays.sort() using int, long, float and double arrays. > > This PR shows upto ~13x improvement for 32-bit datatypes (int, float) and upto 8x improvement for 64-bit datatypes (long, double) as shown in the performance data below. > > **Arrays.sort performance data using JMH benchmarks** > > | Arrays.sort benchmark | Array Size | Baseline (us/op) | AVX512 Sort (us/op) | Speedup | > | --- | --- | --- | --- | --- | > | ArraysSort.doubleSort | 100 | 0.639 | 0.217 | 2.9x | > | ArraysSort.doubleSort | 1000 | 8.707 | 3.421 | 2.5x | > | ArraysSort.doubleSort | 10000 | 349.267 | 43.56 | **8.0x** | > | ArraysSort.doubleSort | 100000 | 4721.17 | 579.819 | **8.1x** | > | ArraysSort.floatSort | 100 | 0.722 | 0.129 | 5.6x | > | ArraysSort.floatSort | 1000 | 9.1 | 2.356 | 3.9x | > | ArraysSort.floatSort | 10000 | 336.472 | 26.706 | **12.6x** | > | ArraysSort.floatSort | 100000 | 4804.716 | 427.397 | **11.2x** | > | ArraysSort.intSort | 100 | 0.61 | 0.111 | 5.5x | > | ArraysSort.intSort | 1000 | 8.534 | 2.025 | 4.2x | > | ArraysSort.intSort | 10000 | 310.97 | 24.082 | **12.9x** | > | ArraysSort.intSort | 100000 | 4484.94 | 381.01 | **11.8x** | > | ArraysSort.longSort | 100 | 0.636 | 0.28 | 2.3x | > | ArraysSort.longSort | 1000 | 8.646 | 4.425 | 2.0x | > | ArraysSort.longSort | 10000 | 322.116 | 53.094 | **6.1x** | > | ArraysSort.longSort | 100000 | 4448.171 | 696.773 | **6.4x** | Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: update arraySort docstring ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14227/files - new: https://git.openjdk.org/jdk/pull/14227/files/e09c0501..5eac7b32 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14227&range=10-11 Stats: 5 lines in 1 file changed: 2 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/14227.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14227/head:pull/14227 PR: https://git.openjdk.org/jdk/pull/14227 From jvernee at openjdk.org Tue Jul 25 20:18:10 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 25 Jul 2023 20:18:10 GMT Subject: RFR: 8313023: Return value corrupted when using CCS + isTrivial (mainline) Message-ID: Port of: https://github.com/openjdk/panama-foreign/pull/848 from the panama-foreign repo. Copying the PR body here for convenience: Due to a bug in the downcall linker stub generation, we don't save the return value when capturing call state for trivial functions, and the return value gets corrupted. We try not to save the return register around calls on the return path of a downcall stub, if it is not needed. Currently we don't save the return register when we're using a return buffer, since we write the return value to the return buffer before the calls on the return path, which means it is safe for those calls to overwrite the return register. But, the current logic also says we don't need to save the return register if the function is trivial (_needs_transition == false). The logic behind this was initially that, since we don't have any calls on the return path, we don't need to save the return register. But, after adding support for capturing call state, we now also have a call on the return path for trivial functions that capture call state, and around that call, we might need to save the return register. The fix is to simply save the return register when capturing call state, regardless of whether the function is trivial or not. In the case of just a trivial function that doesn't capture call state, we still don't save the return register around the return path calls for the thread state transition (which is not needed), since we don't generate those thread state transitions in the first first place. Testing: jdk-tier1, jdk-tier2, jdk-tier5. ------------- Commit messages: - 8312473: Return value corrupted when using CCS + isTrivial Changes: https://git.openjdk.org/jdk/pull/15025/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=15025&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8313023 Stats: 39 lines in 5 files changed: 12 ins; 0 del; 27 mod Patch: https://git.openjdk.org/jdk/pull/15025.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/15025/head:pull/15025 PR: https://git.openjdk.org/jdk/pull/15025 From duke at openjdk.org Tue Jul 25 20:30:31 2023 From: duke at openjdk.org (Srinivas Vamsi Parasa) Date: Tue, 25 Jul 2023 20:30:31 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v13] In-Reply-To: References: