From jbhateja at openjdk.org Sun Dec 1 08:51:31 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sun, 1 Dec 2024 08:51:31 GMT Subject: RFR: 8342677: Add IR validation tests for newly added saturated vector add / sub operations [v2] In-Reply-To: References: Message-ID: > This is a follow up PR to https://github.com/openjdk/jdk/pull/20507 > It adds IR validation tests for newly added saturated vector add / sub operations. Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21603/files - new: https://git.openjdk.org/jdk/pull/21603/files/400ffe45..61ce8866 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21603&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21603&range=00-01 Stats: 137 lines in 2 files changed: 84 ins; 11 del; 42 mod Patch: https://git.openjdk.org/jdk/pull/21603.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21603/head:pull/21603 PR: https://git.openjdk.org/jdk/pull/21603 From jbhateja at openjdk.org Sun Dec 1 08:51:33 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sun, 1 Dec 2024 08:51:33 GMT Subject: RFR: 8342677: Add IR validation tests for newly added saturated vector add / sub operations [v2] In-Reply-To: References: Message-ID: <2GixHUcKrIVrm0P3mrtWGLUZbp1XY-9h-fBMLzrnl_w=.7b86b39c-cb31-498d-bc47-490fd47a1607@github.com> On Fri, 29 Nov 2024 06:43:08 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review resolutions > > test/hotspot/jtreg/compiler/vectorapi/VectorSaturatedOperationsTest.java line 97: > >> 95: short_in2[i] = (short)-i; >> 96: byte_in1[i] = Byte.MIN_VALUE; >> 97: byte_in2[i] = (byte)-i; > > Are these values sufficient? > > Example with `SADD_VL`: > if the itt elements are `max_value` and `min_value` -> no overflow -> `-1` > if the ith elements are `-i` and `i` -> no overflow -> `0` > > So it seems we are actually not testing the saturation here, am I correct? I am now randomizing input, and explicitly setting delimiting test inputs. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21603#discussion_r1864786273 From dnsimon at openjdk.org Mon Dec 2 07:14:15 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 2 Dec 2024 07:14:15 GMT Subject: RFR: 8345267: Fix memory leak in JVMCIEnv dtor Message-ID: The `ALLOW_C_FUNCTION` macro takes the identifier for the relevant C function, followed by a statement containing the use as additional (variadic) macro args. This PR fixes a use of this macro where the leading identifier arg was being omitted. ------------- Commit messages: - fix use of ALLOW_C_FUNCTION Changes: https://git.openjdk.org/jdk/pull/22471/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22471&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8345267 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22471.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22471/head:pull/22471 PR: https://git.openjdk.org/jdk/pull/22471 From simonis at openjdk.org Mon Dec 2 08:02:36 2024 From: simonis at openjdk.org (Volker Simonis) Date: Mon, 2 Dec 2024 08:02:36 GMT Subject: RFR: 8345267: Fix memory leak in JVMCIEnv dtor In-Reply-To: References: Message-ID: On Mon, 2 Dec 2024 07:08:22 GMT, Doug Simon wrote: > The `ALLOW_C_FUNCTION` macro takes the identifier for the relevant C function, followed by a statement containing the use as additional (variadic) macro args. This PR fixes a use of this macro where the leading identifier arg was being omitted. Looks good. Have you checked (maybe with a simple grep command) if we don't have other instances of this issue? ------------- Marked as reviewed by simonis (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22471#pullrequestreview-2471983004 From kbarrett at openjdk.org Mon Dec 2 08:11:37 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 2 Dec 2024 08:11:37 GMT Subject: RFR: 8345267: Fix memory leak in JVMCIEnv dtor In-Reply-To: References: Message-ID: On Mon, 2 Dec 2024 07:08:22 GMT, Doug Simon wrote: > The `ALLOW_C_FUNCTION` macro takes the identifier for the relevant C function, followed by a statement containing the use as additional (variadic) macro args. This PR fixes a use of this macro where the leading identifier arg was being omitted. Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22471#pullrequestreview-2471996944 From kbarrett at openjdk.org Mon Dec 2 08:11:38 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 2 Dec 2024 08:11:38 GMT Subject: RFR: 8345267: Fix memory leak in JVMCIEnv dtor In-Reply-To: References: Message-ID: On Mon, 2 Dec 2024 07:59:53 GMT, Volker Simonis wrote: > Looks good. Have you checked (maybe with a simple grep command) if we don't have other instances of this issue? I've just recently looked at all of the uses of that macro for other reasons, and this was the only one I found like this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22471#issuecomment-2510838072 From dnsimon at openjdk.org Mon Dec 2 08:39:19 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 2 Dec 2024 08:39:19 GMT Subject: RFR: 8345267: Fix memory leak in JVMCIEnv dtor [v2] In-Reply-To: References: Message-ID: > The `ALLOW_C_FUNCTION` macro takes the identifier for the relevant C function, followed by a statement containing the use as additional (variadic) macro args. This PR fixes a use of this macro where the leading identifier arg was being omitted. Doug Simon has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: fix use of ALLOW_C_FUNCTION ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22471/files - new: https://git.openjdk.org/jdk/pull/22471/files/c789f049..35c0c3de Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22471&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22471&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22471.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22471/head:pull/22471 PR: https://git.openjdk.org/jdk/pull/22471 From simonis at openjdk.org Mon Dec 2 08:43:41 2024 From: simonis at openjdk.org (Volker Simonis) Date: Mon, 2 Dec 2024 08:43:41 GMT Subject: RFR: 8345267: Fix memory leak in JVMCIEnv dtor [v2] In-Reply-To: References: Message-ID: On Mon, 2 Dec 2024 08:39:19 GMT, Doug Simon wrote: >> The `ALLOW_C_FUNCTION` macro takes the identifier for the relevant C function, followed by a statement containing the use as additional (variadic) macro args. This PR fixes a use of this macro where the leading identifier arg was being omitted. > > Doug Simon has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > fix use of ALLOW_C_FUNCTION Marked as reviewed by simonis (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22471#pullrequestreview-2472062339 From duke at openjdk.org Mon Dec 2 08:46:45 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Mon, 2 Dec 2024 08:46:45 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v15] In-Reply-To: References: Message-ID: On Wed, 20 Nov 2024 11:35:59 GMT, Tobias Hartmann wrote: >> theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix test > > Looks good to me otherwise. Nice tests! @TobiHartmann @eme64 Would you like to take a look at the latest changes again? I think this PR is ready for integration. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22061#issuecomment-2510903464 From roland at openjdk.org Mon Dec 2 09:08:55 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 2 Dec 2024 09:08:55 GMT Subject: RFR: 8345287: C2: live in computation is broken Message-ID: 8234003 (Improve IndexSet iteration) broke live in computation: @@ -273,23 +276,25 @@ void PhaseLive::add_liveout( Block *p, IndexSet *lo, VectorSet &first_pass ) { // Add a vector of live-in values to a given blocks live-in set. void PhaseLive::add_livein(Block *p, IndexSet *lo) { IndexSet *livein = &_livein[p->_pre_order-1]; - IndexSetIterator elements(lo); - uint r; - while ((r = elements.next()) != 0) { - livein->insert(r); // Then add to live-in set + if (!livein->is_empty()) { + IndexSetIterator elements(lo); + uint r; + while ((r = elements.next()) != 0) { + livein->insert(r); // Then add to live-in set + } } } `livein` is initially empy and the patch above only adds element to it if: if (!livein->is_empty()) { which is never true. This doesn't affect correctness as live in sets are only used to drive scheduling. ------------- Commit messages: - fix Changes: https://git.openjdk.org/jdk/pull/22473/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22473&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8345287 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22473.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22473/head:pull/22473 PR: https://git.openjdk.org/jdk/pull/22473 From rcastanedalo at openjdk.org Mon Dec 2 09:11:40 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 2 Dec 2024 09:11:40 GMT Subject: RFR: 8345041: IGV: Free Placement Mode in IGV Layout [v8] In-Reply-To: <1Y-P6ytzuAaNLtzMlmrh7_b4o-Dr6wqpN0mxDkBiFgs=.38c8cb92-1ac6-4db3-b0ba-34b5f8f70cca@github.com> References: <1Y-P6ytzuAaNLtzMlmrh7_b4o-Dr6wqpN0mxDkBiFgs=.38c8cb92-1ac6-4db3-b0ba-34b5f8f70cca@github.com> Message-ID: <_c3LQsUW3SW35mulC7rtDejXFGV5aXB67TpYKNuvXOs=.0dad577d-f7d2-41d2-864d-af5f22c860f1@github.com> On Sat, 30 Nov 2024 00:15:18 GMT, Tobias Holenstein wrote: >> This PR depends on https://github.com/openjdk/jdk/pull/22430. To check out this PR locally: >> >> git fetch https://git.openjdk.org/jdk.git pull/22438/head:pull/22438 >> git checkout pull/22438 >> >> >> Introduce a Free Placement Mode to IGV, allowing users to position nodes freely without being limited to the hierarchical layout constraints. >> >> In this mode, users can manually drag and place nodes anywhere within the space, giving them complete control over the visual arrangement of the graph. Connections between nodes will be rendered as straight (or S curved) lines, without recalculating or enforcing hierarchical constraints. >> >> This feature is ideal for users who need a flexible, non-restrictive way to organize and visualize complex graph structures in a customized and intuitive manner. The free placement of nodes will remain persistent until the layout is reset or another layout mode is selected. >> >> ![free](https://github.com/user-attachments/assets/c150334e-4d9f-4abf-97ea-3cb42bd1c602) > > Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: > > make applyForceBasedAdjustment more numerical stable I re-tested and re-ran layout drawing performance tests, all good. The feature would be even more useful, in my opinion, if the hand-made layouts would persist across graphs in a group and even across sessions (e.g. serializing the node coordinates as properties, similarly to [JDK-8345039](https://bugs.openjdk.org/browse/JDK-8345039)), but this could be done separately. ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22438#pullrequestreview-2472128391 From rcastanedalo at openjdk.org Mon Dec 2 09:29:37 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 2 Dec 2024 09:29:37 GMT Subject: RFR: 8345287: C2: live in computation is broken In-Reply-To: References: Message-ID: On Mon, 2 Dec 2024 09:03:50 GMT, Roland Westrelin wrote: > 8234003 (Improve IndexSet iteration) broke live in computation: > > > @@ -273,23 +276,25 @@ void PhaseLive::add_liveout( Block *p, IndexSet *lo, VectorSet &first_pass ) { > // Add a vector of live-in values to a given blocks live-in set. > void PhaseLive::add_livein(Block *p, IndexSet *lo) { > IndexSet *livein = &_livein[p->_pre_order-1]; > - IndexSetIterator elements(lo); > - uint r; > - while ((r = elements.next()) != 0) { > - livein->insert(r); // Then add to live-in set > + if (!livein->is_empty()) { > + IndexSetIterator elements(lo); > + uint r; > + while ((r = elements.next()) != 0) { > + livein->insert(r); // Then add to live-in set > + } > } > } > > > `livein` is initially empy and the patch above only adds element to it if: > > > if (!livein->is_empty()) { > > > which is never true. > > This doesn't affect correctness as live in sets are only used to drive > scheduling. Good catch! Do you have an example where the final schedule is affected by this issue? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22473#issuecomment-2510998547 From epeter at openjdk.org Mon Dec 2 09:43:38 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 2 Dec 2024 09:43:38 GMT Subject: RFR: 8342677: Add IR validation tests for newly added saturated vector add / sub operations [v2] In-Reply-To: References: Message-ID: On Sun, 1 Dec 2024 08:51:31 GMT, Jatin Bhateja wrote: >> This is a follow up PR to https://github.com/openjdk/jdk/pull/20507 >> It adds IR validation tests for newly added saturated vector add / sub operations. > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review resolutions Thanks for the changes @jatin-bhateja ! Looks good to me :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21603#pullrequestreview-2472203690 From epeter at openjdk.org Mon Dec 2 10:10:45 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 2 Dec 2024 10:10:45 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v15] In-Reply-To: References: Message-ID: On Fri, 29 Nov 2024 11:53:59 GMT, theoweidmannoracle wrote: >> This PR introduces >> - several new optimizations to unsigned division and modulo >> - x % 1, x % x, x % 2^k >> - x / 1, x / x, x / 2^k >> - does not implement the Granlund and Montgomery algorithm, which has been implemented for signed modulo division in the past. It is unclear if a lot is to be gained by implementing this. >> - tests to test existing optimizations for signed division and modulo >> - does not test the Granlund and Montgomery algorithm directly > > theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: > > Fix test Nice work! I have a few more comments. src/hotspot/share/opto/divnode.cpp line 461: > 459: Node* make_and(Node* a, Node* b) { > 460: return new AndINode(a, b); > 461: } Does this not belong in `addnode.hpp`? If I searched for something like this, I would look there. Is there no way you could use `AddNode::make(in1, in2, bt)`, where `bt` is either `T_INT` or `T_LONG`? src/hotspot/share/opto/divnode.cpp line 474: > 472: Node* make_urshift(Node* a, Node* b) { > 473: return new URShiftINode(a, b); > 474: } There is a similar `LShiftNode::make(Node* in1, Node* in2, BasicType bt)` method... it might be better to follow that code pattern. src/hotspot/share/opto/divnode.cpp line 477: > 475: > 476: template > 477: Node* unsigned_div_ideal(PhaseGVN* phase, bool can_reshape, Node* div) { And instead of passing the `TypeClass` here, you could pass the `bt`. src/hotspot/share/opto/divnode.cpp line 943: > 941: //------------------------------Idealize--------------------------------------- > 942: Node *UDivINode::Ideal(PhaseGVN *phase, bool can_reshape) { > 943: return unsigned_div_ideal(phase, can_reshape, this); And here you could pass `bt = T_INT`, instead of the template. src/hotspot/share/opto/divnode.cpp line 1156: > 1154: } > 1155: // Don't bother trying to transform a dead node > 1156: if (mod->in(0) && mod->in(0)->is_top()) { Suggestion: if (mod->in(0) != nullptr && mod->in(0)->is_top()) { https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md `Do not use ints or pointers as (implicit) booleans with &&, ||, if, while. Instead, compare explicitly, i.e. if (x != 0) or if (ptr != nullptr), etc.` src/hotspot/share/opto/type.hpp line 2180: > 2178: inline const TypeLong* Type::cast() const { > 2179: return is_long(); > 2180: } I think we usually handle this with `const TypeInteger* isa_integer(BasicType bt)`. You could also have a look at `jlong get_con_as_long(BasicType bt) const`, may be useful as well in this PR. test/hotspot/jtreg/compiler/c2/irTests/ModINodeIdealizationTests.java line 43: > 41: > 42: public static void main(String[] args) { > 43: TestFramework.runWithFlags("-XX:CompileCommand=inline,*Math::max"); Is this really necessary? Should this not happen automatically? test/hotspot/jtreg/compiler/c2/irTests/ModLNodeIdealizationTests.java line 72: > 70: > 71: Asserts.assertEQ(a % 8589934592L, powerOf2(a)); > 72: Asserts.assertEQ(a % 8589934591L, powerOf2Minus1(a)); I think we generally prefer to use `1L << 33`, it is easier to read and know what the constant means. test/hotspot/jtreg/compiler/c2/irTests/UDivLNodeIdealizationTests.java line 120: > 118: // Checks x / (c / c) => x > 119: public long identityAgainButBig(long x) { > 120: return Long.divideUnsigned(x, Long.divideUnsigned(-9214294468834361176L, -9214294468834361176L)); // Long.parseUnsignedLong("9232449604875190440") = -9214294468834361176L What is the comment for here? ------------- PR Review: https://git.openjdk.org/jdk/pull/22061#pullrequestreview-2472212748 PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1865536747 PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1865541848 PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1865542712 PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1865543843 PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1865568634 PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1865550446 PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1865552790 PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1865562324 PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1865563534 From epeter at openjdk.org Mon Dec 2 10:10:45 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 2 Dec 2024 10:10:45 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v15] In-Reply-To: References: Message-ID: On Mon, 2 Dec 2024 10:05:18 GMT, Emanuel Peter wrote: >> theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix test > > src/hotspot/share/opto/divnode.cpp line 1156: > >> 1154: } >> 1155: // Don't bother trying to transform a dead node >> 1156: if (mod->in(0) && mod->in(0)->is_top()) { > > Suggestion: > > if (mod->in(0) != nullptr && mod->in(0)->is_top()) { > > > https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md > `Do not use ints or pointers as (implicit) booleans with &&, ||, if, while. Instead, compare explicitly, i.e. if (x != 0) or if (ptr != nullptr), etc.` There are more instances in the code. I know you copied it, but when we touch code we make sure to adhere to the new rules ;) > test/hotspot/jtreg/compiler/c2/irTests/ModINodeIdealizationTests.java line 43: > >> 41: >> 42: public static void main(String[] args) { >> 43: TestFramework.runWithFlags("-XX:CompileCommand=inline,*Math::max"); > > Is this really necessary? Should this not happen automatically? If you do need it: can you add a comment why? > test/hotspot/jtreg/compiler/c2/irTests/UDivLNodeIdealizationTests.java line 120: > >> 118: // Checks x / (c / c) => x >> 119: public long identityAgainButBig(long x) { >> 120: return Long.divideUnsigned(x, Long.divideUnsigned(-9214294468834361176L, -9214294468834361176L)); // Long.parseUnsignedLong("9232449604875190440") = -9214294468834361176L > > What is the comment for here? Is this `Long.MIN_VALUE`? Can you explain more in the comment, and move it on a separate line? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1865569534 PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1865553451 PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1865565224 From dfenacci at openjdk.org Mon Dec 2 10:11:38 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 2 Dec 2024 10:11:38 GMT Subject: RFR: 8345287: C2: live in computation is broken In-Reply-To: References: Message-ID: On Mon, 2 Dec 2024 09:03:50 GMT, Roland Westrelin wrote: > 8234003 (Improve IndexSet iteration) broke live in computation: > > > @@ -273,23 +276,25 @@ void PhaseLive::add_liveout( Block *p, IndexSet *lo, VectorSet &first_pass ) { > // Add a vector of live-in values to a given blocks live-in set. > void PhaseLive::add_livein(Block *p, IndexSet *lo) { > IndexSet *livein = &_livein[p->_pre_order-1]; > - IndexSetIterator elements(lo); > - uint r; > - while ((r = elements.next()) != 0) { > - livein->insert(r); // Then add to live-in set > + if (!livein->is_empty()) { > + IndexSetIterator elements(lo); > + uint r; > + while ((r = elements.next()) != 0) { > + livein->insert(r); // Then add to live-in set > + } > } > } > > > `livein` is initially empy and the patch above only adds element to it if: > > > if (!livein->is_empty()) { > > > which is never true. > > This doesn't affect correctness as live in sets are only used to drive > scheduling. Well spotted! I've just got one doubt: do we actually need that `if (!lo->is_empty())` at all? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22473#issuecomment-2511098310 From epeter at openjdk.org Mon Dec 2 10:24:42 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 2 Dec 2024 10:24:42 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v15] In-Reply-To: References: Message-ID: On Wed, 20 Nov 2024 11:35:59 GMT, Tobias Hartmann wrote: >> theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix test > > Looks good to me otherwise. Nice tests! Ok. I'm not sure about the Template vs BasicType. Generally BasicType is what we have used so far, but I would say Templates are cleaner. Discussed it with @TobiHartmann , and he thinks Templates are fine too. But you need to move the `make_...` to the corresponding file, so others can find it. And I think the constants in the tests should be given as `1L << 42`, and not some long unreadable chain of digits ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/22061#issuecomment-2511129198 From roland at openjdk.org Mon Dec 2 12:17:37 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 2 Dec 2024 12:17:37 GMT Subject: RFR: 8345287: C2: live in computation is broken In-Reply-To: References: Message-ID: <66KJ0kayCyRztG81OvFyladp1BbDq5nW810HKEZPmlk=.51449889-9e47-408e-9751-470e1f912113@github.com> On Mon, 2 Dec 2024 09:26:53 GMT, Roberto Casta?eda Lozano wrote: > Good catch! Do you have an example where the final schedule is affected by this issue? I don't. I noticed that live in sets are always empty while looking at memory usage of some `IndexSet`. It is puzzling that this didn't cause any performance regression. So it may be worth exploring why. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22473#issuecomment-2511377837 From roland at openjdk.org Mon Dec 2 12:22:36 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 2 Dec 2024 12:22:36 GMT Subject: RFR: 8345287: C2: live in computation is broken In-Reply-To: References: Message-ID: On Mon, 2 Dec 2024 10:08:30 GMT, Damon Fenacci wrote: > Well spotted! I've just got one doubt: do we actually need that `if (!lo->is_empty())` at all? It's a performance optimization from https://bugs.openjdk.org/browse/JDK-8234003 Otherwise the iterator code will look for the next element in the set which is done by iterating over an array that can be large even when the set is empty. 8234003 introduced the bug here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22473#issuecomment-2511388535 From rcastanedalo at openjdk.org Mon Dec 2 12:43:37 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 2 Dec 2024 12:43:37 GMT Subject: RFR: 8345287: C2: live in computation is broken In-Reply-To: <66KJ0kayCyRztG81OvFyladp1BbDq5nW810HKEZPmlk=.51449889-9e47-408e-9751-470e1f912113@github.com> References: <66KJ0kayCyRztG81OvFyladp1BbDq5nW810HKEZPmlk=.51449889-9e47-408e-9751-470e1f912113@github.com> Message-ID: On Mon, 2 Dec 2024 12:14:47 GMT, Roland Westrelin wrote: > > Good catch! Do you have an example where the final schedule is affected by this issue? > > I don't. I noticed that live in sets are always empty while looking at memory usage of some `IndexSet`. It is puzzling that this didn't cause any performance regression. So it may be worth exploring why. OK, thanks. I will start by running some benchmarks on x64 with and without this fix. Will report results in a couple of days. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22473#issuecomment-2511437094 From duke at openjdk.org Mon Dec 2 13:03:47 2024 From: duke at openjdk.org (Daniel Skantz) Date: Mon, 2 Dec 2024 13:03:47 GMT Subject: RFR: 8345158: IGV: local scheduling should not place successors before predecessors Message-ID: <0VUX5bX4oDw7aUoQ8KWtsG6PLMfM5bH3ge7ZYY8Gz1Y=.305aa13d-f638-4645-b956-a7209266358f@github.com> A slight tweak to consider def-use in scheduleBlock. Testing: opened a few graphs in control-flow graph mode without any assertion failure. ------------- Commit messages: - Merge branch 'master' of github.com:openjdk/jdk into igv - def-use local schedule fix Changes: https://git.openjdk.org/jdk/pull/22481/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22481&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8345158 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/22481.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22481/head:pull/22481 PR: https://git.openjdk.org/jdk/pull/22481 From duke at openjdk.org Mon Dec 2 13:13:12 2024 From: duke at openjdk.org (Daniel Skantz) Date: Mon, 2 Dec 2024 13:13:12 GMT Subject: RFR: 8345156: C2: Add bailouts next to a few asserts Message-ID: This patch associates product bailouts with a few existing debug asserts. The criteria are that there have been product bugs associated with failing these asserts in the past, and that there are not too many callers to the method where the compilation is now cancelled or any measurable impact on compilation time. Testing: T1-T4 and extra testing. Tested compilation time with -XX:+CITime on performance benchmarks and the effect was not measurable. ------------- Commit messages: - Merge branch 'master' of github.com:openjdk/jdk into cancel-more-compilations - add new bailouts Changes: https://git.openjdk.org/jdk/pull/22482/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22482&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8345156 Stats: 68 lines in 8 files changed: 56 ins; 5 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/22482.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22482/head:pull/22482 PR: https://git.openjdk.org/jdk/pull/22482 From dfenacci at openjdk.org Mon Dec 2 13:22:38 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Mon, 2 Dec 2024 13:22:38 GMT Subject: RFR: 8345287: C2: live in computation is broken In-Reply-To: References: Message-ID: On Mon, 2 Dec 2024 09:03:50 GMT, Roland Westrelin wrote: > 8234003 (Improve IndexSet iteration) broke live in computation: > > > @@ -273,23 +276,25 @@ void PhaseLive::add_liveout( Block *p, IndexSet *lo, VectorSet &first_pass ) { > // Add a vector of live-in values to a given blocks live-in set. > void PhaseLive::add_livein(Block *p, IndexSet *lo) { > IndexSet *livein = &_livein[p->_pre_order-1]; > - IndexSetIterator elements(lo); > - uint r; > - while ((r = elements.next()) != 0) { > - livein->insert(r); // Then add to live-in set > + if (!livein->is_empty()) { > + IndexSetIterator elements(lo); > + uint r; > + while ((r = elements.next()) != 0) { > + livein->insert(r); // Then add to live-in set > + } > } > } > > > `livein` is initially empy and the patch above only adds element to it if: > > > if (!livein->is_empty()) { > > > which is never true. > > This doesn't affect correctness as live in sets are only used to drive > scheduling. > It's a performance optimization from https://bugs.openjdk.org/browse/JDK-8234003 > ... The biggest improvement comes from avoiding iterating over empty sets altogether. ???? of course! Thanks for the explanation (sorry that I missed that)! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22473#issuecomment-2511526852 From duke at openjdk.org Mon Dec 2 13:38:42 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Mon, 2 Dec 2024 13:38:42 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v15] In-Reply-To: References: Message-ID: On Mon, 2 Dec 2024 10:05:58 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/divnode.cpp line 1156: >> >>> 1154: } >>> 1155: // Don't bother trying to transform a dead node >>> 1156: if (mod->in(0) && mod->in(0)->is_top()) { >> >> Suggestion: >> >> if (mod->in(0) != nullptr && mod->in(0)->is_top()) { >> >> >> https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md >> `Do not use ints or pointers as (implicit) booleans with &&, ||, if, while. Instead, compare explicitly, i.e. if (x != 0) or if (ptr != nullptr), etc.` > > There are more instances in the code. I know you copied it, but when we touch code we make sure to adhere to the new rules ;) It would be nice to have some automatic checks for this (like the check for extra whitespace). clang-tidy can perform this check automatically, for example. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1865865012 From epeter at openjdk.org Mon Dec 2 13:42:02 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 2 Dec 2024 13:42:02 GMT Subject: RFR: 8343685: C2 SuperWord: refactor VPointer with MemPointer Message-ID: I know this one is large, but it consists of a lot of renamings, and new tests. On the whole, the new `VPointer` code is less than the old! **Goal** Replace old `VPointer` with a new version that relies on `MemPointer` - which then is a shared utility for both `MergeStores` and `SuperWord / AutoVectorization`. `MemPointer` generally parses pointers, and `VPointer` specializes this facility for the use in loops (`VLoop`). The old `VPointer` implementation with its recursive pattern matching was quite complicated and difficult to reason about for correctness. The approach in `MemPointer` is much simpler: iteratively decomposing sub-expressions. Further: the new implementation is more powerful at detecting equivalent invariants. **Future**: with the `MemPointer` implementation of `VPointer`, it should be easier to implement speculative runtime-checks for Aliasing-Analysis [JDK-8324751](https://bugs.openjdk.org/browse/JDK-8324751). The pressing need for this has come from the FFM / MemorySegment folks, like @mcimadamore and @minborg . **Details** This looks like a rather big patch, so let me explain the parts. - Refactor of `MemPointer` in `mepointer.hpp/cpp`: - Added concept of `Base` to `MemPointer`. This is required for the aliasing computation in `VPointer`. - `sub_expression_has_native_base_candidate`: add special case to parse through `CastX2P` if we find a native memory base `MemorySegment.address()`, i.e. `jdk.internal.foreign.NativeMemorySegmentImpl.min`. This helps some native memory segment cases to vectorize that did not before. - So far `MemPointer` could only answer adjacency queries. But VPointer also needs overlap queries, see the old `VPointer::not_equal` (i.e. can we prove that the two `VPointer` never overlap?). So I had to add a new case to aliasing computation: `NotOrAtDistance`. It is useful to answer the new and better named `MemPointer::never_overlaps_with`. - Collapsed together `MemPointerDecomposedForm` and `MemPointer`. It was an unnecessary and unhelpful split. - Re-write of `VPointer` based on `MemPointer`: - Old pattern: - `VPointer[mem: 847 StoreI, base: 37, adr: 37, base[ 37] + offset( 16) + invar( 0) + scale( 4) * iv]` - `VPointer[mem: 3189 LoadB, base: 1, adr: 2273, base[ 1] + offset( 0) + invar( 0) + scale( 1) * iv]` -> `adr = CastX2P`, the addition of `11 Param`, `1819 Phi` and `629 LoadL`. - New pattern: - `VPointer[size: 4, object, base(110 CastPP) + con( 16) + iv_scale( 4) * iv + invar(0)]` - `VPointer[size: 1, native, base(629 LoadL) + con( 0) + iv_scale( 1) * iv + invar(1 * [11 Parm] + 1 * [1819 Phi])]` -> parse through the `CastX2P`, and detect the more helpful base `629 LoadL` (`MemorySegment.address()`), take `11 Param` and `1819 Phi` as invariants. - Move `invariant` check to `VLoop::is_pre_loop_invariant`. - Re-work the aliasing computation, and give it better names: - Before, it was done via `VPointer::cmp`, which returned quite cryptic return codes. This allowed us to do adjacency and overlap queries. Now I have explicit queries `is_adjacent_to_and_before` and `never_overlaps_with`. - `VPointer::make_with_size` and `VPointer::make_with_iv_offset`: one can now create a copy of a `VPointer` with a changed size (e.g. when going from smaller scalar to larger vector), or with a shifted offset (when simulating a `VPointer` in the next iteration). Now we can do aliasing computations with such modified `VPointers`, which is quite powerful and I will probably use for other things in the future. - Refactor many uses of `VPointer`: - since we now do not have a single `invar`, but rather a set of `invar_summands`, I had to change some code in `AlignmentSolution / AlignmentSolver`, `VMemoryRegion`, `create_adjacent_memop_pairs` and `VTransform::adjust_pre_loop_limit_to_align_main_loop_vectors`. This unfortunately makes the change quite large, but a lot of it is just renaming of variables (e.g. `scale` -> `iv_scale` to avoid confusion with scales of other summands). - `VTransformMemVectorNode`: it now carries the `VPointer` corresponding to the vector load/store, adjusted for its size. We can now do aliasing checks with these nodes. This is how I refactored away `overlap_possible_with_any_in`, and replaced it with a single call to `never_overlaps_with`. - Fix existing tests: more cases now vectorize: - Non-power-of-2 stride: the issue used to be that e.g. `stride=3` would make IGVN split `3 * iv -> iv + iv << 1`. This was not properly parsed by the old `VPointer`, as it found the `iv` twice. But the `MemPointerParser` simply collects all usages and adds them up again, back to `3 * iv`. - Some cases in `TestMemorySegment.java` now vectorize, because `VPointer` is better at detecting the invariants and detecting that they are equivalent. - Add new tests: - A while ago, I had worked on: [JDK-8330274](https://bugs.openjdk.org/browse/JDK-8330274) / https://github.com/openjdk/jdk/pull/18795, but then gave it up, in favour of this refactoring here. I had previously noticed, that the old VPointer does not deal very well with multiple invariants, because if the order of addition is off, then a different `invar` node was generated, and then the aliasing computation gets confused, i.e. thinks the aliasing is unknown. I added the tests from there:`TestEquivalentInvariants.java`. **Future work:** [JDK-8330274](https://bugs.openjdk.org/browse/JDK-8330274): investigate why some cases with multiple invariants still do not vectorize, and what can be done. [JDK-8331576](https://bugs.openjdk.org/browse/JDK-8331576): Investigate case where native memory does not vectorize because of `CastX2P`. [JDK-8324751](https://bugs.openjdk.org/browse/JDK-8324751): Aliasing-Analysis with speculative runtime-checks ------------- Commit messages: - rename - fix up print - add TestEquivalentInvariants.java - improve documentation - hide parser via delegation - Merge branch 'master' into JDK-8343685-VPointer-MemPointer - make sort stable - some comment and naming improvements - check if field not found - update comments - ... and 102 more: https://git.openjdk.org/jdk/compare/28ae281b...4b3c7d29 Changes: https://git.openjdk.org/jdk/pull/21926/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21926&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343685 Stats: 4032 lines in 18 files changed: 1849 ins; 1538 del; 645 mod Patch: https://git.openjdk.org/jdk/pull/21926.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21926/head:pull/21926 PR: https://git.openjdk.org/jdk/pull/21926 From epeter at openjdk.org Mon Dec 2 13:42:06 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 2 Dec 2024 13:42:06 GMT Subject: RFR: 8343685: C2 SuperWord: refactor VPointer with MemPointer In-Reply-To: References: Message-ID: <9ImmEt87cCVVFSbL8My0n5eA07h00WLiu7paMwNoxQ4=.f66eeb55-240f-4e44-829d-29cb64a9c398@github.com> On Wed, 6 Nov 2024 13:07:13 GMT, Emanuel Peter wrote: > I know this one is large, but it consists of a lot of renamings, and new tests. On the whole, the new `VPointer` code is less than the old! > > **Goal** > > Replace old `VPointer` with a new version that relies on `MemPointer` - which then is a shared utility for both `MergeStores` and `SuperWord / AutoVectorization`. `MemPointer` generally parses pointers, and `VPointer` specializes this facility for the use in loops (`VLoop`). > > The old `VPointer` implementation with its recursive pattern matching was quite complicated and difficult to reason about for correctness. The approach in `MemPointer` is much simpler: iteratively decomposing sub-expressions. Further: the new implementation is more powerful at detecting equivalent invariants. > > **Future**: with the `MemPointer` implementation of `VPointer`, it should be easier to implement speculative runtime-checks for Aliasing-Analysis [JDK-8324751](https://bugs.openjdk.org/browse/JDK-8324751). The pressing need for this has come from the FFM / MemorySegment folks, like @mcimadamore and @minborg . > > **Details** > > This looks like a rather big patch, so let me explain the parts. > - Refactor of `MemPointer` in `mepointer.hpp/cpp`: > - Added concept of `Base` to `MemPointer`. This is required for the aliasing computation in `VPointer`. > - `sub_expression_has_native_base_candidate`: add special case to parse through `CastX2P` if we find a native memory base `MemorySegment.address()`, i.e. `jdk.internal.foreign.NativeMemorySegmentImpl.min`. This helps some native memory segment cases to vectorize that did not before. > - So far `MemPointer` could only answer adjacency queries. But VPointer also needs overlap queries, see the old `VPointer::not_equal` (i.e. can we prove that the two `VPointer` never overlap?). So I had to add a new case to aliasing computation: `NotOrAtDistance`. It is useful to answer the new and better named `MemPointer::never_overlaps_with`. > - Collapsed together `MemPointerDecomposedForm` and `MemPointer`. It was an unnecessary and unhelpful split. > - Re-write of `VPointer` based on `MemPointer`: > - Old pattern: > - `VPointer[mem: 847 StoreI, base: 37, adr: 37, base[ 37] + offset( 16) + invar( 0) + scale( 4) * iv]` > - `VPointer[mem: 3189 LoadB, base: 1, adr: 2273, base[ 1] + offset( 0) + invar( 0) + scale( 1) * iv]` -> `adr = CastX2P`, the addition of `11 Param`, `1819 Phi` and `629 LoadL`. > - New pattern: > - `VPointer[size: 4, object, base(110 CastPP) + con( 16)... src/hotspot/share/opto/mempointer.cpp line 188: > 186: } > 187: // Fall-through: we can find a more precise native-memory "base". We further decompose > 188: // the CastX2P to find this "base" and any other offsets from it. Note: special handling for new `Base` detection, required for `VPointer`. src/hotspot/share/opto/mempointer.cpp line 206: > 204: callback.callback(n); > 205: return; > 206: } Note: indentation fix src/hotspot/share/opto/mempointer.cpp line 458: > 456: // (S2) holds. If all summands are the same, also the base must be the same. > 457: has_same_base = true; > 458: } else { Note: refactored into `has_same_summands_as` (-> `make_always_at_distance`), and added new case `has_different_object_base_but_otherwise_same_summands_as` (-> `make_not_or_at_distance`). This is necessary for the overlap query needed in `VPointer`. src/hotspot/share/opto/mempointer.cpp line 541: > 539: bool MemPointer::is_adjacent_to_and_before(const MemPointer& other) const { > 540: const MemPointerAliasing aliasing = get_aliasing_with(other NOT_PRODUCT( COMMA _trace )); > 541: const bool is_adjacent = aliasing.is_always_at_distance(_size); Note: collapsed `MemPointerDecomposedForm` and `MemPointer` into one. src/hotspot/share/opto/mempointer.cpp line 582: > 580: return is_never_overlap; > 581: } > 582: Note: replaces the old `VPointer::not_equal`, which was used for overlap queries. src/hotspot/share/opto/mempointer.hpp line 174: > 172: // and determine if they are adjacent. > 173: // > 174: // MemPointerDecomposedFormParser: Note: moved down src/hotspot/share/opto/mempointer.hpp line 209: > 207: // > 208: // then we can easily compute the distance between the pointers (distance = con2 - con1), > 209: // and determine if they are adjacent. Note: moved down from higher up. src/hotspot/share/opto/mempointer.hpp line 244: > 242: // (2) LoadL from field jdk.internal.foreign.NativeMemorySegmentImpl.min > 243: // This would be preferrable over CastX2P, because it holds the address() of a native > 244: // MemorySegment, i.e. we know it points to the beginning of that MemorySegment. Note: description about new `Base` detection needed in `VPointer`. src/hotspot/share/opto/mempointer.hpp line 430: > 428: // e.g. "array1[i] vs array2[i+4]": > 429: // if "array1 == array2": distance = 4. > 430: // if "array1 != array2": different memory objects. Note: added `NotOrAtDistance`, needed for overlap query. src/hotspot/share/opto/mempointer.hpp line 521: > 519: if (cmp != 0) { return cmp; } > 520: > 521: return NoOverflowInt::cmp(p1.scale(), p2.scale()); Note: `cmp` is used for sorting. src/hotspot/share/opto/mempointer.hpp line 559: > 557: // Singleton for default arguments. > 558: static MemPointerParserCallback& empty() { return _empty; } > 559: }; Note: used for `VPointer`. In `SuperWord::unrolling_analysis` to track all the "ignore" nodes, which are the internal nodes in the address expressions. src/hotspot/share/opto/mempointer.hpp line 694: > 692: _summands[i] = old.summands_at(i); > 693: } > 694: } Note: mutated copy constructor: used for `make_with_size` and `make_with_con`, i.e. modifying the size (scalar -> vector), and modifying the `con`, so that we can shift a `VPointer` and simulate the `VPointer` in a next iteration. src/hotspot/share/opto/mempointer.hpp line 734: > 732: bool has_same_summands_as(const MemPointer& other, uint start) const; > 733: bool has_same_summands_as(const MemPointer& other) const { return has_same_summands_as(other, 0); } > 734: bool has_different_object_base_but_otherwise_same_summands_as(const MemPointer& other) const; Note: refactored from `get_aliasing_with` and adding new case `has_different_object_base_but_otherwise_same_summands_as` for overlap query. src/hotspot/share/opto/mempointer.hpp line 776: > 774: > 775: bool is_adjacent_to_and_before(const MemPointer& other) const; > 776: bool never_overlaps_with(const MemPointer& other) const; Note: these are the two basic queries: `adjacent` and `overlap`. src/hotspot/share/opto/noOverflowInt.hpp line 113: > 111: if (a.value() > b.value()) { return 1; } > 112: return 0; > 113: } Note: needed to compare `MemPointerSummand` (when variable equal), used by `MemPointer` compare, used to sort `VPointer`s. src/hotspot/share/opto/superword.cpp line 97: > 95: } > 96: }; > 97: Note: refactoring: instead of passing in the old `ignored_loop_nodes` to the `VPointer` during parsing, we pass in a `callback`, and track the nodes this way. src/hotspot/share/opto/superword.cpp line 2880: > 2878: // The base is only aligned with ObjectAlignmentInBytes with arrays. > 2879: // When the base() is top, we have no alignment guarantee at all. > 2880: // Hence, we must now take the base into account for the calculation. Note: `align_to_ref_p.base()->is_top()` meant it was a native memory access, i.e. off-heap, and not related to any java object. src/hotspot/share/opto/superword.hpp line 587: > 585: static int cmp_by_group(MemOp* a, MemOp* b); > 586: static int cmp_by_group_and_con_and_original_index(MemOp* a, MemOp* b); > 587: }; Note: we used to put `VPointer` directly in a `GrowableArray` to sort them. Now that the `VPointer` does not have a reference to the `MemNode*`, we have to create a "tuple" of `(VPointer, mem)`. The `original_index` is needed to keep the sort stable in `qsort`. Before this was done with the `mem->_idx`, which is a bit less well behaved. src/hotspot/share/opto/superwordVTransformBuilder.cpp line 148: > 146: const VPointer& scalar_p = _vloop_analyzer.vpointers().vpointer(p0->as_Store()); > 147: const VPointer vector_p(scalar_p.make_with_size(scalar_p.size() * pack_size)); > 148: vtn = new (_vtransform.arena()) VTransformStoreVectorNode(_vtransform, pack_size, vector_p); Note: we now pass in the `VPointer` of the size of the vector directly. Before, we referenced the `VPointer`s of the packed nodes. But that reliance will be in the way in the future, and it is more clean to carry the modified `VPointer` like this. src/hotspot/share/opto/traceAutoVectorizationTag.hpp line 35: > 33: flags(POINTER_ALIASING, "Trace VPointer/MemPointer aliasing") \ > 34: flags(POINTER_ADJACENCY, "Trace VPointer/MemPointer adjacency") \ > 35: flags(POINTER_OVERLAP, "Trace VPointer/MemPointer overlap") \ Note: naming is now more consistent with `TraceMergeStore`. src/hotspot/share/opto/traceMergeStoresTag.hpp line 36: > 34: flags(POINTER_PARSING, "Trace pointer IR") \ > 35: flags(POINTER_ALIASING, "Trace MemPointerSimpleForm::get_aliasing_with") \ > 36: flags(POINTER_ADJACENCY, "Trace adjacency") \ Note: naming is now more consistent with `TraceAutoVectorization`. src/hotspot/share/opto/vectorization.hpp line 207: > 205: // Is it before the pre-loop? > 206: return phase()->is_dominator(ctrl, pre_loop_head()); > 207: } Note: replaces the old `VPointer::invariant` src/hotspot/share/opto/vectorization.hpp line 737: > 735: Node_Stack* _nstack; // stack used to record a vpointer trace of variants > 736: bool _analyze_only; // Used in loop unrolling only for vpointer trace > 737: uint _stack_idx; // Used in loop unrolling only for vpointer trace Note: this is now replaced with the `MemPointerParserCallback`. src/hotspot/share/opto/vectorization.hpp line 836: > 834: jlong max_diff = (jlong)1 << 31; > 835: if (difference >= max_diff) { > 836: return NotComparable; Note: this used to be the basis of `adjacency` and `offset` queries. Now we do this in `MemPointer`, and with better naming ;) src/hotspot/share/opto/vectorization.hpp line 854: > 852: // the two memory regions. > 853: if (!not_equal(p_mem)) { > 854: return true; Note: refactored away. We used to pass in all nodes in a pack/vector, find all their `VPointer`. Now the vector has its own size-adjusted `VPointer`, and we can simply do a `never_overlaps_with` query directly on that. src/hotspot/share/opto/vectorization.hpp line 940: > 938: tty->print_cr("VPointer::init_is_valid: scale or stride too large."); > 939: } > 940: #endif Note: this was already present in `VPointer::VPointer` before. src/hotspot/share/opto/vtransform.cpp line 352: > 350: // care too much about those edge cases. > 351: memory_regions.push(new VMemoryRegion(iv_offset_p, is_load, schedule_order++)); > 352: } Note: instead of doing the costum logic with `iv_offset` all in `VMemoryRegion`, we now create a modified copy of the `VPointer` that simulates the `VPointer` at `iv = iv + iv_offset`. src/hotspot/share/opto/vtransform.cpp line 579: > 577: mem = mem->in(MemNode::Memory); > 578: } else { > 579: break; Note: We now have access to the size-adjusted `VPointer` for the vector. src/hotspot/share/opto/vtransform.hpp line 488: > 486: class VTransformMemVectorNode : public VTransformVectorNode { > 487: private: > 488: const VPointer _vpointer; // with size of the vector Note: make sure we can store the size-adjusted `VPointer` for load/store vectors. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865401201 PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865400704 PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865403574 PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865404708 PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865405191 PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865405473 PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865447415 PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865448032 PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865448653 PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865450303 PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865453293 PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865455958 PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865457779 PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865458817 PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865464174 PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865465505 PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865469704 PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865472623 PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865475878 PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865473660 PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865474120 PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865478374 PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865479782 PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865481593 PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865483659 PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865485213 PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865488580 PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865489622 PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865490285 From chagedorn at openjdk.org Mon Dec 2 13:42:06 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 2 Dec 2024 13:42:06 GMT Subject: RFR: 8343685: C2 SuperWord: refactor VPointer with MemPointer In-Reply-To: References: Message-ID: On Wed, 6 Nov 2024 13:07:13 GMT, Emanuel Peter wrote: > I know this one is large, but it consists of a lot of renamings, and new tests. On the whole, the new `VPointer` code is less than the old! > > **Goal** > > Replace old `VPointer` with a new version that relies on `MemPointer` - which then is a shared utility for both `MergeStores` and `SuperWord / AutoVectorization`. `MemPointer` generally parses pointers, and `VPointer` specializes this facility for the use in loops (`VLoop`). > > The old `VPointer` implementation with its recursive pattern matching was quite complicated and difficult to reason about for correctness. The approach in `MemPointer` is much simpler: iteratively decomposing sub-expressions. Further: the new implementation is more powerful at detecting equivalent invariants. > > **Future**: with the `MemPointer` implementation of `VPointer`, it should be easier to implement speculative runtime-checks for Aliasing-Analysis [JDK-8324751](https://bugs.openjdk.org/browse/JDK-8324751). The pressing need for this has come from the FFM / MemorySegment folks, like @mcimadamore and @minborg . > > **Details** > > This looks like a rather big patch, so let me explain the parts. > - Refactor of `MemPointer` in `mepointer.hpp/cpp`: > - Added concept of `Base` to `MemPointer`. This is required for the aliasing computation in `VPointer`. > - `sub_expression_has_native_base_candidate`: add special case to parse through `CastX2P` if we find a native memory base `MemorySegment.address()`, i.e. `jdk.internal.foreign.NativeMemorySegmentImpl.min`. This helps some native memory segment cases to vectorize that did not before. > - So far `MemPointer` could only answer adjacency queries. But VPointer also needs overlap queries, see the old `VPointer::not_equal` (i.e. can we prove that the two `VPointer` never overlap?). So I had to add a new case to aliasing computation: `NotOrAtDistance`. It is useful to answer the new and better named `MemPointer::never_overlaps_with`. > - Collapsed together `MemPointerDecomposedForm` and `MemPointer`. It was an unnecessary and unhelpful split. > - Re-write of `VPointer` based on `MemPointer`: > - Old pattern: > - `VPointer[mem: 847 StoreI, base: 37, adr: 37, base[ 37] + offset( 16) + invar( 0) + scale( 4) * iv]` > - `VPointer[mem: 3189 LoadB, base: 1, adr: 2273, base[ 1] + offset( 0) + invar( 0) + scale( 1) * iv]` -> `adr = CastX2P`, the addition of `11 Param`, `1819 Phi` and `629 LoadL`. > - New pattern: > - `VPointer[size: 4, object, base(110 CastPP) + con( 16)... src/hotspot/share/opto/vectorization.cpp line 212: > 210: > 211: _body.for_each_mem([&] (const MemNode* mem, int bb_idx) { > 212: const VPointer& xp = vpointer(mem); Maybe also check other `xp` names. Suggestion: const VPointer& p = vpointer(mem); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1865514348 From dnsimon at openjdk.org Mon Dec 2 13:59:48 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 2 Dec 2024 13:59:48 GMT Subject: RFR: 8345267: Fix memory leak in JVMCIEnv dtor [v2] In-Reply-To: References: Message-ID: On Mon, 2 Dec 2024 08:39:19 GMT, Doug Simon wrote: >> The `ALLOW_C_FUNCTION` macro takes the identifier for the relevant C function, followed by a statement containing the use as additional (variadic) macro args. This PR fixes a use of this macro where the leading identifier arg was being omitted. > > Doug Simon has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: > > fix use of ALLOW_C_FUNCTION Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22471#issuecomment-2511612001 From dnsimon at openjdk.org Mon Dec 2 13:59:49 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 2 Dec 2024 13:59:49 GMT Subject: Integrated: 8345267: Fix memory leak in JVMCIEnv dtor In-Reply-To: References: Message-ID: On Mon, 2 Dec 2024 07:08:22 GMT, Doug Simon wrote: > The `ALLOW_C_FUNCTION` macro takes the identifier for the relevant C function, followed by a statement containing the use as additional (variadic) macro args. This PR fixes a use of this macro where the leading identifier arg was being omitted. This pull request has now been integrated. Changeset: b8233989 Author: Doug Simon URL: https://git.openjdk.org/jdk/commit/b8233989e7605268dda908e6b639ca373789792b Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8345267: Fix memory leak in JVMCIEnv dtor Reviewed-by: simonis, kbarrett ------------- PR: https://git.openjdk.org/jdk/pull/22471 From roland at openjdk.org Mon Dec 2 14:01:12 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 2 Dec 2024 14:01:12 GMT Subject: RFR: 8345299: C2: some nodes can still have incorrect control after do_range_check() Message-ID: 8339733 fixed controls for updated pre/main limits during do_range_check(). However, it missed one issue: Control for the new limits is computed in `new_limit_ctrl` for both pre and main loops. `new_limit_ctrl` is currently initialized from the pre limit control but it also needs to take the main loop limit control into account as sometimes, the main loop limit control is below the pre limit and pre loop entry control. 8339733 also introduced a couple bugs. Control of the `Bool`/`Cmp` nodes are updated for the pre and main loop to `new_limit_ctrl`. But that's incorrect because `new_limit_ctrl` may be above the pre loop while the `Bool`/`Cmp` for the pre loop are in the loop (because they depend on the loop iv) and for the main loop are after the pre loop (because they depend on the iv out of the pre loop). I fixed this for the pre loop by setting control for the `Bool`/`Cmp` to be as late as possible. For the main loop, no change appears to be required as control computed by c2 is already late enough. I've added an assert instead. ------------- Commit messages: - comment - fix Changes: https://git.openjdk.org/jdk/pull/22485/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22485&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8345299 Stats: 11 lines in 2 files changed: 1 ins; 1 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/22485.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22485/head:pull/22485 PR: https://git.openjdk.org/jdk/pull/22485 From rcastanedalo at openjdk.org Mon Dec 2 14:05:38 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 2 Dec 2024 14:05:38 GMT Subject: RFR: 8345158: IGV: local scheduling should not place successors before predecessors In-Reply-To: <0VUX5bX4oDw7aUoQ8KWtsG6PLMfM5bH3ge7ZYY8Gz1Y=.305aa13d-f638-4645-b956-a7209266358f@github.com> References: <0VUX5bX4oDw7aUoQ8KWtsG6PLMfM5bH3ge7ZYY8Gz1Y=.305aa13d-f638-4645-b956-a7209266358f@github.com> Message-ID: <-vxv4BuTLs6rUXAMf6vGGpQyYC1hHr5cmdSupAQ-KgE=.453b3722-18cd-448d-bd0e-09dd28596fa2@github.com> On Mon, 2 Dec 2024 12:58:57 GMT, Daniel Skantz wrote: > A slight tweak to consider def-use in scheduleBlock. > > Testing: opened a few graphs in control-flow graph mode without any assertion failure. Looks good, thanks for fixing this Daniel! I tested this by examining a few schedules manually (comparing IGV-approximated schedules after matching vs. "real" C2-generated schedules after GCM) and also checked that the changes do not increase the overhead of approximating schedules in IGV (which is still negligible compared to graph layout time). ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22481#pullrequestreview-2472789707 From roland at openjdk.org Mon Dec 2 14:06:41 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 2 Dec 2024 14:06:41 GMT Subject: RFR: 8345287: C2: live in computation is broken In-Reply-To: References: <66KJ0kayCyRztG81OvFyladp1BbDq5nW810HKEZPmlk=.51449889-9e47-408e-9751-470e1f912113@github.com> Message-ID: <-JM7PaO43IeAwzF30ordG0ERn-ViXxsVfQtE0M7RkgQ=.9fb476a0-b8a9-4618-8aaa-5ceebef181dd@github.com> On Mon, 2 Dec 2024 12:40:49 GMT, Roberto Casta?eda Lozano wrote: > OK, thanks. I will start by running some benchmarks on x64 with and without this fix. Will report results in a couple of days. Sounds good. Thanks. If there happen to be a regression, I think it would make more sense to fix the code (with this patch) and disable `OptoRegScheduling` (until someone figures out what's going on) than keep code that doesn't make any sense. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22473#issuecomment-2511633892 From duke at openjdk.org Mon Dec 2 15:59:43 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Mon, 2 Dec 2024 15:59:43 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v15] In-Reply-To: References: Message-ID: On Mon, 2 Dec 2024 10:21:38 GMT, Emanuel Peter wrote: >> Looks good to me otherwise. Nice tests! > > Ok. I'm not sure about the Template vs BasicType. Generally BasicType is what we have used so far, but I would say Templates are cleaner. Discussed it with @TobiHartmann , and he thinks Templates are fine too. > > But you need to move the `make_...` to the corresponding file, so others can find it. > > And I think the constants in the tests should be given as `1L << 42`, and not some long unreadable chain of digits ;) @eme64 Without any template handling the signed/unsigned conversion is a bit tricky I think. get_con_as_long would, as far as I can tell, sign extend ints to longs, which changes the unsigned interpretation for values whose signed interpretation is negative. I think if a template is already used for this part, it makes sense to also use it for node creation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22061#issuecomment-2511919238 From duke at openjdk.org Mon Dec 2 16:11:42 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Mon, 2 Dec 2024 16:11:42 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v15] In-Reply-To: References: Message-ID: On Mon, 2 Dec 2024 09:55:26 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/c2/irTests/ModINodeIdealizationTests.java line 43: >> >>> 41: >>> 42: public static void main(String[] args) { >>> 43: TestFramework.runWithFlags("-XX:CompileCommand=inline,*Math::max"); >> >> Is this really necessary? Should this not happen automatically? > > If you do need it: can you add a comment why? Surprisingly, this does make a difference for long as the intrinsic is currently not used. But since this test is for int, it's not needed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1866129943 From duke at openjdk.org Mon Dec 2 16:28:56 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Mon, 2 Dec 2024 16:28:56 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v16] In-Reply-To: References: Message-ID: > This PR introduces > - several new optimizations to unsigned division and modulo > - x % 1, x % x, x % 2^k > - x / 1, x / x, x / 2^k > - does not implement the Granlund and Montgomery algorithm, which has been implemented for signed modulo division in the past. It is unclear if a lot is to be gained by implementing this. > - tests to test existing optimizations for signed division and modulo > - does not test the Granlund and Montgomery algorithm directly theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: Address comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22061/files - new: https://git.openjdk.org/jdk/pull/22061/files/ee7f3b35..8e709eb0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22061&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22061&range=14-15 Stats: 67 lines in 5 files changed: 26 ins; 26 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/22061.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22061/head:pull/22061 PR: https://git.openjdk.org/jdk/pull/22061 From kvn at openjdk.org Mon Dec 2 16:36:42 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 2 Dec 2024 16:36:42 GMT Subject: RFR: 8345172: x86: Some CPU feature asserts are declared as 32-bit only [v3] In-Reply-To: References: Message-ID: On Fri, 29 Nov 2024 15:17:00 GMT, Aleksey Shipilev wrote: >> Noticed this while cleaning up the 32-bit x86 code. We baseline our 64-bit x86 to be at least UseSSE=2. Therefore we still need to check for features UseSSE > 2. I have found a few places where we do NOT_LP64 for these checks. I checked other `VMVersion::supports_*()` uses, and I think these are the only two outliers. > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Wrap more SSE 1/2 asserts in NOT_LP64 Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22432#pullrequestreview-2473225377 From duke at openjdk.org Mon Dec 2 16:38:03 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Mon, 2 Dec 2024 16:38:03 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v17] In-Reply-To: References: Message-ID: On Mon, 2 Dec 2024 10:21:38 GMT, Emanuel Peter wrote: >> Looks good to me otherwise. Nice tests! > > Ok. I'm not sure about the Template vs BasicType. Generally BasicType is what we have used so far, but I would say Templates are cleaner. Discussed it with @TobiHartmann , and he thinks Templates are fine too. > > But you need to move the `make_...` to the corresponding file, so others can find it. > > And I think the constants in the tests should be given as `1L << 42`, and not some long unreadable chain of digits ;) Thanks for all your comments, @eme64. I think I addressed them all. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22061#issuecomment-2512071345 From duke at openjdk.org Mon Dec 2 16:38:03 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Mon, 2 Dec 2024 16:38:03 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v17] In-Reply-To: References: Message-ID: <9AO7AsQv3puIVURjKB7wvQCWcMO2ZG4gpUJxpxTLghw=.28527eeb-1ec3-42cf-b714-ccd5f5541e71@github.com> > This PR introduces > - several new optimizations to unsigned division and modulo > - x % 1, x % x, x % 2^k > - x / 1, x / x, x / 2^k > - does not implement the Granlund and Montgomery algorithm, which has been implemented for signed modulo division in the past. It is unclear if a lot is to be gained by implementing this. > - tests to test existing optimizations for signed division and modulo > - does not test the Granlund and Montgomery algorithm directly theoweidmannoracle has updated the pull request incrementally with two additional commits since the last revision: - Update UDivINodeIdealizationTests.java - Remove more magic numbers ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22061/files - new: https://git.openjdk.org/jdk/pull/22061/files/8e709eb0..621cf4d1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22061&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22061&range=15-16 Stats: 6 lines in 2 files changed: 2 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/22061.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22061/head:pull/22061 PR: https://git.openjdk.org/jdk/pull/22061 From jbhateja at openjdk.org Mon Dec 2 16:58:50 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 2 Dec 2024 16:58:50 GMT Subject: Integrated: 8342677: Add IR validation tests for newly added saturated vector add / sub operations In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 10:52:01 GMT, Jatin Bhateja wrote: > This is a follow up PR to https://github.com/openjdk/jdk/pull/20507 > It adds IR validation tests for newly added saturated vector add / sub operations. This pull request has now been integrated. Changeset: 29c57e8b Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/29c57e8b346531c8675ad853460207f67e00f946 Stats: 587 lines in 2 files changed: 587 ins; 0 del; 0 mod 8342677: Add IR validation tests for newly added saturated vector add / sub operations Reviewed-by: epeter ------------- PR: https://git.openjdk.org/jdk/pull/21603 From jbhateja at openjdk.org Mon Dec 2 16:58:49 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 2 Dec 2024 16:58:49 GMT Subject: RFR: 8342677: Add IR validation tests for newly added saturated vector add / sub operations [v2] In-Reply-To: References: Message-ID: On Mon, 2 Dec 2024 09:41:06 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review resolutions > > Thanks for the changes @jatin-bhateja ! Looks good to me :) Thanks @eme64 , going ahead with integration. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21603#issuecomment-2512150064 From shade at openjdk.org Mon Dec 2 17:37:43 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 2 Dec 2024 17:37:43 GMT Subject: RFR: 8345172: x86: Some CPU feature asserts are declared as 32-bit only [v3] In-Reply-To: References: Message-ID: On Fri, 29 Nov 2024 15:17:00 GMT, Aleksey Shipilev wrote: >> Noticed this while cleaning up the 32-bit x86 code. We baseline our 64-bit x86 to be at least UseSSE=2. Therefore we still need to check for features UseSSE > 2. I have found a few places where we do NOT_LP64 for these checks. I checked other `VMVersion::supports_*()` uses, and I think these are the only two outliers. > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Wrap more SSE 1/2 asserts in NOT_LP64 Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22432#issuecomment-2512245124 From shade at openjdk.org Mon Dec 2 17:37:44 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 2 Dec 2024 17:37:44 GMT Subject: Integrated: 8345172: x86: Some CPU feature asserts are declared as 32-bit only In-Reply-To: References: Message-ID: <6Yu1x4b0WvI2s7AGV8eDUIh2wHxNDvKpO0ppRsePB8k=.3f74792b-8524-467d-bc67-a301943eff62@github.com> On Thu, 28 Nov 2024 10:52:48 GMT, Aleksey Shipilev wrote: > Noticed this while cleaning up the 32-bit x86 code. We baseline our 64-bit x86 to be at least UseSSE=2. Therefore we still need to check for features UseSSE > 2. I have found a few places where we do NOT_LP64 for these checks. I checked other `VMVersion::supports_*()` uses, and I think these are the only two outliers. This pull request has now been integrated. Changeset: 7c944ee6 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/7c944ee6f4dda4f1626721d63ac6bc6d1b40d33b Stats: 16 lines in 2 files changed: 0 ins; 1 del; 15 mod 8345172: x86: Some CPU feature asserts are declared as 32-bit only Reviewed-by: dfenacci, kvn ------------- PR: https://git.openjdk.org/jdk/pull/22432 From kvn at openjdk.org Mon Dec 2 17:41:40 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 2 Dec 2024 17:41:40 GMT Subject: RFR: 8344833: CTW: Make failing on zero classes optional [v3] In-Reply-To: References: Message-ID: On Wed, 27 Nov 2024 18:29:18 GMT, Evgeny Nikitin wrote: >> For CTW, zero classes in provided jar is now a failure. >> This creates noisy and blocking false positives in fuzzy/mass scale runs, where we use jar archives from random sources, unchecked or randomly generated, etc. >> >> This PR makes this behaviour controllable. Default reaction is a failure, like before. > > Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: > > Use totalClassCount instead of the classCount Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22320#pullrequestreview-2473441190 From kvn at openjdk.org Mon Dec 2 17:41:40 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 2 Dec 2024 17:41:40 GMT Subject: RFR: 8344833: CTW: Make failing on zero classes optional [v3] In-Reply-To: References: <0afLSZbCNxy6H8PbPzuIDzz4zqKh-xDt1YFGt17bejw=.4362d5a8-f8b9-44e2-a96d-e2421f316c63@github.com> Message-ID: On Wed, 27 Nov 2024 18:26:21 GMT, Evgeny Nikitin wrote: > > Why you even need this to be controlled and not default behavior? What is benefit of having error vs warning for empty `jar`? > > For mass-running of CTW against unchecked/random jars from various jar repositories, like Maven Central. Another solution would be filtering out such jars in advance, but that's a more difficult (read the jar file, check the class' count, etc.) solution. So for our normal testing it will be still error? Okay. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22320#issuecomment-2512253786 From kvn at openjdk.org Mon Dec 2 19:41:39 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 2 Dec 2024 19:41:39 GMT Subject: RFR: 8345156: C2: Add bailouts next to a few asserts In-Reply-To: References: Message-ID: On Mon, 2 Dec 2024 13:07:57 GMT, Daniel Skantz wrote: > This patch associates product bailouts with a few existing debug asserts. The criteria are that there have been product bugs associated with failing these asserts in the past, and that there are not too many callers to the method where the compilation is now cancelled or any measurable impact on compilation time. > > Testing: T1-T4 and extra testing. Tested compilation time with -XX:+CITime on performance benchmarks and the effect was not measurable. How you found places where to add `failing()` check? To be useful it should be preceded by a failure recording. src/hotspot/share/opto/block.cpp line 1359: > 1357: if (n->needs_anti_dependence_check()) { > 1358: verify_anti_dependences(block, n); > 1359: if (C->failing()) { I don't see where `record_failure()` is called from `verify_anti_dependences()`. src/hotspot/share/opto/gcm.cpp line 1535: > 1533: // defs in new LCA block. > 1534: LCA = insert_anti_dependences(LCA, self); > 1535: if (C->failing()) { Same here. I don't see where `record_failure()` is called from `insert_anti_dependences()`. And there are a lot of assets there. ------------- PR Review: https://git.openjdk.org/jdk/pull/22482#pullrequestreview-2473737363 PR Review Comment: https://git.openjdk.org/jdk/pull/22482#discussion_r1866499303 PR Review Comment: https://git.openjdk.org/jdk/pull/22482#discussion_r1866500091 From dlong at openjdk.org Mon Dec 2 22:22:41 2024 From: dlong at openjdk.org (Dean Long) Date: Mon, 2 Dec 2024 22:22:41 GMT Subject: RFR: 8343789: Move mutable nmethod data out of CodeCache [v2] In-Reply-To: References: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> Message-ID: On Thu, 28 Nov 2024 21:06:56 GMT, Boris Ulasevich wrote: >> This change relocates mutable data (such as relocations, oops, and metadata) from the nmethod. The change follows the recent PR #18984, which relocated immutable nmethod data from the CodeCache. >> >> The core idea remains the same: use the CodeCache for executable code while moving additional data to the C heap. The primary motivations are improving security and enhancing code density. >> >> Although performance is not the main focus, testing on AArch64 CPUs, where code density plays a significant role, has shown a 1?2% performance improvement in specific scenarios, such as the CodeCacheStress test and the Renaissance Dotty benchmark. >> >> The numbers. Immutable data constitutes **~30%** on the nmehtod. Mutable data constitutes **~8%** of nmethod. Example (statistics collected on the CodeCacheStress benchmark): >> - nmethod_count:134000, total_compilation_time: 510460ms >> - total allocation time malloc_mutable/malloc_immutable/CodeCache_alloc: 62ms/114ms/6333ms, >> - total allocation size (mutable/immutable/nmentod): 64MB/192MB/488MB >> >> Functional testing: jtreg on arm/aarch/x86. >> Performance testing: renaissance/dacapo/SPECjvm2008 benchmarks. >> >> Alternative solution (see comments): In the future, relocations can be moved to _immutable_data. > > Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision: > > rework movoop for not_supports_instruction_patching case: correcting in ldr_constant and relocations fixup src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 1423: > 1421: } else { > 1422: uint64_t offset; > 1423: adrp(dest, const_addr, offset); I don't see how this ADRP path ever gets called now. The only caller is in MacroAssembler::movoop(), which uses a dummy Address in the CodeCache. I think we need to force near/far with an extra bool parameter. The way this function is currently used, a better name might be ldr_patchable(). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21276#discussion_r1866699921 From dlong at openjdk.org Tue Dec 3 00:00:40 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 3 Dec 2024 00:00:40 GMT Subject: RFR: 8344304: [s390x] ubsan: negation of -2147483648 cannot be represented in type 'int' [v2] In-Reply-To: References: Message-ID: On Sat, 30 Nov 2024 03:10:33 GMT, Amit Kumar wrote: >> fixes the issue reported by ubsan. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > cover lir_add as well src/hotspot/cpu/s390/c1_LIRAssembler_s390.cpp line 1540: > 1538: } else { > 1539: __ z_afi(lreg, c); > 1540: } It seems like it would be better to have code like this in a helper function, instead of making every call site repeat the pattern. Can you use add2reg() here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22456#discussion_r1866779410 From dlong at openjdk.org Tue Dec 3 00:00:41 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 3 Dec 2024 00:00:41 GMT Subject: RFR: 8344304: [s390x] ubsan: negation of -2147483648 cannot be represented in type 'int' [v2] In-Reply-To: References: <1PSt0vETd6HA2UFCDJbHadPyiNmHeqJ-t-FuHriAU4k=.73971a5d-f7ac-4df3-9be6-8e7e3e419939@github.com> Message-ID: On Fri, 29 Nov 2024 13:58:08 GMT, Amit Kumar wrote: >> src/hotspot/cpu/s390/c1_LIRAssembler_s390.cpp line 1542: >> >>> 1540: __ z_slfi(lreg, c); >>> 1541: } >>> 1542: break; >> >> Would it be simpler to use `java_negate(c)` (from globalDefinitions.hpp)? > > Not sure of that actually. I didn't even know that there exists such helper method. Thanks for making me aware. > > I updated current solution in accordance with GCC compiler. So Z don't have a `shi` instruction which can handle 16-bit numbers, so GCC negates the number and adds it with `ahi` instruction. Then for number upto 32bits, `slfi` instruction is emitted for subtraction. > > @RealLucy do you have other thoughts on this ? I agree, ` __ z_afi(lreg, java_negate(c));` reads better. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22456#discussion_r1866781325 From dlong at openjdk.org Tue Dec 3 00:02:41 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 3 Dec 2024 00:02:41 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 In-Reply-To: References: Message-ID: On Tue, 1 Oct 2024 21:26:41 GMT, Vladimir Ivanov wrote: >> @dean-long, may be I misunderstand your statement. Are you re-writing the fix or keep current? > > I like @vnkozlov suggestion to null out `cha_monomorphic_target`. Moreover, the validation can be performed inside `ciMethod::find_monomorphic_target()` which is used to compute `cha_monomorphic_target`. Thanks @iwanowww. @vnkozlov , please take another look when you get a chance. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21148#issuecomment-2513225983 From dlong at openjdk.org Tue Dec 3 00:30:38 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 3 Dec 2024 00:30:38 GMT Subject: RFR: 8345159: RISCV: Fix -Wzero-as-null-pointer-constant warning in emit_static_call_stub In-Reply-To: References: Message-ID: <0k7ZOn2etTZAnrZZcGi1QMp6sZX36cGqPIlitEnQ4Fw=.693fc951-de7b-44a2-8f00-dd208515c260@github.com> On Thu, 28 Nov 2024 12:05:28 GMT, Kim Barrett wrote: > Please review this change to RISCV code to remove a > -Wzero-as-null-pointer-constant warning in MacroAssembler::emit_static_call_stub. > > It was calling MacroAssembler::movptr with the second (address) argument being > a literal 0. Rather than changing it to use nullptr for that argument, I've > instead changed it to call the movptr2 helper function, which takes the target > address as a unint64_t. This eliminates the conversion of 0 to a pointer and > then back to an integer 0. It seemed to me more natural to use that helper > directly, as it was presumed that was what ended up being called anyway. But > the riscv porters should weigh in on whether that's a good approach to dealing > with this case. > > Testing: GHA sanity tests, which includes building for linux-riscv64. I don't > have the capability to run tests for this platform, so hoping someone from the > riscv porters can do more testing. src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 788: > 786: // Jump to the entry point of the c2i stub. > 787: int32_t offset = 0; > 788: movptr2(t1, 0, offset, t0); // lui + lui + slli + add How about something like this? Suggestion: address placeholder = pc(); // correct value will be patched in later movptr2(t1, placeholder, offset, t0); // lui + lui + slli + add ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22435#discussion_r1866802461 From kvn at openjdk.org Tue Dec 3 02:18:46 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 3 Dec 2024 02:18:46 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 [v9] In-Reply-To: References: Message-ID: On Tue, 26 Nov 2024 17:21:15 GMT, Dean Long wrote: >> This PR changes ciMethod::equals() to a special-purpose debug helper method for the one place in C1 that uses it in an assert. The reason why making it general purpose is difficult is because JVMTI can add and delete methods. See the bug report and JDK-8338471 for more details. I'm open to suggestions for a better name than equals_ignore_version(). >> >> An alternative approach, which I think may actually be better, would be to check for old methods first, and bail out if we see any. Then we can change the assert back to how it was originally, using ==. > > Dean Long has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: > > - redo without bailout > - Merge remote-tracking branch 'origin/master' into 8340141 > - add missing bailout checks > - C1 fix > - remove blank line > - Merge master > - bail out on old methods > - redo VM state > - fix errors > - make sure to be in VM state when checking is_old > - ... and 2 more: https://git.openjdk.org/jdk/compare/4d4cef80...7a7bdb86 This version looks good to me too. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21148#pullrequestreview-2474347310 From kxu at openjdk.org Tue Dec 3 02:37:50 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Tue, 3 Dec 2024 02:37:50 GMT Subject: RFR: 8336759: C2: int counted loop with long limit not recognized as counted loop Message-ID: <_d_CiLfCN9ahEmhp9fLcGqO-L8n7a0gW86R3lzLkX60=.b3bdc697-cb2d-477f-a525-0f16a3eee383@github.com> This patch implements [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759) that recognizes int counted loops with long limits. Currently, patterns like `for ( int i =...; i < long_limit; ...)` where int `i` is implicitly promoted to long (i.e., `(long) i < long_limit`) is not recognized as (int) counted loop. This patch speculatively and optimistically converts long limits to ints and deoptimize if the limit is outside int range, allowing more optimization opportunities. In other words, it transforms for (int i = 0; (long) i < long_limit; i++) {...} to if (int_min <= long_limit && long_limit <= int_max ) { for (int i = 0; i < (int) long_limit; i++) {...} } else { trap: loop_limit_check } This could benefit calls to APIs like `long MemorySegment#byteSize()` when iterating over a long limit. ------------- Commit messages: - wrap a line at 80 chars - register new node with optimizer - update testLimitNotInvariant - fix typo in comments - update parse predicate check, over/underflow detection, tests - uncomment tests - update tests - update comments - add TestIntLoopLongLimit.java - Merge branch 'openjdk:master' into 8336759-int-loop-long-limit - ... and 9 more: https://git.openjdk.org/jdk/compare/df2d4c15...4a7d03fe Changes: https://git.openjdk.org/jdk/pull/22449/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22449&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8336759 Stats: 293 lines in 4 files changed: 283 ins; 1 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/22449.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22449/head:pull/22449 PR: https://git.openjdk.org/jdk/pull/22449 From kxu at openjdk.org Tue Dec 3 02:37:51 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Tue, 3 Dec 2024 02:37:51 GMT Subject: RFR: 8336759: C2: int counted loop with long limit not recognized as counted loop In-Reply-To: <_d_CiLfCN9ahEmhp9fLcGqO-L8n7a0gW86R3lzLkX60=.b3bdc697-cb2d-477f-a525-0f16a3eee383@github.com> References: <_d_CiLfCN9ahEmhp9fLcGqO-L8n7a0gW86R3lzLkX60=.b3bdc697-cb2d-477f-a525-0f16a3eee383@github.com> Message-ID: On Fri, 29 Nov 2024 01:08:04 GMT, Kangcheng Xu wrote: > This patch implements [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759) that recognizes int counted loops with long limits. > > Currently, patterns like `for ( int i =...; i < long_limit; ...)` where int `i` is implicitly promoted to long (i.e., `(long) i < long_limit`) is not recognized as (int) counted loop. This patch speculatively and optimistically converts long limits to ints and deoptimize if the limit is outside int range, allowing more optimization opportunities. > > In other words, it transforms > > > for (int i = 0; (long) i < long_limit; i++) {...} > > > to > > > if (int_min <= long_limit && long_limit <= int_max ) { > for (int i = 0; i < (int) long_limit; i++) {...} > } else { > trap: loop_limit_check > } > > > This could benefit calls to APIs like `long MemorySegment#byteSize()` when iterating over a long limit. C2 changes Passes `tier1 hotspot_compiler tier1_compiler tier2_compiler tier3_compiler tier1_compiler_not_xcomp` on x86_64 Linux. Will open this PR for formal review once GHA jcheck confirms passing. Thank you very much! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22449#issuecomment-2506910167 PR Comment: https://git.openjdk.org/jdk/pull/22449#issuecomment-2513040792 From roland at openjdk.org Tue Dec 3 02:37:52 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 3 Dec 2024 02:37:52 GMT Subject: RFR: 8336759: C2: int counted loop with long limit not recognized as counted loop In-Reply-To: <_d_CiLfCN9ahEmhp9fLcGqO-L8n7a0gW86R3lzLkX60=.b3bdc697-cb2d-477f-a525-0f16a3eee383@github.com> References: <_d_CiLfCN9ahEmhp9fLcGqO-L8n7a0gW86R3lzLkX60=.b3bdc697-cb2d-477f-a525-0f16a3eee383@github.com> Message-ID: <8rkY_7RHRZjoTmy_O5Z7xunvC_gUSKR-xJpn1rwgJVo=.97ea144d-c93b-49fb-869a-a30f2a88c472@github.com> On Fri, 29 Nov 2024 01:08:04 GMT, Kangcheng Xu wrote: > This patch implements [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759) that recognizes int counted loops with long limits. > > Currently, patterns like `for ( int i =...; i < long_limit; ...)` where int `i` is implicitly promoted to long (i.e., `(long) i < long_limit`) is not recognized as (int) counted loop. This patch speculatively and optimistically converts long limits to ints and deoptimize if the limit is outside int range, allowing more optimization opportunities. > > In other words, it transforms > > > for (int i = 0; (long) i < long_limit; i++) {...} > > > to > > > if (int_min <= long_limit && long_limit <= int_max ) { > for (int i = 0; i < (int) long_limit; i++) {...} > } else { > trap: loop_limit_check > } > > > This could benefit calls to APIs like `long MemorySegment#byteSize()` when iterating over a long limit. src/hotspot/share/opto/loopnode.cpp line 1660: > 1658: Node* new_cmp = _igvn.register_new_node_with_optimizer( > 1659: cmp->in(1) == incr ? new CmpINode(new_incr, new_limit) : new CmpINode(new_limit, new_incr), > 1660: // new CmpINode(new_incr, new_limit), That shouldn't be here. src/hotspot/share/opto/loopnode.cpp line 1668: > 1666: > 1667: _igvn.rehash_node_delayed(bol); > 1668: bol->replace_edge(cmp, new_cmp, &_igvn); You should use: `PhaseIdealLoop::insert_loop_limit_check_predicate()`. `do_is_counted_loop()` may need to add more loop limit checks and the way you've implemented it, I think you're replacing the place holder predicate that we use to create other ones. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22449#discussion_r1863786149 PR Review Comment: https://git.openjdk.org/jdk/pull/22449#discussion_r1863787694 From roland at openjdk.org Tue Dec 3 02:37:52 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 3 Dec 2024 02:37:52 GMT Subject: RFR: 8336759: C2: int counted loop with long limit not recognized as counted loop In-Reply-To: <8rkY_7RHRZjoTmy_O5Z7xunvC_gUSKR-xJpn1rwgJVo=.97ea144d-c93b-49fb-869a-a30f2a88c472@github.com> References: <_d_CiLfCN9ahEmhp9fLcGqO-L8n7a0gW86R3lzLkX60=.b3bdc697-cb2d-477f-a525-0f16a3eee383@github.com> <8rkY_7RHRZjoTmy_O5Z7xunvC_gUSKR-xJpn1rwgJVo=.97ea144d-c93b-49fb-869a-a30f2a88c472@github.com> Message-ID: On Fri, 29 Nov 2024 17:05:53 GMT, Roland Westrelin wrote: >> This patch implements [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759) that recognizes int counted loops with long limits. >> >> Currently, patterns like `for ( int i =...; i < long_limit; ...)` where int `i` is implicitly promoted to long (i.e., `(long) i < long_limit`) is not recognized as (int) counted loop. This patch speculatively and optimistically converts long limits to ints and deoptimize if the limit is outside int range, allowing more optimization opportunities. >> >> In other words, it transforms >> >> >> for (int i = 0; (long) i < long_limit; i++) {...} >> >> >> to >> >> >> if (int_min <= long_limit && long_limit <= int_max ) { >> for (int i = 0; i < (int) long_limit; i++) {...} >> } else { >> trap: loop_limit_check >> } >> >> >> This could benefit calls to APIs like `long MemorySegment#byteSize()` when iterating over a long limit. > > src/hotspot/share/opto/loopnode.cpp line 1668: > >> 1666: >> 1667: _igvn.rehash_node_delayed(bol); >> 1668: bol->replace_edge(cmp, new_cmp, &_igvn); > > You should use: `PhaseIdealLoop::insert_loop_limit_check_predicate()`. `do_is_counted_loop()` may need to add more loop limit checks and the way you've implemented it, I think you're replacing the place holder predicate that we use to create other ones. Actually you're not checking that it's a loop limit check so you could be replacing something else. You need to follow the pattern used elsewhere i.e.: https://github.com/openjdk/jdk/blob/2beb2b602bf20f1ec36e6244eca1a2eb50baccb4/src/hotspot/share/opto/loopnode.cpp#L2011 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22449#discussion_r1863789968 From roland at openjdk.org Tue Dec 3 02:37:53 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 3 Dec 2024 02:37:53 GMT Subject: RFR: 8336759: C2: int counted loop with long limit not recognized as counted loop In-Reply-To: References: <_d_CiLfCN9ahEmhp9fLcGqO-L8n7a0gW86R3lzLkX60=.b3bdc697-cb2d-477f-a525-0f16a3eee383@github.com> <8rkY_7RHRZjoTmy_O5Z7xunvC_gUSKR-xJpn1rwgJVo=.97ea144d-c93b-49fb-869a-a30f2a88c472@github.com> Message-ID: <60ukqI842Q5FodQ-bTezFV42wEYrfMpvrdaKVnb2oIs=.ebebfdc2-82d8-4a59-aab3-d8b22063ee8b@github.com> On Fri, 29 Nov 2024 17:09:08 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/loopnode.cpp line 1668: >> >>> 1666: >>> 1667: _igvn.rehash_node_delayed(bol); >>> 1668: bol->replace_edge(cmp, new_cmp, &_igvn); >> >> You should use: `PhaseIdealLoop::insert_loop_limit_check_predicate()`. `do_is_counted_loop()` may need to add more loop limit checks and the way you've implemented it, I think you're replacing the place holder predicate that we use to create other ones. > > Actually you're not checking that it's a loop limit check so you could be replacing something else. You need to follow the pattern used elsewhere i.e.: https://github.com/openjdk/jdk/blob/2beb2b602bf20f1ec36e6244eca1a2eb50baccb4/src/hotspot/share/opto/loopnode.cpp#L2011 FTR, we discussed this offline and I was looking at the wrong location. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22449#discussion_r1866129649 From qamai at openjdk.org Tue Dec 3 02:37:53 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 3 Dec 2024 02:37:53 GMT Subject: RFR: 8336759: C2: int counted loop with long limit not recognized as counted loop In-Reply-To: <_d_CiLfCN9ahEmhp9fLcGqO-L8n7a0gW86R3lzLkX60=.b3bdc697-cb2d-477f-a525-0f16a3eee383@github.com> References: <_d_CiLfCN9ahEmhp9fLcGqO-L8n7a0gW86R3lzLkX60=.b3bdc697-cb2d-477f-a525-0f16a3eee383@github.com> Message-ID: On Fri, 29 Nov 2024 01:08:04 GMT, Kangcheng Xu wrote: > This patch implements [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759) that recognizes int counted loops with long limits. > > Currently, patterns like `for ( int i =...; i < long_limit; ...)` where int `i` is implicitly promoted to long (i.e., `(long) i < long_limit`) is not recognized as (int) counted loop. This patch speculatively and optimistically converts long limits to ints and deoptimize if the limit is outside int range, allowing more optimization opportunities. > > In other words, it transforms > > > for (int i = 0; (long) i < long_limit; i++) {...} > > > to > > > if (int_min <= long_limit && long_limit <= int_max ) { > for (int i = 0; i < (int) long_limit; i++) {...} > } else { > trap: loop_limit_check > } > > > This could benefit calls to APIs like `long MemorySegment#byteSize()` when iterating over a long limit. src/hotspot/share/opto/loopnode.cpp line 1679: > 1677: } > 1678: > 1679: // Optimistically assume limit in within int range, but add guards and traps to loop_limit_check. You can merge these 2 checks into `CmpL(limit, ConvI2L(new_limit))`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22449#discussion_r1863429887 From kxu at openjdk.org Tue Dec 3 02:37:53 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Tue, 3 Dec 2024 02:37:53 GMT Subject: RFR: 8336759: C2: int counted loop with long limit not recognized as counted loop In-Reply-To: References: <_d_CiLfCN9ahEmhp9fLcGqO-L8n7a0gW86R3lzLkX60=.b3bdc697-cb2d-477f-a525-0f16a3eee383@github.com> Message-ID: On Fri, 29 Nov 2024 12:08:41 GMT, Quan Anh Mai wrote: >> This patch implements [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759) that recognizes int counted loops with long limits. >> >> Currently, patterns like `for ( int i =...; i < long_limit; ...)` where int `i` is implicitly promoted to long (i.e., `(long) i < long_limit`) is not recognized as (int) counted loop. This patch speculatively and optimistically converts long limits to ints and deoptimize if the limit is outside int range, allowing more optimization opportunities. >> >> In other words, it transforms >> >> >> for (int i = 0; (long) i < long_limit; i++) {...} >> >> >> to >> >> >> if (int_min <= long_limit && long_limit <= int_max ) { >> for (int i = 0; i < (int) long_limit; i++) {...} >> } else { >> trap: loop_limit_check >> } >> >> >> This could benefit calls to APIs like `long MemorySegment#byteSize()` when iterating over a long limit. > > src/hotspot/share/opto/loopnode.cpp line 1679: > >> 1677: } >> 1678: >> 1679: // Optimistically assume limit in within int range, but add guards and traps to loop_limit_check. > > You can merge these 2 checks into `CmpL(limit, ConvI2L(new_limit))`. Thanks for reviewing! Good point. Code updated. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22449#discussion_r1866288963 From kvn at openjdk.org Tue Dec 3 03:13:36 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 3 Dec 2024 03:13:36 GMT Subject: RFR: 8345219: C2: x86_64 should not go to interpreter stubs for NaNs handling In-Reply-To: References: Message-ID: On Thu, 28 Nov 2024 18:22:24 GMT, Aleksey Shipilev wrote: > Found this while cleaning up x86_32 code for removal. > > In our current code there is a block added by [JDK-8076373](https://bugs.openjdk.org/browse/JDK-8076373): > https://github.com/openjdk/jdk/blob/3b21a298c29d88720f6bfb2dc1f3305b6a3db307/src/hotspot/share/compiler/compileBroker.cpp#L1451-L1473 > > Ostensibly, that block is for x86_32 handling of signalling NaNs -- x87 FPU has a peculiarity with them. See other funky bugs we seen with it: [JDK-8285985](https://bugs.openjdk.org/browse/JDK-8285985), [JDK-8293991](https://bugs.openjdk.org/browse/JDK-8293991). > > But the way current block is coded, it is enabled for X86 wholesale, which also means x86_64! In fact, it is likely even worse on x86_64, because the related "fast" entries are generated only for x86_32: > https://github.com/openjdk/jdk/blob/3b21a298c29d88720f6bfb2dc1f3305b6a3db307/src/hotspot/share/interpreter/templateInterpreterGenerator.cpp#L493-L502 > > This can be solved by checking `IA32` instead of `X86`. This block would be gone completely once we remove x86_32 port. Meanwhile, we can make it right by x86_64, and make eventual x86_32 removal less confusing. This issue seems to only affect the compilation of native methods, while most of the hot code is riding on compiler intrinsics. I'll put performance data in comments. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` This fix looks correct by looking on original changes. Yes, special Interpreter's code for these methods is only generated for x86_32 in `templateInterpreterGenerator_x86_32.cpp`. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22446#pullrequestreview-2474463057 From vlivanov at openjdk.org Tue Dec 3 03:28:38 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 3 Dec 2024 03:28:38 GMT Subject: RFR: 8330851: C2: More efficient TypeFunc creation [v5] In-Reply-To: References: Message-ID: On Fri, 29 Nov 2024 01:15:18 GMT, Amit Kumar wrote: >> Lazy computation of TypeFunc. >> >> Testing: Tier1 on Fastdebug & Release VMs (`s390x architecture`) > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > test fix Thanks, Amit. > Most of them are now moved into OptoRuntime::initialize_types() method. And their *_init method is called from there. The other one are you talking about the class specific ones ? Or something else I need to update in this PR ? I was specifically pointing at the distinction between `OptoRuntime` entries which are materialized using `C2_STUBS_DO` and those explicitly declared there. Take a look at `OptoRuntime::_new_instance_Java` (and related code). I believe TypeFunc caching should be handled by the same machinery for C2 runtime stubs. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21782#issuecomment-2513469286 From vlivanov at openjdk.org Tue Dec 3 03:34:41 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 3 Dec 2024 03:34:41 GMT Subject: RFR: 8330851: C2: More efficient TypeFunc creation [v5] In-Reply-To: References: Message-ID: On Fri, 29 Nov 2024 01:15:18 GMT, Amit Kumar wrote: >> Lazy computation of TypeFunc. >> >> Testing: Tier1 on Fastdebug & Release VMs (`s390x architecture`) > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > test fix src/hotspot/share/gc/shenandoah/c2/shenandoahBarrierSetC2.cpp line 438: > 436: > 437: const TypeFunc* ShenandoahBarrierSetC2::write_ref_field_pre_Type() { > 438: return OptoRuntime::write_ref_field_pre_Type(); Please, keep them local to `ShenandoahBarrierSetC2`. Otherwise, you need to guard it with `INCLUDE_SHENANDOAHGC` there. src/hotspot/share/opto/runtime.cpp line 195: > 193: // #undef gen > 194: > 195: const TypeFunc* OptoRuntime::_new_instance_tf = nullptr; Please, use `*_Type` pattern to name `TypeFunc` caches. src/hotspot/share/opto/runtime.cpp line 2114: > 2112: } > 2113: > 2114: void OptoRuntime::initialize_types() { For non-C2 runtime stubs, please, turn factories into local static functions and directly initialize cache fields in `OptoRuntime::initialize_types()`. As an example: https://github.com/openjdk/jdk/commit/c343b56249f223a72cef4671cd5de15406387e3d ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21782#discussion_r1866978354 PR Review Comment: https://git.openjdk.org/jdk/pull/21782#discussion_r1866983988 PR Review Comment: https://git.openjdk.org/jdk/pull/21782#discussion_r1866986028 From vlivanov at openjdk.org Tue Dec 3 03:41:39 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 3 Dec 2024 03:41:39 GMT Subject: RFR: 8330851: C2: More efficient TypeFunc creation [v2] In-Reply-To: <2jyj0PGAJAvyE3zmwQqaDdjaZsMtYkVjdI7bhzdz1FE=.ced468e6-5450-4c4f-8597-e882d7f65380@github.com> References: <2jyj0PGAJAvyE3zmwQqaDdjaZsMtYkVjdI7bhzdz1FE=.ced468e6-5450-4c4f-8597-e882d7f65380@github.com> Message-ID: On Thu, 28 Nov 2024 13:18:01 GMT, Amit Kumar wrote: > Okay I have increased the coverage. But still some of them I left alone as they are class-specific, they were accepting argument and using that in the Type creation. Indeed. It doesn't work for `OptoRuntime::Math_Vector_Vector_Type()`, but `alloc_type(const Type* t)` you mentioned can be specialized: it is only called as`AllocateNode::alloc_type(Type::TOP)` and `AllocateArrayNode::alloc_type(TypeInt::INT)`. FTR I'm perfectly fine with leaving it for a future cleanup. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21782#issuecomment-2513479968 From fyang at openjdk.org Tue Dec 3 03:43:37 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 3 Dec 2024 03:43:37 GMT Subject: RFR: 8345159: RISCV: Fix -Wzero-as-null-pointer-constant warning in emit_static_call_stub In-Reply-To: <0k7ZOn2etTZAnrZZcGi1QMp6sZX36cGqPIlitEnQ4Fw=.693fc951-de7b-44a2-8f00-dd208515c260@github.com> References: <0k7ZOn2etTZAnrZZcGi1QMp6sZX36cGqPIlitEnQ4Fw=.693fc951-de7b-44a2-8f00-dd208515c260@github.com> Message-ID: On Tue, 3 Dec 2024 00:28:14 GMT, Dean Long wrote: >> Please review this change to RISCV code to remove a >> -Wzero-as-null-pointer-constant warning in MacroAssembler::emit_static_call_stub. >> >> It was calling MacroAssembler::movptr with the second (address) argument being >> a literal 0. Rather than changing it to use nullptr for that argument, I've >> instead changed it to call the movptr2 helper function, which takes the target >> address as a unint64_t. This eliminates the conversion of 0 to a pointer and >> then back to an integer 0. It seemed to me more natural to use that helper >> directly, as it was presumed that was what ended up being called anyway. But >> the riscv porters should weigh in on whether that's a good approach to dealing >> with this case. >> >> Testing: GHA sanity tests, which includes building for linux-riscv64. I don't >> have the capability to run tests for this platform, so hoping someone from the >> riscv porters can do more testing. > > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 788: > >> 786: // Jump to the entry point of the c2i stub. >> 787: int32_t offset = 0; >> 788: movptr2(t1, 0, offset, t0); // lui + lui + slli + add > > How about something like this? > Suggestion: > > address placeholder = pc(); // correct value will be patched in later > movptr2(t1, placeholder, offset, t0); // lui + lui + slli + add That seems OK to me. Then we can still use `movptr` here. address placeholder = pc(); // correct value will be patched in later movptr(t1, placeholder, offset, t0); // lui + lui + slli + add ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22435#discussion_r1866991376 From epeter at openjdk.org Tue Dec 3 06:34:44 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Dec 2024 06:34:44 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v17] In-Reply-To: <9AO7AsQv3puIVURjKB7wvQCWcMO2ZG4gpUJxpxTLghw=.28527eeb-1ec3-42cf-b714-ccd5f5541e71@github.com> References: <9AO7AsQv3puIVURjKB7wvQCWcMO2ZG4gpUJxpxTLghw=.28527eeb-1ec3-42cf-b714-ccd5f5541e71@github.com> Message-ID: On Mon, 2 Dec 2024 16:38:03 GMT, theoweidmannoracle wrote: >> This PR introduces >> - several new optimizations to unsigned division and modulo >> - x % 1, x % x, x % 2^k >> - x / 1, x / x, x / 2^k >> - does not implement the Granlund and Montgomery algorithm, which has been implemented for signed modulo division in the past. It is unclear if a lot is to be gained by implementing this. >> - tests to test existing optimizations for signed division and modulo >> - does not test the Granlund and Montgomery algorithm directly > > theoweidmannoracle has updated the pull request incrementally with two additional commits since the last revision: > > - Update UDivINodeIdealizationTests.java > - Remove more magic numbers Thanks for the updates, great work! Yeah, the templates seem to be the better solution indeed :) Ship it ? (Though please hold off until RDP1 on Dec 5, so that it only goes into JDK25 and we have more time if things break) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22061#pullrequestreview-2474699695 From amitkumar at openjdk.org Tue Dec 3 06:42:55 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 3 Dec 2024 06:42:55 GMT Subject: RFR: 8344304: [s390x] ubsan: negation of -2147483648 cannot be represented in type 'int' [v3] In-Reply-To: References: Message-ID: > fixes the issue reported by ubsan. Amit Kumar has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into ubsan_new - cover lir_add as well - updates instruction ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22456/files - new: https://git.openjdk.org/jdk/pull/22456/files/5bdf265a..bc70f092 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22456&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22456&range=01-02 Stats: 30071 lines in 308 files changed: 24954 ins; 3122 del; 1995 mod Patch: https://git.openjdk.org/jdk/pull/22456.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22456/head:pull/22456 PR: https://git.openjdk.org/jdk/pull/22456 From rehn at openjdk.org Tue Dec 3 06:44:38 2024 From: rehn at openjdk.org (Robbin Ehn) Date: Tue, 3 Dec 2024 06:44:38 GMT Subject: RFR: 8345159: RISCV: Fix -Wzero-as-null-pointer-constant warning in emit_static_call_stub In-Reply-To: References: <0k7ZOn2etTZAnrZZcGi1QMp6sZX36cGqPIlitEnQ4Fw=.693fc951-de7b-44a2-8f00-dd208515c260@github.com> Message-ID: On Tue, 3 Dec 2024 03:40:42 GMT, Fei Yang wrote: >> src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 788: >> >>> 786: // Jump to the entry point of the c2i stub. >>> 787: int32_t offset = 0; >>> 788: movptr2(t1, 0, offset, t0); // lui + lui + slli + add >> >> How about something like this? >> Suggestion: >> >> address placeholder = pc(); // correct value will be patched in later >> movptr2(t1, placeholder, offset, t0); // lui + lui + slli + add > > That seems OK to me. Then we can still use `movptr` here. > > address placeholder = pc(); // correct value will be patched in later > movptr(t1, placeholder, offset, t0); // lui + lui + slli + add When looking at the disassembly it much easier to find uninitialized call stubs when we use 0. Also if something would go wrong, it much nicer with a crash than a loop. So for debuggability I prefer address 0. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22435#discussion_r1867134766 From amitkumar at openjdk.org Tue Dec 3 06:59:40 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 3 Dec 2024 06:59:40 GMT Subject: RFR: 8344304: [s390x] ubsan: negation of -2147483648 cannot be represented in type 'int' [v2] In-Reply-To: References: Message-ID: On Mon, 2 Dec 2024 23:54:34 GMT, Dean Long wrote: >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> cover lir_add as well > > src/hotspot/cpu/s390/c1_LIRAssembler_s390.cpp line 1540: > >> 1538: } else { >> 1539: __ z_afi(lreg, c); >> 1540: } > > It seems like it would be better to have code like this in a helper function, instead of making every call site repeat the pattern. Can you use add2reg() here? add2reg will emit `z_aghi` instruction which is for 64 bit (register) <- 16 bit (immediate). `z_ahi` and `z_afi` are both 32 bit (register) < - 16 bit (immediate), 32 bit (register) <- 32 bit (immediate). So I guess these are better here. Though we can move the logic to new method add2reg_32() method which will do the check and emit correct instruction. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22456#discussion_r1867147830 From shade at openjdk.org Tue Dec 3 07:12:40 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 3 Dec 2024 07:12:40 GMT Subject: RFR: 8345219: C2: x86_64 should not go to interpreter stubs for NaNs handling In-Reply-To: References: Message-ID: On Tue, 3 Dec 2024 03:10:51 GMT, Vladimir Kozlov wrote: > This fix looks correct by looking on original changes. Thank you! I need one more (R)eviewer :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/22446#issuecomment-2513723257 From amitkumar at openjdk.org Tue Dec 3 09:13:15 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 3 Dec 2024 09:13:15 GMT Subject: RFR: 8344304: [s390x] ubsan: negation of -2147483648 cannot be represented in type 'int' [v4] In-Reply-To: References: Message-ID: <66c7eQ4Od0SqoYj7uIDis_Gjw1ZpU9NdKHek4yBiF5M=.f1c160d8-3595-416f-ac74-1125d381fb12@github.com> > fixes the issue reported by ubsan. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: move logic to add2reg_32 & use java_negate() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22456/files - new: https://git.openjdk.org/jdk/pull/22456/files/bc70f092..608cec70 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22456&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22456&range=02-03 Stats: 24 lines in 3 files changed: 13 ins; 8 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/22456.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22456/head:pull/22456 PR: https://git.openjdk.org/jdk/pull/22456 From chagedorn at openjdk.org Tue Dec 3 09:40:44 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 3 Dec 2024 09:40:44 GMT Subject: RFR: 8336759: C2: int counted loop with long limit not recognized as counted loop In-Reply-To: <_d_CiLfCN9ahEmhp9fLcGqO-L8n7a0gW86R3lzLkX60=.b3bdc697-cb2d-477f-a525-0f16a3eee383@github.com> References: <_d_CiLfCN9ahEmhp9fLcGqO-L8n7a0gW86R3lzLkX60=.b3bdc697-cb2d-477f-a525-0f16a3eee383@github.com> Message-ID: On Fri, 29 Nov 2024 01:08:04 GMT, Kangcheng Xu wrote: > This patch implements [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759) that recognizes int counted loops with long limits. > > Currently, patterns like `for ( int i =...; i < long_limit; ...)` where int `i` is implicitly promoted to long (i.e., `(long) i < long_limit`) is not recognized as (int) counted loop. This patch speculatively and optimistically converts long limits to ints and deoptimize if the limit is outside int range, allowing more optimization opportunities. > > In other words, it transforms > > > for (int i = 0; (long) i < long_limit; i++) {...} > > > to > > > if (int_min <= long_limit && long_limit <= int_max ) { > for (int i = 0; i < (int) long_limit; i++) {...} > } else { > trap: loop_limit_check > } > > > This could benefit calls to APIs like `long MemorySegment#byteSize()` when iterating over a long limit. Nice enhancement! Couple of mostly minor comments. src/hotspot/share/opto/loopnode.cpp line 1625: > 1623: // counted loop but deoptimize if the limit is out of int range. > 1624: // This is common pattern with "for (int i =...; i < long_limit; ...)" where int "i" is implicitly promoted to long, > 1625: // i.e., "(long) i < long_limit", and therefore making it not an int counted loop without this transformation. Maybe add here the visualization from the PR which makes it easier to grasp, something like: In summary, we transform for (int i = 0; (long) i < long_limit; i++) {...} to if (int_min <= long_limit && long_limit <= int_max ) { for (int i = 0; i < (int) long_limit; i++) {...} } else { trap: loop_limit_check } src/hotspot/share/opto/loopnode.cpp line 1626: > 1624: // This is common pattern with "for (int i =...; i < long_limit; ...)" where int "i" is implicitly promoted to long, > 1625: // i.e., "(long) i < long_limit", and therefore making it not an int counted loop without this transformation. > 1626: bool PhaseIdealLoop::is_counted_loop_with_speculative_long_limit(Node* x, IdealLoopTree*&loop, BasicType iv_bt) { I suggest to rename `x` to `loop_head` (could also be done in `do_is_counted_loop()`): Suggestion: bool PhaseIdealLoop::is_counted_loop_with_speculative_long_limit(Node* loop_head, IdealLoopTree*&loop, BasicType iv_bt) { src/hotspot/share/opto/loopnode.cpp line 1638: > 1636: Node* init_control = x->in(LoopNode::EntryControl); > 1637: > 1638: // Make sure there is a parse predicate for us to insert the loop limit check. I usually write the predicate names capitalized to better highlight that these are well-defined names: Suggestion: // Make sure there is a Loop Limit Check Parse Predicate for us to insert the Loop Limit Check Predicate above it. src/hotspot/share/opto/loopnode.cpp line 1642: > 1640: const PredicateBlock* loop_limit_check_predicate_block = predicates.loop_limit_check_predicate_block(); > 1641: if (!loop_limit_check_predicate_block->has_parse_predicate()) { > 1642: return false; I suggest to also add here a trace message, similar to what we already do in `is_counted_loop()`: https://github.com/openjdk/jdk/blob/c330b90b9f43f80c322153585fa78704358f0224/src/hotspot/share/opto/loopnode.cpp#L2013-L2023 src/hotspot/share/opto/loopnode.cpp line 1671: > 1669: cmp->in(1) == incr > 1670: ? new CmpINode(new_incr, new_limit) : new CmpINode(new_limit, new_incr), > 1671: cmp); This is a little hard to read. How about: Suggestion: Node* new_cmp = cmp->in(1) == incr ? new CmpINode(new_incr, new_limit) : new CmpINode(new_limit, new_incr); _igvn.register_new_node_with_optimizer(new_cmp, cmp); src/hotspot/share/opto/loopnode.cpp line 1684: > 1682: _igvn.rehash_node_delayed(bol); > 1683: bol->replace_edge(new_cmp, cmp, &_igvn); > 1684: New line can be removed Suggestion: src/hotspot/share/opto/loopnode.cpp line 1696: > 1694: Node* bol_limit = new BoolNode(cmp_limit, BoolTest::eq); > 1695: insert_loop_limit_check_predicate(init_control->as_IfTrue(), cmp_limit, bol_limit); > 1696: New line can be removed Suggestion: src/hotspot/share/opto/loopnode.cpp line 1713: > 1711: Node* limit = nullptr; > 1712: Node* cmp = loop_exit_test(back_control, loop, incr, limit, bt, cl_prob); > 1713: Not sure if that new line was intended Suggestion: src/hotspot/share/opto/loopnode.hpp line 1229: > 1227: bool is_counted_loop_with_speculative_long_limit(Node* x, IdealLoopTree*& loop, BasicType iv_bt); > 1228: private: > 1229: bool do_is_counted_loop(Node* x, IdealLoopTree*& loop, BasicType iv_bt); Was like that before but I think `is_counted_loop()` is a bit misleading, suggesting it's a query but it's actually doing the conversion work. Since you now change these methods anyway, what do you think about the following naming suggestions? - public `try_convert_to_counted_loop()` - calls: private `convert_to_counted_loop()` - call: private `convert_to_counted_loop_with_speculative_long_limit()` test/hotspot/jtreg/compiler/loopopts/TestIntCountedLoopLongLimit.java line 41: > 39: */ > 40: public class TestIntCountedLoopLongLimit { > 41: private static final Random RNG = jdk.test.lib.Utils.getRandomInstance(); I suggest to rather use an `import` instead of a fully qualified name: Suggestion: import jdk.test.lib.Asserts; import jdk.test.lib.Utils; import java.util.Random; /** * @test * @bug 8336759 * @summary test long limits in int counted loops are speculatively converted to int for counted loop * optimizations * @library /test/lib / * @requires vm.compiler2.enabled * @run driver compiler.loopopts.TestIntCountedLoopLongLimit */ public class TestIntCountedLoopLongLimit { private static final Random RNG = Utils.getRandomInstance(); test/hotspot/jtreg/compiler/loopopts/TestIntCountedLoopLongLimit.java line 88: > 86: "testCountedLoopWithSwappedComparisonOperand" }) > 87: public static void runTestSimpleCountedLoops(RunInfo info) { > 88: long limit = RNG.nextLong(0, 1024 * 1024); // Choice a small number to avoid tests taking too long Suggestion: long limit = RNG.nextLong(0, 1024 * 1024); // Choose a small number to avoid tests taking too long test/hotspot/jtreg/compiler/loopopts/TestIntCountedLoopLongLimit.java line 127: > 125: > 126: // Use a larger stride to avoid tests taking too long > 127: private static final int LARGE_STRIDE = Integer.MAX_VALUE / 1024; Can you move this to the top of the class to the other static final field? test/hotspot/jtreg/compiler/loopopts/TestIntCountedLoopLongLimit.java line 179: > 177: } > 178: > 179: private static volatile long SOME_LONG = 42; Same here, can you move this to the top of the class? ------------- PR Review: https://git.openjdk.org/jdk/pull/22449#pullrequestreview-2474951014 PR Review Comment: https://git.openjdk.org/jdk/pull/22449#discussion_r1867278568 PR Review Comment: https://git.openjdk.org/jdk/pull/22449#discussion_r1867318157 PR Review Comment: https://git.openjdk.org/jdk/pull/22449#discussion_r1867324663 PR Review Comment: https://git.openjdk.org/jdk/pull/22449#discussion_r1867330528 PR Review Comment: https://git.openjdk.org/jdk/pull/22449#discussion_r1867342583 PR Review Comment: https://git.openjdk.org/jdk/pull/22449#discussion_r1867306191 PR Review Comment: https://git.openjdk.org/jdk/pull/22449#discussion_r1867305838 PR Review Comment: https://git.openjdk.org/jdk/pull/22449#discussion_r1867305458 PR Review Comment: https://git.openjdk.org/jdk/pull/22449#discussion_r1867304899 PR Review Comment: https://git.openjdk.org/jdk/pull/22449#discussion_r1867349691 PR Review Comment: https://git.openjdk.org/jdk/pull/22449#discussion_r1867351852 PR Review Comment: https://git.openjdk.org/jdk/pull/22449#discussion_r1867353100 PR Review Comment: https://git.openjdk.org/jdk/pull/22449#discussion_r1867353558 From lucy at openjdk.org Tue Dec 3 10:41:39 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Tue, 3 Dec 2024 10:41:39 GMT Subject: RFR: 8344304: [s390x] ubsan: negation of -2147483648 cannot be represented in type 'int' [v4] In-Reply-To: <66c7eQ4Od0SqoYj7uIDis_Gjw1ZpU9NdKHek4yBiF5M=.f1c160d8-3595-416f-ac74-1125d381fb12@github.com> References: <66c7eQ4Od0SqoYj7uIDis_Gjw1ZpU9NdKHek4yBiF5M=.f1c160d8-3595-416f-ac74-1125d381fb12@github.com> Message-ID: On Tue, 3 Dec 2024 09:13:15 GMT, Amit Kumar wrote: >> fixes the issue reported by ubsan. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > move logic to add2reg_32 & use java_negate() Changes requested by lucy (Reviewer). src/hotspot/cpu/s390/macroAssembler_s390.cpp line 683: > 681: z_agfi(r1, imm); > 682: } > 683: I would prefer this method to have the same interface as add2reg(). Of course, that implies a more complex method body. Your choice. ------------- PR Review: https://git.openjdk.org/jdk/pull/22456#pullrequestreview-2475248982 PR Review Comment: https://git.openjdk.org/jdk/pull/22456#discussion_r1867472345 From lucy at openjdk.org Tue Dec 3 10:41:40 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Tue, 3 Dec 2024 10:41:40 GMT Subject: RFR: 8344304: [s390x] ubsan: negation of -2147483648 cannot be represented in type 'int' [v4] In-Reply-To: References: <1PSt0vETd6HA2UFCDJbHadPyiNmHeqJ-t-FuHriAU4k=.73971a5d-f7ac-4df3-9be6-8e7e3e419939@github.com> Message-ID: On Mon, 2 Dec 2024 23:57:32 GMT, Dean Long wrote: >> Not sure of that actually. I didn't even know that there exists such helper method. Thanks for making me aware. >> >> I updated current solution in accordance with GCC compiler. So Z don't have a `shi` instruction which can handle 16-bit numbers, so GCC negates the number and adds it with `ahi` instruction. Then for number upto 32bits, `slfi` instruction is emitted for subtraction. >> >> @RealLucy do you have other thoughts on this ? > > I agree, > ` __ z_afi(lreg, java_negate(c));` > reads better. Couldn't you use add2reg_32() for the subtraction as well? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22456#discussion_r1867465911 From amitkumar at openjdk.org Tue Dec 3 11:34:19 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 3 Dec 2024 11:34:19 GMT Subject: RFR: 8344304: [s390x] ubsan: negation of -2147483648 cannot be represented in type 'int' [v5] In-Reply-To: References: Message-ID: <24gVxH5gD-m8jA3zMiw_rUXs9HbcBYhyjfmJIEle9I4=.85f9f726-15ed-465c-84ed-1cf5893ce415@github.com> > fixes the issue reported by ubsan. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: make similar to add2reg ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22456/files - new: https://git.openjdk.org/jdk/pull/22456/files/608cec70..56a01717 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22456&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22456&range=03-04 Stats: 43 lines in 3 files changed: 37 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/22456.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22456/head:pull/22456 PR: https://git.openjdk.org/jdk/pull/22456 From dlong at openjdk.org Tue Dec 3 11:58:39 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 3 Dec 2024 11:58:39 GMT Subject: RFR: 8344304: [s390x] ubsan: negation of -2147483648 cannot be represented in type 'int' [v5] In-Reply-To: <24gVxH5gD-m8jA3zMiw_rUXs9HbcBYhyjfmJIEle9I4=.85f9f726-15ed-465c-84ed-1cf5893ce415@github.com> References: <24gVxH5gD-m8jA3zMiw_rUXs9HbcBYhyjfmJIEle9I4=.85f9f726-15ed-465c-84ed-1cf5893ce415@github.com> Message-ID: <5JBLpzCNQ5ixdUULsUtbpo9WTJGjjb6m8Yk0p-86ZTk=.3f1bfc21-8ab5-4db1-b94b-00f8f7a5093e@github.com> On Tue, 3 Dec 2024 11:34:19 GMT, Amit Kumar wrote: >> fixes the issue reported by ubsan. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > make similar to add2reg src/hotspot/cpu/s390/macroAssembler_s390.hpp line 160: > 158: // Generic operation r1 := r2 + imm. > 159: void add2reg (Register r1, int64_t imm, Register r2 = noreg); > 160: void add2reg_32(Register r1, int64_t imm, Register r2 = noreg); I don't understand the difference between these two. For both, imm must be a simm_32. I don't think we need add2reg_32. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22456#discussion_r1867585614 From amitkumar at openjdk.org Tue Dec 3 12:08:39 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 3 Dec 2024 12:08:39 GMT Subject: RFR: 8344304: [s390x] ubsan: negation of -2147483648 cannot be represented in type 'int' [v5] In-Reply-To: <5JBLpzCNQ5ixdUULsUtbpo9WTJGjjb6m8Yk0p-86ZTk=.3f1bfc21-8ab5-4db1-b94b-00f8f7a5093e@github.com> References: <24gVxH5gD-m8jA3zMiw_rUXs9HbcBYhyjfmJIEle9I4=.85f9f726-15ed-465c-84ed-1cf5893ce415@github.com> <5JBLpzCNQ5ixdUULsUtbpo9WTJGjjb6m8Yk0p-86ZTk=.3f1bfc21-8ab5-4db1-b94b-00f8f7a5093e@github.com> Message-ID: <8seTJ0PxCI8w6iyvRBYlug6gzOErmDRZSPA23n_N_O4=.3a542bd2-7ede-4071-bb30-ff0e290ba67d@github.com> On Tue, 3 Dec 2024 11:55:53 GMT, Dean Long wrote: >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> make similar to add2reg > > src/hotspot/cpu/s390/macroAssembler_s390.hpp line 160: > >> 158: // Generic operation r1 := r2 + imm. >> 159: void add2reg (Register r1, int64_t imm, Register r2 = noreg); >> 160: void add2reg_32(Register r1, int64_t imm, Register r2 = noreg); > > I don't understand the difference between these two. For both, imm must be a simm_32. I don't think we need add2reg_32. for `add2reg_32` the sum and the first operand, register in this case, are treated as 32 bits signed integers. But for `add2reg`, sum and operands will be treated as `64 bits` signed integers. Immediate value in both case will be 32 bits only. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22456#discussion_r1867599419 From chagedorn at openjdk.org Tue Dec 3 12:50:39 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 3 Dec 2024 12:50:39 GMT Subject: RFR: 8345299: C2: some nodes can still have incorrect control after do_range_check() In-Reply-To: References: Message-ID: On Mon, 2 Dec 2024 13:53:00 GMT, Roland Westrelin wrote: > 8339733 fixed controls for updated pre/main limits during > do_range_check(). However, it missed one issue: > > Control for the new limits is computed in `new_limit_ctrl` for both > pre and main loops. `new_limit_ctrl` is currently initialized from the > pre limit control but it also needs to take the main loop limit > control into account as sometimes, the main loop limit control is > below the pre limit and pre loop entry control. > > 8339733 also introduced a couple bugs. Control of the `Bool`/`Cmp` > nodes are updated for the pre and main loop to `new_limit_ctrl`. But > that's incorrect because `new_limit_ctrl` may be above the pre loop > while the `Bool`/`Cmp` for the pre loop are in the loop (because they > depend on the loop iv) and for the main loop are after the pre loop > (because they depend on the iv out of the pre loop). I fixed this for > the pre loop by setting control for the `Bool`/`Cmp` to be as late as > possible. For the main loop, no change appears to be required as > control computed by c2 is already late enough. I've added an assert > instead. Looks good to me. Do you have a regression test where this leads to an actual failure? ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22485#pullrequestreview-2475587460 From roland at openjdk.org Tue Dec 3 12:53:40 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 3 Dec 2024 12:53:40 GMT Subject: RFR: 8345299: C2: some nodes can still have incorrect control after do_range_check() In-Reply-To: References: Message-ID: On Tue, 3 Dec 2024 12:47:43 GMT, Christian Hagedorn wrote: > Looks good to me. Do you have a regression test where this leads to an actual failure? I don't. This seems unlikely to cause issues with the current code. I ran into this while working on JDK-8275202 where new code I've added is failing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22485#issuecomment-2514477633 From rcastanedalo at openjdk.org Tue Dec 3 12:54:40 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 3 Dec 2024 12:54:40 GMT Subject: RFR: 8345287: C2: live in computation is broken In-Reply-To: References: Message-ID: On Mon, 2 Dec 2024 09:03:50 GMT, Roland Westrelin wrote: > 8234003 (Improve IndexSet iteration) broke live in computation: > > > @@ -273,23 +276,25 @@ void PhaseLive::add_liveout( Block *p, IndexSet *lo, VectorSet &first_pass ) { > // Add a vector of live-in values to a given blocks live-in set. > void PhaseLive::add_livein(Block *p, IndexSet *lo) { > IndexSet *livein = &_livein[p->_pre_order-1]; > - IndexSetIterator elements(lo); > - uint r; > - while ((r = elements.next()) != 0) { > - livein->insert(r); // Then add to live-in set > + if (!livein->is_empty()) { > + IndexSetIterator elements(lo); > + uint r; > + while ((r = elements.next()) != 0) { > + livein->insert(r); // Then add to live-in set > + } > } > } > > > `livein` is initially empy and the patch above only adds element to it if: > > > if (!livein->is_empty()) { > > > which is never true. > > This doesn't affect correctness as live in sets are only used to drive > scheduling. Test results on Oracle CI look good, benchmarks are still running but the results look rather neutral so far. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22473#issuecomment-2514480249 From chagedorn at openjdk.org Tue Dec 3 12:55:41 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 3 Dec 2024 12:55:41 GMT Subject: RFR: 8345158: IGV: local scheduling should not place successors before predecessors In-Reply-To: <0VUX5bX4oDw7aUoQ8KWtsG6PLMfM5bH3ge7ZYY8Gz1Y=.305aa13d-f638-4645-b956-a7209266358f@github.com> References: <0VUX5bX4oDw7aUoQ8KWtsG6PLMfM5bH3ge7ZYY8Gz1Y=.305aa13d-f638-4645-b956-a7209266358f@github.com> Message-ID: On Mon, 2 Dec 2024 12:58:57 GMT, Daniel Skantz wrote: > A slight tweak to consider def-use in scheduleBlock. > > Testing: opened a few graphs in control-flow graph mode without any assertion failure. Looks good to me, too. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22481#pullrequestreview-2475600393 From duke at openjdk.org Tue Dec 3 13:13:38 2024 From: duke at openjdk.org (Daniel Skantz) Date: Tue, 3 Dec 2024 13:13:38 GMT Subject: RFR: 8345158: IGV: local scheduling should not place successors before predecessors In-Reply-To: <0VUX5bX4oDw7aUoQ8KWtsG6PLMfM5bH3ge7ZYY8Gz1Y=.305aa13d-f638-4645-b956-a7209266358f@github.com> References: <0VUX5bX4oDw7aUoQ8KWtsG6PLMfM5bH3ge7ZYY8Gz1Y=.305aa13d-f638-4645-b956-a7209266358f@github.com> Message-ID: On Mon, 2 Dec 2024 12:58:57 GMT, Daniel Skantz wrote: > A slight tweak to consider def-use in scheduleBlock. > > Testing: opened a few graphs in control-flow graph mode without any assertion failure. Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22481#issuecomment-2514519471 From duke at openjdk.org Tue Dec 3 13:13:39 2024 From: duke at openjdk.org (duke) Date: Tue, 3 Dec 2024 13:13:39 GMT Subject: RFR: 8345158: IGV: local scheduling should not place successors before predecessors In-Reply-To: <0VUX5bX4oDw7aUoQ8KWtsG6PLMfM5bH3ge7ZYY8Gz1Y=.305aa13d-f638-4645-b956-a7209266358f@github.com> References: <0VUX5bX4oDw7aUoQ8KWtsG6PLMfM5bH3ge7ZYY8Gz1Y=.305aa13d-f638-4645-b956-a7209266358f@github.com> Message-ID: On Mon, 2 Dec 2024 12:58:57 GMT, Daniel Skantz wrote: > A slight tweak to consider def-use in scheduleBlock. > > Testing: opened a few graphs in control-flow graph mode without any assertion failure. @danielogh Your change (at version abf359a4ddff3e02f8e8d5d8487dac32eaa1dc4e) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22481#issuecomment-2514520997 From duke at openjdk.org Tue Dec 3 13:35:45 2024 From: duke at openjdk.org (Daniel Skantz) Date: Tue, 3 Dec 2024 13:35:45 GMT Subject: Integrated: 8345158: IGV: local scheduling should not place successors before predecessors In-Reply-To: <0VUX5bX4oDw7aUoQ8KWtsG6PLMfM5bH3ge7ZYY8Gz1Y=.305aa13d-f638-4645-b956-a7209266358f@github.com> References: <0VUX5bX4oDw7aUoQ8KWtsG6PLMfM5bH3ge7ZYY8Gz1Y=.305aa13d-f638-4645-b956-a7209266358f@github.com> Message-ID: On Mon, 2 Dec 2024 12:58:57 GMT, Daniel Skantz wrote: > A slight tweak to consider def-use in scheduleBlock. > > Testing: opened a few graphs in control-flow graph mode without any assertion failure. This pull request has now been integrated. Changeset: 65b5a2e3 Author: Daniel Skantz Committer: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk/commit/65b5a2e3e4f9882adca587b9fed90223b93302a0 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod 8345158: IGV: local scheduling should not place successors before predecessors Reviewed-by: rcastanedalo, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/22481 From amitkumar at openjdk.org Tue Dec 3 13:40:13 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 3 Dec 2024 13:40:13 GMT Subject: RFR: 8344304: [s390x] ubsan: negation of -2147483648 cannot be represented in type 'int' [v6] In-Reply-To: References: Message-ID: > fixes the issue reported by ubsan. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: la has no benefit over add instructions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22456/files - new: https://git.openjdk.org/jdk/pull/22456/files/56a01717..238cdcd4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22456&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22456&range=04-05 Stats: 26 lines in 1 file changed: 2 ins; 19 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/22456.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22456/head:pull/22456 PR: https://git.openjdk.org/jdk/pull/22456 From chagedorn at openjdk.org Tue Dec 3 13:46:46 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 3 Dec 2024 13:46:46 GMT Subject: RFR: 8345299: C2: some nodes can still have incorrect control after do_range_check() In-Reply-To: References: Message-ID: On Mon, 2 Dec 2024 13:53:00 GMT, Roland Westrelin wrote: > 8339733 fixed controls for updated pre/main limits during > do_range_check(). However, it missed one issue: > > Control for the new limits is computed in `new_limit_ctrl` for both > pre and main loops. `new_limit_ctrl` is currently initialized from the > pre limit control but it also needs to take the main loop limit > control into account as sometimes, the main loop limit control is > below the pre limit and pre loop entry control. > > 8339733 also introduced a couple bugs. Control of the `Bool`/`Cmp` > nodes are updated for the pre and main loop to `new_limit_ctrl`. But > that's incorrect because `new_limit_ctrl` may be above the pre loop > while the `Bool`/`Cmp` for the pre loop are in the loop (because they > depend on the loop iv) and for the main loop are after the pre loop > (because they depend on the iv out of the pre loop). I fixed this for > the pre loop by setting control for the `Bool`/`Cmp` to be as late as > possible. For the main loop, no change appears to be required as > control computed by c2 is already late enough. I've added an assert > instead. I see, thanks for background! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22485#issuecomment-2514594751 From duke at openjdk.org Tue Dec 3 13:48:50 2024 From: duke at openjdk.org (Daniel Skantz) Date: Tue, 3 Dec 2024 13:48:50 GMT Subject: RFR: 8345156: C2: Add bailouts next to a few asserts In-Reply-To: References: Message-ID: On Mon, 2 Dec 2024 19:34:18 GMT, Vladimir Kozlov wrote: >> This patch associates product bailouts with a few existing debug asserts. The criteria are that there have been product bugs associated with failing these asserts in the past, and that there are not too many callers to the method where the compilation is now cancelled or any measurable impact on compilation time. >> >> Testing: T1-T4 and extra testing. Tested compilation time with -XX:+CITime on performance benchmarks and the effect was not measurable. > > src/hotspot/share/opto/gcm.cpp line 1535: > >> 1533: // defs in new LCA block. >> 1534: LCA = insert_anti_dependences(LCA, self); >> 1535: if (C->failing()) { > > Same here. I don't see where `record_failure()` is called from `insert_anti_dependences()`. > And there are a lot of assets there. This is intended to cover `insert_anti_dependences` : `memory_early_block` : `assert_dom` : `record_failure("unschedulable graph");` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22482#discussion_r1867749958 From lucy at openjdk.org Tue Dec 3 14:46:40 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Tue, 3 Dec 2024 14:46:40 GMT Subject: RFR: 8344304: [s390x] ubsan: negation of -2147483648 cannot be represented in type 'int' [v6] In-Reply-To: References: Message-ID: On Tue, 3 Dec 2024 13:40:13 GMT, Amit Kumar wrote: >> fixes the issue reported by ubsan. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > la has no benefit over add instructions LGTM. ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22456#pullrequestreview-2475901454 From aph at openjdk.org Tue Dec 3 14:48:11 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 3 Dec 2024 14:48:11 GMT Subject: RFR: 8344068: Windows x86-64: Out of CodeBuffer space when generating final stubs Message-ID: I've had a look at the difference between an Intel AVX-512 machine (which does run out of memory) and an AMD machine, and it seems to be that the AVX-512 stubs required by Windows really do take up a lot of space. This should be sufficient. ------------- Commit messages: - 8344068: Windows x86-64: Out of CodeBuffer space when generating final stubs Changes: https://git.openjdk.org/jdk/pull/22516/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22516&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8344068 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22516.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22516/head:pull/22516 PR: https://git.openjdk.org/jdk/pull/22516 From epeter at openjdk.org Tue Dec 3 16:38:40 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Dec 2024 16:38:40 GMT Subject: RFR: 8343747: C2: TestReplicateAtConv.java crashes with -XX:MaxVectorSize=8 In-Reply-To: References: Message-ID: On Thu, 28 Nov 2024 15:31:47 GMT, Roland Westrelin wrote: > Crash occurs when attempting to create a `Replicate` node that's input > to a `VectorCast` node (for a `ConvL2I`) that's not supported by the > platform (when run with `MaxVectorSize=8`). I think the pack for the > `VectorCast` should be filtered out earlier as not implemented and I > propose adding a test to `VectorCastNode::implemented()` for the type > of its input to handle that corner case. Looks reasonable, thanks for the fix! I linked the issue with [JDK-8341834](https://bugs.openjdk.org/browse/JDK-8341834) on JBS - after all that had the same issue of Replicate->Cast test/hotspot/jtreg/compiler/vectorization/TestReplicateAtConv.java line 29: > 27: * @summary C2 compilation fails with "bad AD file" due to Replicate > 28: * @run main/othervm -XX:CompileCommand=compileonly,TestReplicateAtConv::test -Xcomp TestReplicateAtConv > 29: * @run main/othervm -XX:CompileCommand=compileonly,TestReplicateAtConv::test -Xcomp -XX:MaxVectorSize=8 TestReplicateAtConv You should add the new bug id to the test. ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22442#pullrequestreview-2476199710 PR Review Comment: https://git.openjdk.org/jdk/pull/22442#discussion_r1868038365 From epeter at openjdk.org Tue Dec 3 16:38:41 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 3 Dec 2024 16:38:41 GMT Subject: RFR: 8343747: C2: TestReplicateAtConv.java crashes with -XX:MaxVectorSize=8 In-Reply-To: References: Message-ID: <_B_2xeH_MuZ_Pzk9K2Ap2klG1XahEnIFr_U3QTi5cnc=.b6e3a197-fa35-4d34-a11a-c120c607c8d3@github.com> On Tue, 3 Dec 2024 16:27:48 GMT, Emanuel Peter wrote: >> Crash occurs when attempting to create a `Replicate` node that's input >> to a `VectorCast` node (for a `ConvL2I`) that's not supported by the >> platform (when run with `MaxVectorSize=8`). I think the pack for the >> `VectorCast` should be filtered out earlier as not implemented and I >> propose adding a test to `VectorCastNode::implemented()` for the type >> of its input to handle that corner case. > > test/hotspot/jtreg/compiler/vectorization/TestReplicateAtConv.java line 29: > >> 27: * @summary C2 compilation fails with "bad AD file" due to Replicate >> 28: * @run main/othervm -XX:CompileCommand=compileonly,TestReplicateAtConv::test -Xcomp TestReplicateAtConv >> 29: * @run main/othervm -XX:CompileCommand=compileonly,TestReplicateAtConv::test -Xcomp -XX:MaxVectorSize=8 TestReplicateAtConv > > You should add the new bug id to the test. Maybe you should also update the summary, and say that the issue is about replicate and cast. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22442#discussion_r1868047222 From kvn at openjdk.org Tue Dec 3 17:05:41 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 3 Dec 2024 17:05:41 GMT Subject: RFR: 8345156: C2: Add bailouts next to a few asserts In-Reply-To: References: Message-ID: On Mon, 2 Dec 2024 13:07:57 GMT, Daniel Skantz wrote: > This patch associates product bailouts with a few existing debug asserts. The criteria are that there have been product bugs associated with failing these asserts in the past, and that there are not too many callers to the method where the compilation is now cancelled or any measurable impact on compilation time. > > Testing: T1-T4 and extra testing. Tested compilation time with -XX:+CITime on performance benchmarks and the effect was not measurable. Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22482#pullrequestreview-2476298157 From dlong at openjdk.org Tue Dec 3 17:08:53 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 3 Dec 2024 17:08:53 GMT Subject: RFR: 8340141: C1: rework ciMethod::equals following 8338471 In-Reply-To: References: Message-ID: On Tue, 1 Oct 2024 16:33:45 GMT, Vladimir Kozlov wrote: >> @vnkozlov , I like the alternative approach better. I went with the current approach because I was thinking it would be simpler than the bailout, but I changed my mind after writing both out. >> >> Yes, we can change cha_monomorphic_target to nullptr instead of bailing out. But my understanding is any use of old/redefined methods will cause the compilation to be thrown out when we try to create the nmethod, so we are avoiding wasted work by bailing out early. > > @dean-long, may be I misunderstand your statement. Are you re-writing the fix or keep current? Thanks @vnkozlov. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21148#issuecomment-2515117055 From dlong at openjdk.org Tue Dec 3 17:08:54 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 3 Dec 2024 17:08:54 GMT Subject: Integrated: 8340141: C1: rework ciMethod::equals following 8338471 In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 00:35:47 GMT, Dean Long wrote: > This PR changes ciMethod::equals() to a special-purpose debug helper method for the one place in C1 that uses it in an assert. The reason why making it general purpose is difficult is because JVMTI can add and delete methods. See the bug report and JDK-8338471 for more details. I'm open to suggestions for a better name than equals_ignore_version(). > > An alternative approach, which I think may actually be better, would be to check for old methods first, and bail out if we see any. Then we can change the assert back to how it was originally, using ==. This pull request has now been integrated. Changeset: 293323c3 Author: Dean Long URL: https://git.openjdk.org/jdk/commit/293323c3e210bc2a3e45a0a9bc99b55378be91d2 Stats: 40 lines in 3 files changed: 21 ins; 18 del; 1 mod 8340141: C1: rework ciMethod::equals following 8338471 Reviewed-by: kvn, vlivanov ------------- PR: https://git.openjdk.org/jdk/pull/21148 From dlong at openjdk.org Tue Dec 3 17:11:48 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 3 Dec 2024 17:11:48 GMT Subject: RFR: 8344304: [s390x] ubsan: negation of -2147483648 cannot be represented in type 'int' [v6] In-Reply-To: References: Message-ID: On Tue, 3 Dec 2024 13:40:13 GMT, Amit Kumar wrote: >> fixes the issue reported by ubsan. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > la has no benefit over add instructions Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22456#pullrequestreview-2476311539 From kvn at openjdk.org Tue Dec 3 17:17:45 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 3 Dec 2024 17:17:45 GMT Subject: RFR: 8345299: C2: some nodes can still have incorrect control after do_range_check() In-Reply-To: References: Message-ID: On Mon, 2 Dec 2024 13:53:00 GMT, Roland Westrelin wrote: > 8339733 fixed controls for updated pre/main limits during > do_range_check(). However, it missed one issue: > > Control for the new limits is computed in `new_limit_ctrl` for both > pre and main loops. `new_limit_ctrl` is currently initialized from the > pre limit control but it also needs to take the main loop limit > control into account as sometimes, the main loop limit control is > below the pre limit and pre loop entry control. > > 8339733 also introduced a couple bugs. Control of the `Bool`/`Cmp` > nodes are updated for the pre and main loop to `new_limit_ctrl`. But > that's incorrect because `new_limit_ctrl` may be above the pre loop > while the `Bool`/`Cmp` for the pre loop are in the loop (because they > depend on the loop iv) and for the main loop are after the pre loop > (because they depend on the iv out of the pre loop). I fixed this for > the pre loop by setting control for the `Bool`/`Cmp` to be as late as > possible. For the main loop, no change appears to be required as > control computed by c2 is already late enough. I've added an assert > instead. Few comments. src/hotspot/share/opto/loopTransform.cpp line 2781: > 2779: // new pre_limit can push Bool/Cmp/Opaque nodes down (when one of the eliminated condition has parameters that are not > 2780: // loop invariant in the pre loop. > 2781: set_ctrl(pre_opaq, new_limit_ctrl); Can you update this comment to explain different control settings as you did in PR's description. src/hotspot/share/opto/loopTransform.cpp line 2825: > 2823: // new main_limit can push Bool/Cmp nodes down (when one of the eliminated condition has parameters that are not loop > 2824: // invariant in the pre loop). > 2825: set_ctrl(opqzm, new_limit_ctrl); Update this comment too. ------------- PR Review: https://git.openjdk.org/jdk/pull/22485#pullrequestreview-2476310277 PR Review Comment: https://git.openjdk.org/jdk/pull/22485#discussion_r1868102387 PR Review Comment: https://git.openjdk.org/jdk/pull/22485#discussion_r1868111672 From kvn at openjdk.org Tue Dec 3 17:24:43 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 3 Dec 2024 17:24:43 GMT Subject: RFR: 8344068: Windows x86-64: Out of CodeBuffer space when generating final stubs In-Reply-To: References: Message-ID: On Tue, 3 Dec 2024 14:38:39 GMT, Andrew Haley wrote: > I've had a look at the difference between an Intel AVX-512 machine (which does run out of memory) and an AMD machine, and it seems to be that the AVX-512 stubs required by Windows really do take up a lot of space. This should be sufficient. src/hotspot/cpu/x86/stubRoutines_x86.hpp line 41: > 39: // Windows have more code to save/restore registers > 40: _compiler_stubs_code_size = 20000 LP64_ONLY(+47000) WINDOWS_ONLY(+2000), > 41: _final_stubs_code_size = 10000 LP64_ONLY(+20000) WINDOWS_ONLY(+22000) ZGC_ONLY(+20000) Do you know which particular stubs cause out of space? Most stubs are generated for `compiler` intrinsics. I would assume that they are causing the issue and not `final` 6 stubs. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22516#discussion_r1868122421 From kvn at openjdk.org Tue Dec 3 17:29:45 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 3 Dec 2024 17:29:45 GMT Subject: RFR: 8344068: Windows x86-64: Out of CodeBuffer space when generating final stubs In-Reply-To: References: Message-ID: On Tue, 3 Dec 2024 17:21:54 GMT, Vladimir Kozlov wrote: >> I've had a look at the difference between an Intel AVX-512 machine (which does run out of memory) and an AMD machine, and it seems to be that the AVX-512 stubs required by Windows really do take up a lot of space. This should be sufficient. > > src/hotspot/cpu/x86/stubRoutines_x86.hpp line 41: > >> 39: // Windows have more code to save/restore registers >> 40: _compiler_stubs_code_size = 20000 LP64_ONLY(+47000) WINDOWS_ONLY(+2000), >> 41: _final_stubs_code_size = 10000 LP64_ONLY(+20000) WINDOWS_ONLY(+22000) ZGC_ONLY(+20000) > > Do you know which particular stubs cause out of space? > Most stubs are generated for `compiler` intrinsics. I would assume that they are causing the issue and not `final` 6 stubs. Scratch that. I see call stack in the bug report. It is indeed final stubs for upcall stub when ZGC is enabled. Should we increase ZGC_ONLY(+20000) value instead? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22516#discussion_r1868130048 From kvn at openjdk.org Tue Dec 3 18:18:47 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 3 Dec 2024 18:18:47 GMT Subject: RFR: 8343148: C2: Refactor uses of "PhaseValue::*con*() + PhaseIdealLoop::set_ctrl()" into separate method [v12] In-Reply-To: References: Message-ID: On Mon, 18 Nov 2024 07:56:31 GMT, theoweidmannoracle wrote: >> This patch introduces the methods `PhaseIdealLoop::intcon` and `PhaseIdealLoop::longcon` which are wrappers for: >> >> >> ConINode* node = _igvn.intcon(i); >> set_ctrl(node, C->root()); >> >> >> and >> >> >> ConLNode* node = _igvn.longcon(i); >> set_ctrl(node, C->root()); >> >> >> Occurrences of this pattern in loopnode.cpp were replaced with the appropriate call to the new methods. > > theoweidmannoracle has updated the pull request incrementally with three additional commits since the last revision: > > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn This version looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21836#pullrequestreview-2476472097 From kvn at openjdk.org Tue Dec 3 18:18:50 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 3 Dec 2024 18:18:50 GMT Subject: RFR: 8343148: C2: Refactor uses of "PhaseValue::*con*() + PhaseIdealLoop::set_ctrl()" into separate method [v4] In-Reply-To: <-1uIsg-ge9MgmoMQFqE7ojuoKr16S4v545Vy71uCs18=.a37c4555-9f94-4aa8-ae59-037f33ff8f05@github.com> References: <-1uIsg-ge9MgmoMQFqE7ojuoKr16S4v545Vy71uCs18=.a37c4555-9f94-4aa8-ae59-037f33ff8f05@github.com> Message-ID: <1LfChQJb6_GXebmbrw5ZT8v5nXWxNzy5e_JpxgJS0eY=.645609ae-76b2-436f-b620-384e377e122d@github.com> On Mon, 11 Nov 2024 07:38:44 GMT, theoweidmannoracle wrote: >> src/hotspot/share/opto/loopnode.cpp line 3147: >> >>> 3145: ConINode* zero = igvn->intcon(0); >>> 3146: if (iloop != nullptr) { >>> 3147: iloop->set_root_as_ctrl(zero); >> >> Please look on history of this code. This is suspicious - constant nodes should be always attached to Root. > > @TobiHartmann Pointed out that this method is also called from code outside of loop opts, for example, `PhaseMacroExpand::expand_macro_nodes`. Since there's no PhaseIdealLoop in this case, nullptr is passed instead and we cannot set control as we are not inside a loop opt. > > Maybe @rwestrel can also take a look as he originally introduced this code in [this PR](https://github.com/openjdk/jdk/pull/7364/files#diff-d49652d43244d52415873c37bf6990269b0d6e2f2111f4f971660470b6bca738R2860). Got it. That is what `(iloop != nullptr)` check for. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21836#discussion_r1868196480 From kvn at openjdk.org Tue Dec 3 19:07:44 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 3 Dec 2024 19:07:44 GMT Subject: RFR: 8344068: Windows x86-64: Out of CodeBuffer space when generating final stubs In-Reply-To: References: Message-ID: On Tue, 3 Dec 2024 17:26:36 GMT, Vladimir Kozlov wrote: >> src/hotspot/cpu/x86/stubRoutines_x86.hpp line 41: >> >>> 39: // Windows have more code to save/restore registers >>> 40: _compiler_stubs_code_size = 20000 LP64_ONLY(+47000) WINDOWS_ONLY(+2000), >>> 41: _final_stubs_code_size = 10000 LP64_ONLY(+20000) WINDOWS_ONLY(+22000) ZGC_ONLY(+20000) >> >> Do you know which particular stubs cause out of space? >> Most stubs are generated for `compiler` intrinsics. I would assume that they are causing the issue and not `final` 6 stubs. > > Scratch that. I see call stack in the bug report. It is indeed final stubs for upcall stub when ZGC is enabled. > Should we increase ZGC_ONLY(+20000) value instead? ZGC save/restores XMM registers regardless OS: [x86/gc/z/zBarrierSetAssembler_x86.cpp#L70](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp#L70) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22516#discussion_r1868234509 From kvn at openjdk.org Tue Dec 3 20:17:42 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 3 Dec 2024 20:17:42 GMT Subject: RFR: 8343747: C2: TestReplicateAtConv.java crashes with -XX:MaxVectorSize=8 In-Reply-To: References: Message-ID: On Thu, 28 Nov 2024 15:31:47 GMT, Roland Westrelin wrote: > Crash occurs when attempting to create a `Replicate` node that's input > to a `VectorCast` node (for a `ConvL2I`) that's not supported by the > platform (when run with `MaxVectorSize=8`). I think the pack for the > `VectorCast` should be filtered out earlier as not implemented and I > propose adding a test to `VectorCastNode::implemented()` for the type > of its input to handle that corner case. Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22442#pullrequestreview-2476660148 From kvn at openjdk.org Tue Dec 3 20:19:44 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 3 Dec 2024 20:19:44 GMT Subject: RFR: 8345287: C2: live in computation is broken In-Reply-To: References: Message-ID: <47t7YNsiQVBJGI5LKT3Uy1pESH33laJvBiMrVh-66yo=.8e1e34d3-ea96-4e26-bf6e-ca231859b9ca@github.com> On Mon, 2 Dec 2024 09:03:50 GMT, Roland Westrelin wrote: > 8234003 (Improve IndexSet iteration) broke live in computation: > > > @@ -273,23 +276,25 @@ void PhaseLive::add_liveout( Block *p, IndexSet *lo, VectorSet &first_pass ) { > // Add a vector of live-in values to a given blocks live-in set. > void PhaseLive::add_livein(Block *p, IndexSet *lo) { > IndexSet *livein = &_livein[p->_pre_order-1]; > - IndexSetIterator elements(lo); > - uint r; > - while ((r = elements.next()) != 0) { > - livein->insert(r); // Then add to live-in set > + if (!livein->is_empty()) { > + IndexSetIterator elements(lo); > + uint r; > + while ((r = elements.next()) != 0) { > + livein->insert(r); // Then add to live-in set > + } > } > } > > > `livein` is initially empy and the patch above only adds element to it if: > > > if (!livein->is_empty()) { > > > which is never true. > > This doesn't affect correctness as live in sets are only used to drive > scheduling. Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22473#pullrequestreview-2476666102 From kxu at openjdk.org Tue Dec 3 21:40:41 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Tue, 3 Dec 2024 21:40:41 GMT Subject: RFR: 8336759: C2: int counted loop with long limit not recognized as counted loop In-Reply-To: References: <_d_CiLfCN9ahEmhp9fLcGqO-L8n7a0gW86R3lzLkX60=.b3bdc697-cb2d-477f-a525-0f16a3eee383@github.com> Message-ID: On Tue, 3 Dec 2024 09:04:08 GMT, Christian Hagedorn wrote: >> This patch implements [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759) that recognizes int counted loops with long limits. >> >> Currently, patterns like `for ( int i =...; i < long_limit; ...)` where int `i` is implicitly promoted to long (i.e., `(long) i < long_limit`) is not recognized as (int) counted loop. This patch speculatively and optimistically converts long limits to ints and deoptimize if the limit is outside int range, allowing more optimization opportunities. >> >> In other words, it transforms >> >> >> for (int i = 0; (long) i < long_limit; i++) {...} >> >> >> to >> >> >> if (int_min <= long_limit && long_limit <= int_max ) { >> for (int i = 0; i < (int) long_limit; i++) {...} >> } else { >> trap: loop_limit_check >> } >> >> >> This could benefit calls to APIs like `long MemorySegment#byteSize()` when iterating over a long limit. > > src/hotspot/share/opto/loopnode.hpp line 1229: > >> 1227: bool is_counted_loop_with_speculative_long_limit(Node* x, IdealLoopTree*& loop, BasicType iv_bt); >> 1228: private: >> 1229: bool do_is_counted_loop(Node* x, IdealLoopTree*& loop, BasicType iv_bt); > > Was like that before but I think `is_counted_loop()` is a bit misleading, suggesting it's a query but it's actually doing the conversion work. Since you now change these methods anyway, what do you think about the following naming suggestions? > - public `try_convert_to_counted_loop()` > - calls: private `convert_to_counted_loop()` > - call: private `convert_to_counted_loop_with_speculative_long_limit()` Yes I found the naming counter-intuitive, too, upon first reading it. I like your suggestions. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22449#discussion_r1868399930 From kxu at openjdk.org Tue Dec 3 21:48:59 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Tue, 3 Dec 2024 21:48:59 GMT Subject: RFR: 8336759: C2: int counted loop with long limit not recognized as counted loop [v2] In-Reply-To: <_d_CiLfCN9ahEmhp9fLcGqO-L8n7a0gW86R3lzLkX60=.b3bdc697-cb2d-477f-a525-0f16a3eee383@github.com> References: <_d_CiLfCN9ahEmhp9fLcGqO-L8n7a0gW86R3lzLkX60=.b3bdc697-cb2d-477f-a525-0f16a3eee383@github.com> Message-ID: > This patch implements [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759) that recognizes int counted loops with long limits. > > Currently, patterns like `for ( int i =...; i < long_limit; ...)` where int `i` is implicitly promoted to long (i.e., `(long) i < long_limit`) is not recognized as (int) counted loop. This patch speculatively and optimistically converts long limits to ints and deoptimize if the limit is outside int range, allowing more optimization opportunities. > > In other words, it transforms > > > for (int i = 0; (long) i < long_limit; i++) {...} > > > to > > > if (int_min <= long_limit && long_limit <= int_max ) { > for (int i = 0; i < (int) long_limit; i++) {...} > } else { > trap: loop_limit_check > } > > > This could benefit calls to APIs like `long MemorySegment#byteSize()` when iterating over a long limit. Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: implement suggested changes from @chhagedorn's review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22449/files - new: https://git.openjdk.org/jdk/pull/22449/files/4a7d03fe..79d8c146 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22449&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22449&range=00-01 Stats: 76 lines in 3 files changed: 27 ins; 11 del; 38 mod Patch: https://git.openjdk.org/jdk/pull/22449.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22449/head:pull/22449 PR: https://git.openjdk.org/jdk/pull/22449 From kxu at openjdk.org Tue Dec 3 21:48:59 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Tue, 3 Dec 2024 21:48:59 GMT Subject: RFR: 8336759: C2: int counted loop with long limit not recognized as counted loop [v2] In-Reply-To: References: <_d_CiLfCN9ahEmhp9fLcGqO-L8n7a0gW86R3lzLkX60=.b3bdc697-cb2d-477f-a525-0f16a3eee383@github.com> Message-ID: On Tue, 3 Dec 2024 09:30:21 GMT, Christian Hagedorn wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> implement suggested changes from @chhagedorn's review > > test/hotspot/jtreg/compiler/loopopts/TestIntCountedLoopLongLimit.java line 41: > >> 39: */ >> 40: public class TestIntCountedLoopLongLimit { >> 41: private static final Random RNG = jdk.test.lib.Utils.getRandomInstance(); > > I suggest to rather use an `import` instead of a fully qualified name: > Suggestion: > > import jdk.test.lib.Asserts; > import jdk.test.lib.Utils; > > import java.util.Random; > > /** > * @test > * @bug 8336759 > * @summary test long limits in int counted loops are speculatively converted to int for counted loop > * optimizations > * @library /test/lib / > * @requires vm.compiler2.enabled > * @run driver compiler.loopopts.TestIntCountedLoopLongLimit > */ > public class TestIntCountedLoopLongLimit { > private static final Random RNG = Utils.getRandomInstance(); Oops. I don't know how that happened. Nice catch! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22449#discussion_r1868407027 From kxu at openjdk.org Tue Dec 3 21:51:41 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Tue, 3 Dec 2024 21:51:41 GMT Subject: RFR: 8336759: C2: int counted loop with long limit not recognized as counted loop [v2] In-Reply-To: References: <_d_CiLfCN9ahEmhp9fLcGqO-L8n7a0gW86R3lzLkX60=.b3bdc697-cb2d-477f-a525-0f16a3eee383@github.com> Message-ID: On Tue, 3 Dec 2024 09:37:53 GMT, Christian Hagedorn wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> implement suggested changes from @chhagedorn's review > > Nice enhancement! Couple of mostly minor comments. @chhagedorn Thank you for your timely review! Those are very valid comments. PR updated with suggested changes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22449#issuecomment-2515620229 From amitkumar at openjdk.org Wed Dec 4 03:49:41 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 4 Dec 2024 03:49:41 GMT Subject: RFR: 8344304: [s390x] ubsan: negation of -2147483648 cannot be represented in type 'int' [v6] In-Reply-To: References: Message-ID: On Tue, 3 Dec 2024 13:40:13 GMT, Amit Kumar wrote: >> fixes the issue reported by ubsan. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > la has no benefit over add instructions Thanks for reviews & suggestions Lutz, Dean. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22456#issuecomment-2516106802 From amitkumar at openjdk.org Wed Dec 4 03:49:42 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 4 Dec 2024 03:49:42 GMT Subject: Integrated: 8344304: [s390x] ubsan: negation of -2147483648 cannot be represented in type 'int' In-Reply-To: References: Message-ID: On Fri, 29 Nov 2024 11:08:45 GMT, Amit Kumar wrote: > fixes the issue reported by ubsan. This pull request has now been integrated. Changeset: 43b337eb Author: Amit Kumar URL: https://git.openjdk.org/jdk/commit/43b337eb438f230dbca903b56e0809fc36fcd71d Stats: 41 lines in 3 files changed: 37 ins; 0 del; 4 mod 8344304: [s390x] ubsan: negation of -2147483648 cannot be represented in type 'int' Reviewed-by: lucy, dlong ------------- PR: https://git.openjdk.org/jdk/pull/22456 From epeter at openjdk.org Wed Dec 4 06:46:38 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 4 Dec 2024 06:46:38 GMT Subject: RFR: 8343747: C2: TestReplicateAtConv.java crashes with -XX:MaxVectorSize=8 In-Reply-To: References: Message-ID: <1D5HQ9-AxBkTMeJ1y9nPMke4_7RjH4QJFV-ta9LF45Q=.b4b2b878-beca-4015-968d-8f6d6008613b@github.com> On Thu, 28 Nov 2024 15:31:47 GMT, Roland Westrelin wrote: > Crash occurs when attempting to create a `Replicate` node that's input > to a `VectorCast` node (for a `ConvL2I`) that's not supported by the > platform (when run with `MaxVectorSize=8`). I think the pack for the > `VectorCast` should be filtered out earlier as not implemented and I > propose adding a test to `VectorCastNode::implemented()` for the type > of its input to handle that corner case. Marked as reviewed by epeter (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22442#pullrequestreview-2477422598 From chagedorn at openjdk.org Wed Dec 4 06:53:41 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 4 Dec 2024 06:53:41 GMT Subject: RFR: 8336759: C2: int counted loop with long limit not recognized as counted loop [v2] In-Reply-To: References: <_d_CiLfCN9ahEmhp9fLcGqO-L8n7a0gW86R3lzLkX60=.b3bdc697-cb2d-477f-a525-0f16a3eee383@github.com> Message-ID: On Tue, 3 Dec 2024 21:48:59 GMT, Kangcheng Xu wrote: >> This patch implements [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759) that recognizes int counted loops with long limits. >> >> Currently, patterns like `for ( int i =...; i < long_limit; ...)` where int `i` is implicitly promoted to long (i.e., `(long) i < long_limit`) is not recognized as (int) counted loop. This patch speculatively and optimistically converts long limits to ints and deoptimize if the limit is outside int range, allowing more optimization opportunities. >> >> In other words, it transforms >> >> >> for (int i = 0; (long) i < long_limit; i++) {...} >> >> >> to >> >> >> if (int_min <= long_limit && long_limit <= int_max ) { >> for (int i = 0; i < (int) long_limit; i++) {...} >> } else { >> trap: loop_limit_check >> } >> >> >> This could benefit calls to APIs like `long MemorySegment#byteSize()` when iterating over a long limit. > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > implement suggested changes from @chhagedorn's review Thanks for the update! Some minor last things, otherwise, it looks good. Let me give this a spin in our testing. Note that tomorrow is the fork. If this is not a critical performance enhancement, you might want to wait until after the fork, so this only goes into JDK 25. This patch will enable more optimization opportunities due to being able to convert more loops to counted ones. This could have some side effects. src/hotspot/share/opto/loopnode.cpp line 1618: > 1616: > 1617: //------------------------------is_counted_loop-------------------------------- > 1618: bool PhaseIdealLoop::try_convert_to_counted_loop(Node* loop_head, IdealLoopTree*&loop, BasicType iv_bt) { Suggestion: bool PhaseIdealLoop::try_convert_to_counted_loop(Node* loop_head, IdealLoopTree*& loop, BasicType iv_bt) { src/hotspot/share/opto/loopnode.cpp line 1639: > 1637: // trap: loop_limit_check > 1638: // } > 1639: bool PhaseIdealLoop::convert_to_counted_loop_with_speculative_long_limit(Node* loop_head, IdealLoopTree*&loop, Suggestion: bool PhaseIdealLoop::convert_to_counted_loop_with_speculative_long_limit(Node* loop_head, IdealLoopTree*& loop, src/hotspot/share/opto/loopnode.cpp line 1716: > 1714: } > 1715: > 1716: bool PhaseIdealLoop::convert_to_counted_loop(Node* loop_head, IdealLoopTree*&loop, BasicType iv_bt) { Suggestion: bool PhaseIdealLoop::convert_to_counted_loop(Node* loop_head, IdealLoopTree*& loop, BasicType iv_bt) { ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22449#pullrequestreview-2477421719 PR Review Comment: https://git.openjdk.org/jdk/pull/22449#discussion_r1868800358 PR Review Comment: https://git.openjdk.org/jdk/pull/22449#discussion_r1868801323 PR Review Comment: https://git.openjdk.org/jdk/pull/22449#discussion_r1868802820 From epeter at openjdk.org Wed Dec 4 07:00:38 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 4 Dec 2024 07:00:38 GMT Subject: RFR: 8345156: C2: Add bailouts next to a few asserts In-Reply-To: References: Message-ID: On Mon, 2 Dec 2024 13:07:57 GMT, Daniel Skantz wrote: > This patch associates product bailouts with a few existing debug asserts. The criteria are that there have been product bugs associated with failing these asserts in the past, and that there are not too many callers to the method where the compilation is now cancelled or any measurable impact on compilation time. > > Testing: T1-T4 and extra testing. Tested compilation time with -XX:+CITime on performance benchmarks and the effect was not measurable. Nice work @danielogh ! Looks reasonable :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22482#pullrequestreview-2477443543 From epeter at openjdk.org Wed Dec 4 07:05:40 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 4 Dec 2024 07:05:40 GMT Subject: RFR: 8345219: C2: x86_64 should not go to interpreter stubs for NaNs handling In-Reply-To: References: Message-ID: On Tue, 3 Dec 2024 07:10:16 GMT, Aleksey Shipilev wrote: >> This fix looks correct by looking on original changes. >> >> Yes, special Interpreter's code for these methods is only generated for x86_32 in `templateInterpreterGenerator_x86_32.cpp`. > >> This fix looks correct by looking on original changes. > > Thank you! I need one more (R)eviewer :) @shipilev Generally looks reasonable, though I'm not very familiar with this part of the code. One question: Could we add an IR test that would show that no interpreter stubs are emitted, but instead whatever it is now emitting? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22446#issuecomment-2516362667 From shade at openjdk.org Wed Dec 4 07:20:42 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 4 Dec 2024 07:20:42 GMT Subject: RFR: 8345219: C2: x86_64 should not go to interpreter stubs for NaNs handling In-Reply-To: References: Message-ID: On Wed, 4 Dec 2024 07:03:22 GMT, Emanuel Peter wrote: > One question: Could we add an IR test that would show that no interpreter stubs are emitted, but instead whatever it is now emitting? We could, but let's not do this in this PR. I already jumped the gun here a little with adding a benchmark to show the effect we have :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/22446#issuecomment-2516387073 From dfenacci at openjdk.org Wed Dec 4 07:24:04 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 4 Dec 2024 07:24:04 GMT Subject: RFR: 8302459: Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure [v2] In-Reply-To: References: Message-ID: > # Issue > > The `compiler/vectorapi/VectorLogicalOpIdentityTest.java` has been failing because C2 compiling the test `testAndMaskSameValue1` expects to have 1 `AndV` nodes but it has none. > > # Cause > > The issue has to do with the criteria that trigger a cleanup when performing late inlining. In the failing test, when the compiler tries to inline a `jdk.internal.vm.vector.VectorSupport::binaryOp` call, it fails because its argument is of the wrong type, mainly because some cast nodes ?hide? the more ?precise? type. > The graph that leads to the issue looks like this: > ![1BCE8148-1E44-4CA1-AF8F-EFC6210FA740](https://github.com/user-attachments/assets/62dd917f-2dac-42a9-90cf-73eedcd3cf8a) > The compiler tries to inline `jdk.internal.vm.vector.VectorSupport::load` and it succeeds: > ![752E81C9-A37D-4626-81A9-E4A839FADD3D](https://github.com/user-attachments/assets/e61057b2-3093-4992-ba5a-b80e4000c0ec) > The node `3027 VectorBox` has type `IntMaxVector`. `912 CastPP` and `934 CheckCastPP` have type `IntVector`instead. > The compiler then tries to inline one of the 2 `bynaryOp` calls but it fails because it needs an argument of type `IntMaxVector` and the argument it is given, which is node `934 CheckCastPP` , has type `IntVector`. > > This would not happen if between the 2 inlining attempts a _cleanup_ was triggered. IGVN would run and the 2 nodes `912 CastPP` and `934 CheckCastPP` would be folded away. `binaryOp` could then be inlined since the types would match. > > # Solution > > Instead of fixing this specific case we try a more generic approach: when late inlining we keep track of failed intrinsics and re-examine them during IGVN. If the `Ideal` method for their call node is called, we reschedule the intrinsic attempt for that call. > > Additional test runs with `-XX:-TieredCompilation` are added to `VectorLogicalOpIdentityTest.java` and `VectorGatherMaskFoldingTest.java` as regression tests and `-XX:+IncrementalInlineForceCleanup` is removed from `VectorGatherMaskFoldingTest.java` (previously added as workaround for this issue) Damon Fenacci has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 28 commits: - Merge branch 'master' into JDK-8302459-new - JDK-8302459: remove unneeded copyright year change - JDK-8302459: add failed late inlines handling to dynamic calls - JDK-8032459: fix indentation - JDK-8302459: copyright year - JDK-8302459: simplify late inline failure conditions - JDK-8302459: revert unneeded copyright update - JDK-8302459: remove unneeded spaces - JDK-8302459: increment MH late inline counter when reinserting them - Merge branch 'master' into JDK-8302459-new - ... and 18 more: https://git.openjdk.org/jdk/compare/4fbf2720...7a4ffc11 ------------- Changes: https://git.openjdk.org/jdk/pull/21682/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21682&range=01 Stats: 105 lines in 7 files changed: 48 ins; 8 del; 49 mod Patch: https://git.openjdk.org/jdk/pull/21682.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21682/head:pull/21682 PR: https://git.openjdk.org/jdk/pull/21682 From dfenacci at openjdk.org Wed Dec 4 07:24:04 2024 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 4 Dec 2024 07:24:04 GMT Subject: RFR: 8302459: Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure In-Reply-To: References: Message-ID: <-ig8iWJUpFAX5pjoWSZUbVmw7E9p5fjb9EcdBVB-Zpk=.5326c678-3362-4ef0-91d5-5d3b83f56dc6@github.com> On Thu, 7 Nov 2024 00:01:01 GMT, Vladimir Ivanov wrote: > A proper fix would be to re-examine failed intrinsics call site during IGVN and repeat intrinsifcation attempt when their inputs improve In the end I've reimplemented the fix with this approach: keeping track of failed late inlining and re-scheduling them in IGVN. One potential issue is that there might be useless redundant inlining attempts. This doesn't seem to be much of a problem though: late inlining attempts should be rather rare (only applied to a few invokedynamic, method handles, vector API). Benchmark measurements (DaCapo, Renaissance) for running time and compilation don't show any noticeable difference. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21682#issuecomment-2516389661 From epeter at openjdk.org Wed Dec 4 07:36:42 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 4 Dec 2024 07:36:42 GMT Subject: RFR: 8342692: C2: MemorySegment API slow with short running loops [v5] In-Reply-To: <0QhHkO9hp_uxL9EC5AYIhm95Gw2DeXXQZgVG2L0NDCw=.6076af9b-5922-48f7-ae41-66edcbe943ca@github.com> References: <0QhHkO9hp_uxL9EC5AYIhm95Gw2DeXXQZgVG2L0NDCw=.6076af9b-5922-48f7-ae41-66edcbe943ca@github.com> Message-ID: On Thu, 28 Nov 2024 15:40:42 GMT, Roland Westrelin wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 21 commits: >> >> - Merge branch 'master' into JDK-8342692 >> - whitespaces >> - more >> - merge >> - more >> - one more test >> - Merge branch 'master' into JDK-8342692 >> - more >> - more >> - Merge branch 'master' into JDK-8342692 >> - ... and 11 more: https://git.openjdk.org/jdk/compare/3b21a298...74c38342 > > I pushed an update that should fix all test failures except the one in `compiler/escapeAnalysis/TestMissingAntiDependency.java` (covered by JDK-8341976). A lot of them were caused by the following part of the change: >> In the case of a long counted loop, the loop is transformed into a > regular loop with a new limit and transformed range checks that's > later turned into an in counted loop. The int counted loop doesn't > need loop limit checks because of the way it's constructed. There's > an assert that catches that we don't attempt to add one. I ran into > test failures where, by the time the int counted loop is created, > the fact that the number of iterations of the loop is small enough > to not need a loop limit check gets lost. I added a cast to make > sure the narrowed limit's type is not lost (I had to do something > similar for loop nests). But then, I ran into the same issue again > because the cast was pushed through a sub or add and the narrowed > type was lost. I propose that pushing casts through sub/add be only > done after loop opts are over (same as what's done for range check > CastII). > > So I removed that part of the initial change and instead added some logic to pattern match the `CastLL` used by the loop nest for which the transformation of `(CastLL (AddL ...))` shouldn't be performed until the inner loop is turned into a counted loop. @rwestrel Would you mind changing the title to something more descriptive of your change? I'm thinking: "C2: Don't create loop-nest for short running loops". ------------- PR Comment: https://git.openjdk.org/jdk/pull/21630#issuecomment-2516412454 From epeter at openjdk.org Wed Dec 4 07:54:45 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 4 Dec 2024 07:54:45 GMT Subject: RFR: 8342692: C2: MemorySegment API slow with short running loops [v5] In-Reply-To: References: Message-ID: On Thu, 28 Nov 2024 14:42:23 GMT, Roland Westrelin wrote: >> To optimize a long counted loop and long range checks in a long or int >> counted loop, the loop is turned into a loop nest. When the loop has >> few iterations, the overhead of having an outer loop whose backedge is >> never taken, has a measurable cost. Furthermore, creating the loop >> nest usually causes one iteration of the loop to be peeled so >> predicates can be set up. If the loop is short running, then it's an >> extra iteration that's run with range checks (compared to an int >> counted loop with int range checks). >> >> This change doesn't create a loop nest when: >> >> 1- it can be determined statically at loop nest creation time that the >> loop runs for a short enough number of iterations >> >> 2- profiling reports that the loop runs for no more than ShortLoopIter >> iterations (1000 by default). >> >> For 2-, a guard is added which is implemented as yet another predicate. >> >> While this change is in principle simple, I ran into a few >> implementation issues: >> >> - while c2 has a way to compute the number of iterations of an int >> counted loop, it doesn't have that for long counted loop. The >> existing logic for int counted loops promotes values to long to >> avoid overflows. I reworked it so it now works for both long and int >> counted loops. >> >> - I added a new deoptimization reason (Reason_short_running_loop) for >> the new predicate. Given the number of iterations is narrowed down >> by the predicate, the limit of the loop after transformation is a >> cast node that's control dependent on the short running loop >> predicate. Because once the counted loop is transformed, it is >> likely that range check predicates will be inserted and they will >> depend on the limit, the short running loop predicate has to be the >> one that's further away from the loop entry. Now it is also possible >> that the limit before transformation depends on a predicate >> (TestShortRunningLongCountedLoopPredicatesClone is an example), we >> can have: new predicates inserted after the transformation that >> depend on the casted limit that itself depend on old predicates >> added before the transformation. To solve this cicular dependency, >> parse and assert predicates are cloned between the old predicates >> and the loop head. The cloned short running loop parse predicate is >> the one that's used to insert the short running loop predicate. >> >> - In the case of a long counted loop, the loop is transformed into a >> regular loop with a ... > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 21 commits: > > - Merge branch 'master' into JDK-8342692 > - whitespaces > - more > - merge > - more > - one more test > - Merge branch 'master' into JDK-8342692 > - more > - more > - Merge branch 'master' into JDK-8342692 > - ... and 11 more: https://git.openjdk.org/jdk/compare/3b21a298...74c38342 Hi @rwestrel this looks very interesting! Which benchmarks are you referring to? I just gave it a quick skim, will come back to this later again. src/hotspot/share/opto/c2_globals.hpp line 815: > 813: product(uintx, ShortLoopIter, 1000, \ > 814: "Number of iterations for a short running loop") \ > 815: range(0, max_juint) \ Can you add something about what effect it has? src/hotspot/share/opto/loopTransform.cpp line 140: > 138: udiff = uinit_con - ulimit_con; > 139: } > 140: julong utrip_count = udiff / ABS(stride_con); Could `stride_con` be `min_int`? src/hotspot/share/opto/loopTransform.cpp line 150: > 148: jlong limit_con = (stride_con > 0) ? limit_type->is_int()->_hi : limit_type->is_int()->_lo; > 149: int stride_m = stride_con - (stride_con > 0 ? 1 : -1); > 150: jlong trip_count = (limit_con - init_con + stride_m)/stride_con; Suggestion: jlong trip_count = (limit_con - init_con + stride_m) / stride_con; src/hotspot/share/opto/loopnode.cpp line 1190: > 1188: get_template_assertion_predicates(parse_predicate_proj, list); > 1189: clone_assertion_predicates(loop, list, ctrl->in(0)->as_ParsePredicate()); > 1190: } You may want to talk with @chhagedorn to see if this cannot be done with less code-duplication. Also: where are the `Unique_Node_List` allocated from / deallocated? src/hotspot/share/opto/loopnode.cpp line 1242: > 1240: if (bt == T_LONG) { > 1241: // const TypeLong* new_limit_t = new_limit->Value(&_igvn)->is_long(); > 1242: // new_limit = new ConvL2INode(new_limit, TypeInt::make(checked_cast(new_limit_t->_lo), checked_cast(new_limit_t->_hi), new_limit_t->_widen)); Commented code? test/hotspot/jtreg/compiler/longcountedloops/TestShortLoopLostLimit.java line 26: > 24: /** > 25: * @test > 26: * @bug 8342330 This is a different bug number. Is that intentional? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21630#pullrequestreview-2477509103 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r1868858788 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r1868862123 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r1868862585 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r1868868161 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r1868869788 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r1868855392 From epeter at openjdk.org Wed Dec 4 07:54:45 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 4 Dec 2024 07:54:45 GMT Subject: RFR: 8342692: C2: MemorySegment API slow with short running loops [v5] In-Reply-To: References: Message-ID: On Wed, 4 Dec 2024 07:41:24 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 21 commits: >> >> - Merge branch 'master' into JDK-8342692 >> - whitespaces >> - more >> - merge >> - more >> - one more test >> - Merge branch 'master' into JDK-8342692 >> - more >> - more >> - Merge branch 'master' into JDK-8342692 >> - ... and 11 more: https://git.openjdk.org/jdk/compare/3b21a298...74c38342 > > src/hotspot/share/opto/loopTransform.cpp line 140: > >> 138: udiff = uinit_con - ulimit_con; >> 139: } >> 140: julong utrip_count = udiff / ABS(stride_con); > > Could `stride_con` be `min_int`? I wonder if we should start converting all these computations to `noOverflowInt`, just to avoid possible overflows etc. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r1868864818 From epeter at openjdk.org Wed Dec 4 07:59:43 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 4 Dec 2024 07:59:43 GMT Subject: RFR: 8341293: Split field loads through Nested Phis [v2] In-Reply-To: References: Message-ID: On Thu, 21 Nov 2024 21:24:37 GMT, Dhamoder Nalla wrote: >> As an extension of the work done as part of https://github.com/openjdk/jdk/pull/12897, split the field loads (AddP -> Load*) with nested phi parent nodes to enable more scalar replacements, thereby reducing memory allocation. >> >> >> Here are the sequence of Ideal graph transformations for Nested phi: >> >> >> ![image](https://github.com/user-attachments/assets/c18e5ca0-c554-475c-814a-7cb288d96569) >> >> ![image](https://github.com/user-attachments/assets/b279b5f2-9ec6-4d9b-a627-506451f1cf81) >> >> ![image](https://github.com/user-attachments/assets/f506b918-2dd0-4dbe-a440-ff253afa3961) >> >> JMH results: >> with disabled RAM >> >> Benchmark Mode Cnt Score Error Units >> NestedPhiAndRematerialize.NopRAM.testBailOut_runner avgt 15 13.969 ? 0.248 ms/op >> NestedPhiAndRematerialize.NopRAM.testFieldEscapeWithMerge_runner avgt 15 80.300 ? 4.306 ms/op >> NestedPhiAndRematerialize.NopRAM.testMerge_TryCatchFinally_runner avgt 15 72.182 ? 1.781 ms/op >> NestedPhiAndRematerialize.NopRAM.testMultiParentPhi_runner avgt 15 2.983 ? 0.001 ms/op >> NestedPhiAndRematerialize.NopRAM.testNestedPhiPolymorphic_runner avgt 15 18.342 ? 0.731 ms/op >> NestedPhiAndRematerialize.NopRAM.testNestedPhiProcessOrder_runner avgt 15 14.315 ? 0.443 ms/op >> NestedPhiAndRematerialize.NopRAM.testNestedPhiWithLambda_runner avgt 15 18.511 ? 1.212 ms/op >> NestedPhiAndRematerialize.NopRAM.testNestedPhiWithTrap_runner avgt 15 66.277 ? 1.478 ms/op >> NestedPhiAndRematerialize.NopRAM.testNestedPhi_FieldLoad_runner avgt 15 17.968 ? 0.306 ms/op >> NestedPhiAndRematerialize.NopRAM.testNestedPhi_TryCatch_runner avgt 15 14.186 ? 0.247 ms/op >> NestedPhiAndRematerialize.NopRAM.testRematerialize_MultiObj_runner avgt 15 88.435 ? 4.869 ms/op >> NestedPhiAndRematerialize.NopRAM.testRematerialize_SingleObj_runner avgt 15 29560.130 ? 48.797 ms/op >> NestedPhiAndRematerialize.NopRAM.testRematerialize_TryCatch_runner avgt 15 49.150 ? 2.307 ms/op >> NestedPhiAndRematerialize.NopRAM.testThreeLevelNestedPhi_runner avgt 15 18.236 ? 0.308 ms/op >> >> with enabled RAM >> Benchmark Mode Cnt Score Error Units >> NestedPhiAndRematerialize.YesRAM.testBailOut_runner avgt 15 3.257 ? 0.423 ms/op >> NestedPhiAndRematerialize.YesRAM.testFieldEscapeWithMerge_runner avgt 15 79.916 ? 3.477 ms/op >> NestedPhiAndRematerialize.YesRAM.testMerge_TryCatchFinally_runner avgt 15 72.053 ? 1.916 ms/op >> NestedPhiAndRematerialize.YesRAM.testMultiParentPhi_runner avgt 15 2.984 ? 0.001 ms/op >> NestedPhiAndRematerialize.YesRAM.testNestedPhiPolymorphic_runner avgt ... > > Dhamoder Nalla has updated the pull request incrementally with one additional commit since the last revision: > > CR feedback test/hotspot/jtreg/compiler/c2/irTests/scalarReplacement/AllocationMergesNestedPhiTests.java line 31: > 29: /* > 30: * @test > 31: * @bug 8281429 Is this bug id correct? test/hotspot/jtreg/compiler/c2/irTests/scalarReplacement/AllocationMergesNestedPhiTests.java line 34: > 32: * @summary Tests that C2 can correctly scalar replace some object allocation merges. > 33: * @library /test/lib / > 34: * @requires vm.debug == true & vm.flagless & vm.bits == 64 & vm.compiler2.enabled & vm.opt.final.EliminateAllocations Do you need all of these? Or is it just that IR rules are failing otherwise? If it is just about the IR rules, you can restrict IR rules with `applyIf...` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21270#discussion_r1868888712 PR Review Comment: https://git.openjdk.org/jdk/pull/21270#discussion_r1868892077 From epeter at openjdk.org Wed Dec 4 07:59:43 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 4 Dec 2024 07:59:43 GMT Subject: RFR: 8341293: Split field loads through Nested Phis [v2] In-Reply-To: References: Message-ID: On Wed, 4 Dec 2024 07:55:05 GMT, Emanuel Peter wrote: >> Dhamoder Nalla has updated the pull request incrementally with one additional commit since the last revision: >> >> CR feedback > > test/hotspot/jtreg/compiler/c2/irTests/scalarReplacement/AllocationMergesNestedPhiTests.java line 34: > >> 32: * @summary Tests that C2 can correctly scalar replace some object allocation merges. >> 33: * @library /test/lib / >> 34: * @requires vm.debug == true & vm.flagless & vm.bits == 64 & vm.compiler2.enabled & vm.opt.final.EliminateAllocations > > Do you need all of these? Or is it just that IR rules are failing otherwise? > If it is just about the IR rules, you can restrict IR rules with `applyIf...` Generally it is nice if tests can run on as many platforms, compilers and flags as possible. But of course IR rules can only apply under specific circumstances. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21270#discussion_r1868897981 From duke at openjdk.org Wed Dec 4 08:26:01 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Wed, 4 Dec 2024 08:26:01 GMT Subject: RFR: 8341696: C2: Non-fluid StringBuilder pattern bails out in OptoStringConcat Message-ID: Extends stringopts to also recognize non-fluid uses of StringBuilder and optimize them the same way. For example, this basic case was not optimized before and is optimized with this PR: StringBuilder sb = new StringBuilder(); sb.append("a"); sb.append(a); return sb.toString(); ------------- Commit messages: - Fix copyright year - Correctly handle constructor - Move into method - Update copyright - Add non-fluid stringopt Changes: https://git.openjdk.org/jdk/pull/22537/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22537&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8341696 Stats: 312 lines in 4 files changed: 249 ins; 43 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/22537.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22537/head:pull/22537 PR: https://git.openjdk.org/jdk/pull/22537 From duke at openjdk.org Wed Dec 4 08:34:00 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Wed, 4 Dec 2024 08:34:00 GMT Subject: RFR: 8341696: C2: Non-fluid StringBuilder pattern bails out in OptoStringConcat [v2] In-Reply-To: References: Message-ID: > Extends stringopts to also recognize non-fluid uses of StringBuilder and optimize them the same way. > > For example, this basic case was not optimized before and is optimized with this PR: > > > StringBuilder sb = new StringBuilder(); > sb.append("a"); > sb.append(a); > return sb.toString(); theoweidmannoracle has updated the pull request incrementally with three additional commits since the last revision: - Correct copyright - Ensure new line - Add copyright ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22537/files - new: https://git.openjdk.org/jdk/pull/22537/files/e55b554f..b8386fd2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22537&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22537&range=00-01 Stats: 26 lines in 1 file changed: 25 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22537.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22537/head:pull/22537 PR: https://git.openjdk.org/jdk/pull/22537 From epeter at openjdk.org Wed Dec 4 08:34:44 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 4 Dec 2024 08:34:44 GMT Subject: RFR: 8344026: Ubsan: prevent potential integer overflow in c1_LIRGenerator_.cpp file [v8] In-Reply-To: <2en5GIxXeljH8KabsgIjJ0-m2OYUCj_bXaFpOSfRZiM=.0fa0c343-a924-4051-8a67-58cf20733ff5@github.com> References: <2en5GIxXeljH8KabsgIjJ0-m2OYUCj_bXaFpOSfRZiM=.0fa0c343-a924-4051-8a67-58cf20733ff5@github.com> Message-ID: On Mon, 25 Nov 2024 13:44:03 GMT, Amit Kumar wrote: >> This PR converts datatype from `jint` to `juint` for contstant `c` check in c1_LIRGenerator_.cpp. Please look JBS for more info. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > unsigned int -> juint src/hotspot/cpu/aarch64/c1_LIRGenerator_aarch64.cpp line 281: > 279: bool LIRGenerator::strength_reduce_multiply(LIR_Opr left, jint c, LIR_Opr result, LIR_Opr tmp) { > 280: juint u_value = (juint)c; > 281: if (is_power_of_2(u_value - 1)) { What happens if this underflows? Is this not undefined behaviour? Could we use `java_add`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22144#discussion_r1868951272 From epeter at openjdk.org Wed Dec 4 08:34:45 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 4 Dec 2024 08:34:45 GMT Subject: RFR: 8344026: Ubsan: prevent potential integer overflow in c1_LIRGenerator_.cpp file [v8] In-Reply-To: References: <2en5GIxXeljH8KabsgIjJ0-m2OYUCj_bXaFpOSfRZiM=.0fa0c343-a924-4051-8a67-58cf20733ff5@github.com> Message-ID: <0Fc1FNC7m8UIskRUEZVyGDMkvPP-UAjxGLPw3JNNLBk=.1ad396f5-fed3-42bf-95a7-47f1846fa1f9@github.com> On Wed, 4 Dec 2024 08:30:19 GMT, Emanuel Peter wrote: >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> unsigned int -> juint > > src/hotspot/cpu/aarch64/c1_LIRGenerator_aarch64.cpp line 281: > >> 279: bool LIRGenerator::strength_reduce_multiply(LIR_Opr left, jint c, LIR_Opr result, LIR_Opr tmp) { >> 280: juint u_value = (juint)c; >> 281: if (is_power_of_2(u_value - 1)) { > > What happens if this underflows? Is this not undefined behaviour? Could we use `java_add`? And do you have some sort of tests for this, to make sure we check with all possible `c` values? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22144#discussion_r1868953085 From duke at openjdk.org Wed Dec 4 08:37:38 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Wed, 4 Dec 2024 08:37:38 GMT Subject: RFR: 8341696: C2: Non-fluid StringBuilder pattern bails out in OptoStringConcat [v2] In-Reply-To: References: Message-ID: On Wed, 4 Dec 2024 08:34:00 GMT, theoweidmannoracle wrote: >> Extends stringopts to also recognize non-fluid uses of StringBuilder and optimize them the same way. >> >> For example, this basic case was not optimized before and is optimized with this PR: >> >> >> StringBuilder sb = new StringBuilder(); >> sb.append("a"); >> sb.append(a); >> return sb.toString(); > > theoweidmannoracle has updated the pull request incrementally with three additional commits since the last revision: > > - Correct copyright > - Ensure new line > - Add copyright test/micro/org/openjdk/bench/vm/compiler/FluidSBBench.java line 2: > 1: /* > 2: * Copyright Amazon.com Inc. or its affiliates. All Rights Reserved. @shipilev I copied this benchmark from your RFE. Is this copyright notice correct? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22537#discussion_r1868960616 From shade at openjdk.org Wed Dec 4 08:47:38 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 4 Dec 2024 08:47:38 GMT Subject: RFR: 8341696: C2: Non-fluid StringBuilder pattern bails out in OptoStringConcat [v2] In-Reply-To: References: Message-ID: <4X69_VN_-b786I5w2ceQWEdPktfmVzC5bArNhHj6gJQ=.075a060e-eaf4-49a9-b6b4-603526611ca0@github.com> On Wed, 4 Dec 2024 08:34:41 GMT, theoweidmannoracle wrote: >> theoweidmannoracle has updated the pull request incrementally with three additional commits since the last revision: >> >> - Correct copyright >> - Ensure new line >> - Add copyright > > test/micro/org/openjdk/bench/vm/compiler/FluidSBBench.java line 2: > >> 1: /* >> 2: * Copyright Amazon.com Inc. or its affiliates. All Rights Reserved. > > @shipilev I copied this benchmark from your RFE. Is this copyright notice correct? Yes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22537#discussion_r1868978370 From kbarrett at openjdk.org Wed Dec 4 08:59:38 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 4 Dec 2024 08:59:38 GMT Subject: RFR: 8345159: RISCV: Fix -Wzero-as-null-pointer-constant warning in emit_static_call_stub In-Reply-To: References: <0k7ZOn2etTZAnrZZcGi1QMp6sZX36cGqPIlitEnQ4Fw=.693fc951-de7b-44a2-8f00-dd208515c260@github.com> Message-ID: On Tue, 3 Dec 2024 06:42:25 GMT, Robbin Ehn wrote: >> That seems OK to me. Then we can still use `movptr` here. >> >> address placeholder = pc(); // correct value will be patched in later >> movptr(t1, placeholder, offset, t0); // lui + lui + slli + add > > When looking at the disassembly it much easier to find uninitialized call stubs when we use 0. > Also if something would go wrong, it much nicer with a crash than a loop. > So for debuggability I prefer address 0. I'm inclined to leave this as is, because I agree with @robehn and because I think directly calling movptr2 when that's what one needs to have called (because one expected a particular sequence of instructions to be generated) is better than indirectly calling it via the more generic movptr. Anyone disagree? I'll wait another day for responses. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22435#discussion_r1868998058 From mli at openjdk.org Wed Dec 4 09:14:44 2024 From: mli at openjdk.org (Hamlin Li) Date: Wed, 4 Dec 2024 09:14:44 GMT Subject: RFR: 8345159: RISCV: Fix -Wzero-as-null-pointer-constant warning in emit_static_call_stub In-Reply-To: References: <0k7ZOn2etTZAnrZZcGi1QMp6sZX36cGqPIlitEnQ4Fw=.693fc951-de7b-44a2-8f00-dd208515c260@github.com> Message-ID: On Wed, 4 Dec 2024 08:57:00 GMT, Kim Barrett wrote: >> When looking at the disassembly it much easier to find uninitialized call stubs when we use 0. >> Also if something would go wrong, it much nicer with a crash than a loop. >> So for debuggability I prefer address 0. > > I'm inclined to leave this as is, because I agree with @robehn and because I think directly calling movptr2 > when that's what one needs to have called (because one expected a particular sequence of instructions > to be generated) is better than indirectly calling it via the more generic movptr. Anyone disagree? I'll wait > another day for responses. I agree with a straight `0`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22435#discussion_r1869025747 From duke at openjdk.org Wed Dec 4 09:15:37 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Wed, 4 Dec 2024 09:15:37 GMT Subject: RFR: 8341696: C2: Non-fluid StringBuilder pattern bails out in OptoStringConcat [v3] In-Reply-To: References: Message-ID: > Extends stringopts to also recognize non-fluid uses of StringBuilder and optimize them the same way. > > For example, this basic case was not optimized before and is optimized with this PR: > > > StringBuilder sb = new StringBuilder(); > sb.append("a"); > sb.append(a); > return sb.toString(); theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: Move test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22537/files - new: https://git.openjdk.org/jdk/pull/22537/files/b8386fd2..69127e10 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22537&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22537&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22537.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22537/head:pull/22537 PR: https://git.openjdk.org/jdk/pull/22537 From shade at openjdk.org Wed Dec 4 09:36:14 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 4 Dec 2024 09:36:14 GMT Subject: RFR: 8345219: C2: x86_64 should not go to interpreter stubs for NaNs handling [v2] In-Reply-To: References: Message-ID: <4qUPY8xPulC5QpUdwxT3yBeHrupdhlEBxmE8gqH9GVc=.b6e75198-d1be-48e6-996f-943dd1e42564@github.com> > Found this while cleaning up x86_32 code for removal. > > In our current code there is a block added by [JDK-8076373](https://bugs.openjdk.org/browse/JDK-8076373): > https://github.com/openjdk/jdk/blob/3b21a298c29d88720f6bfb2dc1f3305b6a3db307/src/hotspot/share/compiler/compileBroker.cpp#L1451-L1473 > > Ostensibly, that block is for x86_32 handling of signalling NaNs -- x87 FPU has a peculiarity with them. See other funky bugs we seen with it: [JDK-8285985](https://bugs.openjdk.org/browse/JDK-8285985), [JDK-8293991](https://bugs.openjdk.org/browse/JDK-8293991). > > But the way current block is coded, it is enabled for X86 wholesale, which also means x86_64! In fact, it is likely even worse on x86_64, because the related "fast" entries are generated only for x86_32: > https://github.com/openjdk/jdk/blob/3b21a298c29d88720f6bfb2dc1f3305b6a3db307/src/hotspot/share/interpreter/templateInterpreterGenerator.cpp#L493-L502 > > This can be solved by checking `IA32` instead of `X86`. This block would be gone completely once we remove x86_32 port. Meanwhile, we can make it right by x86_64, and make eventual x86_32 removal less confusing. This issue seems to only affect the compilation of native methods, while most of the hot code is riding on compiler intrinsics. I'll put performance data in comments. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Add IR tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22446/files - new: https://git.openjdk.org/jdk/pull/22446/files/c5f89e88..e24b0060 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22446&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22446&range=00-01 Stats: 146 lines in 2 files changed: 146 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22446.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22446/head:pull/22446 PR: https://git.openjdk.org/jdk/pull/22446 From shade at openjdk.org Wed Dec 4 09:36:14 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 4 Dec 2024 09:36:14 GMT Subject: RFR: 8345219: C2: x86_64 should not go to interpreter stubs for NaNs handling [v2] In-Reply-To: References: Message-ID: On Wed, 4 Dec 2024 07:17:55 GMT, Aleksey Shipilev wrote: > > One question: Could we add an IR test that would show that no interpreter stubs are emitted, but instead whatever it is now emitting? > > We could, but let's not do this in this PR. I already jumped the gun here a little with adding a benchmark to show the effect we have :) Actually, I thought it would be more hassle because the conversion might be arch-specific. But I see that C2 intrinsics just emit the relevant conversion nodes unconditionally, so the test can be simple. I added one in new commit. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22446#issuecomment-2516675519 From epeter at openjdk.org Wed Dec 4 09:36:14 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 4 Dec 2024 09:36:14 GMT Subject: RFR: 8345219: C2: x86_64 should not go to interpreter stubs for NaNs handling In-Reply-To: References: Message-ID: On Thu, 28 Nov 2024 18:22:24 GMT, Aleksey Shipilev wrote: > Found this while cleaning up x86_32 code for removal. > > In our current code there is a block added by [JDK-8076373](https://bugs.openjdk.org/browse/JDK-8076373): > https://github.com/openjdk/jdk/blob/3b21a298c29d88720f6bfb2dc1f3305b6a3db307/src/hotspot/share/compiler/compileBroker.cpp#L1451-L1473 > > Ostensibly, that block is for x86_32 handling of signalling NaNs -- x87 FPU has a peculiarity with them. See other funky bugs we seen with it: [JDK-8285985](https://bugs.openjdk.org/browse/JDK-8285985), [JDK-8293991](https://bugs.openjdk.org/browse/JDK-8293991). > > But the way current block is coded, it is enabled for X86 wholesale, which also means x86_64! In fact, it is likely even worse on x86_64, because the related "fast" entries are generated only for x86_32: > https://github.com/openjdk/jdk/blob/3b21a298c29d88720f6bfb2dc1f3305b6a3db307/src/hotspot/share/interpreter/templateInterpreterGenerator.cpp#L493-L502 > > This can be solved by checking `IA32` instead of `X86`. This block would be gone completely once we remove x86_32 port. Meanwhile, we can make it right by x86_64, and make eventual x86_32 removal less confusing. This issue seems to only affect the compilation of native methods, while most of the hot code is riding on compiler intrinsics. I'll put performance data in comments. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` Nice, that was quick! Do we not need to restrict the IR rules? Or will that not fail on `IA32`? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22446#issuecomment-2516685436 From shade at openjdk.org Wed Dec 4 09:36:14 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 4 Dec 2024 09:36:14 GMT Subject: RFR: 8345219: C2: x86_64 should not go to interpreter stubs for NaNs handling In-Reply-To: References: Message-ID: On Wed, 4 Dec 2024 09:31:14 GMT, Emanuel Peter wrote: > Nice, that was quick! Do we not need to restrict the IR rules? Or will that not fail on `IA32`? `IA32` is on its way out, so I would be okay with the new test to pretend `IA32` does not exist :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/22446#issuecomment-2516689914 From mdoerr at openjdk.org Wed Dec 4 09:37:42 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 4 Dec 2024 09:37:42 GMT Subject: RFR: 8344026: Ubsan: prevent potential integer overflow in c1_LIRGenerator_.cpp file [v8] In-Reply-To: <0Fc1FNC7m8UIskRUEZVyGDMkvPP-UAjxGLPw3JNNLBk=.1ad396f5-fed3-42bf-95a7-47f1846fa1f9@github.com> References: <2en5GIxXeljH8KabsgIjJ0-m2OYUCj_bXaFpOSfRZiM=.0fa0c343-a924-4051-8a67-58cf20733ff5@github.com> <0Fc1FNC7m8UIskRUEZVyGDMkvPP-UAjxGLPw3JNNLBk=.1ad396f5-fed3-42bf-95a7-47f1846fa1f9@github.com> Message-ID: On Wed, 4 Dec 2024 08:31:50 GMT, Emanuel Peter wrote: >> src/hotspot/cpu/aarch64/c1_LIRGenerator_aarch64.cpp line 281: >> >>> 279: bool LIRGenerator::strength_reduce_multiply(LIR_Opr left, jint c, LIR_Opr result, LIR_Opr tmp) { >>> 280: juint u_value = (juint)c; >>> 281: if (is_power_of_2(u_value - 1)) { >> >> What happens if this underflows? Is this not undefined behaviour? Could we use `java_add`? > > And do you have some sort of tests for this, to make sure we check with all possible `c` values? > What happens if this underflows? Is this not undefined behaviour? Could we use `java_add`? Unsigned subtraction is never undefined. "wrap around" behavior is used. Using `java_add` / `java_subtract` sounds like a good idea. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22144#discussion_r1869066654 From shade at openjdk.org Wed Dec 4 09:42:59 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 4 Dec 2024 09:42:59 GMT Subject: RFR: 8345219: C2: x86_64 should not go to interpreter stubs for NaNs handling [v3] In-Reply-To: References: Message-ID: <-VRX_UtkVhwBeNJMvIs8RnYcPmXtAx6TWIpTP8Oj6iE=.62e16355-7536-45ee-9624-fd8a5c1acce0@github.com> > Found this while cleaning up x86_32 code for removal. > > In our current code there is a block added by [JDK-8076373](https://bugs.openjdk.org/browse/JDK-8076373): > https://github.com/openjdk/jdk/blob/3b21a298c29d88720f6bfb2dc1f3305b6a3db307/src/hotspot/share/compiler/compileBroker.cpp#L1451-L1473 > > Ostensibly, that block is for x86_32 handling of signalling NaNs -- x87 FPU has a peculiarity with them. See other funky bugs we seen with it: [JDK-8285985](https://bugs.openjdk.org/browse/JDK-8285985), [JDK-8293991](https://bugs.openjdk.org/browse/JDK-8293991). > > But the way current block is coded, it is enabled for X86 wholesale, which also means x86_64! In fact, it is likely even worse on x86_64, because the related "fast" entries are generated only for x86_32: > https://github.com/openjdk/jdk/blob/3b21a298c29d88720f6bfb2dc1f3305b6a3db307/src/hotspot/share/interpreter/templateInterpreterGenerator.cpp#L493-L502 > > This can be solved by checking `IA32` instead of `X86`. This block would be gone completely once we remove x86_32 port. Meanwhile, we can make it right by x86_64, and make eventual x86_32 removal less confusing. This issue seems to only affect the compilation of native methods, while most of the hot code is riding on compiler intrinsics. I'll put performance data in comments. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Disable IR tests on IA32 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22446/files - new: https://git.openjdk.org/jdk/pull/22446/files/e24b0060..51cb1660 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22446&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22446&range=01-02 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22446.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22446/head:pull/22446 PR: https://git.openjdk.org/jdk/pull/22446 From aph at openjdk.org Wed Dec 4 09:59:41 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 4 Dec 2024 09:59:41 GMT Subject: RFR: 8344026: Ubsan: prevent potential integer overflow in c1_LIRGenerator_.cpp file [v8] In-Reply-To: <2en5GIxXeljH8KabsgIjJ0-m2OYUCj_bXaFpOSfRZiM=.0fa0c343-a924-4051-8a67-58cf20733ff5@github.com> References: <2en5GIxXeljH8KabsgIjJ0-m2OYUCj_bXaFpOSfRZiM=.0fa0c343-a924-4051-8a67-58cf20733ff5@github.com> Message-ID: On Mon, 25 Nov 2024 13:44:03 GMT, Amit Kumar wrote: >> This PR converts datatype from `jint` to `juint` for contstant `c` check in c1_LIRGenerator_.cpp. Please look JBS for more info. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > unsigned int -> juint Marked as reviewed by aph (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22144#pullrequestreview-2477912679 From amitkumar at openjdk.org Wed Dec 4 10:06:42 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 4 Dec 2024 10:06:42 GMT Subject: RFR: 8344026: Ubsan: prevent potential integer overflow in c1_LIRGenerator_.cpp file [v8] In-Reply-To: References: <2en5GIxXeljH8KabsgIjJ0-m2OYUCj_bXaFpOSfRZiM=.0fa0c343-a924-4051-8a67-58cf20733ff5@github.com> <0Fc1FNC7m8UIskRUEZVyGDMkvPP-UAjxGLPw3JNNLBk=.1ad396f5-fed3-42bf-95a7-47f1846fa1f9@github.com> Message-ID: <5e_MHZarll6kDLApsRIwapr5YSqW7VlRNiMC93VH0Uw=.3ff988f1-726f-4c91-ac6e-fcc5538bb974@github.com> On Wed, 4 Dec 2024 09:34:55 GMT, Martin Doerr wrote: >> And do you have some sort of tests for this, to make sure we check with all possible `c` values? > >> What happens if this underflows? Is this not undefined behaviour? Could we use `java_add`? > > Unsigned subtraction is never undefined. "wrap around" behavior is used. Using `java_add` / `java_subtract` sounds like a good idea. >And do you have some sort of tests for this, to make sure we check with all possible c values? No, as of now I only ran tier1 test cases with c1 compiler. Nothing else. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22144#discussion_r1869114193 From aph at openjdk.org Wed Dec 4 10:14:42 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 4 Dec 2024 10:14:42 GMT Subject: RFR: 8344026: Ubsan: prevent potential integer overflow in c1_LIRGenerator_.cpp file [v8] In-Reply-To: <5e_MHZarll6kDLApsRIwapr5YSqW7VlRNiMC93VH0Uw=.3ff988f1-726f-4c91-ac6e-fcc5538bb974@github.com> References: <2en5GIxXeljH8KabsgIjJ0-m2OYUCj_bXaFpOSfRZiM=.0fa0c343-a924-4051-8a67-58cf20733ff5@github.com> <0Fc1FNC7m8UIskRUEZVyGDMkvPP-UAjxGLPw3JNNLBk=.1ad396f5-fed3-42bf-95a7-47f1846fa1f9@github.com> <5e_MHZarll6kDLApsRIwapr5YSqW7VlRNiMC93VH0Uw=.3ff988f1-726f-4c91-ac6e-fcc5538bb974@github.com> Message-ID: On Wed, 4 Dec 2024 10:03:38 GMT, Amit Kumar wrote: >>> What happens if this underflows? Is this not undefined behaviour? Could we use `java_add`? >> >> Unsigned subtraction is never undefined. "wrap around" behavior is used. Using `java_add` / `java_subtract` sounds like a good idea. > >>And do you have some sort of tests for this, to make sure we check with all possible c values? > > No, as of now I only ran tier1 test cases with c1 compiler. Nothing else. All that `java_add` does is cast to unsigned and then add. That's equivalent to what we're doing here, but explicit casts make the arithmetic clearer, IMO. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22144#discussion_r1869127617 From epeter at openjdk.org Wed Dec 4 10:14:42 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 4 Dec 2024 10:14:42 GMT Subject: RFR: 8344026: Ubsan: prevent potential integer overflow in c1_LIRGenerator_.cpp file [v8] In-Reply-To: References: <2en5GIxXeljH8KabsgIjJ0-m2OYUCj_bXaFpOSfRZiM=.0fa0c343-a924-4051-8a67-58cf20733ff5@github.com> <0Fc1FNC7m8UIskRUEZVyGDMkvPP-UAjxGLPw3JNNLBk=.1ad396f5-fed3-42bf-95a7-47f1846fa1f9@github.com> <5e_MHZarll6kDLApsRIwapr5YSqW7VlRNiMC93VH0Uw=.3ff988f1-726f-4c91-ac6e-fcc5538bb974@github.com> Message-ID: <6RikOqqZivUpV30HKp4GhaZzzx7RgT2tde_8tLy1TCk=.d3457927-512f-4742-a8ad-3e116619c606@github.com> On Wed, 4 Dec 2024 10:10:06 GMT, Andrew Haley wrote: >>>And do you have some sort of tests for this, to make sure we check with all possible c values? >> >> No, as of now I only ran tier1 test cases with c1 compiler. Nothing else. > > All that `java_add` does is cast to unsigned and then add. That's equivalent to what we're doing here, but explicit casts make the arithmetic clearer, IMO. Ok, I forgot that unsigned has properly defined overflow semantics. > No, as of now I only ran tier1 test cases with c1 compiler. Nothing else. Tests would really be great. Is that possible? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22144#discussion_r1869132100 From djelinski at openjdk.org Wed Dec 4 10:16:14 2024 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Wed, 4 Dec 2024 10:16:14 GMT Subject: RFR: 8345471: Clean up compiler/intrinsics/sha/cli tests Message-ID: Merge all the GenericTestCaseForUnsupportedXXXCPU and GenericTestCaseForOtherCPU into GenericTestCaseForUnsupportedCPU.java. The CPU-specific files are almost identical; I chose to resolve the differences in favor of the AArch64 version. The OtherCPU version looks wrong, and it wasn't executed on any supported platform. The tests continue to pass on linux-aarch64/x64, windows-x64 and mac-aarch64. I didn't test other platforms. After the change, the tests will start running on PPC and S390. They will also automatically run on any new architectures. For those interested in historical background, when the tests were introduced, there were only 2 supported CPU architectures. X86 did not support any of the intrinsics, and the X86 test case did not even call `getPredicateForOption`. The call to `getPredicateForOption` was added in f2e9b827d699115f8683e9def06c249e5476fd50, and since then all the cases are the same. ------------- Commit messages: - Unify GenericTestCaseForUnsupportedCPU Changes: https://git.openjdk.org/jdk/pull/22517/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22517&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8345471 Stats: 628 lines in 11 files changed: 114 ins; 497 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/22517.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22517/head:pull/22517 PR: https://git.openjdk.org/jdk/pull/22517 From amitkumar at openjdk.org Wed Dec 4 10:22:50 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 4 Dec 2024 10:22:50 GMT Subject: RFR: 8344026: Ubsan: prevent potential integer overflow in c1_LIRGenerator_.cpp file [v8] In-Reply-To: <6RikOqqZivUpV30HKp4GhaZzzx7RgT2tde_8tLy1TCk=.d3457927-512f-4742-a8ad-3e116619c606@github.com> References: <2en5GIxXeljH8KabsgIjJ0-m2OYUCj_bXaFpOSfRZiM=.0fa0c343-a924-4051-8a67-58cf20733ff5@github.com> <0Fc1FNC7m8UIskRUEZVyGDMkvPP-UAjxGLPw3JNNLBk=.1ad396f5-fed3-42bf-95a7-47f1846fa1f9@github.com> <5e_MHZarll6kDLApsRIwapr5YSqW7VlRNiMC93VH0Uw=.3ff988f1-726f-4c91-ac6e-fcc5538bb974@github.com> <6RikOqqZivUpV30HKp4GhaZzzx7RgT2tde_8tLy1TCk=.d3457927-512f-4742-a8ad-3e116619c606@github.com> Message-ID: <22-J3Eelze79XROngKRHXcnkMgzwjdqC3zd7bry48qo=.5c3fe69c-bbd6-4a7b-9429-1e5582324860@github.com> On Wed, 4 Dec 2024 10:12:17 GMT, Emanuel Peter wrote: > Tests would really be great. Is that possible? I could try, Wouldn't this be already tested by `test/hotspot/jtreg/compiler/c1/MultiplyByMaxInt.java` and `test/hotspot/jtreg/compiler/integerArithmetic/MultiplyByIntegerMinHang.java` tests :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22144#discussion_r1869144823 From jbhateja at openjdk.org Wed Dec 4 10:32:46 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 4 Dec 2024 10:32:46 GMT Subject: RFR: 8345472: Fix incorrect format instruction for floating point max/min patterns Message-ID: <17-MPy5iimqCflEA8LjXnW-TpzGQtUa74FScH5MJZwU=.b0b0e670-8d14-467a-83ef-7a7a39949296@github.com> The bug fix patch fixes incorrect operand references in the format instruction for floating point max/min patterns. Best Regards, Jatin ------------- Commit messages: - 8345472: Fix incorrect format instructions for floating point max/min patterns Changes: https://git.openjdk.org/jdk/pull/22457/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22457&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8345472 Stats: 20 lines in 1 file changed: 0 ins; 0 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/22457.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22457/head:pull/22457 PR: https://git.openjdk.org/jdk/pull/22457 From shade at openjdk.org Wed Dec 4 12:59:47 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 4 Dec 2024 12:59:47 GMT Subject: RFR: 8345219: C2: x86_64 should not go to interpreter stubs for NaNs handling In-Reply-To: References: Message-ID: On Wed, 4 Dec 2024 09:31:14 GMT, Emanuel Peter wrote: >> Found this while cleaning up x86_32 code for removal. >> >> In our current code there is a block added by [JDK-8076373](https://bugs.openjdk.org/browse/JDK-8076373): >> https://github.com/openjdk/jdk/blob/3b21a298c29d88720f6bfb2dc1f3305b6a3db307/src/hotspot/share/compiler/compileBroker.cpp#L1451-L1473 >> >> Ostensibly, that block is for x86_32 handling of signalling NaNs -- x87 FPU has a peculiarity with them. See other funky bugs we seen with it: [JDK-8285985](https://bugs.openjdk.org/browse/JDK-8285985), [JDK-8293991](https://bugs.openjdk.org/browse/JDK-8293991). >> >> But the way current block is coded, it is enabled for X86 wholesale, which also means x86_64! In fact, it is likely even worse on x86_64, because the related "fast" entries are generated only for x86_32: >> https://github.com/openjdk/jdk/blob/3b21a298c29d88720f6bfb2dc1f3305b6a3db307/src/hotspot/share/interpreter/templateInterpreterGenerator.cpp#L493-L502 >> >> This can be solved by checking `IA32` instead of `X86`. This block would be gone completely once we remove x86_32 port. Meanwhile, we can make it right by x86_64, and make eventual x86_32 removal less confusing. This issue seems to only affect the compilation of native methods, while most of the hot code is riding on compiler intrinsics. I'll put performance data in comments. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` > > Nice, that was quick! > Do we not need to restrict the IR rules? Or will that not fail on `IA32`? Seems to work on x86_64, AArch64, as well as on GHA. @eme64, are you good with this then? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22446#issuecomment-2517276608 From epeter at openjdk.org Wed Dec 4 14:09:41 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 4 Dec 2024 14:09:41 GMT Subject: RFR: 8345219: C2: x86_64 should not go to interpreter stubs for NaNs handling [v3] In-Reply-To: <-VRX_UtkVhwBeNJMvIs8RnYcPmXtAx6TWIpTP8Oj6iE=.62e16355-7536-45ee-9624-fd8a5c1acce0@github.com> References: <-VRX_UtkVhwBeNJMvIs8RnYcPmXtAx6TWIpTP8Oj6iE=.62e16355-7536-45ee-9624-fd8a5c1acce0@github.com> Message-ID: On Wed, 4 Dec 2024 09:42:59 GMT, Aleksey Shipilev wrote: >> Found this while cleaning up x86_32 code for removal. >> >> In our current code there is a block added by [JDK-8076373](https://bugs.openjdk.org/browse/JDK-8076373): >> https://github.com/openjdk/jdk/blob/3b21a298c29d88720f6bfb2dc1f3305b6a3db307/src/hotspot/share/compiler/compileBroker.cpp#L1451-L1473 >> >> Ostensibly, that block is for x86_32 handling of signalling NaNs -- x87 FPU has a peculiarity with them. See other funky bugs we seen with it: [JDK-8285985](https://bugs.openjdk.org/browse/JDK-8285985), [JDK-8293991](https://bugs.openjdk.org/browse/JDK-8293991). >> >> But the way current block is coded, it is enabled for X86 wholesale, which also means x86_64! In fact, it is likely even worse on x86_64, because the related "fast" entries are generated only for x86_32: >> https://github.com/openjdk/jdk/blob/3b21a298c29d88720f6bfb2dc1f3305b6a3db307/src/hotspot/share/interpreter/templateInterpreterGenerator.cpp#L493-L502 >> >> This can be solved by checking `IA32` instead of `X86`. This block would be gone completely once we remove x86_32 port. Meanwhile, we can make it right by x86_64, and make eventual x86_32 removal less confusing. This issue seems to only affect the compilation of native methods, while most of the hot code is riding on compiler intrinsics. I'll put performance data in comments. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Disable IR tests on IA32 I just launched testing, I don't think @vnkozlov did that already. Better to hold off until RDP1 / JDK25 fork anyway. test/hotspot/jtreg/compiler/c2/irTests/TestFPConversion.java line 33: > 31: * @summary Test that code generation for FP conversion works as intended > 32: * @library /test/lib / > 33: * @requires os.arch != "x86" & os.arch != "i386" It would have been preferrable to add this to the IR rule, so the test still runs elsewhere. What about `aarch64`? ------------- PR Review: https://git.openjdk.org/jdk/pull/22446#pullrequestreview-2478784069 PR Review Comment: https://git.openjdk.org/jdk/pull/22446#discussion_r1869595147 From shade at openjdk.org Wed Dec 4 14:52:38 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 4 Dec 2024 14:52:38 GMT Subject: RFR: 8345219: C2: x86_64 should not go to interpreter stubs for NaNs handling [v3] In-Reply-To: References: Message-ID: On Tue, 3 Dec 2024 03:10:51 GMT, Vladimir Kozlov wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Disable IR tests on IA32 > > This fix looks correct by looking on original changes. > > Yes, special Interpreter's code for these methods is only generated for x86_32 in `templateInterpreterGenerator_x86_32.cpp`. > I just launched testing, I don't think @vnkozlov did that already. Better to hold off until RDP1 / JDK25 fork anyway. Thanks! I would like to have this fix in JDK 24, though: it is a (minor) performance optimization, and I would want to know if it breaks anything in ATR before we pull this block out completely when removing x86_32. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22446#issuecomment-2517655873 From shade at openjdk.org Wed Dec 4 14:52:39 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 4 Dec 2024 14:52:39 GMT Subject: RFR: 8345219: C2: x86_64 should not go to interpreter stubs for NaNs handling [v3] In-Reply-To: References: <-VRX_UtkVhwBeNJMvIs8RnYcPmXtAx6TWIpTP8Oj6iE=.62e16355-7536-45ee-9624-fd8a5c1acce0@github.com> Message-ID: <3D3kyP1ezMwB8abCgYHOgJo_-OWEEqBPwx_6bF_eEmE=.fb0af155-e5f6-4325-ba65-2d4537ddb6ea@github.com> On Wed, 4 Dec 2024 14:06:38 GMT, Emanuel Peter wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Disable IR tests on IA32 > > test/hotspot/jtreg/compiler/c2/irTests/TestFPConversion.java line 33: > >> 31: * @summary Test that code generation for FP conversion works as intended >> 32: * @library /test/lib / >> 33: * @requires os.arch != "x86" & os.arch != "i386" > > It would have been preferrable to add this to the IR rule, so the test still runs elsewhere. > What about `aarch64`? For all practical purposes, the test will run everywhere, really. It would be skipped only on `IA32`, which is going away. This `@requires` serves as the concession to `IA32` that is still in the tree. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22446#discussion_r1869679943 From epeter at openjdk.org Wed Dec 4 15:11:40 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 4 Dec 2024 15:11:40 GMT Subject: RFR: 8345219: C2: x86_64 should not go to interpreter stubs for NaNs handling [v3] In-Reply-To: References: Message-ID: <4pfFyKKJT2wiFKNXhhHndllHZ5-OEXRPpyBLbz4goXU=.8c53086b-f881-4d3d-9e0e-6d5d7ab026fd@github.com> On Wed, 4 Dec 2024 14:50:11 GMT, Aleksey Shipilev wrote: >> This fix looks correct by looking on original changes. >> >> Yes, special Interpreter's code for these methods is only generated for x86_32 in `templateInterpreterGenerator_x86_32.cpp`. > >> I just launched testing, I don't think @vnkozlov did that already. Better to hold off until RDP1 / JDK25 fork anyway. > > Thanks! I would like to have this fix in JDK 24, though: it is a (minor) performance optimization, and I would want to know if it breaks anything in ATR before we pull this block out completely when removing x86_32. @shipilev ok, I guess it is a small change and the backout would be simple. Half of the testing just passed, no failures so far. Give me 1-2h to see if anything comes up. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22446#issuecomment-2517707266 From epeter at openjdk.org Wed Dec 4 15:11:42 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 4 Dec 2024 15:11:42 GMT Subject: RFR: 8345219: C2: x86_64 should not go to interpreter stubs for NaNs handling [v3] In-Reply-To: <3D3kyP1ezMwB8abCgYHOgJo_-OWEEqBPwx_6bF_eEmE=.fb0af155-e5f6-4325-ba65-2d4537ddb6ea@github.com> References: <-VRX_UtkVhwBeNJMvIs8RnYcPmXtAx6TWIpTP8Oj6iE=.62e16355-7536-45ee-9624-fd8a5c1acce0@github.com> <3D3kyP1ezMwB8abCgYHOgJo_-OWEEqBPwx_6bF_eEmE=.fb0af155-e5f6-4325-ba65-2d4537ddb6ea@github.com> Message-ID: On Wed, 4 Dec 2024 14:48:55 GMT, Aleksey Shipilev wrote: >> test/hotspot/jtreg/compiler/c2/irTests/TestFPConversion.java line 33: >> >>> 31: * @summary Test that code generation for FP conversion works as intended >>> 32: * @library /test/lib / >>> 33: * @requires os.arch != "x86" & os.arch != "i386" >> >> It would have been preferrable to add this to the IR rule, so the test still runs elsewhere. >> What about `aarch64`? > > For all practical purposes, the test will run everywhere, really. It would be skipped only on `IA32`, which is going away. This `@requires` serves as the concession to `IA32` that is still in the tree. Ah, I read the requires wrong. All good. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22446#discussion_r1869731836 From roland at openjdk.org Wed Dec 4 15:18:55 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 4 Dec 2024 15:18:55 GMT Subject: RFR: 8345299: C2: some nodes can still have incorrect control after do_range_check() [v2] In-Reply-To: References: Message-ID: On Tue, 3 Dec 2024 17:14:28 GMT, Vladimir Kozlov wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > src/hotspot/share/opto/loopTransform.cpp line 2825: > >> 2823: // new main_limit can push Bool/Cmp nodes down (when one of the eliminated condition has parameters that are not loop >> 2824: // invariant in the pre loop). >> 2825: set_ctrl(opqzm, new_limit_ctrl); > > Update this comment too. Both done in new commit. Do the comments look ok to you? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22485#discussion_r1869756543 From roland at openjdk.org Wed Dec 4 15:18:54 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 4 Dec 2024 15:18:54 GMT Subject: RFR: 8345299: C2: some nodes can still have incorrect control after do_range_check() [v2] In-Reply-To: References: Message-ID: <0OJRLL7pqscoC_54OG2QjQj7zMwVLVUtsJYXM9wh6_c=.f9502ae7-ffba-42fd-ab92-9bfc242735a0@github.com> > 8339733 fixed controls for updated pre/main limits during > do_range_check(). However, it missed one issue: > > Control for the new limits is computed in `new_limit_ctrl` for both > pre and main loops. `new_limit_ctrl` is currently initialized from the > pre limit control but it also needs to take the main loop limit > control into account as sometimes, the main loop limit control is > below the pre limit and pre loop entry control. > > 8339733 also introduced a couple bugs. Control of the `Bool`/`Cmp` > nodes are updated for the pre and main loop to `new_limit_ctrl`. But > that's incorrect because `new_limit_ctrl` may be above the pre loop > while the `Bool`/`Cmp` for the pre loop are in the loop (because they > depend on the loop iv) and for the main loop are after the pre loop > (because they depend on the iv out of the pre loop). I fixed this for > the pre loop by setting control for the `Bool`/`Cmp` to be as late as > possible. For the main loop, no change appears to be required as > control computed by c2 is already late enough. I've added an assert > instead. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22485/files - new: https://git.openjdk.org/jdk/pull/22485/files/e7904f6c..0b08402d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22485&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22485&range=00-01 Stats: 6 lines in 1 file changed: 4 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/22485.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22485/head:pull/22485 PR: https://git.openjdk.org/jdk/pull/22485 From roland at openjdk.org Wed Dec 4 15:35:44 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 4 Dec 2024 15:35:44 GMT Subject: RFR: 8342692: C2: MemorySegment API slow with short running loops [v6] In-Reply-To: References: Message-ID: > To optimize a long counted loop and long range checks in a long or int > counted loop, the loop is turned into a loop nest. When the loop has > few iterations, the overhead of having an outer loop whose backedge is > never taken, has a measurable cost. Furthermore, creating the loop > nest usually causes one iteration of the loop to be peeled so > predicates can be set up. If the loop is short running, then it's an > extra iteration that's run with range checks (compared to an int > counted loop with int range checks). > > This change doesn't create a loop nest when: > > 1- it can be determined statically at loop nest creation time that the > loop runs for a short enough number of iterations > > 2- profiling reports that the loop runs for no more than ShortLoopIter > iterations (1000 by default). > > For 2-, a guard is added which is implemented as yet another predicate. > > While this change is in principle simple, I ran into a few > implementation issues: > > - while c2 has a way to compute the number of iterations of an int > counted loop, it doesn't have that for long counted loop. The > existing logic for int counted loops promotes values to long to > avoid overflows. I reworked it so it now works for both long and int > counted loops. > > - I added a new deoptimization reason (Reason_short_running_loop) for > the new predicate. Given the number of iterations is narrowed down > by the predicate, the limit of the loop after transformation is a > cast node that's control dependent on the short running loop > predicate. Because once the counted loop is transformed, it is > likely that range check predicates will be inserted and they will > depend on the limit, the short running loop predicate has to be the > one that's further away from the loop entry. Now it is also possible > that the limit before transformation depends on a predicate > (TestShortRunningLongCountedLoopPredicatesClone is an example), we > can have: new predicates inserted after the transformation that > depend on the casted limit that itself depend on old predicates > added before the transformation. To solve this cicular dependency, > parse and assert predicates are cloned between the old predicates > and the loop head. The cloned short running loop parse predicate is > the one that's used to insert the short running loop predicate. > > - In the case of a long counted loop, the loop is transformed into a > regular loop with a new limit and transformed range checks that's > later turned into an in counted loop. The int ... Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/loopTransform.cpp Co-authored-by: Emanuel Peter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21630/files - new: https://git.openjdk.org/jdk/pull/21630/files/74c38342..aa567717 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21630&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21630&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21630/head:pull/21630 PR: https://git.openjdk.org/jdk/pull/21630 From roland at openjdk.org Wed Dec 4 15:38:57 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 4 Dec 2024 15:38:57 GMT Subject: RFR: 8343747: C2: TestReplicateAtConv.java crashes with -XX:MaxVectorSize=8 [v2] In-Reply-To: References: Message-ID: > Crash occurs when attempting to create a `Replicate` node that's input > to a `VectorCast` node (for a `ConvL2I`) that's not supported by the > platform (when run with `MaxVectorSize=8`). I think the pack for the > `VectorCast` should be filtered out earlier as not implemented and I > propose adding a test to `VectorCastNode::implemented()` for the type > of its input to handle that corner case. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22442/files - new: https://git.openjdk.org/jdk/pull/22442/files/ecc9a7b7..faacb5d7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22442&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22442&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/22442.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22442/head:pull/22442 PR: https://git.openjdk.org/jdk/pull/22442 From epeter at openjdk.org Wed Dec 4 15:38:58 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 4 Dec 2024 15:38:58 GMT Subject: RFR: 8343747: C2: TestReplicateAtConv.java crashes with -XX:MaxVectorSize=8 [v2] In-Reply-To: References: Message-ID: On Wed, 4 Dec 2024 15:36:00 GMT, Roland Westrelin wrote: >> Crash occurs when attempting to create a `Replicate` node that's input >> to a `VectorCast` node (for a `ConvL2I`) that's not supported by the >> platform (when run with `MaxVectorSize=8`). I think the pack for the >> `VectorCast` should be filtered out earlier as not implemented and I >> propose adding a test to `VectorCastNode::implemented()` for the type >> of its input to handle that corner case. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Thanks for the update - still good :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22442#pullrequestreview-2479109766 From roland at openjdk.org Wed Dec 4 15:38:59 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 4 Dec 2024 15:38:59 GMT Subject: RFR: 8343747: C2: TestReplicateAtConv.java crashes with -XX:MaxVectorSize=8 [v2] In-Reply-To: <_B_2xeH_MuZ_Pzk9K2Ap2klG1XahEnIFr_U3QTi5cnc=.b6e3a197-fa35-4d34-a11a-c120c607c8d3@github.com> References: <_B_2xeH_MuZ_Pzk9K2Ap2klG1XahEnIFr_U3QTi5cnc=.b6e3a197-fa35-4d34-a11a-c120c607c8d3@github.com> Message-ID: On Tue, 3 Dec 2024 16:33:16 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/vectorization/TestReplicateAtConv.java line 29: >> >>> 27: * @summary C2 compilation fails with "bad AD file" due to Replicate >>> 28: * @run main/othervm -XX:CompileCommand=compileonly,TestReplicateAtConv::test -Xcomp TestReplicateAtConv >>> 29: * @run main/othervm -XX:CompileCommand=compileonly,TestReplicateAtConv::test -Xcomp -XX:MaxVectorSize=8 TestReplicateAtConv >> >> You should add the new bug id to the test. > > Maybe you should also update the summary, and say that the issue is about replicate and cast. Done in new commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22442#discussion_r1869788512 From roland at openjdk.org Wed Dec 4 15:48:46 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 4 Dec 2024 15:48:46 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v5] In-Reply-To: <0QhHkO9hp_uxL9EC5AYIhm95Gw2DeXXQZgVG2L0NDCw=.6076af9b-5922-48f7-ae41-66edcbe943ca@github.com> References: <0QhHkO9hp_uxL9EC5AYIhm95Gw2DeXXQZgVG2L0NDCw=.6076af9b-5922-48f7-ae41-66edcbe943ca@github.com> Message-ID: On Thu, 28 Nov 2024 15:40:42 GMT, Roland Westrelin wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 21 commits: >> >> - Merge branch 'master' into JDK-8342692 >> - whitespaces >> - more >> - merge >> - more >> - one more test >> - Merge branch 'master' into JDK-8342692 >> - more >> - more >> - Merge branch 'master' into JDK-8342692 >> - ... and 11 more: https://git.openjdk.org/jdk/compare/3b21a298...74c38342 > > I pushed an update that should fix all test failures except the one in `compiler/escapeAnalysis/TestMissingAntiDependency.java` (covered by JDK-8341976). A lot of them were caused by the following part of the change: >> In the case of a long counted loop, the loop is transformed into a > regular loop with a new limit and transformed range checks that's > later turned into an in counted loop. The int counted loop doesn't > need loop limit checks because of the way it's constructed. There's > an assert that catches that we don't attempt to add one. I ran into > test failures where, by the time the int counted loop is created, > the fact that the number of iterations of the loop is small enough > to not need a loop limit check gets lost. I added a cast to make > sure the narrowed limit's type is not lost (I had to do something > similar for loop nests). But then, I ran into the same issue again > because the cast was pushed through a sub or add and the narrowed > type was lost. I propose that pushing casts through sub/add be only > done after loop opts are over (same as what's done for range check > CastII). > > So I removed that part of the initial change and instead added some logic to pattern match the `CastLL` used by the loop nest for which the transformation of `(CastLL (AddL ...))` shouldn't be performed until the inner loop is turned into a counted loop. > @rwestrel Would you mind changing the title to something more descriptive of your change? I'm thinking: "C2: Don't create loop-nest for short running loops". Done. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21630#issuecomment-2517825359 From roland at openjdk.org Wed Dec 4 15:48:47 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 4 Dec 2024 15:48:47 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v5] In-Reply-To: References: Message-ID: <_WYCBS2i_X0pCnBbn00lFPYA6s-dxBypkXFZrl67rG4=.0dc4cb01-08cd-4b89-b98c-104399521044@github.com> On Wed, 4 Dec 2024 07:52:15 GMT, Emanuel Peter wrote: > Which benchmarks are you referring to? The one mentioned in the bug: https://github.com/openjdk/jdk/compare/master...mcimadamore:jdk:manual_mismatch_bench?expand=1 > test/hotspot/jtreg/compiler/longcountedloops/TestShortLoopLostLimit.java line 26: > >> 24: /** >> 25: * @test >> 26: * @bug 8342330 > > This is a different bug number. Is that intentional? Nope. Not sure what happened. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21630#issuecomment-2517827361 PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r1869824180 From duke at openjdk.org Wed Dec 4 15:52:42 2024 From: duke at openjdk.org (duke) Date: Wed, 4 Dec 2024 15:52:42 GMT Subject: RFR: 8344833: CTW: Make failing on zero classes optional [v3] In-Reply-To: References: Message-ID: On Wed, 27 Nov 2024 18:29:18 GMT, Evgeny Nikitin wrote: >> For CTW, zero classes in provided jar is now a failure. >> This creates noisy and blocking false positives in fuzzy/mass scale runs, where we use jar archives from random sources, unchecked or randomly generated, etc. >> >> This PR makes this behaviour controllable. Default reaction is a failure, like before. > > Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: > > Use totalClassCount instead of the classCount @lepestock Your change (at version d1e57aa60fe372a2a8d1fa52030d8f0f1390304b) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22320#issuecomment-2517839963 From roland at openjdk.org Wed Dec 4 15:58:41 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 4 Dec 2024 15:58:41 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v5] In-Reply-To: References: Message-ID: On Wed, 4 Dec 2024 07:43:35 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/loopTransform.cpp line 140: >> >>> 138: udiff = uinit_con - ulimit_con; >>> 139: } >>> 140: julong utrip_count = udiff / ABS(stride_con); >> >> Could `stride_con` be `min_int`? > > I wonder if we should start converting all these computations to `noOverflowInt`, just to avoid possible overflows etc. > Could `stride_con` be `min_int`? We don't create a counted loop if stride is min_int. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r1869837776 From roland at openjdk.org Wed Dec 4 15:58:41 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 4 Dec 2024 15:58:41 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v5] In-Reply-To: References: Message-ID: On Wed, 4 Dec 2024 15:53:48 GMT, Roland Westrelin wrote: >> I wonder if we should start converting all these computations to `noOverflowInt`, just to avoid possible overflows etc. > >> Could `stride_con` be `min_int`? > > We don't create a counted loop if stride is min_int. > I wonder if we should start converting all these computations to `noOverflowInt`, just to avoid possible overflows etc. In this case we would need `NoOverflowUnsignedLong` I think. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21630#discussion_r1869842138 From epeter at openjdk.org Wed Dec 4 16:03:43 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 4 Dec 2024 16:03:43 GMT Subject: RFR: 8345219: C2: x86_64 should not go to interpreter stubs for NaNs handling [v3] In-Reply-To: <-VRX_UtkVhwBeNJMvIs8RnYcPmXtAx6TWIpTP8Oj6iE=.62e16355-7536-45ee-9624-fd8a5c1acce0@github.com> References: <-VRX_UtkVhwBeNJMvIs8RnYcPmXtAx6TWIpTP8Oj6iE=.62e16355-7536-45ee-9624-fd8a5c1acce0@github.com> Message-ID: On Wed, 4 Dec 2024 09:42:59 GMT, Aleksey Shipilev wrote: >> Found this while cleaning up x86_32 code for removal. >> >> In our current code there is a block added by [JDK-8076373](https://bugs.openjdk.org/browse/JDK-8076373): >> https://github.com/openjdk/jdk/blob/3b21a298c29d88720f6bfb2dc1f3305b6a3db307/src/hotspot/share/compiler/compileBroker.cpp#L1451-L1473 >> >> Ostensibly, that block is for x86_32 handling of signalling NaNs -- x87 FPU has a peculiarity with them. See other funky bugs we seen with it: [JDK-8285985](https://bugs.openjdk.org/browse/JDK-8285985), [JDK-8293991](https://bugs.openjdk.org/browse/JDK-8293991). >> >> But the way current block is coded, it is enabled for X86 wholesale, which also means x86_64! In fact, it is likely even worse on x86_64, because the related "fast" entries are generated only for x86_32: >> https://github.com/openjdk/jdk/blob/3b21a298c29d88720f6bfb2dc1f3305b6a3db307/src/hotspot/share/interpreter/templateInterpreterGenerator.cpp#L493-L502 >> >> This can be solved by checking `IA32` instead of `X86`. This block would be gone completely once we remove x86_32 port. Meanwhile, we can make it right by x86_64, and make eventual x86_32 removal less confusing. This issue seems to only affect the compilation of native methods, while most of the hot code is riding on compiler intrinsics. I'll put performance data in comments. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Disable IR tests on IA32 Almost all tests finished. All green. Approved. ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22446#pullrequestreview-2479203897 From kxu at openjdk.org Wed Dec 4 16:11:58 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 4 Dec 2024 16:11:58 GMT Subject: RFR: 8336759: C2: int counted loop with long limit not recognized as counted loop [v3] In-Reply-To: <_d_CiLfCN9ahEmhp9fLcGqO-L8n7a0gW86R3lzLkX60=.b3bdc697-cb2d-477f-a525-0f16a3eee383@github.com> References: <_d_CiLfCN9ahEmhp9fLcGqO-L8n7a0gW86R3lzLkX60=.b3bdc697-cb2d-477f-a525-0f16a3eee383@github.com> Message-ID: > This patch implements [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759) that recognizes int counted loops with long limits. > > Currently, patterns like `for ( int i =...; i < long_limit; ...)` where int `i` is implicitly promoted to long (i.e., `(long) i < long_limit`) is not recognized as (int) counted loop. This patch speculatively and optimistically converts long limits to ints and deoptimize if the limit is outside int range, allowing more optimization opportunities. > > In other words, it transforms > > > for (int i = 0; (long) i < long_limit; i++) {...} > > > to > > > if (int_min <= long_limit && long_limit <= int_max ) { > for (int i = 0; i < (int) long_limit; i++) {...} > } else { > trap: loop_limit_check > } > > > This could benefit calls to APIs like `long MemorySegment#byteSize()` when iterating over a long limit. Kangcheng Xu has updated the pull request incrementally with three additional commits since the last revision: - Update src/hotspot/share/opto/loopnode.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/loopnode.cpp Co-authored-by: Christian Hagedorn - Update src/hotspot/share/opto/loopnode.cpp Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22449/files - new: https://git.openjdk.org/jdk/pull/22449/files/79d8c146..de847318 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22449&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22449&range=01-02 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/22449.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22449/head:pull/22449 PR: https://git.openjdk.org/jdk/pull/22449 From kxu at openjdk.org Wed Dec 4 16:11:58 2024 From: kxu at openjdk.org (Kangcheng Xu) Date: Wed, 4 Dec 2024 16:11:58 GMT Subject: RFR: 8336759: C2: int counted loop with long limit not recognized as counted loop [v2] In-Reply-To: References: <_d_CiLfCN9ahEmhp9fLcGqO-L8n7a0gW86R3lzLkX60=.b3bdc697-cb2d-477f-a525-0f16a3eee383@github.com> Message-ID: On Wed, 4 Dec 2024 06:51:10 GMT, Christian Hagedorn wrote: > [...] you might want to wait until after the fork [...] Yes, I agree. I'll hold from integrating this PR after the fork. Thanks for the reminder (and the review)! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22449#issuecomment-2517889839 From shade at openjdk.org Wed Dec 4 16:40:57 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 4 Dec 2024 16:40:57 GMT Subject: RFR: 8345219: C2: x86_64 should not go to interpreter stubs for NaNs handling [v3] In-Reply-To: <-VRX_UtkVhwBeNJMvIs8RnYcPmXtAx6TWIpTP8Oj6iE=.62e16355-7536-45ee-9624-fd8a5c1acce0@github.com> References: <-VRX_UtkVhwBeNJMvIs8RnYcPmXtAx6TWIpTP8Oj6iE=.62e16355-7536-45ee-9624-fd8a5c1acce0@github.com> Message-ID: On Wed, 4 Dec 2024 09:42:59 GMT, Aleksey Shipilev wrote: >> Found this while cleaning up x86_32 code for removal. >> >> In our current code there is a block added by [JDK-8076373](https://bugs.openjdk.org/browse/JDK-8076373): >> https://github.com/openjdk/jdk/blob/3b21a298c29d88720f6bfb2dc1f3305b6a3db307/src/hotspot/share/compiler/compileBroker.cpp#L1451-L1473 >> >> Ostensibly, that block is for x86_32 handling of signalling NaNs -- x87 FPU has a peculiarity with them. See other funky bugs we seen with it: [JDK-8285985](https://bugs.openjdk.org/browse/JDK-8285985), [JDK-8293991](https://bugs.openjdk.org/browse/JDK-8293991). >> >> But the way current block is coded, it is enabled for X86 wholesale, which also means x86_64! In fact, it is likely even worse on x86_64, because the related "fast" entries are generated only for x86_32: >> https://github.com/openjdk/jdk/blob/3b21a298c29d88720f6bfb2dc1f3305b6a3db307/src/hotspot/share/interpreter/templateInterpreterGenerator.cpp#L493-L502 >> >> This can be solved by checking `IA32` instead of `X86`. This block would be gone completely once we remove x86_32 port. Meanwhile, we can make it right by x86_64, and make eventual x86_32 removal less confusing. This issue seems to only affect the compilation of native methods, while most of the hot code is riding on compiler intrinsics. I'll put performance data in comments. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Disable IR tests on IA32 Thanks! Here goes then. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22446#issuecomment-2517971997 From shade at openjdk.org Wed Dec 4 16:40:59 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 4 Dec 2024 16:40:59 GMT Subject: Integrated: 8345219: C2: x86_64 should not go to interpreter stubs for NaNs handling In-Reply-To: References: Message-ID: On Thu, 28 Nov 2024 18:22:24 GMT, Aleksey Shipilev wrote: > Found this while cleaning up x86_32 code for removal. > > In our current code there is a block added by [JDK-8076373](https://bugs.openjdk.org/browse/JDK-8076373): > https://github.com/openjdk/jdk/blob/3b21a298c29d88720f6bfb2dc1f3305b6a3db307/src/hotspot/share/compiler/compileBroker.cpp#L1451-L1473 > > Ostensibly, that block is for x86_32 handling of signalling NaNs -- x87 FPU has a peculiarity with them. See other funky bugs we seen with it: [JDK-8285985](https://bugs.openjdk.org/browse/JDK-8285985), [JDK-8293991](https://bugs.openjdk.org/browse/JDK-8293991). > > But the way current block is coded, it is enabled for X86 wholesale, which also means x86_64! In fact, it is likely even worse on x86_64, because the related "fast" entries are generated only for x86_32: > https://github.com/openjdk/jdk/blob/3b21a298c29d88720f6bfb2dc1f3305b6a3db307/src/hotspot/share/interpreter/templateInterpreterGenerator.cpp#L493-L502 > > This can be solved by checking `IA32` instead of `X86`. This block would be gone completely once we remove x86_32 port. Meanwhile, we can make it right by x86_64, and make eventual x86_32 removal less confusing. This issue seems to only affect the compilation of native methods, while most of the hot code is riding on compiler intrinsics. I'll put performance data in comments. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` This pull request has now been integrated. Changeset: f3b4350e Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/f3b4350e0f14d3b0c551e0d24563788f379111d6 Stats: 355 lines in 5 files changed: 353 ins; 0 del; 2 mod 8345219: C2: x86_64 should not go to interpreter stubs for NaNs handling Reviewed-by: epeter, kvn ------------- PR: https://git.openjdk.org/jdk/pull/22446 From jkarthikeyan at openjdk.org Wed Dec 4 16:41:46 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 4 Dec 2024 16:41:46 GMT Subject: RFR: 8341781: Improve Min/Max node identities [v4] In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 04:25:07 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This patch implements some missing identities for Min/Max nodes. It adds static type-based operand choosing for MinI/MaxI, such as the ones that MinL/MaxL use. In addition, it adds simplification for patterns such as `Max(A, Max(A, B))` to `Max(A, B)` and `Max(A, Min(A, B))` to `A`. These simplifications stem from the [lattice identity rules](https://en.wikipedia.org/wiki/Lattice_(order)#As_algebraic_structure). The main place I've seen this pattern is with MinL/MaxL nodes created during loop optimizations. Some examples of where this occurs include BigInteger addition/subtraction, and regex code. I've run some of the existing benchmarks and found some nice improvements: >> >> Baseline Patch >> Benchmark Mode Cnt Score Error Units Score Error Units Improvement >> BigIntegers.testAdd avgt 15 25.096 ? 3.936 ns/op 19.214 ? 0.521 ns/op (+ 26.5%) >> PatternBench.charPatternCompile avgt 8 453.727 ? 117.265 ns/op 370.054 ? 26.106 ns/op (+ 20.3%) >> PatternBench.charPatternMatch avgt 8 917.604 ? 121.766 ns/op 810.560 ? 38.437 ns/op (+ 12.3%) >> PatternBench.charPatternMatchWithCompile avgt 8 1477.703 ? 255.783 ns/op 1224.460 ? 28.220 ns/op (+ 18.7%) >> PatternBench.longStringGraphemeMatches avgt 8 860.909 ? 124.661 ns/op 743.729 ? 22.877 ns/op (+ 14.6%) >> PatternBench.splitFlags avgt 8 420.506 ? 76.252 ns/op 321.911 ? 11.661 ns/op (+ 26.6%) >> >> I've added some IR tests, and tier 1 testing passes on my linux machine. Reviews would be appreciated! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Make long tests check IR Thanks for the re-review! I think that's a good idea, I'll integrate it after JDK 24 is forked. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21439#issuecomment-2517978724 From kvn at openjdk.org Wed Dec 4 17:54:43 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 4 Dec 2024 17:54:43 GMT Subject: RFR: 8341696: C2: Non-fluid StringBuilder pattern bails out in OptoStringConcat [v3] In-Reply-To: References: Message-ID: <5EtdWyfk1xl82qO7F2hpEYiQqjdBKVYFQf_zF2KKP3Y=.232c0d49-3d27-4908-8cee-d9c64609ed98@github.com> On Wed, 4 Dec 2024 09:15:37 GMT, theoweidmannoracle wrote: >> Extends stringopts to also recognize non-fluid uses of StringBuilder and optimize them the same way. >> >> For example, this basic case was not optimized before and is optimized with this PR: >> >> >> StringBuilder sb = new StringBuilder(); >> sb.append("a"); >> sb.append(a); >> return sb.toString(); > > theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: > > Move test Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22537#pullrequestreview-2479485684 From kvn at openjdk.org Wed Dec 4 18:03:42 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 4 Dec 2024 18:03:42 GMT Subject: RFR: 8345299: C2: some nodes can still have incorrect control after do_range_check() [v2] In-Reply-To: <0OJRLL7pqscoC_54OG2QjQj7zMwVLVUtsJYXM9wh6_c=.f9502ae7-ffba-42fd-ab92-9bfc242735a0@github.com> References: <0OJRLL7pqscoC_54OG2QjQj7zMwVLVUtsJYXM9wh6_c=.f9502ae7-ffba-42fd-ab92-9bfc242735a0@github.com> Message-ID: On Wed, 4 Dec 2024 15:18:54 GMT, Roland Westrelin wrote: >> 8339733 fixed controls for updated pre/main limits during >> do_range_check(). However, it missed one issue: >> >> Control for the new limits is computed in `new_limit_ctrl` for both >> pre and main loops. `new_limit_ctrl` is currently initialized from the >> pre limit control but it also needs to take the main loop limit >> control into account as sometimes, the main loop limit control is >> below the pre limit and pre loop entry control. >> >> 8339733 also introduced a couple bugs. Control of the `Bool`/`Cmp` >> nodes are updated for the pre and main loop to `new_limit_ctrl`. But >> that's incorrect because `new_limit_ctrl` may be above the pre loop >> while the `Bool`/`Cmp` for the pre loop are in the loop (because they >> depend on the loop iv) and for the main loop are after the pre loop >> (because they depend on the iv out of the pre loop). I fixed this for >> the pre loop by setting control for the `Bool`/`Cmp` to be as late as >> possible. For the main loop, no change appears to be required as >> control computed by c2 is already late enough. I've added an assert >> instead. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22485#pullrequestreview-2479505358 From kvn at openjdk.org Wed Dec 4 18:03:44 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 4 Dec 2024 18:03:44 GMT Subject: RFR: 8345299: C2: some nodes can still have incorrect control after do_range_check() [v2] In-Reply-To: References: Message-ID: <4EAKbM9Z9VCKZhehu7gSH-bxoNT90oilCJFLyROT_rM=.9c67fda8-407a-47d3-b401-06e7e76df9fb@github.com> On Wed, 4 Dec 2024 15:15:39 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/loopTransform.cpp line 2825: >> >>> 2823: // new main_limit can push Bool/Cmp nodes down (when one of the eliminated condition has parameters that are not loop >>> 2824: // invariant in the pre loop). >>> 2825: set_ctrl(opqzm, new_limit_ctrl); >> >> Update this comment too. > > Both done in new commit. Do the comments look ok to you? yes ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22485#discussion_r1870036054 From kvn at openjdk.org Wed Dec 4 18:04:47 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 4 Dec 2024 18:04:47 GMT Subject: RFR: 8343747: C2: TestReplicateAtConv.java crashes with -XX:MaxVectorSize=8 [v2] In-Reply-To: References: Message-ID: On Wed, 4 Dec 2024 15:38:57 GMT, Roland Westrelin wrote: >> Crash occurs when attempting to create a `Replicate` node that's input >> to a `VectorCast` node (for a `ConvL2I`) that's not supported by the >> platform (when run with `MaxVectorSize=8`). I think the pack for the >> `VectorCast` should be filtered out earlier as not implemented and I >> propose adding a test to `VectorCastNode::implemented()` for the type >> of its input to handle that corner case. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Update is good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22442#pullrequestreview-2479507300 From kvn at openjdk.org Wed Dec 4 18:07:45 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 4 Dec 2024 18:07:45 GMT Subject: RFR: 8344833: CTW: Make failing on zero classes optional [v3] In-Reply-To: References: Message-ID: On Mon, 25 Nov 2024 06:39:29 GMT, Tobias Hartmann wrote: >> Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: >> >> Use totalClassCount instead of the classCount > > Changes requested by thartmann (Reviewer). I leave to @TobiHartmann to finish his review and sponsor if approved. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22320#issuecomment-2518170224 From kvn at openjdk.org Wed Dec 4 18:19:39 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 4 Dec 2024 18:19:39 GMT Subject: RFR: 8342651: Refactor array constant to use an array of jbyte [v3] In-Reply-To: References: <0gyWEIQ_ZHlIoR_7zdB6sxvApC-5hXkG3RnYQSqWp6w=.fad5dcb7-ffaf-4841-a55c-9afc3475a48d@github.com> <8Md6jSNs0prly4S1-5OoHCw8t68pk57lYJQ4YCH8ndI=.cc384063-f661-48c4-af29-4a788f1c24e6@github.com> Message-ID: On Tue, 26 Nov 2024 18:25:08 GMT, Quan Anh Mai wrote: >> src/hotspot/cpu/x86/x86.ad line 2771: >> >>> 2769: int offset = i * type2aelembytes(bt); >>> 2770: switch (bt) { >>> 2771: case T_BYTE: val->at(i) = con; break; >> >> I don't like that switch is executed for each copied element. What is typical `len` value? > > `len` is at most 16 and is typically 1 (you only emit 1 element and the broadcast instruction will fill the whole register). Also, this function is only invoked a couple of times for each compilation and I think the compiler can do unswitching, too. okay ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21596#discussion_r1870060518 From kvn at openjdk.org Wed Dec 4 18:23:42 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 4 Dec 2024 18:23:42 GMT Subject: RFR: 8342651: Refactor array constant to use an array of jbyte [v4] In-Reply-To: References: Message-ID: <_0jX1YedNq0T0kzc4T-UzKsvjvJrT2ZQdyd0m2uMl-Q=.f4818425-13ae-409a-a70a-b750d2b9dc89@github.com> On Tue, 26 Nov 2024 18:24:07 GMT, Quan Anh Mai wrote: >> Hi, >> >> This small patch refactors array constants in C2 to use an array of `jbyte`s instead of an array of `jvalue`. The former is much easier to work with and we can do `memcpy` with them trivially. >> >> Since code buffers support alignment of the constant section, I have also allowed constant tables to be aligned more than 8 bytes and used it for constant vectors on machines not supporting `SSE3`. I also fixed an issue with code buffer relocation where the temporary buffer is not correctly aligned. >> >> This patch is extracted from https://github.com/openjdk/jdk/pull/21229. Tests passed with `UseSSE=2` where 16-byte constants would be generated, as well as normal testing routines. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - add comment to ConstantTable::alignment > - Merge branch 'master' into constanttable > - indentation > - Merge branch 'master' into constanttable > - Merge branch 'master' into constanttable > - refactor array constant, fix codebuffer reallocation Good. Please push it into JDK 25 after 24 is forked. We would need to re-test it internally before integration. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21596#pullrequestreview-2479553695 From kvn at openjdk.org Wed Dec 4 18:26:39 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 4 Dec 2024 18:26:39 GMT Subject: RFR: 8345472: Fix incorrect format instruction for floating point max/min patterns In-Reply-To: <17-MPy5iimqCflEA8LjXnW-TpzGQtUa74FScH5MJZwU=.b0b0e670-8d14-467a-83ef-7a7a39949296@github.com> References: <17-MPy5iimqCflEA8LjXnW-TpzGQtUa74FScH5MJZwU=.b0b0e670-8d14-467a-83ef-7a7a39949296@github.com> Message-ID: On Fri, 29 Nov 2024 11:48:48 GMT, Jatin Bhateja wrote: > The bug fix patch fixes incorrect operand references in the format instruction for floating point max/min patterns. > > Best Regards, > Jatin Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22457#pullrequestreview-2479559099 From jbhateja at openjdk.org Wed Dec 4 18:30:43 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 4 Dec 2024 18:30:43 GMT Subject: Integrated: 8345472: Fix incorrect format instruction for floating point max/min patterns In-Reply-To: <17-MPy5iimqCflEA8LjXnW-TpzGQtUa74FScH5MJZwU=.b0b0e670-8d14-467a-83ef-7a7a39949296@github.com> References: <17-MPy5iimqCflEA8LjXnW-TpzGQtUa74FScH5MJZwU=.b0b0e670-8d14-467a-83ef-7a7a39949296@github.com> Message-ID: On Fri, 29 Nov 2024 11:48:48 GMT, Jatin Bhateja wrote: > The bug fix patch fixes incorrect operand references in the format instruction for floating point max/min patterns. > > Best Regards, > Jatin This pull request has now been integrated. Changeset: e1695f6c Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/e1695f6c40dbf27538c6c450eb1cf64a05e0ee9a Stats: 20 lines in 1 file changed: 0 ins; 0 del; 20 mod 8345472: Fix incorrect format instruction for floating point max/min patterns Reviewed-by: kvn ------------- PR: https://git.openjdk.org/jdk/pull/22457 From kvn at openjdk.org Wed Dec 4 20:08:41 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 4 Dec 2024 20:08:41 GMT Subject: RFR: 8345471: Clean up compiler/intrinsics/sha/cli tests In-Reply-To: References: Message-ID: On Tue, 3 Dec 2024 15:48:52 GMT, Daniel Jeli?ski wrote: > Merge all the GenericTestCaseForUnsupportedXXXCPU and GenericTestCaseForOtherCPU into GenericTestCaseForUnsupportedCPU.java. > > The CPU-specific files are almost identical; I chose to resolve the differences in favor of the AArch64 version. The OtherCPU version looks wrong, and it wasn't executed on any supported platform. > > The tests continue to pass on linux-aarch64/x64, windows-x64 and mac-aarch64. I didn't test other platforms. > > After the change, the tests will start running on PPC and S390. They will also automatically run on any new architectures. > > For those interested in historical background, when the tests were introduced, there were only 2 supported CPU architectures. X86 did not support any of the intrinsics, and the X86 test case did not even call `getPredicateForOption`. The call to `getPredicateForOption` was added in f2e9b827d699115f8683e9def06c249e5476fd50, and since then all the cases are the same. Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22517#pullrequestreview-2479793601 From psandoz at openjdk.org Thu Dec 5 00:49:11 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Thu, 5 Dec 2024 00:49:11 GMT Subject: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v42] In-Reply-To: References: Message-ID: On Sat, 23 Nov 2024 13:41:24 GMT, Piotr Tarsa wrote: > Could you add some code that disables the AVX512 version on Zen4, but keeps it enabled on Zen5 and future Zen architectures? Or as you suggest [here](https://github.com/intel/x86-simd-sort/issues/6#issuecomment-2506476505) revert to AVX2. I updated [JDK-8317976](JDK-8317976) with that suggestion, which is simpler to maintain. The HotSpot C++ class `VM_Version` might need updating to return the Zen version number. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14227#issuecomment-2518850112 From dlong at openjdk.org Thu Dec 5 01:32:37 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 5 Dec 2024 01:32:37 GMT Subject: RFR: 8345287: C2: live in computation is broken In-Reply-To: References: Message-ID: On Mon, 2 Dec 2024 09:03:50 GMT, Roland Westrelin wrote: > 8234003 (Improve IndexSet iteration) broke live in computation: > > > @@ -273,23 +276,25 @@ void PhaseLive::add_liveout( Block *p, IndexSet *lo, VectorSet &first_pass ) { > // Add a vector of live-in values to a given blocks live-in set. > void PhaseLive::add_livein(Block *p, IndexSet *lo) { > IndexSet *livein = &_livein[p->_pre_order-1]; > - IndexSetIterator elements(lo); > - uint r; > - while ((r = elements.next()) != 0) { > - livein->insert(r); // Then add to live-in set > + if (!livein->is_empty()) { > + IndexSetIterator elements(lo); > + uint r; > + while ((r = elements.next()) != 0) { > + livein->insert(r); // Then add to live-in set > + } > } > } > > > `livein` is initially empy and the patch above only adds element to it if: > > > if (!livein->is_empty()) { > > > which is never true. > > This doesn't affect correctness as live in sets are only used to drive > scheduling. Marked as reviewed by dlong (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22473#pullrequestreview-2480237284 From chagedorn at openjdk.org Thu Dec 5 06:57:42 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 5 Dec 2024 06:57:42 GMT Subject: RFR: 8345299: C2: some nodes can still have incorrect control after do_range_check() [v2] In-Reply-To: <0OJRLL7pqscoC_54OG2QjQj7zMwVLVUtsJYXM9wh6_c=.f9502ae7-ffba-42fd-ab92-9bfc242735a0@github.com> References: <0OJRLL7pqscoC_54OG2QjQj7zMwVLVUtsJYXM9wh6_c=.f9502ae7-ffba-42fd-ab92-9bfc242735a0@github.com> Message-ID: <6DoZh4BvSeikv90RcSjTqpm6p6TB_0Wc-9S07nfvjRQ=.67a6336f-6585-4527-bb39-da894f8be610@github.com> On Wed, 4 Dec 2024 15:18:54 GMT, Roland Westrelin wrote: >> 8339733 fixed controls for updated pre/main limits during >> do_range_check(). However, it missed one issue: >> >> Control for the new limits is computed in `new_limit_ctrl` for both >> pre and main loops. `new_limit_ctrl` is currently initialized from the >> pre limit control but it also needs to take the main loop limit >> control into account as sometimes, the main loop limit control is >> below the pre limit and pre loop entry control. >> >> 8339733 also introduced a couple bugs. Control of the `Bool`/`Cmp` >> nodes are updated for the pre and main loop to `new_limit_ctrl`. But >> that's incorrect because `new_limit_ctrl` may be above the pre loop >> while the `Bool`/`Cmp` for the pre loop are in the loop (because they >> depend on the loop iv) and for the main loop are after the pre loop >> (because they depend on the iv out of the pre loop). I fixed this for >> the pre loop by setting control for the `Bool`/`Cmp` to be as late as >> possible. For the main loop, no change appears to be required as >> control computed by c2 is already late enough. I've added an assert >> instead. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22485#pullrequestreview-2480646039 From epeter at openjdk.org Thu Dec 5 07:07:41 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 5 Dec 2024 07:07:41 GMT Subject: RFR: 8343629: More MergeStore benchmark [v5] In-Reply-To: References: Message-ID: On Fri, 15 Nov 2024 04:15:36 GMT, Shaojin Wen wrote: >> 1. Added the putBytes4 benchmark, which corresponds to StringBuilder appendNull >> 2. Optimized the putChars4/setInt/setLong series of benchmarks to reduce extra overhead and more accurately reflect performance differences. > > Shaojin Wen has updated the pull request incrementally with one additional commit since the last revision: > > seperate MergeStoreBench and MergeLoadBench Hi @wenshao ! I'm sorry I lost track of this one. It looks quite reasonable, though I was wondering why in one benchmark you increment `i` and use `i * 4` as offset, and in the other you decrement `i` and use `off += 4` as offset. test/micro/org/openjdk/bench/vm/compiler/MergeStoreBench.java line 103: > 101: public void setIntBU(Blackhole BH) { > 102: int off = 0; > 103: for (int i = ints.length - 1; i >= 0; i--) { Why are you going reverse here, and also other places? Does that affect the performance at all? test/micro/org/openjdk/bench/vm/compiler/MergeStoreBench.java line 135: > 133: for (int i = ints.length - 1; i >= 0; i--) { > 134: setIntLU(bytes4, off, ints[i]); > 135: off += 4; I'm also wondering why you changed it from multiplication of `i * 4` to `offset += 4`. Did that have an impact? ------------- PR Review: https://git.openjdk.org/jdk/pull/21659#pullrequestreview-2480654227 PR Review Comment: https://git.openjdk.org/jdk/pull/21659#discussion_r1870751224 PR Review Comment: https://git.openjdk.org/jdk/pull/21659#discussion_r1870753681 From rcastanedalo at openjdk.org Thu Dec 5 07:41:37 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 5 Dec 2024 07:41:37 GMT Subject: RFR: 8345287: C2: live in computation is broken In-Reply-To: References: Message-ID: On Tue, 3 Dec 2024 12:51:57 GMT, Roberto Casta?eda Lozano wrote: > Test results on Oracle CI look good, benchmarks are still running but the results look rather neutral so far. Benchmarking is done now, the results of applying this changeset are as follows: - neutral for DaCapo (bach and chopin) - slight overall regression for Renaissance - one slight improvement (crypto.signverify, around +1%) and one noticeably regression (scimark.monte_carlo, around -5%) for SPECjvm2008 on a Coffee Lake-B processor (3.0 GHz Intel Core i5-8500B). Given these results, I think it might be good to investigate what is the overall effect of `OptoRegScheduling` and whether it makes sense to keep it enabled for x64 before integrating this fix. The opposite order (integrating and then investigating how to deal with the regressions) seems like a higher-risk approach. @rwestrel @vnkozlov @dean-long what do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22473#issuecomment-2519472283 From roland at openjdk.org Thu Dec 5 08:16:37 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 5 Dec 2024 08:16:37 GMT Subject: RFR: 8345287: C2: live in computation is broken In-Reply-To: <-JM7PaO43IeAwzF30ordG0ERn-ViXxsVfQtE0M7RkgQ=.9fb476a0-b8a9-4618-8aaa-5ceebef181dd@github.com> References: <66KJ0kayCyRztG81OvFyladp1BbDq5nW810HKEZPmlk=.51449889-9e47-408e-9751-470e1f912113@github.com> <-JM7PaO43IeAwzF30ordG0ERn-ViXxsVfQtE0M7RkgQ=.9fb476a0-b8a9-4618-8aaa-5ceebef181dd@github.com> Message-ID: On Mon, 2 Dec 2024 14:04:30 GMT, Roland Westrelin wrote: >>> > Good catch! Do you have an example where the final schedule is affected by this issue? >>> >>> I don't. I noticed that live in sets are always empty while looking at memory usage of some `IndexSet`. It is puzzling that this didn't cause any performance regression. So it may be worth exploring why. >> >> OK, thanks. I will start by running some benchmarks on x64 with and without this fix. Will report results in a couple of days. > >> OK, thanks. I will start by running some benchmarks on x64 with and without this fix. Will report results in a couple of days. > > Sounds good. Thanks. If there happen to be a regression, I think it would make more sense to fix the code (with this patch) and disable `OptoRegScheduling` (until someone figures out what's going on) than keep code that doesn't make any sense. > Given these results, I think it might be good to investigate what is the overall effect of `OptoRegScheduling` and whether it makes sense to keep it enabled for x64 before integrating this fix. The opposite order (integrating and then investigating how to deal with the regressions) seems like a higher-risk approach. @rwestrel @vnkozlov @dean-long what do you think? What happens to the regresions with `OptoRegScheduling` turned off? It may take a while to investigate the regressions. I wouldn't delay this fix to obviously very broken code until then if possible. So maybe wait for the fork, push this fix and then create a PR to disable `OptoRegScheduling`? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22473#issuecomment-2519535158 From roland at openjdk.org Thu Dec 5 08:16:38 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 5 Dec 2024 08:16:38 GMT Subject: RFR: 8345287: C2: live in computation is broken In-Reply-To: References: Message-ID: On Thu, 5 Dec 2024 07:39:26 GMT, Roberto Casta?eda Lozano wrote: > Benchmarking is done now, the results of applying this changeset are as follows: > > * neutral for DaCapo (bach and chopin) > > * slight overall regression for Renaissance > > * one slight improvement (crypto.signverify, around +1%) and one noticeably regression (scimark.monte_carlo, around -5%) for SPECjvm2008 on a Coffee Lake-B processor (3.0 GHz Intel Core i5-8500B). Thanks for running benchmarks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22473#issuecomment-2519536370 From rcastanedalo at openjdk.org Thu Dec 5 08:50:37 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 5 Dec 2024 08:50:37 GMT Subject: RFR: 8345287: C2: live in computation is broken In-Reply-To: References: <66KJ0kayCyRztG81OvFyladp1BbDq5nW810HKEZPmlk=.51449889-9e47-408e-9751-470e1f912113@github.com> <-JM7PaO43IeAwzF30ordG0ERn-ViXxsVfQtE0M7RkgQ=.9fb476a0-b8a9-4618-8aaa-5ceebef181dd@github.com> Message-ID: On Thu, 5 Dec 2024 08:13:08 GMT, Roland Westrelin wrote: > What happens to the regresions with OptoRegScheduling turned off? I will do a new benchmark run of jdk-24+26 with default configuration vs. jdk-24+26 with `-XX:-OptoRegScheduling` on x64 and report the results in a few days (likely early next week). ------------- PR Comment: https://git.openjdk.org/jdk/pull/22473#issuecomment-2519644802 From chagedorn at openjdk.org Thu Dec 5 09:05:42 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 5 Dec 2024 09:05:42 GMT Subject: RFR: 8336759: C2: int counted loop with long limit not recognized as counted loop [v3] In-Reply-To: References: <_d_CiLfCN9ahEmhp9fLcGqO-L8n7a0gW86R3lzLkX60=.b3bdc697-cb2d-477f-a525-0f16a3eee383@github.com> Message-ID: <8Ihb7LfbaWl9ZfJW6SZfzqkGL1GLiMqB1ObShYqBfEU=.8fcd829a-afaa-40b1-a06e-46df139013e7@github.com> On Wed, 4 Dec 2024 16:11:58 GMT, Kangcheng Xu wrote: >> This patch implements [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759) that recognizes int counted loops with long limits. >> >> Currently, patterns like `for ( int i =...; i < long_limit; ...)` where int `i` is implicitly promoted to long (i.e., `(long) i < long_limit`) is not recognized as (int) counted loop. This patch speculatively and optimistically converts long limits to ints and deoptimize if the limit is outside int range, allowing more optimization opportunities. >> >> In other words, it transforms >> >> >> for (int i = 0; (long) i < long_limit; i++) {...} >> >> >> to >> >> >> if (int_min <= long_limit && long_limit <= int_max ) { >> for (int i = 0; i < (int) long_limit; i++) {...} >> } else { >> trap: loop_limit_check >> } >> >> >> This could benefit calls to APIs like `long MemorySegment#byteSize()` when iterating over a long limit. > > Kangcheng Xu has updated the pull request incrementally with three additional commits since the last revision: > > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn Marked as reviewed by chagedorn (Reviewer). Testing looked good! ------------- PR Review: https://git.openjdk.org/jdk/pull/22449#pullrequestreview-2480983916 PR Comment: https://git.openjdk.org/jdk/pull/22449#issuecomment-2519678072 From epeter at openjdk.org Thu Dec 5 09:22:41 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 5 Dec 2024 09:22:41 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v5] In-Reply-To: <_WYCBS2i_X0pCnBbn00lFPYA6s-dxBypkXFZrl67rG4=.0dc4cb01-08cd-4b89-b98c-104399521044@github.com> References: <_WYCBS2i_X0pCnBbn00lFPYA6s-dxBypkXFZrl67rG4=.0dc4cb01-08cd-4b89-b98c-104399521044@github.com> Message-ID: On Wed, 4 Dec 2024 15:45:54 GMT, Roland Westrelin wrote: >> Hi @rwestrel this looks very interesting! >> >> Which benchmarks are you referring to? >> >> I just gave it a quick skim, will come back to this later again. > >> Which benchmarks are you referring to? > > The one mentioned in the bug: https://github.com/openjdk/jdk/compare/master...mcimadamore:jdk:manual_mismatch_bench?expand=1 @rwestrel it would be nice to see a plot like this, with the benchmark results: X-axis: increasing loop iterations Y-axis: time Similar to what I did here: https://github.com/openjdk/jdk/pull/22070 ![image](https://github.com/user-attachments/assets/f62c5800-874c-4d29-9fc7-b46f077a1034) You could go over loop sizes 500-2000 in steps of 100, just to get a rough sense if your constant threshold of `1000` is roughly right. Maybe you can even extend the benchmark I wrote there, with MemorySegment cases. That would be useful also for the other efforts where we are working on short running loops: [JDK-8307084](https://bugs.openjdk.org/browse/JDK-8307084): C2: Vector atomic post loop is not executed for some small trip counts [JDK-8344085](https://bugs.openjdk.org/browse/JDK-8344085): C2 SuperWord: improve vectorization for small loop iteration count I just linked these two issues with this RFE on JBS. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21630#issuecomment-2519722350 From roland at openjdk.org Thu Dec 5 10:03:27 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 5 Dec 2024 10:03:27 GMT Subject: RFR: 8332827: [REDO] C2: crash in compiled code because of dependency on removed range check CastIIs Message-ID: The failures that caused the backout were due to a bug in: `find_or_make_integer_cast()` which caused the `_range_check_dependency` field's value of the existing cast node to not be set in the new cast node. I re-ran some testing with this fixed and current jdk repo and found that a few vectorization tests fail now because the patch pushes range check `CastII` nodes through `AddI`/`SubI`. To fix this, I delayed that transformation to after loop opts. ------------- Commit messages: - white spaces - test & fix - Revert "8332829: [BACKOUT] C2: crash in compiled code because of dependency on removed range check CastIIs" Changes: https://git.openjdk.org/jdk/pull/22568/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22568&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8332827 Stats: 734 lines in 9 files changed: 704 ins; 23 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/22568.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22568/head:pull/22568 PR: https://git.openjdk.org/jdk/pull/22568 From roland at openjdk.org Thu Dec 5 12:44:40 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 5 Dec 2024 12:44:40 GMT Subject: RFR: 8336759: C2: int counted loop with long limit not recognized as counted loop [v3] In-Reply-To: References: <_d_CiLfCN9ahEmhp9fLcGqO-L8n7a0gW86R3lzLkX60=.b3bdc697-cb2d-477f-a525-0f16a3eee383@github.com> Message-ID: On Wed, 4 Dec 2024 16:11:58 GMT, Kangcheng Xu wrote: >> This patch implements [JDK-8336759](https://bugs.openjdk.org/browse/JDK-8336759) that recognizes int counted loops with long limits. >> >> Currently, patterns like `for ( int i =...; i < long_limit; ...)` where int `i` is implicitly promoted to long (i.e., `(long) i < long_limit`) is not recognized as (int) counted loop. This patch speculatively and optimistically converts long limits to ints and deoptimize if the limit is outside int range, allowing more optimization opportunities. >> >> In other words, it transforms >> >> >> for (int i = 0; (long) i < long_limit; i++) {...} >> >> >> to >> >> >> if (int_min <= long_limit && long_limit <= int_max ) { >> for (int i = 0; i < (int) long_limit; i++) {...} >> } else { >> trap: loop_limit_check >> } >> >> >> This could benefit calls to APIs like `long MemorySegment#byteSize()` when iterating over a long limit. > > Kangcheng Xu has updated the pull request incrementally with three additional commits since the last revision: > > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn It would be nice to refactor the code so the part where the exit condition is modified, transformation into a counted loop is attempted and if it fails the exit condition is restored back to what it was can be avoided. `is_counted_loop()` roughly is: if (loop_pattern_matches_counted_loop()) { transform_into_counted_loop(); } Maybe the code can be refactored that way so that then `convert_to_counted_loop_with_speculative_long_limit` can be something like: if (loop_pattern_matches_counted_loop()) { // convert CmpL to CmpI transform_into_counted_loop(); } You would likely need to extract the check for: Node* cmp = loop_exit_test(back_control, loop, incr, limit, bt, cl_prob); if (cmp == nullptr || cmp->Opcode() != Op_Cmp(iv_bt)) { out of `loop_pattern_matches_counted_loop()` so `convert_to_counted_loop_with_speculative_long_limit` can pattern match the exit condition it expects ------------- PR Comment: https://git.openjdk.org/jdk/pull/22449#issuecomment-2520215899 From aph at openjdk.org Thu Dec 5 14:10:40 2024 From: aph at openjdk.org (Andrew Haley) Date: Thu, 5 Dec 2024 14:10:40 GMT Subject: RFR: 8344068: Windows x86-64: Out of CodeBuffer space when generating final stubs In-Reply-To: References: Message-ID: On Tue, 3 Dec 2024 18:48:38 GMT, Vladimir Kozlov wrote: >> Scratch that. I see call stack in the bug report. It is indeed final stubs for upcall stub when ZGC is enabled. >> Should we increase ZGC_ONLY(+20000) value instead? > > ZGC save/restores XMM registers regardless OS: [x86/gc/z/zBarrierSetAssembler_x86.cpp#L70](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp#L70) I spent a while reading the dumps of final stubs. I can account for 10k additional for XMM stubs in final on Windows. This takes it to the edge. It may well be that this is really a ZGC thing, but with ZGC on Linux we get nowhere near the limit. I think there is about 20k difference between Linux and Windows with ZGC enabled. I can't account for all of it, though. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22516#discussion_r1871450927 From qamai at openjdk.org Thu Dec 5 14:36:47 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 5 Dec 2024 14:36:47 GMT Subject: RFR: 8302459: Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure [v2] In-Reply-To: References: Message-ID: On Wed, 4 Dec 2024 07:24:04 GMT, Damon Fenacci wrote: >> # Issue >> >> The `compiler/vectorapi/VectorLogicalOpIdentityTest.java` has been failing because C2 compiling the test `testAndMaskSameValue1` expects to have 1 `AndV` nodes but it has none. >> >> # Cause >> >> The issue has to do with the criteria that trigger a cleanup when performing late inlining. In the failing test, when the compiler tries to inline a `jdk.internal.vm.vector.VectorSupport::binaryOp` call, it fails because its argument is of the wrong type, mainly because some cast nodes ?hide? the more ?precise? type. >> The graph that leads to the issue looks like this: >> ![1BCE8148-1E44-4CA1-AF8F-EFC6210FA740](https://github.com/user-attachments/assets/62dd917f-2dac-42a9-90cf-73eedcd3cf8a) >> The compiler tries to inline `jdk.internal.vm.vector.VectorSupport::load` and it succeeds: >> ![752E81C9-A37D-4626-81A9-E4A839FADD3D](https://github.com/user-attachments/assets/e61057b2-3093-4992-ba5a-b80e4000c0ec) >> The node `3027 VectorBox` has type `IntMaxVector`. `912 CastPP` and `934 CheckCastPP` have type `IntVector`instead. >> The compiler then tries to inline one of the 2 `bynaryOp` calls but it fails because it needs an argument of type `IntMaxVector` and the argument it is given, which is node `934 CheckCastPP` , has type `IntVector`. >> >> This would not happen if between the 2 inlining attempts a _cleanup_ was triggered. IGVN would run and the 2 nodes `912 CastPP` and `934 CheckCastPP` would be folded away. `binaryOp` could then be inlined since the types would match. >> >> # Solution >> >> Instead of fixing this specific case we try a more generic approach: when late inlining we keep track of failed intrinsics and re-examine them during IGVN. If the `Ideal` method for their call node is called, we reschedule the intrinsic attempt for that call. >> >> Additional test runs with `-XX:-TieredCompilation` are added to `VectorLogicalOpIdentityTest.java` and `VectorGatherMaskFoldingTest.java` as regression tests and `-XX:+IncrementalInlineForceCleanup` is removed from `VectorGatherMaskFoldingTest.java` (previously added as workaround for this issue) > > Damon Fenacci has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 28 commits: > > - Merge branch 'master' into JDK-8302459-new > - JDK-8302459: remove unneeded copyright year change > - JDK-8302459: add failed late inlines handling to dynamic calls > - JDK-8032459: fix indentation > - JDK-8302459: copyright year > - JDK-8302459: simplify late inline failure conditions > - JDK-8302459: revert unneeded copyright update > - JDK-8302459: remove unneeded spaces > - JDK-8302459: increment MH late inline counter when reinserting them > - Merge branch 'master' into JDK-8302459-new > - ... and 18 more: https://git.openjdk.org/jdk/compare/4fbf2720...7a4ffc11 src/hotspot/share/opto/compile.cpp line 2088: > 2086: igvn.reset_from_gvn(initial_gvn()); > 2087: igvn.optimize(); > 2088: // Reset failed generator in call node This is actually run in the incremental inline loop. Typically, we inline calls until we have too many nodes, at which time we do the cleanup with IGVN and IdealLoop and try again with the remaining calls in the queue. I think your approach is similar to we just retrying inlining a second time for all inlining failures. As a result, I suggest you moving this to the main loop of `Compile::inline_incrementally`, on each iteration, we start inlining all calls in the `failed_once` queue, then continue with the main queue as normal. For `Compile::inline_incrementally_one`, each node that fails inlining is pushed into the `failed_once` queue (note that failed inline after `LiveNodeCountInliningCutoff` being broken should not be pushed into the `failed_once` queue and left in the original queue. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21682#discussion_r1871494758 From kvn at openjdk.org Thu Dec 5 16:32:40 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 5 Dec 2024 16:32:40 GMT Subject: RFR: 8344068: Windows x86-64: Out of CodeBuffer space when generating final stubs In-Reply-To: References: Message-ID: On Tue, 3 Dec 2024 14:38:39 GMT, Andrew Haley wrote: > I've had a look at the difference between an Intel AVX-512 machine (which does run out of memory) and an AMD machine, and it seems to be that the AVX-512 stubs required by Windows really do take up a lot of space. This should be sufficient. Marked as reviewed by kvn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22516#pullrequestreview-2482247062 From kvn at openjdk.org Thu Dec 5 16:32:41 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 5 Dec 2024 16:32:41 GMT Subject: RFR: 8344068: Windows x86-64: Out of CodeBuffer space when generating final stubs In-Reply-To: References: Message-ID: <6NGReIL8Iw5X-1ZyUTGpA-SL7SysYVWNIu4XLyzb3ag=.dd8cefad-d1f4-4f54-90dc-f7427623bab0@github.com> On Thu, 5 Dec 2024 14:08:30 GMT, Andrew Haley wrote: >> ZGC save/restores XMM registers regardless OS: [x86/gc/z/zBarrierSetAssembler_x86.cpp#L70](https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp#L70) > > I spent a while reading the dumps of final stubs. I can account for 10k additional for XMM stubs in final on Windows. This takes it to the edge. > > It may well be that this is really a ZGC thing, but with ZGC on Linux we get nowhere near the limit. I think there is about 20k difference between Linux and Windows with ZGC enabled. I can't account for all of it, though. Okay, thank you for details. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22516#discussion_r1871702910 From mdoerr at openjdk.org Thu Dec 5 17:01:47 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 5 Dec 2024 17:01:47 GMT Subject: RFR: 8345609: [C1] LIR Operations with one input should be implemented as LIR_Op1 Message-ID: Change `lir_sqrt`, `lir_abs`, `lir_neg`, `lir_f2hf`, `lir_hf2f` to use `LIR_Op1`. Extend `LIR_Op1` to support one temp register operand. Also see JBS issue. Remove `lir_tan` and `lir_log10` which are unused (dead and incomplete code). They should get implemented in a better way if needed in the future, but I guess they are not important to have in C1. ------------- Commit messages: - 8345609: [C1] LIR Operations with one input should be implemented as LIR_Op1 Changes: https://git.openjdk.org/jdk/pull/22582/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22582&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8345609 Stats: 155 lines in 10 files changed: 59 ins; 89 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/22582.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22582/head:pull/22582 PR: https://git.openjdk.org/jdk/pull/22582 From kvn at openjdk.org Thu Dec 5 17:08:54 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 5 Dec 2024 17:08:54 GMT Subject: RFR: 8345287: C2: live in computation is broken In-Reply-To: References: Message-ID: On Mon, 2 Dec 2024 09:03:50 GMT, Roland Westrelin wrote: > 8234003 (Improve IndexSet iteration) broke live in computation: > > > @@ -273,23 +276,25 @@ void PhaseLive::add_liveout( Block *p, IndexSet *lo, VectorSet &first_pass ) { > // Add a vector of live-in values to a given blocks live-in set. > void PhaseLive::add_livein(Block *p, IndexSet *lo) { > IndexSet *livein = &_livein[p->_pre_order-1]; > - IndexSetIterator elements(lo); > - uint r; > - while ((r = elements.next()) != 0) { > - livein->insert(r); // Then add to live-in set > + if (!livein->is_empty()) { > + IndexSetIterator elements(lo); > + uint r; > + while ((r = elements.next()) != 0) { > + livein->insert(r); // Then add to live-in set > + } > } > } > > > `livein` is initially empy and the patch above only adds element to it if: > > > if (!livein->is_empty()) { > > > which is never true. > > This doesn't affect correctness as live in sets are only used to drive > scheduling. All benefits described in [JDK-8234003](https://bugs.openjdk.org/browse/JDK-8234003) could be due to not iterating at all in `add_liveout()`. Fixing it will place us at least at the same state as before those changes. -5% regression in monte-carlo is seen only on old macos-x64 machine which does not have AVX512 (only AVX2). It would be interesting to investigate but I think it is fine if we do that after push. So I agree to integrate this fix into JDK 24. Fixing regression could be done in 24u update release. `OptoRegScheduling` should be investigated separately (in JDK 25) regardless this fix. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22473#issuecomment-2520937901 From mdoerr at openjdk.org Thu Dec 5 17:19:43 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 5 Dec 2024 17:19:43 GMT Subject: RFR: 8345609: [C1] LIR Operations with one input should be implemented as LIR_Op1 In-Reply-To: References: Message-ID: On Thu, 5 Dec 2024 16:57:24 GMT, Martin Doerr wrote: > Change `lir_sqrt`, `lir_abs`, `lir_neg`, `lir_f2hf`, `lir_hf2f` to use `LIR_Op1`. Extend `LIR_Op1` to support one temp register operand. > Also see JBS issue. > > Remove `lir_tan` and `lir_log10` which are unused (dead and incomplete code). They should get implemented in a better way if needed in the future, but I guess they are not important to have in C1. src/hotspot/share/c1/c1_LIR.cpp line 492: > 490: assert(op1->_info != nullptr, ""); do_info(op1->_info); > 491: if (op1->_opr->is_valid()) do_temp(op1->_opr); // safepoints on SPARC need temporary register > 492: assert(op1->_tmp->is_illegal(), "not used"); I think safepoints should ideally be implemented as LIR_Op0. SPARC support is removed. We could file a new RFE for that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22582#discussion_r1871787519 From psandoz at openjdk.org Thu Dec 5 20:40:41 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Thu, 5 Dec 2024 20:40:41 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v2] In-Reply-To: References: Message-ID: <5NjY5B87PIM5jIaPEQrlsz8tNPoKU14iNsqyTQwmxA8=.ea336100-b4d0-4d7d-8721-8e9e4efb64b8@github.com> On Tue, 26 Nov 2024 18:11:59 GMT, Quan Anh Mai wrote: >> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Int256Vector.java line 870: >> >>> 868: @Override >>> 869: public final Int256Shuffle rearrange(VectorShuffle shuffle) { >>> 870: return (Int256Shuffle) toBitsVector().rearrange(((Int256Shuffle) shuffle) >> >> I think the cast is redundant for all vector kinds. Similarly the explicit cast is redundant for the integral vectors, perhaps in the template separate out the expressions to avoid it where not needed? >> >> We could also refer to `VSPECIES` directly rather than calling `vspecies()`, same applies in other methods in the concrete vector classes. > > The cast is added so that we have the concrete type of the shuffle, the result of `toShuffle` is only `VectorShuffle` Ah i see now, you want to ensure an invocation to the final/concrete method. (The IDE's highlighting of the redundant cast is misleading) >> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Int256Vector.java line 908: >> >>> 906: } >>> 907: >>> 908: private static boolean indicesInRange(int[] indices) { >> >> Since this method is only called from an assert statement in the constructor we could avoid the clever checking that assertions are enabled and the explicit throwing on an AssertionError by using a second expression that produces an error message when the assertion fails : e.g., >> >> assert indicesInRange(indices) : outOfBoundsAssertMessage(indices); > > Yes you are right, since this method is only called in `assert` I think we can just remove the `assert` trick inside instead. That's fine too. >> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/IntVector.java line 2473: >> >>> 2471: final >>> 2472: VectorShuffle toShuffle(AbstractSpecies dsp, boolean wrap) { >>> 2473: assert(dsp.elementSize() == vspecies().elementSize()); >> >> Even though we force inline I cannot quite decide if it is better to forego the assert since it unduly increases method size. Regardless it may be useful to place the partial wrapping logic in a separate method, given it is less likely to be used. > > You don't have to worry too much about inlining of Vector API methods since it is done during late inlining and we have a pretty huge budget there. Ok, my comment was motivated by some feedback on the FFM API where IIRC forced inline limits were reached. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21042#discussion_r1872036603 PR Review Comment: https://git.openjdk.org/jdk/pull/21042#discussion_r1872037914 PR Review Comment: https://git.openjdk.org/jdk/pull/21042#discussion_r1872037711 From psandoz at openjdk.org Thu Dec 5 20:50:41 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Thu, 5 Dec 2024 20:50:41 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v2] In-Reply-To: References: Message-ID: <07FzsRmVOfdZW51Pif18Dsb0gvWjVIfUswT2Eljm4IY=.852007f6-e3bf-492d-b7ad-9330656edd6a@github.com> On Tue, 26 Nov 2024 18:15:47 GMT, Quan Anh Mai wrote: >> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/AbstractVector.java line 228: >> >>> 226: } >>> 227: >>> 228: AbstractVector iota = vspecies().asIntegral().iota(); >> >> I suspect the non-power of two code is more efficient. (Even better if the MUL could be transformed to a shift for power of two values.) >> >> Separately, it makes me wonder if we should revisit the shuffle factories if it is now much more efficient to construct a shuffle from a vector. > > `shuffleFromOp` is a slow path op so I don't think it is. Additionally, our vector multiplication is against a scalar, too. So we can optimize it if `step` is a constant. I incorrectly read `!=` as `==` :-) as that is the more common pattern used in the code base, so i was thinking the power of two code path was using `shuffleFromOp`. Could you invert the check to be more consistent? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21042#discussion_r1872048156 From rrich at openjdk.org Thu Dec 5 21:20:37 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 5 Dec 2024 21:20:37 GMT Subject: RFR: 8345609: [C1] LIR Operations with one input should be implemented as LIR_Op1 In-Reply-To: References: Message-ID: On Thu, 5 Dec 2024 16:57:24 GMT, Martin Doerr wrote: > Change `lir_sqrt`, `lir_abs`, `lir_neg`, `lir_f2hf`, `lir_hf2f` to use `LIR_Op1`. Extend `LIR_Op1` to support one temp register operand. > Also see JBS issue. > > Remove `lir_tan` and `lir_log10` which are unused (dead and incomplete code). They should have been removed with or after https://github.com/openjdk/jdk/commit/ad79a5ae65d24861ead3ae96cf148c8bc0f02736 and the corresponding changes on other platforms. Nice cleanup ? A few files need copyright update // especially c1_LinearScan_x86.hpp ;) Cheers, Richard. ------------- Marked as reviewed by rrich (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22582#pullrequestreview-2482983552 From mdoerr at openjdk.org Thu Dec 5 21:29:51 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 5 Dec 2024 21:29:51 GMT Subject: RFR: 8345609: [C1] LIR Operations with one input should be implemented as LIR_Op1 [v2] In-Reply-To: References: Message-ID: > Change `lir_sqrt`, `lir_abs`, `lir_neg`, `lir_f2hf`, `lir_hf2f` to use `LIR_Op1`. Extend `LIR_Op1` to support one temp register operand. > Also see JBS issue. > > Remove `lir_tan` and `lir_log10` which are unused (dead and incomplete code). They should have been removed with or after https://github.com/openjdk/jdk/commit/ad79a5ae65d24861ead3ae96cf148c8bc0f02736 and the corresponding changes on other platforms. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Update Copyright headers. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22582/files - new: https://git.openjdk.org/jdk/pull/22582/files/1213839c..5ff93516 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22582&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22582&range=00-01 Stats: 4 lines in 4 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/22582.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22582/head:pull/22582 PR: https://git.openjdk.org/jdk/pull/22582 From rrich at openjdk.org Thu Dec 5 21:29:51 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 5 Dec 2024 21:29:51 GMT Subject: RFR: 8345609: [C1] LIR Operations with one input should be implemented as LIR_Op1 [v2] In-Reply-To: References: Message-ID: On Thu, 5 Dec 2024 21:26:57 GMT, Martin Doerr wrote: >> Change `lir_sqrt`, `lir_abs`, `lir_neg`, `lir_f2hf`, `lir_hf2f` to use `LIR_Op1`. Extend `LIR_Op1` to support one temp register operand. >> Also see JBS issue. >> >> Remove `lir_tan` and `lir_log10` which are unused (dead and incomplete code). They should have been removed with or after https://github.com/openjdk/jdk/commit/ad79a5ae65d24861ead3ae96cf148c8bc0f02736 and the corresponding changes on other platforms. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Update Copyright headers. Marked as reviewed by rrich (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22582#pullrequestreview-2483007979 From mdoerr at openjdk.org Thu Dec 5 21:29:52 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 5 Dec 2024 21:29:52 GMT Subject: RFR: 8345609: [C1] LIR Operations with one input should be implemented as LIR_Op1 In-Reply-To: References: Message-ID: On Thu, 5 Dec 2024 16:57:24 GMT, Martin Doerr wrote: > Change `lir_sqrt`, `lir_abs`, `lir_neg`, `lir_f2hf`, `lir_hf2f` to use `LIR_Op1`. Extend `LIR_Op1` to support one temp register operand. > Also see JBS issue. > > Remove `lir_tan` and `lir_log10` which are unused (dead and incomplete code). They should have been removed with or after https://github.com/openjdk/jdk/commit/ad79a5ae65d24861ead3ae96cf148c8bc0f02736 and the corresponding changes on other platforms. Thanks for the review! Copyright headers are updated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22582#issuecomment-2521486043 From kbarrett at openjdk.org Fri Dec 6 06:34:44 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 6 Dec 2024 06:34:44 GMT Subject: RFR: 8345159: RISCV: Fix -Wzero-as-null-pointer-constant warning in emit_static_call_stub In-Reply-To: References: Message-ID: <4j9-1gbCrvrski7u6-H4pDg_GnA0w6WHVsBN2pCALFU=.78439ad5-ccea-4e7b-a93d-cbbbc0682a1a@github.com> On Thu, 28 Nov 2024 12:05:28 GMT, Kim Barrett wrote: > Please review this change to RISCV code to remove a > -Wzero-as-null-pointer-constant warning in MacroAssembler::emit_static_call_stub. > > It was calling MacroAssembler::movptr with the second (address) argument being > a literal 0. Rather than changing it to use nullptr for that argument, I've > instead changed it to call the movptr2 helper function, which takes the target > address as a unint64_t. This eliminates the conversion of 0 to a pointer and > then back to an integer 0. It seemed to me more natural to use that helper > directly, as it was presumed that was what ended up being called anyway. But > the riscv porters should weigh in on whether that's a good approach to dealing > with this case. > > Testing: GHA sanity tests, which includes building for linux-riscv64. I don't > have the capability to run tests for this platform, so hoping someone from the > riscv porters can do more testing. Thanks for reviews and discussion. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22435#issuecomment-2522245304 From kbarrett at openjdk.org Fri Dec 6 06:34:45 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 6 Dec 2024 06:34:45 GMT Subject: Integrated: 8345159: RISCV: Fix -Wzero-as-null-pointer-constant warning in emit_static_call_stub In-Reply-To: References: Message-ID: On Thu, 28 Nov 2024 12:05:28 GMT, Kim Barrett wrote: > Please review this change to RISCV code to remove a > -Wzero-as-null-pointer-constant warning in MacroAssembler::emit_static_call_stub. > > It was calling MacroAssembler::movptr with the second (address) argument being > a literal 0. Rather than changing it to use nullptr for that argument, I've > instead changed it to call the movptr2 helper function, which takes the target > address as a unint64_t. This eliminates the conversion of 0 to a pointer and > then back to an integer 0. It seemed to me more natural to use that helper > directly, as it was presumed that was what ended up being called anyway. But > the riscv porters should weigh in on whether that's a good approach to dealing > with this case. > > Testing: GHA sanity tests, which includes building for linux-riscv64. I don't > have the capability to run tests for this platform, so hoping someone from the > riscv porters can do more testing. This pull request has now been integrated. Changeset: 2286fae3 Author: Kim Barrett URL: https://git.openjdk.org/jdk/commit/2286fae300b37f4b69ed817d3edea6fe7fa2f52d Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8345159: RISCV: Fix -Wzero-as-null-pointer-constant warning in emit_static_call_stub Reviewed-by: mli, rehn ------------- PR: https://git.openjdk.org/jdk/pull/22435 From iklam at openjdk.org Fri Dec 6 07:52:49 2024 From: iklam at openjdk.org (Ioi Lam) Date: Fri, 6 Dec 2024 07:52:49 GMT Subject: RFR: 8344556: [Graal] compiler/intrinsics/bmi/* fail when AOTCache cannot be loaded Message-ID: See JBS issue for detailed evaluation. In summary -- when `-XX:+UseJVMCICompiler` is used, one of `-int` or `-Xcomp` will print out the `[cds][error]` message. The work around is to disable the CDS log for both cases. ------------- Commit messages: - 8344556: [Graal] compiler/intrinsics/bmi/* fail when AOTCache cannot be loaded Changes: https://git.openjdk.org/jdk/pull/22596/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22596&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8344556 Stats: 5 lines in 1 file changed: 4 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22596.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22596/head:pull/22596 PR: https://git.openjdk.org/jdk/pull/22596 From rcastanedalo at openjdk.org Fri Dec 6 08:01:40 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 6 Dec 2024 08:01:40 GMT Subject: RFR: 8345287: C2: live in computation is broken In-Reply-To: References: Message-ID: On Mon, 2 Dec 2024 09:03:50 GMT, Roland Westrelin wrote: > 8234003 (Improve IndexSet iteration) broke live in computation: > > > @@ -273,23 +276,25 @@ void PhaseLive::add_liveout( Block *p, IndexSet *lo, VectorSet &first_pass ) { > // Add a vector of live-in values to a given blocks live-in set. > void PhaseLive::add_livein(Block *p, IndexSet *lo) { > IndexSet *livein = &_livein[p->_pre_order-1]; > - IndexSetIterator elements(lo); > - uint r; > - while ((r = elements.next()) != 0) { > - livein->insert(r); // Then add to live-in set > + if (!livein->is_empty()) { > + IndexSetIterator elements(lo); > + uint r; > + while ((r = elements.next()) != 0) { > + livein->insert(r); // Then add to live-in set > + } > } > } > > > `livein` is initially empy and the patch above only adds element to it if: > > > if (!livein->is_empty()) { > > > which is never true. > > This doesn't affect correctness as live in sets are only used to drive > scheduling. Marked as reviewed by rcastanedalo (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22473#pullrequestreview-2483917762 From rcastanedalo at openjdk.org Fri Dec 6 08:01:40 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 6 Dec 2024 08:01:40 GMT Subject: RFR: 8345287: C2: live in computation is broken In-Reply-To: References: Message-ID: On Thu, 5 Dec 2024 17:05:36 GMT, Vladimir Kozlov wrote: > All benefits described in [JDK-8234003](https://bugs.openjdk.org/browse/JDK-8234003) could be due to not iterating at all in `add_liveout()`. Fixing it will place us at least at the same state as before those changes. > > -5% regression in monte-carlo is seen only on old macos-x64 machine which does not have AVX512 (only AVX2). It would be interesting to investigate but I think it is fine if we do that after push. So I agree to integrate this fix into JDK 24. Fixing regression could be done in 24u update release. > > `OptoRegScheduling` should be investigated separately (in JDK 25) regardless this fix. OK. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22473#issuecomment-2522428369 From rrich at openjdk.org Fri Dec 6 08:36:39 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 6 Dec 2024 08:36:39 GMT Subject: RFR: 8345609: [C1] LIR Operations with one input should be implemented as LIR_Op1 [v2] In-Reply-To: References: Message-ID: On Thu, 5 Dec 2024 21:29:51 GMT, Martin Doerr wrote: >> Change `lir_sqrt`, `lir_abs`, `lir_neg`, `lir_f2hf`, `lir_hf2f` to use `LIR_Op1`. Extend `LIR_Op1` to support one temp register operand. >> Also see JBS issue. >> >> Remove `lir_tan` and `lir_log10` which are unused (dead and incomplete code). They should have been removed with or after https://github.com/openjdk/jdk/commit/ad79a5ae65d24861ead3ae96cf148c8bc0f02736 and the corresponding changes on other platforms. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Update Copyright headers. You might even see an effect in the `Fp16ConversionBenchmark` if you stop compilation at tier 1. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22582#issuecomment-2522508261 From qamai at openjdk.org Fri Dec 6 09:16:41 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 6 Dec 2024 09:16:41 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v2] In-Reply-To: <07FzsRmVOfdZW51Pif18Dsb0gvWjVIfUswT2Eljm4IY=.852007f6-e3bf-492d-b7ad-9330656edd6a@github.com> References: <07FzsRmVOfdZW51Pif18Dsb0gvWjVIfUswT2Eljm4IY=.852007f6-e3bf-492d-b7ad-9330656edd6a@github.com> Message-ID: <2z1gcZ6s7rIbNJvgJkShmTZRx6Qsbjr5iXphL3I7vgw=.a873603a-ab40-40df-b862-c88a44767454@github.com> On Thu, 5 Dec 2024 20:48:07 GMT, Paul Sandoz wrote: >> `shuffleFromOp` is a slow path op so I don't think it is. Additionally, our vector multiplication is against a scalar, too. So we can optimize it if `step` is a constant. > > I incorrectly read `!=` as `==` :-) as that is the more common pattern used in the code base, so i was thinking the power of two code path was using `shuffleFromOp`. Could you invert the check to be more consistent? This piece of code follows the pattern: if (uncommonCondition) { return uncommonPath(); } // Continue the common path So I think it is better to keep it as it is. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21042#discussion_r1872901230 From roland at openjdk.org Fri Dec 6 09:21:50 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 6 Dec 2024 09:21:50 GMT Subject: RFR: 8345299: C2: some nodes can still have incorrect control after do_range_check() [v2] In-Reply-To: References: <0OJRLL7pqscoC_54OG2QjQj7zMwVLVUtsJYXM9wh6_c=.f9502ae7-ffba-42fd-ab92-9bfc242735a0@github.com> Message-ID: <0LyODDVYQaVgi2jNAmrU8swij--TFbsZXbcNp19i9bU=.be9217db-17ef-4e41-9f92-85a1b608e2a2@github.com> On Wed, 4 Dec 2024 18:01:29 GMT, Vladimir Kozlov wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > Good. @vnkozlov @chhagedorn thanks for the reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/22485#issuecomment-2522615405 From roland at openjdk.org Fri Dec 6 09:21:51 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 6 Dec 2024 09:21:51 GMT Subject: Integrated: 8345299: C2: some nodes can still have incorrect control after do_range_check() In-Reply-To: References: Message-ID: <_hCuJdutxO4dh3lcPg8gKZ3GmjR_K9UJFGRODIkuikA=.a362f5aa-8d6e-432c-8571-20c665e30f79@github.com> On Mon, 2 Dec 2024 13:53:00 GMT, Roland Westrelin wrote: > 8339733 fixed controls for updated pre/main limits during > do_range_check(). However, it missed one issue: > > Control for the new limits is computed in `new_limit_ctrl` for both > pre and main loops. `new_limit_ctrl` is currently initialized from the > pre limit control but it also needs to take the main loop limit > control into account as sometimes, the main loop limit control is > below the pre limit and pre loop entry control. > > 8339733 also introduced a couple bugs. Control of the `Bool`/`Cmp` > nodes are updated for the pre and main loop to `new_limit_ctrl`. But > that's incorrect because `new_limit_ctrl` may be above the pre loop > while the `Bool`/`Cmp` for the pre loop are in the loop (because they > depend on the loop iv) and for the main loop are after the pre loop > (because they depend on the iv out of the pre loop). I fixed this for > the pre loop by setting control for the `Bool`/`Cmp` to be as late as > possible. For the main loop, no change appears to be required as > control computed by c2 is already late enough. I've added an assert > instead. This pull request has now been integrated. Changeset: d9a22139 Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/d9a22139fb14c67e2b1dac2c93c1e46bc3b14c72 Stats: 15 lines in 2 files changed: 4 ins; 0 del; 11 mod 8345299: C2: some nodes can still have incorrect control after do_range_check() Reviewed-by: chagedorn, kvn ------------- PR: https://git.openjdk.org/jdk/pull/22485 From roland at openjdk.org Fri Dec 6 09:22:42 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 6 Dec 2024 09:22:42 GMT Subject: RFR: 8343747: C2: TestReplicateAtConv.java crashes with -XX:MaxVectorSize=8 [v2] In-Reply-To: References: Message-ID: On Wed, 4 Dec 2024 15:31:19 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > Thanks for the update - still good :) @eme64 @vnkozlov thanks for the reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/22442#issuecomment-2522616843 From roland at openjdk.org Fri Dec 6 09:22:43 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 6 Dec 2024 09:22:43 GMT Subject: Integrated: 8343747: C2: TestReplicateAtConv.java crashes with -XX:MaxVectorSize=8 In-Reply-To: References: Message-ID: On Thu, 28 Nov 2024 15:31:47 GMT, Roland Westrelin wrote: > Crash occurs when attempting to create a `Replicate` node that's input > to a `VectorCast` node (for a `ConvL2I`) that's not supported by the > platform (when run with `MaxVectorSize=8`). I think the pack for the > `VectorCast` should be filtered out earlier as not implemented and I > propose adding a test to `VectorCastNode::implemented()` for the type > of its input to handle that corner case. This pull request has now been integrated. Changeset: 874d68a9 Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/874d68a96ce67caaf944dd25fbfb44eab965dfd3 Stats: 6 lines in 2 files changed: 4 ins; 0 del; 2 mod 8343747: C2: TestReplicateAtConv.java crashes with -XX:MaxVectorSize=8 Reviewed-by: epeter, kvn ------------- PR: https://git.openjdk.org/jdk/pull/22442 From mdoerr at openjdk.org Fri Dec 6 11:49:37 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 6 Dec 2024 11:49:37 GMT Subject: RFR: 8345609: [C1] LIR Operations with one input should be implemented as LIR_Op1 [v2] In-Reply-To: References: Message-ID: On Fri, 6 Dec 2024 08:33:38 GMT, Richard Reingruber wrote: > You might even see an effect in the `Fp16ConversionBenchmark` if you stop compilation at tier 1. Yes. Added benchmark results to the description. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22582#issuecomment-2523015974 From qamai at openjdk.org Fri Dec 6 11:54:25 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 6 Dec 2024 11:54:25 GMT Subject: RFR: 8342651: Refactor array constant to use an array of jbyte [v5] In-Reply-To: References: Message-ID: > Hi, > > This small patch refactors array constants in C2 to use an array of `jbyte`s instead of an array of `jvalue`. The former is much easier to work with and we can do `memcpy` with them trivially. > > Since code buffers support alignment of the constant section, I have also allowed constant tables to be aligned more than 8 bytes and used it for constant vectors on machines not supporting `SSE3`. I also fixed an issue with code buffer relocation where the temporary buffer is not correctly aligned. > > This patch is extracted from https://github.com/openjdk/jdk/pull/21229. Tests passed with `UseSSE=2` where 16-byte constants would be generated, as well as normal testing routines. > > Please take a look and leave your reviews, thanks a lot. Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Merge branch 'master' into constanttable - add comment to ConstantTable::alignment - Merge branch 'master' into constanttable - indentation - Merge branch 'master' into constanttable - Merge branch 'master' into constanttable - refactor array constant, fix codebuffer reallocation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21596/files - new: https://git.openjdk.org/jdk/pull/21596/files/b8a8d9a3..ed66a106 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21596&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21596&range=03-04 Stats: 109219 lines in 1736 files changed: 81360 ins; 19597 del; 8262 mod Patch: https://git.openjdk.org/jdk/pull/21596.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21596/head:pull/21596 PR: https://git.openjdk.org/jdk/pull/21596 From qamai at openjdk.org Fri Dec 6 12:04:41 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 6 Dec 2024 12:04:41 GMT Subject: RFR: 8342651: Refactor array constant to use an array of jbyte [v5] In-Reply-To: <_0jX1YedNq0T0kzc4T-UzKsvjvJrT2ZQdyd0m2uMl-Q=.f4818425-13ae-409a-a70a-b750d2b9dc89@github.com> References: <_0jX1YedNq0T0kzc4T-UzKsvjvJrT2ZQdyd0m2uMl-Q=.f4818425-13ae-409a-a70a-b750d2b9dc89@github.com> Message-ID: On Wed, 4 Dec 2024 18:20:54 GMT, Vladimir Kozlov wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: >> >> - Merge branch 'master' into constanttable >> - add comment to ConstantTable::alignment >> - Merge branch 'master' into constanttable >> - indentation >> - Merge branch 'master' into constanttable >> - Merge branch 'master' into constanttable >> - refactor array constant, fix codebuffer reallocation > > Good. Please push it into JDK 25 after 24 is forked. > We would need to re-test it internally before integration. @vnkozlov Thanks, I have merged the branch with the latest master ------------- PR Comment: https://git.openjdk.org/jdk/pull/21596#issuecomment-2523068339 From qamai at openjdk.org Fri Dec 6 12:07:52 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 6 Dec 2024 12:07:52 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v5] In-Reply-To: References: Message-ID: <40Uaer7ZKwGk5kOiEnHSYlyJy74QRzciHspy5CKfM54=.8c37f555-ac4a-4811-bdff-4e4d459f984a@github.com> > Hi, > > This is just a redo of https://github.com/openjdk/jdk/pull/13093. mostly just the revert of the backout. > > Regarding the related issues: > > - [JDK-8306008](https://bugs.openjdk.org/browse/JDK-8306008) and [JDK-8309531](https://bugs.openjdk.org/browse/JDK-8309531) have been fixed before the backout. > - [JDK-8309373](https://bugs.openjdk.org/browse/JDK-8309373) was due to missing `ForceInline` on `AbstractVector::toBitsVectorTemplate` > - [JDK-8306592](https://bugs.openjdk.org/browse/JDK-8306592), I have not been able to find the root causes. I'm not sure if this is a blocker, now I cannot even build x86-32 tests. > > Finally, I moved some implementation of public methods and methods that call into intrinsics to the concrete class as that may help the compiler know the correct types of the variables. > > Please take a look and leave reviews. Thanks a lot. > > The description of the original PR: > > This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, `VectorShuffle` is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: > > Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. > Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. > Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. > Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. > Upon these changes, a `rearrange` can emit more efficient code: > > var species = IntVector.SPECIES_128; > var v1 = IntVector.fromArray(species, SRC1, 0); > var v2 = IntVector.fromArray(species, SRC2, 0); > v1.rearrange(v2.toShuffle()).intoArray(DST, 0); > > Before: > movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} > vmovdqu 0x10(%r10),%xmm2 > movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} > vmovdqu 0x10(%r10),%xmm1 > movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} > vmovdqu 0x10(%r10),%xmm0 > vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byte_mask > ; {ex... Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: - Merge branch 'master' into shufflerefactor - fix asserts - whitespace - Merge branch 'master' into shufflerefactor - [vectorapi] Refactor VectorShuffle implementation ------------- Changes: https://git.openjdk.org/jdk/pull/21042/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21042&range=04 Stats: 4919 lines in 64 files changed: 2609 ins; 1066 del; 1244 mod Patch: https://git.openjdk.org/jdk/pull/21042.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21042/head:pull/21042 PR: https://git.openjdk.org/jdk/pull/21042 From qamai at openjdk.org Fri Dec 6 12:22:36 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 6 Dec 2024 12:22:36 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v29] In-Reply-To: References: Message-ID: > Hi, > > This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. > > In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. > > This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. > > Please kindly review, thanks a lot. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-4 Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 39 commits: - move try_cast to Type - Merge branch 'master' into unsignedbounds - build failure - build failures - whitespace - further reviews - Merge branch 'master' into unsignedbounds - Merge branch 'master' into unsignedbounds - address reviews - comment adjust_lo empty case - ... and 29 more: https://git.openjdk.org/jdk/compare/874d68a9...cf1de627 ------------- Changes: https://git.openjdk.org/jdk/pull/17508/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=28 Stats: 2004 lines in 10 files changed: 1446 ins; 325 del; 233 mod Patch: https://git.openjdk.org/jdk/pull/17508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17508/head:pull/17508 PR: https://git.openjdk.org/jdk/pull/17508 From thartmann at openjdk.org Fri Dec 6 14:21:44 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 6 Dec 2024 14:21:44 GMT Subject: RFR: 8344833: CTW: Make failing on zero classes optional [v3] In-Reply-To: References: Message-ID: On Wed, 27 Nov 2024 18:29:18 GMT, Evgeny Nikitin wrote: >> For CTW, zero classes in provided jar is now a failure. >> This creates noisy and blocking false positives in fuzzy/mass scale runs, where we use jar archives from random sources, unchecked or randomly generated, etc. >> >> This PR makes this behaviour controllable. Default reaction is a failure, like before. > > Evgeny Nikitin has updated the pull request incrementally with one additional commit since the last revision: > > Use totalClassCount instead of the classCount Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22320#pullrequestreview-2484961448 From enikitin at openjdk.org Fri Dec 6 14:21:44 2024 From: enikitin at openjdk.org (Evgeny Nikitin) Date: Fri, 6 Dec 2024 14:21:44 GMT Subject: Integrated: 8344833: CTW: Make failing on zero classes optional In-Reply-To: References: Message-ID: On Fri, 22 Nov 2024 11:11:55 GMT, Evgeny Nikitin wrote: > For CTW, zero classes in provided jar is now a failure. > This creates noisy and blocking false positives in fuzzy/mass scale runs, where we use jar archives from random sources, unchecked or randomly generated, etc. > > This PR makes this behaviour controllable. Default reaction is a failure, like before. This pull request has now been integrated. Changeset: 0e2a2852 Author: Evgeny Nikitin Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/0e2a28527738d227a66ea44b9a5c037c72039044 Stats: 8 lines in 1 file changed: 7 ins; 1 del; 0 mod 8344833: CTW: Make failing on zero classes optional Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.org/jdk/pull/22320 From kvn at openjdk.org Fri Dec 6 16:16:40 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 6 Dec 2024 16:16:40 GMT Subject: RFR: 8344556: [Graal] compiler/intrinsics/bmi/* fail when AOTCache cannot be loaded In-Reply-To: References: Message-ID: On Fri, 6 Dec 2024 05:52:49 GMT, Ioi Lam wrote: > See JBS issue for detailed evaluation. > > In summary -- when `-XX:+UseJVMCICompiler` is used, one of `-int` or `-Xcomp` will print out the `[cds][error]` message. The work around is to disable the CDS log for both cases. I agree with @dougxc suggestion in bug's comment: "these tests should not be run with -XX:+AOTClassLinking when JVMCI is enabled" Is it possible to add such condition to tests @require ? ------------- PR Review: https://git.openjdk.org/jdk/pull/22596#pullrequestreview-2485293156 From psandoz at openjdk.org Fri Dec 6 16:18:47 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Fri, 6 Dec 2024 16:18:47 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v2] In-Reply-To: <2z1gcZ6s7rIbNJvgJkShmTZRx6Qsbjr5iXphL3I7vgw=.a873603a-ab40-40df-b862-c88a44767454@github.com> References: <07FzsRmVOfdZW51Pif18Dsb0gvWjVIfUswT2Eljm4IY=.852007f6-e3bf-492d-b7ad-9330656edd6a@github.com> <2z1gcZ6s7rIbNJvgJkShmTZRx6Qsbjr5iXphL3I7vgw=.a873603a-ab40-40df-b862-c88a44767454@github.com> Message-ID: On Fri, 6 Dec 2024 09:14:07 GMT, Quan Anh Mai wrote: >> I incorrectly read `!=` as `==` :-) as that is the more common pattern used in the code base, so i was thinking the power of two code path was using `shuffleFromOp`. Could you invert the check to be more consistent? > > This piece of code follows the pattern: > > if (uncommonCondition) { > return uncommonPath(); > } > // Continue the common path > > So I think it is better to keep it as it is. Ok, but since it deviates from the pattern applied in other areas for POT checks and caught me out would you mind adding a comment? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21042#discussion_r1873634643 From psandoz at openjdk.org Fri Dec 6 16:23:40 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Fri, 6 Dec 2024 16:23:40 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v2] In-Reply-To: <5NjY5B87PIM5jIaPEQrlsz8tNPoKU14iNsqyTQwmxA8=.ea336100-b4d0-4d7d-8721-8e9e4efb64b8@github.com> References: <5NjY5B87PIM5jIaPEQrlsz8tNPoKU14iNsqyTQwmxA8=.ea336100-b4d0-4d7d-8721-8e9e4efb64b8@github.com> Message-ID: <75oGPIIsnaNawGwF2IPmHoXf6YsP_E1QbhDwOkDjplU=.bd3b14a2-72fe-45c4-b932-3d1c6972f432@github.com> On Thu, 5 Dec 2024 20:36:48 GMT, Paul Sandoz wrote: >> The cast is added so that we have the concrete type of the shuffle, the result of `toShuffle` is only `VectorShuffle` > > Ah i see now, you want to ensure an invocation to the final/concrete method. (The IDE's highlighting of the redundant cast is misleading) The common way we tend to do this in other areas is assign to a local variable with the sharper type. This makes it a littler clearer on the intent. Could you do that? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21042#discussion_r1873652324 From qamai at openjdk.org Fri Dec 6 17:00:15 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 6 Dec 2024 17:00:15 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v6] In-Reply-To: References: Message-ID: <6YeHbHlrPoV9obcfC7g1PtFdMKfanhnsKxqwTZovts4=.f5743961-07e2-4325-b8f5-270edd342e11@github.com> > Hi, > > This is just a redo of https://github.com/openjdk/jdk/pull/13093. mostly just the revert of the backout. > > Regarding the related issues: > > - [JDK-8306008](https://bugs.openjdk.org/browse/JDK-8306008) and [JDK-8309531](https://bugs.openjdk.org/browse/JDK-8309531) have been fixed before the backout. > - [JDK-8309373](https://bugs.openjdk.org/browse/JDK-8309373) was due to missing `ForceInline` on `AbstractVector::toBitsVectorTemplate` > - [JDK-8306592](https://bugs.openjdk.org/browse/JDK-8306592), I have not been able to find the root causes. I'm not sure if this is a blocker, now I cannot even build x86-32 tests. > > Finally, I moved some implementation of public methods and methods that call into intrinsics to the concrete class as that may help the compiler know the correct types of the variables. > > Please take a look and leave reviews. Thanks a lot. > > The description of the original PR: > > This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, `VectorShuffle` is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: > > Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. > Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. > Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. > Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. > Upon these changes, a `rearrange` can emit more efficient code: > > var species = IntVector.SPECIES_128; > var v1 = IntVector.fromArray(species, SRC1, 0); > var v2 = IntVector.fromArray(species, SRC2, 0); > v1.rearrange(v2.toShuffle()).intoArray(DST, 0); > > Before: > movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} > vmovdqu 0x10(%r10),%xmm2 > movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} > vmovdqu 0x10(%r10),%xmm1 > movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} > vmovdqu 0x10(%r10),%xmm0 > vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byte_mask > ; {ex... Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: add comment, extract cast into local variable ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21042/files - new: https://git.openjdk.org/jdk/pull/21042/files/c851d133..a2f59007 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21042&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21042&range=04-05 Stats: 68 lines in 32 files changed: 6 ins; 0 del; 62 mod Patch: https://git.openjdk.org/jdk/pull/21042.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21042/head:pull/21042 PR: https://git.openjdk.org/jdk/pull/21042 From qamai at openjdk.org Fri Dec 6 17:00:15 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 6 Dec 2024 17:00:15 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v2] In-Reply-To: References: <07FzsRmVOfdZW51Pif18Dsb0gvWjVIfUswT2Eljm4IY=.852007f6-e3bf-492d-b7ad-9330656edd6a@github.com> <2z1gcZ6s7rIbNJvgJkShmTZRx6Qsbjr5iXphL3I7vgw=.a873603a-ab40-40df-b862-c88a44767454@github.com> Message-ID: On Fri, 6 Dec 2024 16:16:04 GMT, Paul Sandoz wrote: >> This piece of code follows the pattern: >> >> if (uncommonCondition) { >> return uncommonPath(); >> } >> // Continue the common path >> >> So I think it is better to keep it as it is. > > Ok, but since it deviates from the pattern applied in other areas for POT checks and caught me out would you mind adding a comment? Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21042#discussion_r1873698256 From qamai at openjdk.org Fri Dec 6 17:00:15 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 6 Dec 2024 17:00:15 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v2] In-Reply-To: <75oGPIIsnaNawGwF2IPmHoXf6YsP_E1QbhDwOkDjplU=.bd3b14a2-72fe-45c4-b932-3d1c6972f432@github.com> References: <5NjY5B87PIM5jIaPEQrlsz8tNPoKU14iNsqyTQwmxA8=.ea336100-b4d0-4d7d-8721-8e9e4efb64b8@github.com> <75oGPIIsnaNawGwF2IPmHoXf6YsP_E1QbhDwOkDjplU=.bd3b14a2-72fe-45c4-b932-3d1c6972f432@github.com> Message-ID: On Fri, 6 Dec 2024 16:20:55 GMT, Paul Sandoz wrote: >> Ah i see now, you want to ensure an invocation to the final/concrete method. (The IDE's highlighting of the redundant cast is misleading) > > The common way we tend to do this in other areas is assign to a local variable with the sharper type. This makes it a littler clearer on the intent. Could you do that? Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21042#discussion_r1873698414 From psandoz at openjdk.org Fri Dec 6 17:20:40 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Fri, 6 Dec 2024 17:20:40 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v6] In-Reply-To: <6YeHbHlrPoV9obcfC7g1PtFdMKfanhnsKxqwTZovts4=.f5743961-07e2-4325-b8f5-270edd342e11@github.com> References: <6YeHbHlrPoV9obcfC7g1PtFdMKfanhnsKxqwTZovts4=.f5743961-07e2-4325-b8f5-270edd342e11@github.com> Message-ID: On Fri, 6 Dec 2024 17:00:15 GMT, Quan Anh Mai wrote: >> Hi, >> >> This is just a redo of https://github.com/openjdk/jdk/pull/13093. mostly just the revert of the backout. >> >> Regarding the related issues: >> >> - [JDK-8306008](https://bugs.openjdk.org/browse/JDK-8306008) and [JDK-8309531](https://bugs.openjdk.org/browse/JDK-8309531) have been fixed before the backout. >> - [JDK-8309373](https://bugs.openjdk.org/browse/JDK-8309373) was due to missing `ForceInline` on `AbstractVector::toBitsVectorTemplate` >> - [JDK-8306592](https://bugs.openjdk.org/browse/JDK-8306592), I have not been able to find the root causes. I'm not sure if this is a blocker, now I cannot even build x86-32 tests. >> >> Finally, I moved some implementation of public methods and methods that call into intrinsics to the concrete class as that may help the compiler know the correct types of the variables. >> >> Please take a look and leave reviews. Thanks a lot. >> >> The description of the original PR: >> >> This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, `VectorShuffle` is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: >> >> Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. >> Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. >> Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. >> Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. >> Upon these changes, a `rearrange` can emit more efficient code: >> >> var species = IntVector.SPECIES_128; >> var v1 = IntVector.fromArray(species, SRC1, 0); >> var v2 = IntVector.fromArray(species, SRC2, 0); >> v1.rearrange(v2.toShuffle()).intoArray(DST, 0); >> >> Before: >> movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} >> vmovdqu 0x10(%r10),%xmm2 >> movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} >> vmovdqu 0x10(%r10),%xmm1 >> movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} >> vmovdqu 0x10(%r10),%xmm0 >> vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byt... > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > add comment, extract cast into local variable Java changes look good. Needs a HotSpot review. ------------- Marked as reviewed by psandoz (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21042#pullrequestreview-2485451835 From psandoz at openjdk.org Fri Dec 6 17:25:39 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Fri, 6 Dec 2024 17:25:39 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v2] In-Reply-To: References: Message-ID: On Tue, 26 Nov 2024 18:17:05 GMT, Quan Anh Mai wrote: >> @merykitty Could you please merge with the latest and resolve conflicts? > > @sviswa7 @PaulSandoz @eme64 @jatin-bhateja Thanks for taking a look, I have merged the PR with a more recent master and resolved the sematic difference with newly added intrinsics, too. @merykitty do you want me to initiate tier 1 to 3 tests? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21042#issuecomment-2523785970 From qamai at openjdk.org Fri Dec 6 17:37:52 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 6 Dec 2024 17:37:52 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v2] In-Reply-To: References: Message-ID: <7todhQ5vEpyMrDa9XkNhJRpZmny4NfgqNgPgY_INZZE=.2e1875cb-9b69-46d2-8790-171caa75b795@github.com> On Fri, 6 Dec 2024 17:22:33 GMT, Paul Sandoz wrote: >> @sviswa7 @PaulSandoz @eme64 @jatin-bhateja Thanks for taking a look, I have merged the PR with a more recent master and resolved the sematic difference with newly added intrinsics, too. > > @merykitty do you want me to initiate tier 1 to 3 tests? @PaulSandoz Yes, please. Thanks a lot for your help. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21042#issuecomment-2523819523 From iklam at openjdk.org Fri Dec 6 18:00:37 2024 From: iklam at openjdk.org (Ioi Lam) Date: Fri, 6 Dec 2024 18:00:37 GMT Subject: RFR: 8344556: [Graal] compiler/intrinsics/bmi/* fail when AOTCache cannot be loaded In-Reply-To: References: Message-ID: On Fri, 6 Dec 2024 16:13:32 GMT, Vladimir Kozlov wrote: > I agree with @dougxc suggestion in bug's comment: "these tests should not be run with -XX:+AOTClassLinking when JVMCI is enabled" > > Is it possible to add such condition to tests @require ? For test coverage, I think we should test it for this flag combination as well. We just need to fix the test so that it doesn't care about unrelated error messages. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22596#issuecomment-2523864882 From kvn at openjdk.org Fri Dec 6 18:33:47 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 6 Dec 2024 18:33:47 GMT Subject: RFR: 8342651: Refactor array constant to use an array of jbyte [v5] In-Reply-To: References: Message-ID: On Fri, 6 Dec 2024 11:54:25 GMT, Quan Anh Mai wrote: >> Hi, >> >> This small patch refactors array constants in C2 to use an array of `jbyte`s instead of an array of `jvalue`. The former is much easier to work with and we can do `memcpy` with them trivially. >> >> Since code buffers support alignment of the constant section, I have also allowed constant tables to be aligned more than 8 bytes and used it for constant vectors on machines not supporting `SSE3`. I also fixed an issue with code buffer relocation where the temporary buffer is not correctly aligned. >> >> This patch is extracted from https://github.com/openjdk/jdk/pull/21229. Tests passed with `UseSSE=2` where 16-byte constants would be generated, as well as normal testing routines. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge branch 'master' into constanttable > - add comment to ConstantTable::alignment > - Merge branch 'master' into constanttable > - indentation > - Merge branch 'master' into constanttable > - Merge branch 'master' into constanttable > - refactor array constant, fix codebuffer reallocation I started testing. Leonid is fixing issue with GHA testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21596#issuecomment-2523917332 PR Comment: https://git.openjdk.org/jdk/pull/21596#issuecomment-2523917959 From iklam at openjdk.org Fri Dec 6 19:03:41 2024 From: iklam at openjdk.org (Ioi Lam) Date: Fri, 6 Dec 2024 19:03:41 GMT Subject: RFR: 8344556: [Graal] compiler/intrinsics/bmi/* fail when AOTCache cannot be loaded In-Reply-To: References: Message-ID: <8ZEThx_E-cF_zRUvBlF-asHXC3RzWnIWQTJsPXqpM78=.e2dbc7d3-3e0f-4242-8a5e-9d7f02e14daf@github.com> On Fri, 6 Dec 2024 05:52:49 GMT, Ioi Lam wrote: > See JBS issue for detailed evaluation. > > In summary -- when `-XX:+UseJVMCICompiler` is used, one of `-int` or `-Xcomp` will print out the `[cds][error]` message. The work around is to disable the CDS log for both cases. There's a better fix by adding `--add-modules=jdk.internal.vm.ci` when running the tests under this configuration. Please see more info in [JDK-8344556](https://bugs.openjdk.org/browse/JDK-8344556). $ env JVMCI_VERSION_CHECK=ignore make test \ JTREG_AOT_JDK=true \ TEST_OPTS_VM_OPTIONS="--add-modules=jdk.internal.vm.ci -XX:+AOTClassLinking -XX:+UnlockExperimentalVMOptions -XX:+UseJVMCICompiler" \ JTREG_OPTIONS="-e:JVMCI_VERSION_CHECK=ignore" \ TEST=open/test/hotspot/jtreg/compiler/intrinsics/bmi/TestLzcntL.java ============================== Test summary ============================== TEST TOTAL PASS FAIL ERROR jtreg:open/test/hotspot/jtreg/compiler/intrinsics/bmi/TestLzcntL.java 1 1 0 0 ============================== TEST SUCCESS Closing this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22596#issuecomment-2523966011 From iklam at openjdk.org Fri Dec 6 19:03:42 2024 From: iklam at openjdk.org (Ioi Lam) Date: Fri, 6 Dec 2024 19:03:42 GMT Subject: Withdrawn: 8344556: [Graal] compiler/intrinsics/bmi/* fail when AOTCache cannot be loaded In-Reply-To: References: Message-ID: <4RdRyCY5P_kHu3w_FrymJxhtuGHcDGS-V4po1PWZ4Tw=.7d2dfd1c-4afd-48fd-93f6-a51ed6ab3d8d@github.com> On Fri, 6 Dec 2024 05:52:49 GMT, Ioi Lam wrote: > See JBS issue for detailed evaluation. > > In summary -- when `-XX:+UseJVMCICompiler` is used, one of `-int` or `-Xcomp` will print out the `[cds][error]` message. The work around is to disable the CDS log for both cases. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/22596 From vladimir.x.ivanov at oracle.com Fri Dec 6 23:18:17 2024 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 6 Dec 2024 15:18:17 -0800 Subject: RFC: Untangle native libraries and the JVM: SVML, SLEEF, and libsimdsort Message-ID: <1bf041a1-1002-4d18-85b3-15f437fa534e@oracle.com> Recently, a trend emerged to use native libraries to back intrinsics in HotSpot JVM. SVML stubs for Vector API paved the road and it was soon followed by SLEEF and simdsort libraries. After examining their support, I must confess that it doesn't look pretty. It introduces significant accidental complexity on JVM side. HotSpot has to be taught about every entry point in each library in an ad-hoc manner. It's inherently unsafe, error-prone to implement and hard to maintain: JVM makes a lot of assumptions about an entry point based solely on its symbolic name and each library has its own naming conventions. Overall, current approach doesn't scale well. Fortunately, new FFI API (java.lang.foreign) was finalized in 22. It provides enough functionality to interact with native libraries from Java in performant manner. I did an exercise to migrate all 3 libraries away from intrinsics and the results look promising: simdsort: https://github.com/openjdk/jdk/pull/22621 SVML/SLEEF: https://github.com/openjdk/jdk/pull/22619 As of now, java.lang.foreign lacks vector calling convention support, so the actual calls into SVML/SLEEF are still backed by intrinsics. But it still enables a major cleanup on JVM side. Also, I coded library headers and used jextract to produce initial library API sketch in Java and it worked really well. Eventually, it can be incorporated into JDK build process to ensure the consistency between native and Java parts of library API. Performance wise, it is on par with current (intrinsic-based) implementation. One open question relates to CPU dispatching. Each library exposes multiple functions with different requirements about CPU ISA extension support (e.g., no AVX vs AVX2 vs AVX512, NEON vs SVE). Right now, it's JVM responsibility, but once it gets out of the loop, the library itself should make the decision. I experimented with 2 approaches: (1) perform CPU dispatching with linking library from Java code (as illustrated in aforementioned PRs); or (2) call into native library to query it about the right entry point [1] [2] [3]. In both cases, it depends on additional API to sense the JVM/hardware capabilities (exposed on jdk.internal.misc.VM for now). Let me know if you have any questions/suggestions/concerns. Thanks! I plan to eventually start publishing PRs to upstream this work. Best regards, Vladimir Ivanov [1] https://github.com/openjdk/jdk/commit/b6e6f2e20772e86fbf9088bcef01391461c17f11 [2] https://github.com/iwanowww/jdk/blob/09234832b6419e54c4fc182e77f6214b36afa4c5/src/java.base/share/classes/java/util/SIMDSortLibrary.java [3] https://github.com/iwanowww/jdk/blob/09234832b6419e54c4fc182e77f6214b36afa4c5/src/java.base/linux/native/libsimdsort/simdsort.c From paul.sandoz at oracle.com Fri Dec 6 23:59:23 2024 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Fri, 6 Dec 2024 23:59:23 +0000 Subject: RFC: Untangle native libraries and the JVM: SVML, SLEEF, and libsimdsort In-Reply-To: <1bf041a1-1002-4d18-85b3-15f437fa534e@oracle.com> References: <1bf041a1-1002-4d18-85b3-15f437fa534e@oracle.com> Message-ID: Hi Vladimir, Excellent work, very happy to see more of this moved to Java leveraging Panama features. The Java code looks very organized. I am wondering if this technique can be applied to stubs dynamically generated by HotSpot via some sort of special library lookup e.g., for crypto. Do you have a sense of the differences in static memory footprint and startup cost? Things I imagine Leyden could help with. Regarding CPU dispatching, my preference would be to do it in Java. Less native logic. This may also be useful to help determine whether we can/should expose capabilities in the Vector API regarding what is optimally supported or not. I presume it also does not preclude some sort of jlink plugin that strips unused methods from the native libraries, something which may be tricker if done in the native library itself? Paul. > On Dec 6, 2024, at 3:18?PM, Vladimir Ivanov wrote: > > Recently, a trend emerged to use native libraries to back intrinsics in HotSpot JVM. SVML stubs for Vector API paved the road and it was soon followed by SLEEF and simdsort libraries. > > After examining their support, I must confess that it doesn't look pretty. It introduces significant accidental complexity on JVM side. HotSpot has to be taught about every entry point in each library in an ad-hoc manner. It's inherently unsafe, error-prone to implement and hard to maintain: JVM makes a lot of assumptions about an entry point based solely on its symbolic name and each library has its own naming conventions. Overall, current approach doesn't scale well. > > Fortunately, new FFI API (java.lang.foreign) was finalized in 22. It provides enough functionality to interact with native libraries from Java in performant manner. > > I did an exercise to migrate all 3 libraries away from intrinsics and the results look promising: > > simdsort: https://github.com/openjdk/jdk/pull/22621 > > SVML/SLEEF: https://github.com/openjdk/jdk/pull/22619 > > As of now, java.lang.foreign lacks vector calling convention support, so the actual calls into SVML/SLEEF are still backed by intrinsics. But it still enables a major cleanup on JVM side. > > Also, I coded library headers and used jextract to produce initial library API sketch in Java and it worked really well. Eventually, it can be incorporated into JDK build process to ensure the consistency between native and Java parts of library API. > > Performance wise, it is on par with current (intrinsic-based) implementation. > > One open question relates to CPU dispatching. > > Each library exposes multiple functions with different requirements about CPU ISA extension support (e.g., no AVX vs AVX2 vs AVX512, NEON vs SVE). Right now, it's JVM responsibility, but once it gets out of the loop, the library itself should make the decision. I experimented with 2 approaches: (1) perform CPU dispatching with linking library from Java code (as illustrated in aforementioned PRs); or (2) call into native library to query it about the right entry point [1] [2] [3]. In both cases, it depends on additional API to sense the JVM/hardware capabilities (exposed on jdk.internal.misc.VM for now). > > Let me know if you have any questions/suggestions/concerns. Thanks! > > I plan to eventually start publishing PRs to upstream this work. > > Best regards, > Vladimir Ivanov > > [1] https://github.com/openjdk/jdk/commit/b6e6f2e20772e86fbf9088bcef01391461c17f11 > > [2] https://github.com/iwanowww/jdk/blob/09234832b6419e54c4fc182e77f6214b36afa4c5/src/java.base/share/classes/java/util/SIMDSortLibrary.java > > [3] https://github.com/iwanowww/jdk/blob/09234832b6419e54c4fc182e77f6214b36afa4c5/src/java.base/linux/native/libsimdsort/simdsort.c > From vladimir.x.ivanov at oracle.com Sat Dec 7 00:48:52 2024 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 6 Dec 2024 16:48:52 -0800 Subject: RFC: Untangle native libraries and the JVM: SVML, SLEEF, and libsimdsort In-Reply-To: References: <1bf041a1-1002-4d18-85b3-15f437fa534e@oracle.com> Message-ID: Thanks, Paul. > Excellent work, very happy to see more of this moved to Java leveraging Panama features. The Java code looks very organized. > > I am wondering if this technique can be applied to stubs dynamically generated by HotSpot via some sort of special library lookup e.g., for crypto. It's an interesting idea. A JVM could expose individual symbols, so they can be looked up, but a more promising approach is to just expose a table of generated stubs through a native call into JVM (similar to simdsort_link [1]). The problematic part is that stubs don't have to obey to platform ABI. Some of them deliberately rely on very restrictive calling conventions (e.g., no caller-saved registers), so calling them from generated code is much simpler and cheaper. In a longer term, custom calling conventions for each entry point can be coded if there's enough java.lang.foreign support present. (So, an entry point returned by the JVM comprises of an entry address accompanied by an appropriate invoker.) > Do you have a sense of the differences in static memory footprint and startup cost? Things I imagine Leyden could help with. Are you asking about simdsort/SVML/SLEEF case here? I didn't measure, but initialization costs will definitely be higher (compared to JVM-only solution). In absolute numbers it should be negligible though (the libraries expose small number of entry points). > Regarding CPU dispatching, my preference would be to do it in Java. Less native logic. Fair enough. The nice thing about doing CPU dispatching on native library side is that all those cryptic naming conventions don't show up on Java side [2], but IMO it requires too much ceremony, so I kept it on Java side for now. > This may also be useful to help determine whether we can/should expose capabilities in the Vector API regarding what is optimally supported or not. IMO Vector API (as it is implemented now) would benefit from a higher-level C2-specific API. > I presume it also does not preclude some sort of jlink plugin that strips unused methods from the native libraries, something which may be tricker if done in the native library itself? Good point. It may be the case, but I don't have enough experience with native library stripping to comment on it. Best regards, Vladimir Ivanov [1] https://github.com/openjdk/jdk/commit/b6e6f2e20772e86fbf9088bcef01391461c17f11 [2] https://github.com/iwanowww/jdk/blob/09234832b6419e54c4fc182e77f6214b36afa4c5/src/java.base/linux/native/libsimdsort/simdsort.c > > Paul. > > >> On Dec 6, 2024, at 3:18?PM, Vladimir Ivanov wrote: >> >> Recently, a trend emerged to use native libraries to back intrinsics in HotSpot JVM. SVML stubs for Vector API paved the road and it was soon followed by SLEEF and simdsort libraries. >> >> After examining their support, I must confess that it doesn't look pretty. It introduces significant accidental complexity on JVM side. HotSpot has to be taught about every entry point in each library in an ad-hoc manner. It's inherently unsafe, error-prone to implement and hard to maintain: JVM makes a lot of assumptions about an entry point based solely on its symbolic name and each library has its own naming conventions. Overall, current approach doesn't scale well. >> >> Fortunately, new FFI API (java.lang.foreign) was finalized in 22. It provides enough functionality to interact with native libraries from Java in performant manner. >> >> I did an exercise to migrate all 3 libraries away from intrinsics and the results look promising: >> >> simdsort: https://github.com/openjdk/jdk/pull/22621 >> >> SVML/SLEEF: https://github.com/openjdk/jdk/pull/22619 >> >> As of now, java.lang.foreign lacks vector calling convention support, so the actual calls into SVML/SLEEF are still backed by intrinsics. But it still enables a major cleanup on JVM side. >> >> Also, I coded library headers and used jextract to produce initial library API sketch in Java and it worked really well. Eventually, it can be incorporated into JDK build process to ensure the consistency between native and Java parts of library API. >> >> Performance wise, it is on par with current (intrinsic-based) implementation. >> >> One open question relates to CPU dispatching. >> >> Each library exposes multiple functions with different requirements about CPU ISA extension support (e.g., no AVX vs AVX2 vs AVX512, NEON vs SVE). Right now, it's JVM responsibility, but once it gets out of the loop, the library itself should make the decision. I experimented with 2 approaches: (1) perform CPU dispatching with linking library from Java code (as illustrated in aforementioned PRs); or (2) call into native library to query it about the right entry point [1] [2] [3]. In both cases, it depends on additional API to sense the JVM/hardware capabilities (exposed on jdk.internal.misc.VM for now). >> >> Let me know if you have any questions/suggestions/concerns. Thanks! >> >> I plan to eventually start publishing PRs to upstream this work. >> >> Best regards, >> Vladimir Ivanov >> >> [1] https://github.com/openjdk/jdk/commit/b6e6f2e20772e86fbf9088bcef01391461c17f11 >> >> [2] https://github.com/iwanowww/jdk/blob/09234832b6419e54c4fc182e77f6214b36afa4c5/src/java.base/share/classes/java/util/SIMDSortLibrary.java >> >> [3] https://github.com/iwanowww/jdk/blob/09234832b6419e54c4fc182e77f6214b36afa4c5/src/java.base/linux/native/libsimdsort/simdsort.c >> > From vlivanov at openjdk.org Sat Dec 7 01:54:52 2024 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Sat, 7 Dec 2024 01:54:52 GMT Subject: RFR: 8344068: Windows x86-64: Out of CodeBuffer space when generating final stubs In-Reply-To: References: Message-ID: <8ZDezt567MduRR3Cta4vaKihwt9aiz-JIPHe-MPZ8ak=.764954e8-a8e2-4ea1-abba-fb99c5bb1cca@github.com> On Tue, 3 Dec 2024 14:38:39 GMT, Andrew Haley wrote: > I've had a look at the difference between an Intel AVX-512 machine (which does run out of memory) and an AMD machine, and it seems to be that the AVX-512 stubs required by Windows really do take up a lot of space. This should be sufficient. Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22516#pullrequestreview-2486251790 From lmesnik at openjdk.org Sat Dec 7 03:40:11 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Sat, 7 Dec 2024 03:40:11 GMT Subject: RFR: 8345746: Remove :resourcehogs/compiler from :hotspot_slow_compiler Message-ID: The test group :resourcehogs/compiler contains tests that should not be executed concurrently with *any* tests. They might use a lot of resources and cause unexplained sporadic failures of other tests. So it should be removed from :hotspot_slow_compiler. ------------- Commit messages: - 8345746: Remove :resourcehogs/compiler from :hotspot_slow_compiler Changes: https://git.openjdk.org/jdk/pull/22626/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22626&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8345746 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22626.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22626/head:pull/22626 PR: https://git.openjdk.org/jdk/pull/22626 From lmesnik at openjdk.org Sat Dec 7 04:19:18 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Sat, 7 Dec 2024 04:19:18 GMT Subject: RFR: 8345746: Remove :resourcehogs/compiler from :hotspot_slow_compiler [v2] In-Reply-To: References: Message-ID: > The test group > :resourcehogs/compiler > contains tests that should not be executed concurrently with *any* tests. > > They might use a lot of resources and cause unexplained sporadic failures of other tests. > > So it should be removed from :hotspot_slow_compiler. Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: hogs group added. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22626/files - new: https://git.openjdk.org/jdk/pull/22626/files/9c9e3b83..ca3150d1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22626&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22626&range=00-01 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22626.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22626/head:pull/22626 PR: https://git.openjdk.org/jdk/pull/22626 From syan at openjdk.org Sat Dec 7 13:34:41 2024 From: syan at openjdk.org (SendaoYan) Date: Sat, 7 Dec 2024 13:34:41 GMT Subject: RFR: 8343763: Aarch64: Gtest codestrings.validate_vm intermittent fails extra addr In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 13:43:10 GMT, SendaoYan wrote: > Hi all, > The `Gtest codestrings.validate_vm` intermittent fails with different disassembly symbol name, such as different symbol name with instruction `adrp`/`b` etc. I think the difference of symbol name is acceptable, this PR remove the releated symbol name to make the fragile disassemble identical compare more robustness. > The change has been verified locally, the gtest test run with 20k times all passed, except sometimes the subtest `ThreadsListHandle::sanity_vm` intermittent fails which has been recorded by [JDK-8315141](https://bugs.openjdk.org/browse/JDK-8315141). Test-fix only, no risk. Hi, can anyone take look this PR, which remove the releated symbol name to make the fragile disassemble identical compare more robustness. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21955#issuecomment-2525115459 From kvn at openjdk.org Sun Dec 8 19:09:50 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sun, 8 Dec 2024 19:09:50 GMT Subject: RFR: 8342651: Refactor array constant to use an array of jbyte [v5] In-Reply-To: References: Message-ID: On Fri, 6 Dec 2024 11:54:25 GMT, Quan Anh Mai wrote: >> Hi, >> >> This small patch refactors array constants in C2 to use an array of `jbyte`s instead of an array of `jvalue`. The former is much easier to work with and we can do `memcpy` with them trivially. >> >> Since code buffers support alignment of the constant section, I have also allowed constant tables to be aligned more than 8 bytes and used it for constant vectors on machines not supporting `SSE3`. I also fixed an issue with code buffer relocation where the temporary buffer is not correctly aligned. >> >> This patch is extracted from https://github.com/openjdk/jdk/pull/21229. Tests passed with `UseSSE=2` where 16-byte constants would be generated, as well as normal testing routines. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge branch 'master' into constanttable > - add comment to ConstantTable::alignment > - Merge branch 'master' into constanttable > - indentation > - Merge branch 'master' into constanttable > - Merge branch 'master' into constanttable > - refactor array constant, fix codebuffer reallocation My tier1-4 testing passed. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21596#pullrequestreview-2487245490 From kvn at openjdk.org Sun Dec 8 19:11:41 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Sun, 8 Dec 2024 19:11:41 GMT Subject: RFR: 8345746: Remove :resourcehogs/compiler from :hotspot_slow_compiler [v2] In-Reply-To: References: Message-ID: <-SbhlJxWpaW1GOF4i-hSe7k7nUQ_KCRiqIFW5WnyPhs=.c1670b00-8d28-4810-a6ef-e42cc2b5136e@github.com> On Sat, 7 Dec 2024 04:19:18 GMT, Leonid Mesnik wrote: >> The test group >> :resourcehogs/compiler >> contains tests that should not be executed concurrently with *any* tests. >> >> They might use a lot of resources and cause unexplained sporadic failures of other tests. >> >> So it should be removed from :hotspot_slow_compiler. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > hogs group added. Marked as reviewed by kvn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22626#pullrequestreview-2487246566 From dholmes at openjdk.org Mon Dec 9 01:03:51 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 9 Dec 2024 01:03:51 GMT Subject: RFR: 8345700: tier{1, 2, 3}_compiler don't cover all compiler tests In-Reply-To: References: Message-ID: On Fri, 6 Dec 2024 20:10:08 GMT, Leonid Mesnik wrote: > Hi, > could you please review following fix that update > tier3_compiler test group > so :hotspot_compiler is always a sum of > tier1_compiler, tier2_compiler, tier3_compiler > This is natural splitting of tests into 3 layers. > The fix is done to execution :hotspot_compiler into 3 tiers with corresponding group names. > > New tests in tier3: > 9 tests in compiler/ccp/ > 22 test in ompiler/predicates/ > 8 tests in compiler/splitif/ > So the number is not increased significantly. > > The new group > ` tier2_ctw` > was introduced for ctw testing. > It is different from tests in hotspot_compiler, and usually executed separately, So I added it as separate sub-test of tier2. > > I moved id from tier3 to tier2 to correspond current tiers of Hotspot CI in our system. > If anyone thinks it should be part of tier3 - we can discuss in PR comments how do deal with it. > > @shipilev, could you please take a look and check if these changes are ok to you since you the author of tier2/tier3. @lmesnik so this is now going to run all the slow tests in tier3? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22614#issuecomment-2526569595 From syan at openjdk.org Mon Dec 9 01:43:49 2024 From: syan at openjdk.org (SendaoYan) Date: Mon, 9 Dec 2024 01:43:49 GMT Subject: RFR: 8345698: Remove tier1_compiler_not_xcomp from github actions In-Reply-To: References: Message-ID: On Fri, 6 Dec 2024 17:32:09 GMT, Leonid Mesnik wrote: > The fix for > https://bugs.openjdk.org/browse/JDK-8345435 > delete tier1_compiler_not_xcomp group > but don't remove corresponding testing from github actions. Marked as reviewed by syan (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22612#pullrequestreview-2487439416 From liach at openjdk.org Mon Dec 9 01:49:52 2024 From: liach at openjdk.org (Chen Liang) Date: Mon, 9 Dec 2024 01:49:52 GMT Subject: RFR: 8345698: Remove tier1_compiler_not_xcomp from github actions In-Reply-To: References: Message-ID: <5KSBwycg5B1BhZ3ep1JEdnixOqbgCROmqPKG2YjKByc=.0e26a213-6117-4928-a487-1592cd06ba59@github.com> On Fri, 6 Dec 2024 17:32:09 GMT, Leonid Mesnik wrote: > The fix for > https://bugs.openjdk.org/browse/JDK-8345435 > delete tier1_compiler_not_xcomp group > but don't remove corresponding testing from github actions. Marked as reviewed by liach (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22612#pullrequestreview-2487448526 From lmesnik at openjdk.org Mon Dec 9 02:22:40 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Mon, 9 Dec 2024 02:22:40 GMT Subject: RFR: 8345700: tier{1, 2, 3}_compiler don't cover all compiler tests In-Reply-To: References: Message-ID: On Fri, 6 Dec 2024 20:10:08 GMT, Leonid Mesnik wrote: > Hi, > could you please review following fix that update > tier3_compiler test group > so :hotspot_compiler is always a sum of > tier1_compiler, tier2_compiler, tier3_compiler > This is natural splitting of tests into 3 layers. > The fix is done to execution :hotspot_compiler into 3 tiers with corresponding group names. > > New tests in tier3: > 9 tests in compiler/ccp/ > 22 test in ompiler/predicates/ > 8 tests in compiler/splitif/ > So the number is not increased significantly. > > The new group > ` tier2_ctw` > was introduced for ctw testing. > It is different from tests in hotspot_compiler, and usually executed separately, So I added it as separate sub-test of tier2. > > I moved id from tier3 to tier2 to correspond current tiers of Hotspot CI in our system. > If anyone thinks it should be part of tier3 - we can discuss in PR comments how do deal with it. > > @shipilev, could you please take a look and check if these changes are ok to you since you the author of tier2/tier3. The :hotspot_slow_compiler is already in tier3. The are only few groups has been added ad I wrote in description. They are not slow. The problem is that currently for hotspot compiler tier1 + tier2 + tier3 are not all tests and there is no tier4_compiler. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22614#issuecomment-2526662694 From lmesnik at openjdk.org Mon Dec 9 02:46:48 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Mon, 9 Dec 2024 02:46:48 GMT Subject: Integrated: 8345698: Remove tier1_compiler_not_xcomp from github actions In-Reply-To: References: Message-ID: On Fri, 6 Dec 2024 17:32:09 GMT, Leonid Mesnik wrote: > The fix for > https://bugs.openjdk.org/browse/JDK-8345435 > delete tier1_compiler_not_xcomp group > but don't remove corresponding testing from github actions. This pull request has now been integrated. Changeset: 842b3638 Author: Leonid Mesnik URL: https://git.openjdk.org/jdk/commit/842b3638794973a3eae920eb898782b280e99589 Stats: 6 lines in 1 file changed: 0 ins; 5 del; 1 mod 8345698: Remove tier1_compiler_not_xcomp from github actions Reviewed-by: syan, liach ------------- PR: https://git.openjdk.org/jdk/pull/22612 From dholmes at openjdk.org Mon Dec 9 04:53:37 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 9 Dec 2024 04:53:37 GMT Subject: RFR: 8345700: tier{1, 2, 3}_compiler don't cover all compiler tests In-Reply-To: References: Message-ID: On Mon, 9 Dec 2024 02:19:38 GMT, Leonid Mesnik wrote: > The :hotspot_slow_compiler is already in tier3. Sorry I misread things. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22614#issuecomment-2526886559 From dholmes at openjdk.org Mon Dec 9 05:02:37 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 9 Dec 2024 05:02:37 GMT Subject: RFR: 8345700: tier{1, 2, 3}_compiler don't cover all compiler tests In-Reply-To: References: Message-ID: On Fri, 6 Dec 2024 20:10:08 GMT, Leonid Mesnik wrote: > Hi, > could you please review following fix that update > tier3_compiler test group > so :hotspot_compiler is always a sum of > tier1_compiler, tier2_compiler, tier3_compiler > This is natural splitting of tests into 3 layers. > The fix is done to execution :hotspot_compiler into 3 tiers with corresponding group names. > > New tests in tier3: > 9 tests in compiler/ccp/ > 22 test in ompiler/predicates/ > 8 tests in compiler/splitif/ > So the number is not increased significantly. > > The new group > ` tier2_ctw` > was introduced for ctw testing. > It is different from tests in hotspot_compiler, and usually executed separately, So I added it as separate sub-test of tier2. > > I moved id from tier3 to tier2 to correspond current tiers of Hotspot CI in our system. > If anyone thinks it should be part of tier3 - we can discuss in PR comments how do deal with it. > > @shipilev, could you please take a look and check if these changes are ok to you since you the author of tier2/tier3. I don't think the ctw tests should be moved up to tier2. IIUC these are quite heavyweight tests and they will run too often in tier2 compared to tier3. ------------- Changes requested by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22614#pullrequestreview-2487731782 From lmesnik at openjdk.org Mon Dec 9 05:20:36 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Mon, 9 Dec 2024 05:20:36 GMT Subject: RFR: 8345700: tier{1, 2, 3}_compiler don't cover all compiler tests In-Reply-To: References: Message-ID: On Mon, 9 Dec 2024 04:59:53 GMT, David Holmes wrote: > I don't think the ctw tests should be moved up to tier2. IIUC these are quite heavyweight tests and they will run too often in tier2 compared to tier3. Sorry, it was a typo in description: `I moved id from tier3 to tier2 to correspond current tiers of Hotspot CI in our system. ` updated to `I moved ctw from tier3 to tier2 to correspond current tiers of Hotspot CI in our system. ` So fix just align ctw tier to current state. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22614#issuecomment-2526933897 From dholmes at openjdk.org Mon Dec 9 05:30:37 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 9 Dec 2024 05:30:37 GMT Subject: RFR: 8345700: tier{1, 2, 3}_compiler don't cover all compiler tests In-Reply-To: References: Message-ID: On Fri, 6 Dec 2024 20:10:08 GMT, Leonid Mesnik wrote: > Hi, > could you please review following fix that update > tier3_compiler test group > so :hotspot_compiler is always a sum of > tier1_compiler, tier2_compiler, tier3_compiler > This is natural splitting of tests into 3 layers. > The fix is done to execution :hotspot_compiler into 3 tiers with corresponding group names. > > New tests in tier3: > 9 tests in compiler/ccp/ > 22 test in ompiler/predicates/ > 8 tests in compiler/splitif/ > So the number is not increased significantly. > > The new group > ` tier2_ctw` > was introduced for ctw testing. > It is different from tests in hotspot_compiler, and usually executed separately, So I added it as separate sub-test of tier2. > > I moved ctw from tier3 to tier2 to correspond current tiers of Hotspot CI in our system. > If anyone thinks it should be part of tier3 - we can discuss in PR comments how do deal with it. > > @shipilev, could you please take a look and check if these changes are ok to you since you the author of tier2/tier3. Okay - from Oracle perspective changes seem fine as they align with our existing testing. Need to see how this impacts others. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22614#pullrequestreview-2487768942 From duke at openjdk.org Mon Dec 9 07:12:40 2024 From: duke at openjdk.org (Daniel Skantz) Date: Mon, 9 Dec 2024 07:12:40 GMT Subject: RFR: 8345156: C2: Add bailouts next to a few asserts In-Reply-To: References: Message-ID: On Mon, 2 Dec 2024 13:07:57 GMT, Daniel Skantz wrote: > This patch associates product bailouts with a few existing debug asserts. The criteria are that there have been product bugs associated with failing these asserts in the past, and that there are not too many callers to the method where the compilation is now cancelled or any measurable impact on compilation time. > > Testing: T1-T4 and extra testing. Tested compilation time with -XX:+CITime on performance benchmarks and the effect was not measurable. Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22482#issuecomment-2527112543 From duke at openjdk.org Mon Dec 9 07:12:41 2024 From: duke at openjdk.org (duke) Date: Mon, 9 Dec 2024 07:12:41 GMT Subject: RFR: 8345156: C2: Add bailouts next to a few asserts In-Reply-To: References: Message-ID: <8Hd0XNuIlCVZJwqfdV94WnHGvXAgCF2_8kgKyTQWCCM=.ada7ba33-3c58-4bd3-b8a4-331529289a3e@github.com> On Mon, 2 Dec 2024 13:07:57 GMT, Daniel Skantz wrote: > This patch associates product bailouts with a few existing debug asserts. The criteria are that there have been product bugs associated with failing these asserts in the past, and that there are not too many callers to the method where the compilation is now cancelled or any measurable impact on compilation time. > > Testing: T1-T4 and extra testing. Tested compilation time with -XX:+CITime on performance benchmarks and the effect was not measurable. @danielogh Your change (at version 2ab24738f2d9ced8143b9a5711dc3707a11f6bd1) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22482#issuecomment-2527113206 From duke at openjdk.org Mon Dec 9 07:55:41 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Mon, 9 Dec 2024 07:55:41 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v17] In-Reply-To: References: Message-ID: On Wed, 20 Nov 2024 11:35:59 GMT, Tobias Hartmann wrote: >> theoweidmannoracle has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update UDivINodeIdealizationTests.java >> - Remove more magic numbers > > Looks good to me otherwise. Nice tests! @TobiHartmann Would you also like to take a final look before integrating this? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22061#issuecomment-2527182826 From chagedorn at openjdk.org Mon Dec 9 08:06:41 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 9 Dec 2024 08:06:41 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v17] In-Reply-To: <9AO7AsQv3puIVURjKB7wvQCWcMO2ZG4gpUJxpxTLghw=.28527eeb-1ec3-42cf-b714-ccd5f5541e71@github.com> References: <9AO7AsQv3puIVURjKB7wvQCWcMO2ZG4gpUJxpxTLghw=.28527eeb-1ec3-42cf-b714-ccd5f5541e71@github.com> Message-ID: On Mon, 2 Dec 2024 16:38:03 GMT, theoweidmannoracle wrote: >> This PR introduces >> - several new optimizations to unsigned division and modulo >> - x % 1, x % x, x % 2^k >> - x / 1, x / x, x / 2^k >> - does not implement the Granlund and Montgomery algorithm, which has been implemented for signed modulo division in the past. It is unclear if a lot is to be gained by implementing this. >> - tests to test existing optimizations for signed division and modulo >> - does not test the Granlund and Montgomery algorithm directly > > theoweidmannoracle has updated the pull request incrementally with two additional commits since the last revision: > > - Update UDivINodeIdealizationTests.java > - Remove more magic numbers Hi Theo, I've just realized that this PR now misses to block rewiring `UDiv/ModI/L` when removing dominating checks. This should be added here: https://github.com/openjdk/jdk/blob/69e664de14d1f9d66447937d494da8bf971ac5fe/src/hotspot/share/opto/phaseX.cpp#L1731-L1749 The original report assumed that this was already possible with mainline. But only with adding the missing optimizations for the unsigned versions with this PR, we should now be able to trigger the same SIGFPE failures as tested here for the signed divisions/modulos: test/hotspot/jtreg/compiler/splitif/TestSplitDivisionThroughPhi.java You should add unsigned variants for the same tests to verify the fix. Thanks, Christian ------------- Changes requested by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22061#pullrequestreview-2488034728 From duke at openjdk.org Mon Dec 9 08:17:28 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Mon, 9 Dec 2024 08:17:28 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v23] In-Reply-To: References: Message-ID: > In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. > > Concerns were raised by @rwestrel in the previous PR: > >> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? > > As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). > > The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: > > > 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant > @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call > > > Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. theoweidmannoracle has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 29 commits: - Merge branch 'master' into 8319850 - Fix tests - Merge branch 'master' into 8319850 - Add missing header - Update memory management and use treap - Fix style - Derecursify locate - Merge branch 'master' into 8319850 - Change comment style - Change is_enabled to old pattern - ... and 19 more: https://git.openjdk.org/jdk/compare/69e664de...9fc53d31 ------------- Changes: https://git.openjdk.org/jdk/pull/21899/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21899&range=22 Stats: 948 lines in 17 files changed: 452 ins; 287 del; 209 mod Patch: https://git.openjdk.org/jdk/pull/21899.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21899/head:pull/21899 PR: https://git.openjdk.org/jdk/pull/21899 From duke at openjdk.org Mon Dec 9 08:17:28 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Mon, 9 Dec 2024 08:17:28 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v14] In-Reply-To: References: <2lTjFEZlOOSdOTreg-wci6MSFvUE5N_mie2uf2OYuT4=.cc792d85-81c7-432f-b0bb-6465844785d8@github.com> Message-ID: On Mon, 25 Nov 2024 21:07:40 GMT, Johan Sj?len wrote: >>> Do you think I should introduce an explicit synchronization mechanism to ensure the formatting is still correct with multiple compile threads? >> >> Yes, we could try grabbing the tty lock in dump(), but in the past I think there were sometimes problems with that approach, which is why there were places where we print everything to a stringStream first. > >> > Do you think I should introduce an explicit synchronization mechanism to ensure the formatting is still correct with multiple compile threads? >> >> Yes, we could try grabbing the tty lock in dump(), but in the past I think there were sometimes problems with that approach, which is why there were places where we print everything to a stringStream first. > > Don't use `ttyLock`, we really want to get rid of that mechanism. The best would be to port the output to UL, but if that's not possible use a `stringStream` as Dean said. @jdksjolen Could you take a look at the new memory management? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21899#issuecomment-2527221145 From shade at openjdk.org Mon Dec 9 08:20:37 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 9 Dec 2024 08:20:37 GMT Subject: RFR: 8345700: tier{1, 2, 3}_compiler don't cover all compiler tests In-Reply-To: References: Message-ID: On Fri, 6 Dec 2024 20:10:08 GMT, Leonid Mesnik wrote: > Hi, > could you please review following fix that update > tier3_compiler test group > so :hotspot_compiler is always a sum of > tier1_compiler, tier2_compiler, tier3_compiler > This is natural splitting of tests into 3 layers. > The fix is done to execution :hotspot_compiler into 3 tiers with corresponding group names. > > New tests in tier3: > 9 tests in compiler/ccp/ > 22 test in ompiler/predicates/ > 8 tests in compiler/splitif/ > So the number is not increased significantly. > > The new group > ` tier2_ctw` > was introduced for ctw testing. > It is different from tests in hotspot_compiler, and usually executed separately, So I added it as separate sub-test of tier2. > > I moved ctw from tier3 to tier2 to correspond current tiers of Hotspot CI in our system. > If anyone thinks it should be part of tier3 - we can discuss in PR comments how do deal with it. > > @shipilev, could you please take a look and check if these changes are ok to you since you the author of tier2/tier3. A good test for these changes is looking at GHA usages and see how well the tests are balanced between the groups. See for example here: https://github.com/lmesnik/jdk/actions/runs/12205347403/usage. I am not sure if this is pre-existing or not, but it looks like `hs/tier1 compiler part 3` is twice as long as `part 1` or `part 2`. We would ideally like all parts to take roughly the same time. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22614#issuecomment-2527231121 From rcastanedalo at openjdk.org Mon Dec 9 08:21:43 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 9 Dec 2024 08:21:43 GMT Subject: RFR: 8345287: C2: live in computation is broken In-Reply-To: References: <66KJ0kayCyRztG81OvFyladp1BbDq5nW810HKEZPmlk=.51449889-9e47-408e-9751-470e1f912113@github.com> <-JM7PaO43IeAwzF30ordG0ERn-ViXxsVfQtE0M7RkgQ=.9fb476a0-b8a9-4618-8aaa-5ceebef181dd@github.com> Message-ID: On Thu, 5 Dec 2024 08:47:31 GMT, Roberto Casta?eda Lozano wrote: > I will do a new benchmark run of jdk-24+26 with default configuration vs. jdk-24+26 with `-XX:-OptoRegScheduling` on x64 and report the results in a few days (likely early next week). Turning off `-XX:-OptoRegScheduling` seems to have a slight overall negative effect on Renaissance (would require a more thorough analysis to confirm) and no measurable effect on DaCapo and SPECjvm2008 except for a slight improvement (around 1%) in crypto.signverify for older Coffee Lake-B machines. I agree with Vladimir that it would be worth further investigating (separately) whether to disable (or improve) `OptoRegScheduling`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22473#issuecomment-2527233803 From epeter at openjdk.org Mon Dec 9 08:21:45 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 9 Dec 2024 08:21:45 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v6] In-Reply-To: <6YeHbHlrPoV9obcfC7g1PtFdMKfanhnsKxqwTZovts4=.f5743961-07e2-4325-b8f5-270edd342e11@github.com> References: <6YeHbHlrPoV9obcfC7g1PtFdMKfanhnsKxqwTZovts4=.f5743961-07e2-4325-b8f5-270edd342e11@github.com> Message-ID: On Fri, 6 Dec 2024 17:00:15 GMT, Quan Anh Mai wrote: >> Hi, >> >> This is just a redo of https://github.com/openjdk/jdk/pull/13093. mostly just the revert of the backout. >> >> Regarding the related issues: >> >> - [JDK-8306008](https://bugs.openjdk.org/browse/JDK-8306008) and [JDK-8309531](https://bugs.openjdk.org/browse/JDK-8309531) have been fixed before the backout. >> - [JDK-8309373](https://bugs.openjdk.org/browse/JDK-8309373) was due to missing `ForceInline` on `AbstractVector::toBitsVectorTemplate` >> - [JDK-8306592](https://bugs.openjdk.org/browse/JDK-8306592), I have not been able to find the root causes. I'm not sure if this is a blocker, now I cannot even build x86-32 tests. >> >> Finally, I moved some implementation of public methods and methods that call into intrinsics to the concrete class as that may help the compiler know the correct types of the variables. >> >> Please take a look and leave reviews. Thanks a lot. >> >> The description of the original PR: >> >> This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, `VectorShuffle` is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: >> >> Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. >> Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. >> Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. >> Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. >> Upon these changes, a `rearrange` can emit more efficient code: >> >> var species = IntVector.SPECIES_128; >> var v1 = IntVector.fromArray(species, SRC1, 0); >> var v2 = IntVector.fromArray(species, SRC2, 0); >> v1.rearrange(v2.toShuffle()).intoArray(DST, 0); >> >> Before: >> movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} >> vmovdqu 0x10(%r10),%xmm2 >> movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} >> vmovdqu 0x10(%r10),%xmm1 >> movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} >> vmovdqu 0x10(%r10),%xmm0 >> vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byt... > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > add comment, extract cast into local variable @merykitty looks quite reasonable, though I only looked at the VM changes, only scanned the Java library code. src/hotspot/cpu/x86/x86.ad line 2215: > 2213: > 2214: // Return true if Vector::rearrange needs preparation of the shuffle argument > 2215: bool Matcher::vector_needs_load_shuffle(BasicType elem_bt, int vlen) { I commented on this before. This needs to have a more expressive name. Is it just about rearrange? Because now it sounds like maybe all vectors may need a shuffle. Or just all loads? Confusing. src/hotspot/share/opto/vectorIntrinsics.cpp line 1757: > 1755: > 1756: int num_elem = vlen->get_con(); > 1757: bool need_load_shuffle = Matcher::vector_needs_load_shuffle(shuffle_bt, num_elem); Maybe it could be renamed to `vector_rearrange_requires_load_shuffle`? src/hotspot/share/opto/vectorIntrinsics.cpp line 1800: > 1798: Node* v1 = unbox_vector(argument(5), vbox_type, elem_bt, num_elem); > 1799: Node* shuffle = unbox_vector(argument(6), shbox_type, shuffle_bt, num_elem); > 1800: const TypeVect* vt = TypeVect::make(elem_bt, num_elem); Is this one used? src/hotspot/share/opto/vectornode.hpp line 1694: > 1692: // we can transform the rearrange into a different element type. For example, on x86 before AVX512, > 1693: // there is no rearrange instruction for short elements, what we will then do is to transform the > 1694: // shuffle vector into 1 that we can do byte rearrange such that it would provide the same result. Suggestion: // shuffle vector into one that we can do byte rearrange such that it would provide the same result. src/hotspot/share/opto/vectornode.hpp line 1696: > 1694: // shuffle vector into 1 that we can do byte rearrange such that it would provide the same result. > 1695: // This can be done in VectorRearrangeNode during code emission but we eagerly expand out this > 1696: // because it is often the case that an index vector is reused in many rearrange operations. Thanks for this explanation! `This can be done in VectorRearrangeNode` -> **could have** be done, because we now don't do it, right? What do you mean by `expand out this`? Do you mean we have a separate dedicated node, so that it could possibly fold away with other nodes, such as index vector? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21042#pullrequestreview-2487987908 PR Review Comment: https://git.openjdk.org/jdk/pull/21042#discussion_r1875503753 PR Review Comment: https://git.openjdk.org/jdk/pull/21042#discussion_r1875517727 PR Review Comment: https://git.openjdk.org/jdk/pull/21042#discussion_r1875519134 PR Review Comment: https://git.openjdk.org/jdk/pull/21042#discussion_r1875537761 PR Review Comment: https://git.openjdk.org/jdk/pull/21042#discussion_r1875540023 From epeter at openjdk.org Mon Dec 9 08:21:45 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 9 Dec 2024 08:21:45 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v6] In-Reply-To: References: <6YeHbHlrPoV9obcfC7g1PtFdMKfanhnsKxqwTZovts4=.f5743961-07e2-4325-b8f5-270edd342e11@github.com> Message-ID: On Mon, 9 Dec 2024 07:37:55 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> add comment, extract cast into local variable > > src/hotspot/cpu/x86/x86.ad line 2215: > >> 2213: >> 2214: // Return true if Vector::rearrange needs preparation of the shuffle argument >> 2215: bool Matcher::vector_needs_load_shuffle(BasicType elem_bt, int vlen) { > > I commented on this before. This needs to have a more expressive name. Is it just about rearrange? Because now it sounds like maybe all vectors may need a shuffle. Or just all loads? Confusing. Maybe it coud be named `vector_rearrange_requires_load_shuffle` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21042#discussion_r1875542286 From jbhateja at openjdk.org Mon Dec 9 09:10:38 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 9 Dec 2024 09:10:38 GMT Subject: RFR: 8344068: Windows x86-64: Out of CodeBuffer space when generating final stubs In-Reply-To: <6NGReIL8Iw5X-1ZyUTGpA-SL7SysYVWNIu4XLyzb3ag=.dd8cefad-d1f4-4f54-90dc-f7427623bab0@github.com> References: <6NGReIL8Iw5X-1ZyUTGpA-SL7SysYVWNIu4XLyzb3ag=.dd8cefad-d1f4-4f54-90dc-f7427623bab0@github.com> Message-ID: On Thu, 5 Dec 2024 16:29:57 GMT, Vladimir Kozlov wrote: >> I spent a while reading the dumps of final stubs. I can account for 10k additional for XMM stubs in final on Windows. This takes it to the edge. >> >> It may well be that this is really a ZGC thing, but with ZGC on Linux we get nowhere near the limit. I think there is about 20k difference between Linux and Windows with ZGC enabled. I can't account for all of it, though. > > Okay, thank you for details. FTR, Following are the final stub section sizes with and without -XX:+UseZGC on linux and windows Linux - With ZGC [0.928s][info][stubs] StubRoutines (final stubs) [0x00007fb1afc320c0, 0x00007fb1afc3e550] used: 41659, free: 8661 - Without ZGC - [0.019s][info][stubs] StubRoutines (final stubs) [0x00007f5e6fc2cc40, 0x00007f5e6fc390d0] used: 27419, free: 22901 Windows: - With ZGC - [0.043s][info][stubs] StubRoutines (final stubs) [0x00000291ea5684c0, 0x00000291ea575120] used: 45870, free: 6450 - Without ZGC - [0.039s][info][stubs] StubRoutines (final stubs) [0x000002003a1e31c0, 0x000002003a1efe20] used: 30030, free: 22290 Hi @theRealAph , it will really help if you could kindly share the details to justify this increase. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22516#discussion_r1875613334 From aph at openjdk.org Mon Dec 9 11:08:44 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 9 Dec 2024 11:08:44 GMT Subject: Integrated: 8344068: Windows x86-64: Out of CodeBuffer space when generating final stubs In-Reply-To: References: Message-ID: On Tue, 3 Dec 2024 14:38:39 GMT, Andrew Haley wrote: > I've had a look at the difference between an Intel AVX-512 machine (which does run out of memory) and an AMD machine, and it seems to be that the AVX-512 stubs required by Windows really do take up a lot of space. This should be sufficient. This pull request has now been integrated. Changeset: 830173fc Author: Andrew Haley URL: https://git.openjdk.org/jdk/commit/830173fcb08b004ea3932d47cb522c589feec0b5 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8344068: Windows x86-64: Out of CodeBuffer space when generating final stubs Reviewed-by: kvn, vlivanov ------------- PR: https://git.openjdk.org/jdk/pull/22516 From aph at openjdk.org Mon Dec 9 11:08:43 2024 From: aph at openjdk.org (Andrew Haley) Date: Mon, 9 Dec 2024 11:08:43 GMT Subject: RFR: 8344068: Windows x86-64: Out of CodeBuffer space when generating final stubs In-Reply-To: References: <6NGReIL8Iw5X-1ZyUTGpA-SL7SysYVWNIu4XLyzb3ag=.dd8cefad-d1f4-4f54-90dc-f7427623bab0@github.com> Message-ID: On Mon, 9 Dec 2024 09:05:44 GMT, Jatin Bhateja wrote: >> Okay, thank you for details. > > FTR, > Following are the final stub section sizes with and without -XX:+UseZGC on linux and windows > > > Linux > - With ZGC > [0.928s][info][stubs] StubRoutines (final stubs) [0x00007fb1afc320c0, 0x00007fb1afc3e550] used: 41659, free: 8661 > - Without ZGC > - [0.019s][info][stubs] StubRoutines (final stubs) [0x00007f5e6fc2cc40, 0x00007f5e6fc390d0] used: 27419, free: 22901 > > > > Windows: > - With ZGC > - [0.043s][info][stubs] StubRoutines (final stubs) [0x00000291ea5684c0, 0x00000291ea575120] used: 45870, free: 6450 > - Without ZGC > - [0.039s][info][stubs] StubRoutines (final stubs) [0x000002003a1e31c0, 0x000002003a1efe20] used: 30030, free: 22290 > > > Hi @theRealAph , it will really help if you could kindly share the details to justify this increase. We're seeing only 500 bytes remaining on AVX-512 machines, and I have not checked it all (there's a *lot* of code) but the samples I looked at were all much bigger because of AVX-512 stubs. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22516#discussion_r1875787230 From maurizio.cimadamore at oracle.com Mon Dec 9 11:42:20 2024 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Mon, 9 Dec 2024 11:42:20 +0000 Subject: RFC: Untangle native libraries and the JVM: SVML, SLEEF, and libsimdsort In-Reply-To: <1bf041a1-1002-4d18-85b3-15f437fa534e@oracle.com> References: <1bf041a1-1002-4d18-85b3-15f437fa534e@oracle.com> Message-ID: <6af9e206-4c97-4d54-9bc4-aac67308e0c1@oracle.com> Great work Vlad! The simdsort part seems a more "classic" FFM binding - where you have a method handle per entry point. That seems to fit the design of FFM rather well. In the second case (SVML/SLEEF) usage of FFM is limited to build a "table of entry points" (e.g. we're just using SymbolLookup + MemorySegment here -- the invocation part is intrinsified as part of the new VectorSupport methods). If it helps, it might be possible to define a custom (JDK internal) family of value layouts for vector types. Then we could enhance the Linker classification to support such layouts. This means you could call into native functions with vector parameters and return types using the Linker API more directly. Not sure if it will give you the same performance, but it's also an approach worth exploring. Re. support for custom calling conventions to call into hotspot stubs from Java, this might be possible - our story for supporting calling conventions other than the system calling convention is that there should be a dedicated linker instance per calling convention. So, if the JVM defines its own calling convention for its stubs there should probably be a custom Linker implementation that is used to call into such stubs - which uses the machinery in the Linker implementation (e.g. Bindings) to classify the incoming function descriptors and determine the shuffle sequence for a given particular call. This should all be doable (at least inside the JDK) - it's just matter of "writing more code". I agree with Paul that, as we move more stuff to use Panama, we will need to look more at the avenues available to us to claim back some of the additional warm up cost introduced by the use of var/method handles. This is probably part of a bigger exploration on warmup and FFM. Cheers Maurizio On 06/12/2024 23:18, Vladimir Ivanov wrote: > Recently, a trend emerged to use native libraries to back intrinsics > in HotSpot JVM. SVML stubs for Vector API paved the road and it was > soon followed by SLEEF and simdsort libraries. > > After examining their support, I must confess that it doesn't look > pretty. It introduces significant accidental complexity on JVM side. > HotSpot has to be taught about every entry point in each library in an > ad-hoc manner. It's inherently unsafe, error-prone to implement and > hard to maintain: JVM makes a lot of assumptions about an entry point > based solely on its symbolic name and each library has its own naming > conventions. Overall, current approach doesn't scale well. > > Fortunately, new FFI API (java.lang.foreign) was finalized in 22. It > provides enough functionality to interact with native libraries from > Java in performant manner. > > I did an exercise to migrate all 3 libraries away from intrinsics and > the results look promising: > > ? simdsort: https://github.com/openjdk/jdk/pull/22621 > > ? SVML/SLEEF: https://github.com/openjdk/jdk/pull/22619 > > As of now, java.lang.foreign lacks vector calling convention support, > so the actual calls into SVML/SLEEF are still backed by intrinsics. > But it still enables a major cleanup on JVM side. > > Also, I coded library headers and used jextract to produce initial > library API sketch in Java and it worked really well. Eventually, it > can be incorporated into JDK build process to ensure the consistency > between native and Java parts of library API. > > Performance wise, it is on par with current (intrinsic-based) > implementation. > > One open question relates to CPU dispatching. > > Each library exposes multiple functions with different requirements > about CPU ISA extension support (e.g., no AVX vs AVX2 vs AVX512, NEON > vs SVE). Right now, it's JVM responsibility, but once it gets out of > the loop, the library itself should make the decision. I experimented > with 2 approaches: (1) perform CPU dispatching with linking library > from Java code (as illustrated in aforementioned PRs); or (2) call > into native library to query it about the right entry point [1] [2] > [3]. In both cases, it depends on additional API to sense the > JVM/hardware capabilities (exposed on jdk.internal.misc.VM for now). > > Let me know if you have any questions/suggestions/concerns. Thanks! > > I plan to eventually start publishing PRs to upstream this work. > > Best regards, > Vladimir Ivanov > > [1] > https://github.com/openjdk/jdk/commit/b6e6f2e20772e86fbf9088bcef01391461c17f11 > > [2] > https://github.com/iwanowww/jdk/blob/09234832b6419e54c4fc182e77f6214b36afa4c5/src/java.base/share/classes/java/util/SIMDSortLibrary.java > > [3] > https://github.com/iwanowww/jdk/blob/09234832b6419e54c4fc182e77f6214b36afa4c5/src/java.base/linux/native/libsimdsort/simdsort.c > From ihse at openjdk.org Mon Dec 9 12:12:13 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Mon, 9 Dec 2024 12:12:13 GMT Subject: RFR: 8345793: Update copyright year to 2024 for the build system in files where it was missed Message-ID: Some files have been modified in 2024, but the copyright year has not been properly updated. This should be fixed. I have located these modified files using: git log --since="Jan 1" --name-only --pretty=format: | sort -u > file.list and then run a script to update the copyright year to 2024 on these files. I have made a manual sampling of files in the list to verify that they have indeed been modified in 2024. ------------- Commit messages: - 8345793: Update copyright year to 2024 for the build system in files where it was missed Changes: https://git.openjdk.org/jdk/pull/22636/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22636&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8345793 Stats: 71 lines in 71 files changed: 0 ins; 0 del; 71 mod Patch: https://git.openjdk.org/jdk/pull/22636.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22636/head:pull/22636 PR: https://git.openjdk.org/jdk/pull/22636 From duke at openjdk.org Mon Dec 9 13:28:11 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Mon, 9 Dec 2024 13:28:11 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v24] In-Reply-To: References: Message-ID: > In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. > > Concerns were raised by @rwestrel in the previous PR: > >> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? > > As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). > > The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: > > > 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant > @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call > > > Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. theoweidmannoracle has updated the pull request incrementally with two additional commits since the last revision: - Update TestSplitDivisionThroughPhi.java - Add UDivI/L and UModI/L to no_dependent_zero_check ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21899/files - new: https://git.openjdk.org/jdk/pull/21899/files/9fc53d31..7526baff Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21899&range=23 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21899&range=22-23 Stats: 128 lines in 2 files changed: 126 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21899.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21899/head:pull/21899 PR: https://git.openjdk.org/jdk/pull/21899 From qamai at openjdk.org Mon Dec 9 13:33:11 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 9 Dec 2024 13:33:11 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v7] In-Reply-To: References: Message-ID: > Hi, > > This is just a redo of https://github.com/openjdk/jdk/pull/13093. mostly just the revert of the backout. > > Regarding the related issues: > > - [JDK-8306008](https://bugs.openjdk.org/browse/JDK-8306008) and [JDK-8309531](https://bugs.openjdk.org/browse/JDK-8309531) have been fixed before the backout. > - [JDK-8309373](https://bugs.openjdk.org/browse/JDK-8309373) was due to missing `ForceInline` on `AbstractVector::toBitsVectorTemplate` > - [JDK-8306592](https://bugs.openjdk.org/browse/JDK-8306592), I have not been able to find the root causes. I'm not sure if this is a blocker, now I cannot even build x86-32 tests. > > Finally, I moved some implementation of public methods and methods that call into intrinsics to the concrete class as that may help the compiler know the correct types of the variables. > > Please take a look and leave reviews. Thanks a lot. > > The description of the original PR: > > This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, `VectorShuffle` is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: > > Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. > Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. > Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. > Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. > Upon these changes, a `rearrange` can emit more efficient code: > > var species = IntVector.SPECIES_128; > var v1 = IntVector.fromArray(species, SRC1, 0); > var v2 = IntVector.fromArray(species, SRC2, 0); > v1.rearrange(v2.toShuffle()).intoArray(DST, 0); > > Before: > movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} > vmovdqu 0x10(%r10),%xmm2 > movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} > vmovdqu 0x10(%r10),%xmm1 > movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} > vmovdqu 0x10(%r10),%xmm0 > vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byte_mask > ; {ex... Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: address reviews ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21042/files - new: https://git.openjdk.org/jdk/pull/21042/files/a2f59007..85208df1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21042&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21042&range=05-06 Stats: 16 lines in 10 files changed: 1 ins; 1 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/21042.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21042/head:pull/21042 PR: https://git.openjdk.org/jdk/pull/21042 From qamai at openjdk.org Mon Dec 9 13:33:12 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 9 Dec 2024 13:33:12 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v6] In-Reply-To: References: <6YeHbHlrPoV9obcfC7g1PtFdMKfanhnsKxqwTZovts4=.f5743961-07e2-4325-b8f5-270edd342e11@github.com> Message-ID: On Mon, 9 Dec 2024 08:10:20 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> add comment, extract cast into local variable > > src/hotspot/share/opto/vectornode.hpp line 1696: > >> 1694: // shuffle vector into 1 that we can do byte rearrange such that it would provide the same result. >> 1695: // This can be done in VectorRearrangeNode during code emission but we eagerly expand out this >> 1696: // because it is often the case that an index vector is reused in many rearrange operations. > > Thanks for this explanation! > > `This can be done in VectorRearrangeNode` -> **could have** be done, because we now don't do it, right? > What do you mean by `expand out this`? Do you mean we have a separate dedicated node, so that it could possibly fold away with other nodes, such as index vector? Thanks for the reviews, I have added another comment sentence. Basically, by separating into a dedicated node, the preparation of the index vector can be GVN-ed across multiple rearrange operations as well as hoisted out of loops if the index vector is a loop invariant. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21042#discussion_r1875985223 From jbhateja at openjdk.org Mon Dec 9 13:42:40 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 9 Dec 2024 13:42:40 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v2] In-Reply-To: References: Message-ID: On Fri, 6 Dec 2024 17:22:33 GMT, Paul Sandoz wrote: >> @sviswa7 @PaulSandoz @eme64 @jatin-bhateja Thanks for taking a look, I have merged the PR with a more recent master and resolved the sematic difference with newly added intrinsics, too. > > @merykitty do you want me to initiate tier 1 to 3 tests? > @PaulSandoz @sviswa7 Thanks for your advice, I have made the PR ready for review > > @iwanowww Could you take another look at this, please? > > @jatin-bhateja Could you verify that [JDK-8309373](https://bugs.openjdk.org/browse/JDK-8309373) does not occur? Hi @merykitty , I am in process of reviewing, its a big change, allow me couple of days time. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21042#issuecomment-2527985893 From epeter at openjdk.org Mon Dec 9 13:49:43 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 9 Dec 2024 13:49:43 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v7] In-Reply-To: References: Message-ID: On Mon, 9 Dec 2024 13:33:11 GMT, Quan Anh Mai wrote: >> Hi, >> >> This is just a redo of https://github.com/openjdk/jdk/pull/13093. mostly just the revert of the backout. >> >> Regarding the related issues: >> >> - [JDK-8306008](https://bugs.openjdk.org/browse/JDK-8306008) and [JDK-8309531](https://bugs.openjdk.org/browse/JDK-8309531) have been fixed before the backout. >> - [JDK-8309373](https://bugs.openjdk.org/browse/JDK-8309373) was due to missing `ForceInline` on `AbstractVector::toBitsVectorTemplate` >> - [JDK-8306592](https://bugs.openjdk.org/browse/JDK-8306592), I have not been able to find the root causes. I'm not sure if this is a blocker, now I cannot even build x86-32 tests. >> >> Finally, I moved some implementation of public methods and methods that call into intrinsics to the concrete class as that may help the compiler know the correct types of the variables. >> >> Please take a look and leave reviews. Thanks a lot. >> >> The description of the original PR: >> >> This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, `VectorShuffle` is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: >> >> Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. >> Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. >> Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. >> Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. >> Upon these changes, a `rearrange` can emit more efficient code: >> >> var species = IntVector.SPECIES_128; >> var v1 = IntVector.fromArray(species, SRC1, 0); >> var v2 = IntVector.fromArray(species, SRC2, 0); >> v1.rearrange(v2.toShuffle()).intoArray(DST, 0); >> >> Before: >> movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} >> vmovdqu 0x10(%r10),%xmm2 >> movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} >> vmovdqu 0x10(%r10),%xmm1 >> movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} >> vmovdqu 0x10(%r10),%xmm0 >> vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byt... > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > address reviews Ok, looks better. I've thought about this a little. And I am wondering if we cannot make the use of Rearrange generally easier. What if we want to use the `VectorRearrangeNode` elsewhere? One would assume that one could just check `arch_supports_vector(Op_VectorRearrange, ...elem_bt)` And then one could emit a `VectorRearrangeNode` for the given `elem_bt`. But that is not the case, the user would have to also check for `Matcher::vector_rearrange_requires_load_shuffle(shuffle_bt, num_elem)`, and possibly add a `VectorLoadShuffleNode`. I don't like this - it makes the use quite cumbersome. I have an idea for an alternative: You add a `VectorRearrangeNode::Ideal`, which then transforms itself if we have `Matcher::vector_rearrange_requires_load_shuffle`. I have not thought this through to the end, but it just seems it would be easier to use Rearrange in the future this way. But maybe we know that we will never use Rearrange in any other way, then I'm fine with this implementation. What do you think @PaulSandoz ? src/hotspot/share/opto/vectornode.hpp line 1696: > 1694: // shuffle vector into one that we can do byte rearrange such that it would provide the same > 1695: // result. This could have been done in VectorRearrangeNode during code emission but we eagerly > 1696: // expand out this because it is often the case that an index vector is reused in many rearrange Suggestion: // expand this out because it is often the case that an index vector is reused in many rearrange ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21042#pullrequestreview-2488817071 PR Review Comment: https://git.openjdk.org/jdk/pull/21042#discussion_r1875989393 From qamai at openjdk.org Mon Dec 9 13:58:41 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 9 Dec 2024 13:58:41 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v7] In-Reply-To: References: Message-ID: <-V1kImS-457WoqS_2Qc7jqY4_iAUo-3-PXheVLc_T84=.64706c04-289b-4cf2-9cfc-b95fbd2656bb@github.com> On Mon, 9 Dec 2024 13:47:10 GMT, Emanuel Peter wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> address reviews > > Ok, looks better. > > I've thought about this a little. And I am wondering if we cannot make the use of Rearrange generally easier. > > What if we want to use the `VectorRearrangeNode` elsewhere? > One would assume that one could just check > `arch_supports_vector(Op_VectorRearrange, ...elem_bt)` > And then one could emit a `VectorRearrangeNode` for the given `elem_bt`. But that is not the case, the user would have to also check for `Matcher::vector_rearrange_requires_load_shuffle(shuffle_bt, num_elem)`, and possibly add a `VectorLoadShuffleNode`. > I don't like this - it makes the use quite cumbersome. > > I have an idea for an alternative: > You add a `VectorRearrangeNode::Ideal`, which then transforms itself if we have `Matcher::vector_rearrange_requires_load_shuffle`. > > I have not thought this through to the end, but it just seems it would be easier to use Rearrange in the future this way. But maybe we know that we will never use Rearrange in any other way, then I'm fine with this implementation. What do you think @PaulSandoz ? @eme64 Yes I have thought about that. My idea is that once phase lowering is ready we will move the expansion there (#21599) . This removes the need to have a standalone method that checks if `LoadShuffleNode` is needed. The current situation is that `VectorRearrangeNode` is expected to come with `VectorLoadShuffleNode` so you cannot easily work with `VectorRearrangeNode`, either. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21042#issuecomment-2528023981 From epeter at openjdk.org Mon Dec 9 14:01:43 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 9 Dec 2024 14:01:43 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v7] In-Reply-To: <-V1kImS-457WoqS_2Qc7jqY4_iAUo-3-PXheVLc_T84=.64706c04-289b-4cf2-9cfc-b95fbd2656bb@github.com> References: <-V1kImS-457WoqS_2Qc7jqY4_iAUo-3-PXheVLc_T84=.64706c04-289b-4cf2-9cfc-b95fbd2656bb@github.com> Message-ID: On Mon, 9 Dec 2024 13:55:46 GMT, Quan Anh Mai wrote: >> Ok, looks better. >> >> I've thought about this a little. And I am wondering if we cannot make the use of Rearrange generally easier. >> >> What if we want to use the `VectorRearrangeNode` elsewhere? >> One would assume that one could just check >> `arch_supports_vector(Op_VectorRearrange, ...elem_bt)` >> And then one could emit a `VectorRearrangeNode` for the given `elem_bt`. But that is not the case, the user would have to also check for `Matcher::vector_rearrange_requires_load_shuffle(shuffle_bt, num_elem)`, and possibly add a `VectorLoadShuffleNode`. >> I don't like this - it makes the use quite cumbersome. >> >> I have an idea for an alternative: >> You add a `VectorRearrangeNode::Ideal`, which then transforms itself if we have `Matcher::vector_rearrange_requires_load_shuffle`. >> >> I have not thought this through to the end, but it just seems it would be easier to use Rearrange in the future this way. But maybe we know that we will never use Rearrange in any other way, then I'm fine with this implementation. What do you think @PaulSandoz ? > > @eme64 Yes I have thought about that. My idea is that once phase lowering is ready we will move the expansion there (#21599) . This removes the need to have a standalone method that checks if `LoadShuffleNode` is needed. The current situation is that `VectorRearrangeNode` is expected to come with `VectorLoadShuffleNode` so you cannot easily work with `VectorRearrangeNode`, either. @merykitty Ok. Is there a chance we could wait for that additional phase to arrive then, and only do this refactor here afterward? I'd also be ok with a follow up RFE - it would just have to be filed and be clear who will take care of it ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21042#issuecomment-2528038627 From qamai at openjdk.org Mon Dec 9 14:12:19 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 9 Dec 2024 14:12:19 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v8] In-Reply-To: References: Message-ID: > Hi, > > This is just a redo of https://github.com/openjdk/jdk/pull/13093. mostly just the revert of the backout. > > Regarding the related issues: > > - [JDK-8306008](https://bugs.openjdk.org/browse/JDK-8306008) and [JDK-8309531](https://bugs.openjdk.org/browse/JDK-8309531) have been fixed before the backout. > - [JDK-8309373](https://bugs.openjdk.org/browse/JDK-8309373) was due to missing `ForceInline` on `AbstractVector::toBitsVectorTemplate` > - [JDK-8306592](https://bugs.openjdk.org/browse/JDK-8306592), I have not been able to find the root causes. I'm not sure if this is a blocker, now I cannot even build x86-32 tests. > > Finally, I moved some implementation of public methods and methods that call into intrinsics to the concrete class as that may help the compiler know the correct types of the variables. > > Please take a look and leave reviews. Thanks a lot. > > The description of the original PR: > > This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, `VectorShuffle` is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: > > Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. > Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. > Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. > Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. > Upon these changes, a `rearrange` can emit more efficient code: > > var species = IntVector.SPECIES_128; > var v1 = IntVector.fromArray(species, SRC1, 0); > var v2 = IntVector.fromArray(species, SRC2, 0); > v1.rearrange(v2.toShuffle()).intoArray(DST, 0); > > Before: > movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} > vmovdqu 0x10(%r10),%xmm2 > movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} > vmovdqu 0x10(%r10),%xmm1 > movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} > vmovdqu 0x10(%r10),%xmm0 > vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byte_mask > ; {ex... Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: adverb order ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21042/files - new: https://git.openjdk.org/jdk/pull/21042/files/85208df1..5cfac5ad Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21042&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21042&range=06-07 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21042.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21042/head:pull/21042 PR: https://git.openjdk.org/jdk/pull/21042 From qamai at openjdk.org Mon Dec 9 14:17:41 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 9 Dec 2024 14:17:41 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v7] In-Reply-To: References: <-V1kImS-457WoqS_2Qc7jqY4_iAUo-3-PXheVLc_T84=.64706c04-289b-4cf2-9cfc-b95fbd2656bb@github.com> Message-ID: On Mon, 9 Dec 2024 13:59:29 GMT, Emanuel Peter wrote: >> @eme64 Yes I have thought about that. My idea is that once phase lowering is ready we will move the expansion there (#21599) . This removes the need to have a standalone method that checks if `LoadShuffleNode` is needed. The current situation is that `VectorRearrangeNode` is expected to come with `VectorLoadShuffleNode` so you cannot easily work with `VectorRearrangeNode`, either. > > @merykitty Ok. Is there a chance we could wait for that additional phase to arrive then, and only do this refactor here afterward? I'd also be ok with a follow up RFE - it would just have to be filed and be clear who will take care of it ;) @eme64 I think this PR is orthogonal to the concern you are having. Either we need to refactor the expansion to lowering, then modify this PR to match the semantics, or we integrate this PR first, then do the lowering refactor on the resulting code. I would prefer the latter then I can take the task to move `VectorLoadShuffle` creation to lowering afterwards. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21042#issuecomment-2528080973 From qamai at openjdk.org Mon Dec 9 14:24:49 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 9 Dec 2024 14:24:49 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v3] In-Reply-To: References: Message-ID: On Wed, 30 Oct 2024 04:48:00 GMT, Jasmine Karthikeyan wrote: >> Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Merge branch 'master' into phase-lowering >> - Remove platform-dependent node definitions, rework PhaseLowering implementation >> - Address some changes from code review >> - Implement PhaseLowering > > Thanks everyone for the discussion. I've pushed a commit that restructures the pass, removing the backend-specific node definition and making the pass extend `PhaseIterGVN` so that nodes can do further idealizations during lowering without complicating the main lowering switch. I also added a shared component to lowering, to facilitate moving transforms that impact multiple backends like `DivMod` to it. Lowering is also now the final phase before final graph reshaping now, since late inlines could also use IGVN. Some more comments: > >> It looks attractive at first, but the downside is subsequent passes may start to require platform-specific code as well (e.g., think of final graph reshaping which operates on Node opcodes). > > This makes sense to me. I agree that the extra complexity required to deal with this change in other parts of the code isn't worth it. The new commit removes this part of the changeset. > >> BTW it's not clear to me now what particular benefits IGVN brings. `DivMod` transformation doesn't use IGVN and after examining `MacroLogicV` code it can be rewritten to avoid IGVN as well. > > The main benefits are being able to reuse node hashing to de-duplicate redundant nodes and being able to use the existing IGVN types that were calculated (which #21244 uses). Some examples where GVN could be useful in final graph reshaping is when reshaping shift nodes and `Op_CmpUL`, where new nodes are created to approximate existing nodes on platforms without support. While I think it is unlikely that any of the created nodes would common with existing nodes except the `ConNode`s, I think it would be nice to reduce the possibility of redundant nodes in the graph before matching. This would include `DivMod` in the cases where the backend doesn't support the `DivMod` node, as multiplication and subtraction is emitted instead. I'm working on refactoring these cases in my example patch. I think it would be nice to make lowering where these platform specific optimizations occur while final graph reshaping focuses on preparing the graph for matching. > >> I'd say that if we want the lowering pass being discussed to be truly scalable, it's better to follow the same pattern. I have some doubts that platform-specific ad-hoc IR tweaks scale will scale well. > > My main concern with the macro-expansion style is that with the proposed transforms unconditional expansion/lowering of nodes isn't always possible. For example, In final graph reshaping for `DivMod` it can be the case ... @jaskarth FYI we have another use case for `PhaseLowering`. Currently, `VectorRearrangeNode` is always created with a corresponding `VectorLoadShuffle`. You can find out more details in my added comments on `VectorLoadShuffleNode` in #21042. The idea is that we can move that expansion to lowering instead, allowing the `VectorLoadShuffleNode` to be GVN-ed and scheduled independently while also helping earlier phases to have easier time working with `VectorRearrangeNode`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21599#issuecomment-2528097803 From epeter at openjdk.org Mon Dec 9 14:41:41 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 9 Dec 2024 14:41:41 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v7] In-Reply-To: References: <-V1kImS-457WoqS_2Qc7jqY4_iAUo-3-PXheVLc_T84=.64706c04-289b-4cf2-9cfc-b95fbd2656bb@github.com> Message-ID: On Mon, 9 Dec 2024 14:15:16 GMT, Quan Anh Mai wrote: >> @merykitty Ok. Is there a chance we could wait for that additional phase to arrive then, and only do this refactor here afterward? I'd also be ok with a follow up RFE - it would just have to be filed and be clear who will take care of it ;) > > @eme64 I think this PR is orthogonal to the concern you are having. Either we need to refactor the expansion to lowering, then modify this PR to match the semantics, or we integrate this PR first, then do the lowering refactor on the resulting code. I would prefer the latter then I can take the task to move `VectorLoadShuffle` creation to lowering afterwards. @merykitty > we integrate this PR first, then do the lowering refactor on the resulting code That is fine with me - I just need to see follow-up RFE filed, and know that it will be taken care of by someone ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21042#issuecomment-2528141393 From epeter at openjdk.org Mon Dec 9 14:46:46 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 9 Dec 2024 14:46:46 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v3] In-Reply-To: References: Message-ID: On Wed, 30 Oct 2024 04:48:00 GMT, Jasmine Karthikeyan wrote: >> Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Merge branch 'master' into phase-lowering >> - Remove platform-dependent node definitions, rework PhaseLowering implementation >> - Address some changes from code review >> - Implement PhaseLowering > > Thanks everyone for the discussion. I've pushed a commit that restructures the pass, removing the backend-specific node definition and making the pass extend `PhaseIterGVN` so that nodes can do further idealizations during lowering without complicating the main lowering switch. I also added a shared component to lowering, to facilitate moving transforms that impact multiple backends like `DivMod` to it. Lowering is also now the final phase before final graph reshaping now, since late inlines could also use IGVN. Some more comments: > >> It looks attractive at first, but the downside is subsequent passes may start to require platform-specific code as well (e.g., think of final graph reshaping which operates on Node opcodes). > > This makes sense to me. I agree that the extra complexity required to deal with this change in other parts of the code isn't worth it. The new commit removes this part of the changeset. > >> BTW it's not clear to me now what particular benefits IGVN brings. `DivMod` transformation doesn't use IGVN and after examining `MacroLogicV` code it can be rewritten to avoid IGVN as well. > > The main benefits are being able to reuse node hashing to de-duplicate redundant nodes and being able to use the existing IGVN types that were calculated (which #21244 uses). Some examples where GVN could be useful in final graph reshaping is when reshaping shift nodes and `Op_CmpUL`, where new nodes are created to approximate existing nodes on platforms without support. While I think it is unlikely that any of the created nodes would common with existing nodes except the `ConNode`s, I think it would be nice to reduce the possibility of redundant nodes in the graph before matching. This would include `DivMod` in the cases where the backend doesn't support the `DivMod` node, as multiplication and subtraction is emitted instead. I'm working on refactoring these cases in my example patch. I think it would be nice to make lowering where these platform specific optimizations occur while final graph reshaping focuses on preparing the graph for matching. > >> I'd say that if we want the lowering pass being discussed to be truly scalable, it's better to follow the same pattern. I have some doubts that platform-specific ad-hoc IR tweaks scale will scale well. > > My main concern with the macro-expansion style is that with the proposed transforms unconditional expansion/lowering of nodes isn't always possible. For example, In final graph reshaping for `DivMod` it can be the case ... @jaskarth could this also take care of vector reductions eventually? Not sure if it fully applies the same everywhere, but it seems we are redoing all sorts of recursive folding in the backend. That is especially annoying for adding new primitives, such as prefix-sum or other scans - everything would have to be implemented in the backend for all platforms. @jatin-bhateja what do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21599#issuecomment-2528153097 From qamai at openjdk.org Mon Dec 9 14:49:40 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 9 Dec 2024 14:49:40 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v7] In-Reply-To: References: <-V1kImS-457WoqS_2Qc7jqY4_iAUo-3-PXheVLc_T84=.64706c04-289b-4cf2-9cfc-b95fbd2656bb@github.com> Message-ID: On Mon, 9 Dec 2024 14:39:25 GMT, Emanuel Peter wrote: >> @eme64 I think this PR is orthogonal to the concern you are having. Either we need to refactor the expansion to lowering, then modify this PR to match the semantics, or we integrate this PR first, then do the lowering refactor on the resulting code. I would prefer the latter then I can take the task to move `VectorLoadShuffle` creation to lowering afterwards. > > @merykitty >> we integrate this PR first, then do the lowering refactor on the resulting code > > That is fine with me - I just need to see follow-up RFE filed, and know that it will be taken care of by someone ;) @eme64 Created https://bugs.openjdk.org/browse/JDK-8345812 ------------- PR Comment: https://git.openjdk.org/jdk/pull/21042#issuecomment-2528176181 From paul.sandoz at oracle.com Mon Dec 9 15:55:47 2024 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Mon, 9 Dec 2024 15:55:47 +0000 Subject: RFC: Untangle native libraries and the JVM: SVML, SLEEF, and libsimdsort In-Reply-To: References: <1bf041a1-1002-4d18-85b3-15f437fa534e@oracle.com> Message-ID: <5EA2B22F-0BFD-49E6-A20B-0F91ED94FF8C@oracle.com> Some further observations. - This arguably makes it harder for the auto-vectorize to access the SVML/SLEEF functionality. However, in comes cases, we cannot guarantee the same guarantees (IIRC mainly around monotonicity) as the scalar operations in Math. - There is an open bug to adjust the simd sort behavior on AMD zen 4 cores due to poor performance of an AVX 512 instruction. The simplest solution is to fall back to AVX2. That may be simpler to manage in Java? (I was looking at the HotSpot code). > On Dec 6, 2024, at 4:48?PM, Vladimir Ivanov wrote: > > Thanks, Paul. > >> Excellent work, very happy to see more of this moved to Java leveraging Panama features. The Java code looks very organized. >> I am wondering if this technique can be applied to stubs dynamically generated by HotSpot via some sort of special library lookup e.g., for crypto. > > It's an interesting idea. A JVM could expose individual symbols, so they can be looked up, but a more promising approach is to just expose a table of generated stubs through a native call into JVM (similar to simdsort_link [1]). > > The problematic part is that stubs don't have to obey to platform ABI. Some of them deliberately rely on very restrictive calling conventions (e.g., no caller-saved registers), so calling them from generated code is much simpler and cheaper. > > In a longer term, custom calling conventions for each entry point can be coded if there's enough java.lang.foreign support present. (So, an entry point returned by the JVM comprises of an entry address accompanied by an appropriate invoker.) > > >> Do you have a sense of the differences in static memory footprint and startup cost? Things I imagine Leyden could help with. > > Are you asking about simdsort/SVML/SLEEF case here? Yes. > I didn't measure, but initialization costs will definitely be higher (compared to JVM-only solution). In absolute numbers it should be negligible though (the libraries expose small number of entry points). > > >> Regarding CPU dispatching, my preference would be to do it in Java. Less native logic. > > Fair enough. The nice thing about doing CPU dispatching on native library side is that all those cryptic naming conventions don't show up on Java side [2], but IMO it requires too much ceremony, so I kept it on Java side for now. > > >> This may also be useful to help determine whether we can/should expose capabilities in the Vector API regarding what is optimally supported or not. > > IMO Vector API (as it is implemented now) would benefit from a higher-level C2-specific API. > Ok. Paul. > >> I presume it also does not preclude some sort of jlink plugin that strips unused methods from the native libraries, something which may be tricker if done in the native library itself? > > Good point. It may be the case, but I don't have enough experience with native library stripping to comment on it. > > Best regards, > Vladimir Ivanov > > [1] https://github.com/openjdk/jdk/commit/b6e6f2e20772e86fbf9088bcef01391461c17f11 > > > [2] https://github.com/iwanowww/jdk/blob/09234832b6419e54c4fc182e77f6214b36afa4c5/src/java.base/linux/native/libsimdsort/simdsort.c > >> Paul. >>> On Dec 6, 2024, at 3:18?PM, Vladimir Ivanov wrote: >>> >>> Recently, a trend emerged to use native libraries to back intrinsics in HotSpot JVM. SVML stubs for Vector API paved the road and it was soon followed by SLEEF and simdsort libraries. >>> >>> After examining their support, I must confess that it doesn't look pretty. It introduces significant accidental complexity on JVM side. HotSpot has to be taught about every entry point in each library in an ad-hoc manner. It's inherently unsafe, error-prone to implement and hard to maintain: JVM makes a lot of assumptions about an entry point based solely on its symbolic name and each library has its own naming conventions. Overall, current approach doesn't scale well. >>> >>> Fortunately, new FFI API (java.lang.foreign) was finalized in 22. It provides enough functionality to interact with native libraries from Java in performant manner. >>> >>> I did an exercise to migrate all 3 libraries away from intrinsics and the results look promising: >>> >>> simdsort: https://github.com/openjdk/jdk/pull/22621 >>> >>> SVML/SLEEF: https://github.com/openjdk/jdk/pull/22619 >>> >>> As of now, java.lang.foreign lacks vector calling convention support, so the actual calls into SVML/SLEEF are still backed by intrinsics. But it still enables a major cleanup on JVM side. >>> >>> Also, I coded library headers and used jextract to produce initial library API sketch in Java and it worked really well. Eventually, it can be incorporated into JDK build process to ensure the consistency between native and Java parts of library API. >>> >>> Performance wise, it is on par with current (intrinsic-based) implementation. >>> >>> One open question relates to CPU dispatching. >>> >>> Each library exposes multiple functions with different requirements about CPU ISA extension support (e.g., no AVX vs AVX2 vs AVX512, NEON vs SVE). Right now, it's JVM responsibility, but once it gets out of the loop, the library itself should make the decision. I experimented with 2 approaches: (1) perform CPU dispatching with linking library from Java code (as illustrated in aforementioned PRs); or (2) call into native library to query it about the right entry point [1] [2] [3]. In both cases, it depends on additional API to sense the JVM/hardware capabilities (exposed on jdk.internal.misc.VM for now). >>> >>> Let me know if you have any questions/suggestions/concerns. Thanks! >>> >>> I plan to eventually start publishing PRs to upstream this work. >>> >>> Best regards, >>> Vladimir Ivanov >>> >>> [1] https://github.com/openjdk/jdk/commit/b6e6f2e20772e86fbf9088bcef01391461c17f11 >>> >>> [2] https://github.com/iwanowww/jdk/blob/09234832b6419e54c4fc182e77f6214b36afa4c5/src/java.base/share/classes/java/util/SIMDSortLibrary.java >>> >>> [3] https://github.com/iwanowww/jdk/blob/09234832b6419e54c4fc182e77f6214b36afa4c5/src/java.base/linux/native/libsimdsort/simdsort.c >>> > From swen at openjdk.org Mon Dec 9 16:29:44 2024 From: swen at openjdk.org (Shaojin Wen) Date: Mon, 9 Dec 2024 16:29:44 GMT Subject: RFR: 8343629: More MergeStore benchmark [v5] In-Reply-To: References: Message-ID: On Thu, 5 Dec 2024 06:59:07 GMT, Emanuel Peter wrote: >> Shaojin Wen has updated the pull request incrementally with one additional commit since the last revision: >> >> seperate MergeStoreBench and MergeLoadBench > > test/micro/org/openjdk/bench/vm/compiler/MergeStoreBench.java line 103: > >> 101: public void setIntBU(Blackhole BH) { >> 102: int off = 0; >> 103: for (int i = ints.length - 1; i >= 0; i--) { > > Why are you going reverse here, and also other places? Does that affect the performance at all? My original intention was to have reads and writes in different orders to avoid being optimized into a batch copy. > test/micro/org/openjdk/bench/vm/compiler/MergeStoreBench.java line 135: > >> 133: for (int i = ints.length - 1; i >= 0; i--) { >> 134: setIntLU(bytes4, off, ints[i]); >> 135: off += 4; > > I'm also wondering why you changed it from multiplication of `i * 4` to `offset += 4`. Did that have an impact? I guess +=4 has less overhead, so the performance test is more accurate. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21659#discussion_r1876287984 PR Review Comment: https://git.openjdk.org/jdk/pull/21659#discussion_r1876290958 From duke at openjdk.org Mon Dec 9 16:32:43 2024 From: duke at openjdk.org (Daniel Skantz) Date: Mon, 9 Dec 2024 16:32:43 GMT Subject: Integrated: 8345156: C2: Add bailouts next to a few asserts In-Reply-To: References: Message-ID: On Mon, 2 Dec 2024 13:07:57 GMT, Daniel Skantz wrote: > This patch associates product bailouts with a few existing debug asserts. The criteria are that there have been product bugs associated with failing these asserts in the past, and that there are not too many callers to the method where the compilation is now cancelled or any measurable impact on compilation time. > > Testing: T1-T4 and extra testing. Tested compilation time with -XX:+CITime on performance benchmarks and the effect was not measurable. This pull request has now been integrated. Changeset: 480b508c Author: Daniel Skantz Committer: Vladimir Kozlov URL: https://git.openjdk.org/jdk/commit/480b508cf2f6972691eea35f133cc8fb939ac30f Stats: 68 lines in 8 files changed: 56 ins; 5 del; 7 mod 8345156: C2: Add bailouts next to a few asserts Reviewed-by: kvn, epeter ------------- PR: https://git.openjdk.org/jdk/pull/22482 From roland at openjdk.org Mon Dec 9 16:45:45 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 9 Dec 2024 16:45:45 GMT Subject: RFR: 8345287: C2: live in computation is broken In-Reply-To: References: <66KJ0kayCyRztG81OvFyladp1BbDq5nW810HKEZPmlk=.51449889-9e47-408e-9751-470e1f912113@github.com> <-JM7PaO43IeAwzF30ordG0ERn-ViXxsVfQtE0M7RkgQ=.9fb476a0-b8a9-4618-8aaa-5ceebef181dd@github.com> Message-ID: On Mon, 9 Dec 2024 08:19:04 GMT, Roberto Casta?eda Lozano wrote: > Turning off `-XX:-OptoRegScheduling` seems to have a slight overall negative effect on Renaissance (would require a more thorough analysis to confirm) and no measurable effect on DaCapo and SPECjvm2008 except for a slight improvement (around 1%) in crypto.signverify for older Coffee Lake-B machines. Thanks @robcasloz for the test results. I filed https://bugs.openjdk.org/browse/JDK-8345820 @robcasloz @vnkozlov @dean-long thanks for the reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/22473#issuecomment-2528649551 PR Comment: https://git.openjdk.org/jdk/pull/22473#issuecomment-2528654538 From roland at openjdk.org Mon Dec 9 16:45:46 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 9 Dec 2024 16:45:46 GMT Subject: Integrated: 8345287: C2: live in computation is broken In-Reply-To: References: Message-ID: On Mon, 2 Dec 2024 09:03:50 GMT, Roland Westrelin wrote: > 8234003 (Improve IndexSet iteration) broke live in computation: > > > @@ -273,23 +276,25 @@ void PhaseLive::add_liveout( Block *p, IndexSet *lo, VectorSet &first_pass ) { > // Add a vector of live-in values to a given blocks live-in set. > void PhaseLive::add_livein(Block *p, IndexSet *lo) { > IndexSet *livein = &_livein[p->_pre_order-1]; > - IndexSetIterator elements(lo); > - uint r; > - while ((r = elements.next()) != 0) { > - livein->insert(r); // Then add to live-in set > + if (!livein->is_empty()) { > + IndexSetIterator elements(lo); > + uint r; > + while ((r = elements.next()) != 0) { > + livein->insert(r); // Then add to live-in set > + } > } > } > > > `livein` is initially empy and the patch above only adds element to it if: > > > if (!livein->is_empty()) { > > > which is never true. > > This doesn't affect correctness as live in sets are only used to drive > scheduling. This pull request has now been integrated. Changeset: cc628a13 Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/cc628a133e471e7edf07831ff386f0eaf57e9bff Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8345287: C2: live in computation is broken Reviewed-by: kvn, dlong, rcastanedalo ------------- PR: https://git.openjdk.org/jdk/pull/22473 From bulasevich at openjdk.org Mon Dec 9 18:18:56 2024 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Mon, 9 Dec 2024 18:18:56 GMT Subject: RFR: 8343789: Move mutable nmethod data out of CodeCache [v3] In-Reply-To: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> References: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> Message-ID: <8312jgwPU7U_JVE8QtFR8wHlsArpfs9W07O4Ux-C_PI=.e0c2a9ca-1da2-4776-94ea-240580cabde8@github.com> > This change relocates mutable data (such as relocations, oops, and metadata) from the nmethod. The change follows the recent PR #18984, which relocated immutable nmethod data from the CodeCache. > > The core idea remains the same: use the CodeCache for executable code while moving additional data to the C heap. The primary motivations are improving security and enhancing code density. > > Although performance is not the main focus, testing on AArch64 CPUs, where code density plays a significant role, has shown a 1?2% performance improvement in specific scenarios, such as the CodeCacheStress test and the Renaissance Dotty benchmark. > > The numbers. Immutable data constitutes **~30%** on the nmehtod. Mutable data constitutes **~8%** of nmethod. Example (statistics collected on the CodeCacheStress benchmark): > - nmethod_count:134000, total_compilation_time: 510460ms > - total allocation time malloc_mutable/malloc_immutable/CodeCache_alloc: 62ms/114ms/6333ms, > - total allocation size (mutable/immutable/nmentod): 64MB/192MB/488MB > > Functional testing: jtreg on arm/aarch/x86. > Performance testing: renaissance/dacapo/SPECjvm2008 benchmarks. > > Alternative solution (see comments): In the future, relocations can be moved to _immutable_data. Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision: a bit of cleanup and satisfying review suggestions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21276/files - new: https://git.openjdk.org/jdk/pull/21276/files/f1a9d9a0..27d27aa3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21276&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21276&range=01-02 Stats: 20 lines in 5 files changed: 1 ins; 7 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/21276.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21276/head:pull/21276 PR: https://git.openjdk.org/jdk/pull/21276 From psandoz at openjdk.org Mon Dec 9 18:23:41 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Mon, 9 Dec 2024 18:23:41 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v8] In-Reply-To: References: Message-ID: On Mon, 9 Dec 2024 14:12:19 GMT, Quan Anh Mai wrote: >> Hi, >> >> This is just a redo of https://github.com/openjdk/jdk/pull/13093. mostly just the revert of the backout. >> >> Regarding the related issues: >> >> - [JDK-8306008](https://bugs.openjdk.org/browse/JDK-8306008) and [JDK-8309531](https://bugs.openjdk.org/browse/JDK-8309531) have been fixed before the backout. >> - [JDK-8309373](https://bugs.openjdk.org/browse/JDK-8309373) was due to missing `ForceInline` on `AbstractVector::toBitsVectorTemplate` >> - [JDK-8306592](https://bugs.openjdk.org/browse/JDK-8306592), I have not been able to find the root causes. I'm not sure if this is a blocker, now I cannot even build x86-32 tests. >> >> Finally, I moved some implementation of public methods and methods that call into intrinsics to the concrete class as that may help the compiler know the correct types of the variables. >> >> Please take a look and leave reviews. Thanks a lot. >> >> The description of the original PR: >> >> This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, `VectorShuffle` is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: >> >> Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. >> Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. >> Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. >> Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. >> Upon these changes, a `rearrange` can emit more efficient code: >> >> var species = IntVector.SPECIES_128; >> var v1 = IntVector.fromArray(species, SRC1, 0); >> var v2 = IntVector.fromArray(species, SRC2, 0); >> v1.rearrange(v2.toShuffle()).intoArray(DST, 0); >> >> Before: >> movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} >> vmovdqu 0x10(%r10),%xmm2 >> movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} >> vmovdqu 0x10(%r10),%xmm1 >> movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} >> vmovdqu 0x10(%r10),%xmm0 >> vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byt... > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > adverb order Marked as reviewed by psandoz (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21042#pullrequestreview-2489632745 From lmesnik at openjdk.org Mon Dec 9 19:11:41 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Mon, 9 Dec 2024 19:11:41 GMT Subject: RFR: 8345700: tier{1, 2, 3}_compiler don't cover all compiler tests In-Reply-To: References: Message-ID: On Fri, 6 Dec 2024 20:10:08 GMT, Leonid Mesnik wrote: > Hi, > could you please review following fix that update > tier3_compiler test group > so :hotspot_compiler is always a sum of > tier1_compiler, tier2_compiler, tier3_compiler > This is natural splitting of tests into 3 layers. > The fix is done to execution :hotspot_compiler into 3 tiers with corresponding group names. > > New tests in tier3: > 9 tests in compiler/ccp/ > 22 test in ompiler/predicates/ > 8 tests in compiler/splitif/ > So the number is not increased significantly. > > The new group > ` tier2_ctw` > was introduced for ctw testing. > It is different from tests in hotspot_compiler, and usually executed separately, So I added it as separate sub-test of tier2. > > I moved ctw from tier3 to tier2 to correspond current tiers of Hotspot CI in our system. > If anyone thinks it should be part of tier3 - we can discuss in PR comments how do deal with it. > > @shipilev, could you please take a look and check if these changes are ok to you since you the author of tier2/tier3. Thanks, Alexey. I filed https://bugs.openjdk.org/browse/JDK-8345824 to re-balance tier1 parts. However, it is not related to current issue that fix tier2/3 testing only. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22614#issuecomment-2529135568 From dlong at openjdk.org Mon Dec 9 23:17:43 2024 From: dlong at openjdk.org (Dean Long) Date: Mon, 9 Dec 2024 23:17:43 GMT Subject: RFR: 8343789: Move mutable nmethod data out of CodeCache [v3] In-Reply-To: <8312jgwPU7U_JVE8QtFR8wHlsArpfs9W07O4Ux-C_PI=.e0c2a9ca-1da2-4776-94ea-240580cabde8@github.com> References: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> <8312jgwPU7U_JVE8QtFR8wHlsArpfs9W07O4Ux-C_PI=.e0c2a9ca-1da2-4776-94ea-240580cabde8@github.com> Message-ID: On Mon, 9 Dec 2024 18:18:56 GMT, Boris Ulasevich wrote: >> This change relocates mutable data (such as relocations, oops, and metadata) from the nmethod. The change follows the recent PR #18984, which relocated immutable nmethod data from the CodeCache. >> >> The core idea remains the same: use the CodeCache for executable code while moving additional data to the C heap. The primary motivations are improving security and enhancing code density. >> >> Although performance is not the main focus, testing on AArch64 CPUs, where code density plays a significant role, has shown a 1?2% performance improvement in specific scenarios, such as the CodeCacheStress test and the Renaissance Dotty benchmark. >> >> The numbers. Immutable data constitutes **~30%** on the nmehtod. Mutable data constitutes **~8%** of nmethod. Example (statistics collected on the CodeCacheStress benchmark): >> - nmethod_count:134000, total_compilation_time: 510460ms >> - total allocation time malloc_mutable/malloc_immutable/CodeCache_alloc: 62ms/114ms/6333ms, >> - total allocation size (mutable/immutable/nmentod): 64MB/192MB/488MB >> >> Functional testing: jtreg on arm/aarch/x86. >> Performance testing: renaissance/dacapo/SPECjvm2008 benchmarks. >> >> Alternative solution (see comments): In the future, relocations can be moved to _immutable_data. > > Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision: > > a bit of cleanup and satisfying review suggestions src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 1419: > 1417: > 1418: void ldr_patchable(Register dest, const Address &const_addr) { > 1419: if (CodeCache::contains(const_addr.target())) { Doesn't this cause us to generate LDR for oops outside the code cache, when what we need is the ADRP below? The caller is still using a PC-relative dummy address. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21276#discussion_r1876893689 From qxing at openjdk.org Tue Dec 10 01:21:40 2024 From: qxing at openjdk.org (Qizheng Xing) Date: Tue, 10 Dec 2024 01:21:40 GMT Subject: RFR: 8345040: Clean up unused variables and code in `generate_native_wrapper` In-Reply-To: References: Message-ID: On Tue, 26 Nov 2024 08:16:56 GMT, Qizheng Xing wrote: > Some of variables and code are related to critical JNI natives feature, which was removed in JDK 18. This patch cleans them up. It seems that all cross builds have been passed, including x86_32, ppc and s390, and this PR contains only cleanup but no other changes, so I think it's ok to integrate. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22384#issuecomment-2529964978 From duke at openjdk.org Tue Dec 10 01:21:41 2024 From: duke at openjdk.org (duke) Date: Tue, 10 Dec 2024 01:21:41 GMT Subject: RFR: 8345040: Clean up unused variables and code in `generate_native_wrapper` In-Reply-To: References: Message-ID: On Tue, 26 Nov 2024 08:16:56 GMT, Qizheng Xing wrote: > Some of variables and code are related to critical JNI natives feature, which was removed in JDK 18. This patch cleans them up. @MaxXSoft Your change (at version 9a516b1b665dd21d93b0abd45ac30fa8d1d0e11e) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22384#issuecomment-2529966980 From epeter at openjdk.org Tue Dec 10 07:41:48 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 10 Dec 2024 07:41:48 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v8] In-Reply-To: References: Message-ID: <1o7SVdc2O38FfuDSOK7eyuQaAtdQIwwa3K5LLWVQI6I=.ee0eb00c-e6f0-4b78-bedf-eafb52269968@github.com> On Mon, 9 Dec 2024 14:12:19 GMT, Quan Anh Mai wrote: >> Hi, >> >> This is just a redo of https://github.com/openjdk/jdk/pull/13093. mostly just the revert of the backout. >> >> Regarding the related issues: >> >> - [JDK-8306008](https://bugs.openjdk.org/browse/JDK-8306008) and [JDK-8309531](https://bugs.openjdk.org/browse/JDK-8309531) have been fixed before the backout. >> - [JDK-8309373](https://bugs.openjdk.org/browse/JDK-8309373) was due to missing `ForceInline` on `AbstractVector::toBitsVectorTemplate` >> - [JDK-8306592](https://bugs.openjdk.org/browse/JDK-8306592), I have not been able to find the root causes. I'm not sure if this is a blocker, now I cannot even build x86-32 tests. >> >> Finally, I moved some implementation of public methods and methods that call into intrinsics to the concrete class as that may help the compiler know the correct types of the variables. >> >> Please take a look and leave reviews. Thanks a lot. >> >> The description of the original PR: >> >> This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, `VectorShuffle` is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: >> >> Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. >> Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. >> Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. >> Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. >> Upon these changes, a `rearrange` can emit more efficient code: >> >> var species = IntVector.SPECIES_128; >> var v1 = IntVector.fromArray(species, SRC1, 0); >> var v2 = IntVector.fromArray(species, SRC2, 0); >> v1.rearrange(v2.toShuffle()).intoArray(DST, 0); >> >> Before: >> movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} >> vmovdqu 0x10(%r10),%xmm2 >> movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} >> vmovdqu 0x10(%r10),%xmm1 >> movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} >> vmovdqu 0x10(%r10),%xmm0 >> vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byt... > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > adverb order Nice Work @merykitty ! The hpp/cpp VM changes look good to me. I guess you now only have 1 review for Java changes, and 1 review for C++ changes, not sure if that means you need someone more to review, maybe just a sanity check. I linked the other two issues on JBS for tracking purposes. I have heard that you might want to backport this to JDK24. For that it would be very important that this patch has gone through thorough testing, including our internal stress-testing. ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21042#pullrequestreview-2491298320 From epeter at openjdk.org Tue Dec 10 07:47:40 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 10 Dec 2024 07:47:40 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v8] In-Reply-To: References: Message-ID: On Mon, 9 Dec 2024 14:12:19 GMT, Quan Anh Mai wrote: >> Hi, >> >> This is just a redo of https://github.com/openjdk/jdk/pull/13093. mostly just the revert of the backout. >> >> Regarding the related issues: >> >> - [JDK-8306008](https://bugs.openjdk.org/browse/JDK-8306008) and [JDK-8309531](https://bugs.openjdk.org/browse/JDK-8309531) have been fixed before the backout. >> - [JDK-8309373](https://bugs.openjdk.org/browse/JDK-8309373) was due to missing `ForceInline` on `AbstractVector::toBitsVectorTemplate` >> - [JDK-8306592](https://bugs.openjdk.org/browse/JDK-8306592), I have not been able to find the root causes. I'm not sure if this is a blocker, now I cannot even build x86-32 tests. >> >> Finally, I moved some implementation of public methods and methods that call into intrinsics to the concrete class as that may help the compiler know the correct types of the variables. >> >> Please take a look and leave reviews. Thanks a lot. >> >> The description of the original PR: >> >> This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, `VectorShuffle` is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: >> >> Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. >> Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. >> Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. >> Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. >> Upon these changes, a `rearrange` can emit more efficient code: >> >> var species = IntVector.SPECIES_128; >> var v1 = IntVector.fromArray(species, SRC1, 0); >> var v2 = IntVector.fromArray(species, SRC2, 0); >> v1.rearrange(v2.toShuffle()).intoArray(DST, 0); >> >> Before: >> movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} >> vmovdqu 0x10(%r10),%xmm2 >> movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} >> vmovdqu 0x10(%r10),%xmm1 >> movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} >> vmovdqu 0x10(%r10),%xmm0 >> vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byt... > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > adverb order FYI: I launched tier1-4 + stress-testing. Please check that it completes and passes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21042#issuecomment-2530695581 From jbhateja at openjdk.org Tue Dec 10 07:50:46 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 10 Dec 2024 07:50:46 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v8] In-Reply-To: References: Message-ID: On Mon, 9 Dec 2024 14:12:19 GMT, Quan Anh Mai wrote: >> Hi, >> >> This is just a redo of https://github.com/openjdk/jdk/pull/13093. mostly just the revert of the backout. >> >> Regarding the related issues: >> >> - [JDK-8306008](https://bugs.openjdk.org/browse/JDK-8306008) and [JDK-8309531](https://bugs.openjdk.org/browse/JDK-8309531) have been fixed before the backout. >> - [JDK-8309373](https://bugs.openjdk.org/browse/JDK-8309373) was due to missing `ForceInline` on `AbstractVector::toBitsVectorTemplate` >> - [JDK-8306592](https://bugs.openjdk.org/browse/JDK-8306592), I have not been able to find the root causes. I'm not sure if this is a blocker, now I cannot even build x86-32 tests. >> >> Finally, I moved some implementation of public methods and methods that call into intrinsics to the concrete class as that may help the compiler know the correct types of the variables. >> >> Please take a look and leave reviews. Thanks a lot. >> >> The description of the original PR: >> >> This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, `VectorShuffle` is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: >> >> Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. >> Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. >> Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. >> Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. >> Upon these changes, a `rearrange` can emit more efficient code: >> >> var species = IntVector.SPECIES_128; >> var v1 = IntVector.fromArray(species, SRC1, 0); >> var v2 = IntVector.fromArray(species, SRC2, 0); >> v1.rearrange(v2.toShuffle()).intoArray(DST, 0); >> >> Before: >> movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} >> vmovdqu 0x10(%r10),%xmm2 >> movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} >> vmovdqu 0x10(%r10),%xmm1 >> movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} >> vmovdqu 0x10(%r10),%xmm0 >> vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byt... > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > adverb order Hi @merykitty , Nice work!, over all looks good to me with some mincro comments. Kindly address. src/hotspot/share/opto/library_call.hpp line 358: > 356: bool inline_vector_shuffle_to_vector(); > 357: bool inline_vector_wrap_shuffle_indexes(); > 358: bool inline_vector_shuffle_iota(); FTR, x86 ISA does not support a direct byte multiplier instruction, so we first unpack to a short vector, multiply at a short granularity, and then pack it back to byte vector. This was somewhat costly since now shuffle backing storage matches the lane size of the corresponding vector. Hence, computation with a non-unit scalar should improve. src/hotspot/share/opto/vectornode.hpp line 1691: > 1689: }; > 1690: > 1691: // The machine may not directly support the rearrange operation of an element type. In those cases, `` Suggestion: // The target may not directly support the rearrange operation for an element type. In those cases, src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Byte128Vector.java line 822: > 820: static final Class ETYPE = byte.class; // used by the JVM > 821: > 822: Byte128Shuffle(byte[] indices) { We still cannot accommodate all the indexes for the 2048 bit scalable vector for ARM SVE. Max index accommodable is 127 since byte is a signed type with value range b/w [-128 , 127]. src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Byte512Vector.java line 1046: > 1044: String msg = ("index "+si+"out of range ["+length+"] in "+ > 1045: java.util.Arrays.toString(indices)); > 1046: throw new AssertionError(msg); Why not directly throw IndexOutOfBoundsException here? src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Double128Vector.java line 859: > 857: .reinterpretAsInts() > 858: .intoArray(a, offset); > 859: default -> { These cases for length() = 4, 8, and 16 looks redundant for 128-bit DeoubleVector. ------------- PR Review: https://git.openjdk.org/jdk/pull/21042#pullrequestreview-2491206953 PR Review Comment: https://git.openjdk.org/jdk/pull/21042#discussion_r1877513236 PR Review Comment: https://git.openjdk.org/jdk/pull/21042#discussion_r1877491672 PR Review Comment: https://git.openjdk.org/jdk/pull/21042#discussion_r1877459622 PR Review Comment: https://git.openjdk.org/jdk/pull/21042#discussion_r1877452008 PR Review Comment: https://git.openjdk.org/jdk/pull/21042#discussion_r1877466991 From jbhateja at openjdk.org Tue Dec 10 07:57:43 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 10 Dec 2024 07:57:43 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v8] In-Reply-To: References: Message-ID: On Tue, 10 Dec 2024 07:09:33 GMT, Jatin Bhateja wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> adverb order > > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Byte128Vector.java line 822: > >> 820: static final Class ETYPE = byte.class; // used by the JVM >> 821: >> 822: Byte128Shuffle(byte[] indices) { > > We still cannot accommodate all the indexes for the 2048 bit scalable vector for ARM SVE. Max index accommodable is 127 since byte is a signed type with value range b/w [-128 , 127]. This is a limitation and not a blocker for this re-factor. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21042#discussion_r1877524772 From qamai at openjdk.org Tue Dec 10 08:16:19 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 10 Dec 2024 08:16:19 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v9] In-Reply-To: References: Message-ID: > Hi, > > This is just a redo of https://github.com/openjdk/jdk/pull/13093. mostly just the revert of the backout. > > Regarding the related issues: > > - [JDK-8306008](https://bugs.openjdk.org/browse/JDK-8306008) and [JDK-8309531](https://bugs.openjdk.org/browse/JDK-8309531) have been fixed before the backout. > - [JDK-8309373](https://bugs.openjdk.org/browse/JDK-8309373) was due to missing `ForceInline` on `AbstractVector::toBitsVectorTemplate` > - [JDK-8306592](https://bugs.openjdk.org/browse/JDK-8306592), I have not been able to find the root causes. I'm not sure if this is a blocker, now I cannot even build x86-32 tests. > > Finally, I moved some implementation of public methods and methods that call into intrinsics to the concrete class as that may help the compiler know the correct types of the variables. > > Please take a look and leave reviews. Thanks a lot. > > The description of the original PR: > > This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, `VectorShuffle` is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: > > Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. > Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. > Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. > Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. > Upon these changes, a `rearrange` can emit more efficient code: > > var species = IntVector.SPECIES_128; > var v1 = IntVector.fromArray(species, SRC1, 0); > var v2 = IntVector.fromArray(species, SRC2, 0); > v1.rearrange(v2.toShuffle()).intoArray(DST, 0); > > Before: > movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} > vmovdqu 0x10(%r10),%xmm2 > movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} > vmovdqu 0x10(%r10),%xmm1 > movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} > vmovdqu 0x10(%r10),%xmm0 > vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byte_mask > ; {ex... Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: Change wording on VectorLoadShuffleNode Co-authored-by: Jatin Bhateja ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21042/files - new: https://git.openjdk.org/jdk/pull/21042/files/5cfac5ad..146c8cea Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21042&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21042&range=07-08 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21042.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21042/head:pull/21042 PR: https://git.openjdk.org/jdk/pull/21042 From qamai at openjdk.org Tue Dec 10 08:16:20 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 10 Dec 2024 08:16:20 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v8] In-Reply-To: References: Message-ID: On Tue, 10 Dec 2024 07:54:55 GMT, Jatin Bhateja wrote: >> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Byte128Vector.java line 822: >> >>> 820: static final Class ETYPE = byte.class; // used by the JVM >>> 821: >>> 822: Byte128Shuffle(byte[] indices) { >> >> We still cannot accommodate all the indexes for the 2048 bit scalable vector for ARM SVE. Max index accommodable is 127 since byte is a signed type with value range b/w [-128 , 127]. > > This is a limitation and not a blocker for this re-factor. A byte is just a bunch of bits, the signness of the value depends on how it is used. As a result, I believe there is nothing preventing us from treating this index as unsigned for 2048-bit SVE (with some modifications such as the Java implementation being `int ei = Integer.remainderUnsigned(Byte.toUnsignedInt(s_.laneSource(i)), v1.length())`). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21042#discussion_r1877545203 From qamai at openjdk.org Tue Dec 10 08:26:50 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 10 Dec 2024 08:26:50 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v2] In-Reply-To: References: Message-ID: <6whsrtGFBfphNcz96T31a4qxDtGCUjLPV_eQDu1rPY8=.8b25ab50-580e-4420-826f-a8c4dbd84fb6@github.com> On Mon, 9 Dec 2024 13:40:16 GMT, Jatin Bhateja wrote: >> @merykitty do you want me to initiate tier 1 to 3 tests? > >> @PaulSandoz @sviswa7 Thanks for your advice, I have made the PR ready for review >> >> @iwanowww Could you take another look at this, please? >> >> @jatin-bhateja Could you verify that [JDK-8309373](https://bugs.openjdk.org/browse/JDK-8309373) does not occur? > > Hi @merykitty , I am in process of reviewing, its a big change, allow me couple of days time. @jatin-bhateja Thanks a lot for your review and suggestions, I hope I have addressed all of them. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21042#issuecomment-2530777876 From qamai at openjdk.org Tue Dec 10 08:26:51 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 10 Dec 2024 08:26:51 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v9] In-Reply-To: References: Message-ID: On Tue, 10 Dec 2024 07:44:57 GMT, Jatin Bhateja wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> Change wording on VectorLoadShuffleNode >> >> Co-authored-by: Jatin Bhateja > > src/hotspot/share/opto/library_call.hpp line 358: > >> 356: bool inline_vector_shuffle_to_vector(); >> 357: bool inline_vector_wrap_shuffle_indexes(); >> 358: bool inline_vector_shuffle_iota(); > > FTR, x86 ISA does not support a direct byte multiplier instruction, so we first unpack to a short vector, multiply at a short granularity, and then pack it back to byte vector. This was somewhat costly since now shuffle backing storage matches the lane size of the corresponding vector. Hence, the perofmance of iota computation with a non-unit scalar should improve. I believe with the type information of vector elements this optimization should be trivial. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21042#discussion_r1877566967 From qamai at openjdk.org Tue Dec 10 08:26:53 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 10 Dec 2024 08:26:53 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v8] In-Reply-To: References: Message-ID: <6yhvabADaFjfg6DLQXTXSyOapN45WsW5vbSgdu7ZpXM=.e0ef06b5-5ba3-4f7e-b364-fb95909a9acd@github.com> On Tue, 10 Dec 2024 07:01:27 GMT, Jatin Bhateja wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> adverb order > > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Byte512Vector.java line 1046: > >> 1044: String msg = ("index "+si+"out of range ["+length+"] in "+ >> 1045: java.util.Arrays.toString(indices)); >> 1046: throw new AssertionError(msg); > > Why not directly throw IndexOutOfBoundsException here? This is called in an `assert` so I think throwing `AssertionError` seems more reasonable, the original implementation also throws `AssertionError` > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Double128Vector.java line 859: > >> 857: .reinterpretAsInts() >> 858: .intoArray(a, offset); >> 859: default -> { > > These cases for length() = 4, 8, and 16 looks redundant for 128-bit DeoubleVector. You are right, but this switch is needed for `DoubleMaxVector`, and using it for the other species simplifies the template. The compiler will eliminate all wrong cases so there should be no runtime concern. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21042#discussion_r1877563187 PR Review Comment: https://git.openjdk.org/jdk/pull/21042#discussion_r1877559527 From duke at openjdk.org Tue Dec 10 08:29:04 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Tue, 10 Dec 2024 08:29:04 GMT Subject: RFR: 8319850: PrintInlining should print which methods are late inlines [v25] In-Reply-To: References: Message-ID: > In https://github.com/openjdk/jdk/pull/16595 @caojoshua previously suggested changes to indicate which calls were inlined late, when printing inlines. This PR re-introduces the changes from the previously closed PR and fixes a minor issue where asserts were triggered. > > Concerns were raised by @rwestrel in the previous PR: > >> When InlineTree::ok_to_inline() is called, some diagnostic message is recorded for the call site. Do I understand right that with this patch, if the call is inlined late, then that message is dropped and replaced by a new "late inline.." message? If that's the case, isn't it the case that sometimes the InlineTree::ok_to_inline() has some useful information that's lost when late inlining happens? > > As already pointed out in the PR by @caojoshua, this does not matter for string/methodhandle/vector/boxing late inlines, as they are [only performed if ok_to_inline() returns true](https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/doCall.cpp#L189). This is also the only call to ok_to_inline(). > > The only other location, where late inline call generators are created, are calls to CallGenerator::for_late_inline_virtual(), which creates a LateInlineVirtualCallGenerator. LateInlineVirtualCallGenerator (introduced in https://github.com/openjdk/jdk/pull/1550) does not really perform inlining but rather performs strength reduction from virtual to static calls. As can be verified by running the according test `test/hotspot/jtreg/compiler/c2/irTests/TestPostParseCallDevirtualization.java`, this does not affect the printing for inlining: > > > 5022 1026 3 compiler.c2.irTests.TestPostParseCallDevirtualization::callHelper (7 bytes) made not entrant > @ 1 compiler.c2.irTests.TestPostParseCallDevirtualization$I::method (0 bytes) failed to inline: virtual call > > > Thus, as far as I can tell, the proposed changes by @caojoshua do not lose any useful information about why no inlining prior to the late inlining occurred. theoweidmannoracle has updated the pull request incrementally with two additional commits since the last revision: - Revert "Add UDivI/L and UModI/L to no_dependent_zero_check" This reverts commit b72ff10ba1581372b72edeadd3cf01a97ccf1c73. - Revert "Update TestSplitDivisionThroughPhi.java" This reverts commit 7526bafff4ea26cb45894477b33f3dd24215e667. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21899/files - new: https://git.openjdk.org/jdk/pull/21899/files/7526baff..f37ae7b6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21899&range=24 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21899&range=23-24 Stats: 128 lines in 2 files changed: 0 ins; 126 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21899.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21899/head:pull/21899 PR: https://git.openjdk.org/jdk/pull/21899 From duke at openjdk.org Tue Dec 10 08:33:31 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Tue, 10 Dec 2024 08:33:31 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v18] In-Reply-To: References: Message-ID: <2DLLOfta_26RO1q74UcqiD0DIsR-COc55l1i3LyLWng=.d99293db-6076-4f23-8bd4-d081b12ca63a@github.com> > This PR introduces > - several new optimizations to unsigned division and modulo > - x % 1, x % x, x % 2^k > - x / 1, x / x, x / 2^k > - does not implement the Granlund and Montgomery algorithm, which has been implemented for signed modulo division in the past. It is unclear if a lot is to be gained by implementing this. > - tests to test existing optimizations for signed division and modulo > - does not test the Granlund and Montgomery algorithm directly theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: Update TestSplitDivisionThroughPhi.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22061/files - new: https://git.openjdk.org/jdk/pull/22061/files/621cf4d1..ce5a3521 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22061&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22061&range=16-17 Stats: 122 lines in 1 file changed: 122 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22061.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22061/head:pull/22061 PR: https://git.openjdk.org/jdk/pull/22061 From jbhateja at openjdk.org Tue Dec 10 08:38:44 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 10 Dec 2024 08:38:44 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v9] In-Reply-To: References: Message-ID: On Tue, 10 Dec 2024 08:16:19 GMT, Quan Anh Mai wrote: >> Hi, >> >> This is just a redo of https://github.com/openjdk/jdk/pull/13093. mostly just the revert of the backout. >> >> Regarding the related issues: >> >> - [JDK-8306008](https://bugs.openjdk.org/browse/JDK-8306008) and [JDK-8309531](https://bugs.openjdk.org/browse/JDK-8309531) have been fixed before the backout. >> - [JDK-8309373](https://bugs.openjdk.org/browse/JDK-8309373) was due to missing `ForceInline` on `AbstractVector::toBitsVectorTemplate` >> - [JDK-8306592](https://bugs.openjdk.org/browse/JDK-8306592), I have not been able to find the root causes. I'm not sure if this is a blocker, now I cannot even build x86-32 tests. >> >> Finally, I moved some implementation of public methods and methods that call into intrinsics to the concrete class as that may help the compiler know the correct types of the variables. >> >> Please take a look and leave reviews. Thanks a lot. >> >> The description of the original PR: >> >> This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, `VectorShuffle` is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: >> >> Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. >> Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. >> Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. >> Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. >> Upon these changes, a `rearrange` can emit more efficient code: >> >> var species = IntVector.SPECIES_128; >> var v1 = IntVector.fromArray(species, SRC1, 0); >> var v2 = IntVector.fromArray(species, SRC2, 0); >> v1.rearrange(v2.toShuffle()).intoArray(DST, 0); >> >> Before: >> movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} >> vmovdqu 0x10(%r10),%xmm2 >> movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} >> vmovdqu 0x10(%r10),%xmm1 >> movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} >> vmovdqu 0x10(%r10),%xmm0 >> vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byt... > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > Change wording on VectorLoadShuffleNode > > Co-authored-by: Jatin Bhateja I am observing some performance drops in slice / unslice benchmarks, I have just completed this run and not got a chance to root cause. ![image](https://github.com/user-attachments/assets/ecb686b6-c3b0-47e9-9325-40341c19ff9d) Can you kindly verify once at your end. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21042#issuecomment-2530801116 From jbhateja at openjdk.org Tue Dec 10 08:44:42 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 10 Dec 2024 08:44:42 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v8] In-Reply-To: <6yhvabADaFjfg6DLQXTXSyOapN45WsW5vbSgdu7ZpXM=.e0ef06b5-5ba3-4f7e-b364-fb95909a9acd@github.com> References: <6yhvabADaFjfg6DLQXTXSyOapN45WsW5vbSgdu7ZpXM=.e0ef06b5-5ba3-4f7e-b364-fb95909a9acd@github.com> Message-ID: On Tue, 10 Dec 2024 08:17:22 GMT, Quan Anh Mai wrote: >> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Double128Vector.java line 859: >> >>> 857: .reinterpretAsInts() >>> 858: .intoArray(a, offset); >>> 859: default -> { >> >> These cases for length() = 4, 8, and 16 looks redundant for 128-bit DeoubleVector. > > You are right, but this switch is needed for `DoubleMaxVector`, and using it for the other species simplifies the template. The compiler will eliminate all wrong cases so there should be no runtime concern. I see, so you want to avoid special handling in template files at the expense of redundant Java code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21042#discussion_r1877599127 From duke at openjdk.org Tue Dec 10 08:54:16 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Tue, 10 Dec 2024 08:54:16 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v19] In-Reply-To: References: Message-ID: > This PR introduces > - several new optimizations to unsigned division and modulo > - x % 1, x % x, x % 2^k > - x / 1, x / x, x / 2^k > - does not implement the Granlund and Montgomery algorithm, which has been implemented for signed modulo division in the past. It is unclear if a lot is to be gained by implementing this. > - tests to test existing optimizations for signed division and modulo > - does not test the Granlund and Montgomery algorithm directly theoweidmannoracle has updated the pull request incrementally with two additional commits since the last revision: - Fix space - Add missing UDiv/UMod cases to no_dependent_zero_check and cannot_split_division ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22061/files - new: https://git.openjdk.org/jdk/pull/22061/files/ce5a3521..94425ea6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22061&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22061&range=17-18 Stats: 10 lines in 2 files changed: 8 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/22061.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22061/head:pull/22061 PR: https://git.openjdk.org/jdk/pull/22061 From jbhateja at openjdk.org Tue Dec 10 09:07:46 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 10 Dec 2024 09:07:46 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v8] In-Reply-To: <6yhvabADaFjfg6DLQXTXSyOapN45WsW5vbSgdu7ZpXM=.e0ef06b5-5ba3-4f7e-b364-fb95909a9acd@github.com> References: <6yhvabADaFjfg6DLQXTXSyOapN45WsW5vbSgdu7ZpXM=.e0ef06b5-5ba3-4f7e-b364-fb95909a9acd@github.com> Message-ID: On Tue, 10 Dec 2024 08:18:36 GMT, Quan Anh Mai wrote: >> src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Byte512Vector.java line 1046: >> >>> 1044: String msg = ("index "+si+"out of range ["+length+"] in "+ >>> 1045: java.util.Arrays.toString(indices)); >>> 1046: throw new AssertionError(msg); >> >> Why not directly throw IndexOutOfBoundsException here? > > This is called in an `assert` so I think throwing `AssertionError` seems more reasonable, the original implementation also throws `AssertionError` I see, you want to throw an error here and not just use an assert statement that reports an assertion failure with -ea flag. This is a newly added routine, why don't you simply return a false and let the assertion invoking this routine do the rest? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21042#discussion_r1877642371 From shade at openjdk.org Tue Dec 10 09:16:40 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 10 Dec 2024 09:16:40 GMT Subject: RFR: 8345700: tier{1, 2, 3}_compiler don't cover all compiler tests In-Reply-To: References: Message-ID: On Fri, 6 Dec 2024 20:10:08 GMT, Leonid Mesnik wrote: > Hi, > could you please review following fix that update > tier3_compiler test group > so :hotspot_compiler is always a sum of > tier1_compiler, tier2_compiler, tier3_compiler > This is natural splitting of tests into 3 layers. > The fix is done to execution :hotspot_compiler into 3 tiers with corresponding group names. > > New tests in tier3: > 9 tests in compiler/ccp/ > 22 test in ompiler/predicates/ > 8 tests in compiler/splitif/ > So the number is not increased significantly. > > The new group > ` tier2_ctw` > was introduced for ctw testing. > It is different from tests in hotspot_compiler, and usually executed separately, So I added it as separate sub-test of tier2. > > I moved ctw from tier3 to tier2 to correspond current tiers of Hotspot CI in our system. > If anyone thinks it should be part of tier3 - we can discuss in PR comments how do deal with it. > > @shipilev, could you please take a look and check if these changes are ok to you since you the author of tier2/tier3. OK, thanks. I'll approve once I see clean GHA run. I think you have already fixed the Xcomp failure, so just pull from master and see what happens? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22614#issuecomment-2530919318 From epeter at openjdk.org Tue Dec 10 09:18:48 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 10 Dec 2024 09:18:48 GMT Subject: RFR: 8343685: C2 SuperWord: refactor VPointer with MemPointer In-Reply-To: References: Message-ID: On Wed, 6 Nov 2024 13:07:13 GMT, Emanuel Peter wrote: > **This is a required step towards adding runtime-checks for Aliasing Analysis, especially Important for FFM / MemorySegments.** > > I know this one is large, but it consists of a lot of renamings, and new tests. On the whole, the new `VPointer` code is less than the old! > > **Goal** > > Replace old `VPointer` with a new version that relies on `MemPointer` - which then is a shared utility for both `MergeStores` and `SuperWord / AutoVectorization`. `MemPointer` generally parses pointers, and `VPointer` specializes this facility for the use in loops (`VLoop`). > > The old `VPointer` implementation with its recursive pattern matching was quite complicated and difficult to reason about for correctness. The approach in `MemPointer` is much simpler: iteratively decomposing sub-expressions. Further: the new implementation is more powerful at detecting equivalent invariants. > > **Future**: with the `MemPointer` implementation of `VPointer`, it should be easier to implement speculative runtime-checks for Aliasing-Analysis [JDK-8324751](https://bugs.openjdk.org/browse/JDK-8324751). The pressing need for this has come from the FFM / MemorySegment folks, like @mcimadamore and @minborg . > > **Details** > > This looks like a rather big patch, so let me explain the parts. > - Refactor of `MemPointer` in `mepointer.hpp/cpp`: > - Added concept of `Base` to `MemPointer`. This is required for the aliasing computation in `VPointer`. > - `sub_expression_has_native_base_candidate`: add special case to parse through `CastX2P` if we find a native memory base `MemorySegment.address()`, i.e. `jdk.internal.foreign.NativeMemorySegmentImpl.min`. This helps some native memory segment cases to vectorize that did not before. > - So far `MemPointer` could only answer adjacency queries. But VPointer also needs overlap queries, see the old `VPointer::not_equal` (i.e. can we prove that the two `VPointer` never overlap?). So I had to add a new case to aliasing computation: `NotOrAtDistance`. It is useful to answer the new and better named `MemPointer::never_overlaps_with`. > - Collapsed together `MemPointerDecomposedForm` and `MemPointer`. It was an unnecessary and unhelpful split. > - Re-write of `VPointer` based on `MemPointer`: > - Old pattern: > - `VPointer[mem: 847 StoreI, base: 37, adr: 37, base[ 37] + offset( 16) + invar( 0) + scale( 4) * iv]` > - `VPointer[mem: 3189 LoadB, base: 1, adr: 2273, base[ 1] + offset( 0) + invar( 0) + scale( 1) * iv]` -> `adr = CastX2P`, the a... @chhagedorn @vnkozlov @rwestrel Would any of you be willing to review this? It is required for me to move on to Aliasing Analysis, cost-model, etc. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21926#issuecomment-2530928210 From chagedorn at openjdk.org Tue Dec 10 09:23:45 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 10 Dec 2024 09:23:45 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v19] In-Reply-To: References: Message-ID: <60ahCXvKc_mW_xhFtHQgCW_FYBn6ZxMC0j6eE5t9ooU=.7f6f4cfb-d187-496a-8f51-ea4e717b25a7@github.com> On Tue, 10 Dec 2024 08:54:16 GMT, theoweidmannoracle wrote: >> This PR introduces >> - several new optimizations to unsigned division and modulo >> - x % 1, x % x, x % 2^k >> - x / 1, x / x, x / 2^k >> - does not implement the Granlund and Montgomery algorithm, which has been implemented for signed modulo division in the past. It is unclear if a lot is to be gained by implementing this. >> - tests to test existing optimizations for signed division and modulo >> - does not test the Granlund and Montgomery algorithm directly > > theoweidmannoracle has updated the pull request incrementally with two additional commits since the last revision: > > - Fix space > - Add missing UDiv/UMod cases to no_dependent_zero_check and cannot_split_division Thanks for fixing the cases! Some minor variable renaming suggestions for the existing code that you moved. Otherwise, looks good to me, too. src/hotspot/share/opto/divnode.cpp line 465: > 463: return nullptr; > 464: } > 465: const TypeClass* tl = t->cast(); I guess `tl` was for type long before which is no longer true. How about renaming this to `type_divisor` to make it more explicit? Then you can also rename `l` -> `divisor` further down. src/hotspot/share/opto/divnode.cpp line 1139: > 1137: return nullptr; > 1138: } > 1139: const TypeClass* ti = t->cast(); Same here, you could also name this `type_divisor` and `con` -> `divisor`. src/hotspot/share/opto/divnode.cpp line 1196: > 1194: } > 1195: > 1196: const TypeClass* i1 = t1->cast(); Could be applied here as well: `i1` -> `type_dividend` `i2` -> `type_divisor` test/hotspot/jtreg/compiler/splitif/TestSplitDivisionThroughPhi.java line 78: > 76: public static void main(String[] strArr) { > 77: Integer.divideUnsigned(2, 3); > 78: Long.divideUnsigned(2, 3); Maybe add a comment here: Suggestion: // Make sure classes are loaded when compiling with -Xcomp Integer.divideUnsigned(2, 3); Long.divideUnsigned(2, 3); ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22061#pullrequestreview-2491523629 PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1877646189 PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1877658415 PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1877674222 PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1877667349 From duke at openjdk.org Tue Dec 10 13:06:02 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Tue, 10 Dec 2024 13:06:02 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v20] In-Reply-To: References: Message-ID: > This PR introduces > - several new optimizations to unsigned division and modulo > - x % 1, x % x, x % 2^k > - x / 1, x / x, x / 2^k > - does not implement the Granlund and Montgomery algorithm, which has been implemented for signed modulo division in the past. It is unclear if a lot is to be gained by implementing this. > - tests to test existing optimizations for signed division and modulo > - does not test the Granlund and Montgomery algorithm directly theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: Update test/hotspot/jtreg/compiler/splitif/TestSplitDivisionThroughPhi.java Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22061/files - new: https://git.openjdk.org/jdk/pull/22061/files/94425ea6..75659ae3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22061&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22061&range=18-19 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22061.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22061/head:pull/22061 PR: https://git.openjdk.org/jdk/pull/22061 From roland at openjdk.org Tue Dec 10 14:45:48 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 10 Dec 2024 14:45:48 GMT Subject: RFR: 8343607: C2: Shenandoah crashes during barrier expansion in Continuation::enter Message-ID: If a load barrier is used both in the fallthrough and exception handling paths out of a call, it needs to be cloned so each path has its copy of the barrier. In the case of the crash, cloning the barrier is attempted for a runtime call that doesn't have an exception handling path. Fix simply detects that corner case. ------------- Commit messages: - fix Changes: https://git.openjdk.org/jdk/pull/22663/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22663&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343607 Stats: 10 lines in 1 file changed: 9 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22663.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22663/head:pull/22663 PR: https://git.openjdk.org/jdk/pull/22663 From shade at openjdk.org Tue Dec 10 14:49:42 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 10 Dec 2024 14:49:42 GMT Subject: RFR: 8343607: C2: Shenandoah crashes during barrier expansion in Continuation::enter In-Reply-To: References: Message-ID: On Tue, 10 Dec 2024 14:40:44 GMT, Roland Westrelin wrote: > If a load barrier is used both in the fallthrough and exception > handling paths out of a call, it needs to be cloned so each path has > its copy of the barrier. In the case of the crash, cloning the barrier > is attempted for a runtime call that doesn't have an exception > handling path. Fix simply detects that corner case. This looks reasonable to me. @JohnTortugo should take a look too. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22663#pullrequestreview-2492569866 From duke at openjdk.org Tue Dec 10 15:00:21 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Tue, 10 Dec 2024 15:00:21 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v21] In-Reply-To: References: Message-ID: > This PR introduces > - several new optimizations to unsigned division and modulo > - x % 1, x % x, x % 2^k > - x / 1, x / x, x / 2^k > - does not implement the Granlund and Montgomery algorithm, which has been implemented for signed modulo division in the past. It is unclear if a lot is to be gained by implementing this. > - tests to test existing optimizations for signed division and modulo > - does not test the Granlund and Montgomery algorithm directly theoweidmannoracle has updated the pull request incrementally with two additional commits since the last revision: - Merge branch 'unsigned-div-opts' of https://github.com/theoweidmannoracle/jdk into unsigned-div-opts - Rename variables ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22061/files - new: https://git.openjdk.org/jdk/pull/22061/files/75659ae3..fe2232ee Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22061&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22061&range=19-20 Stats: 20 lines in 1 file changed: 0 ins; 0 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/22061.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22061/head:pull/22061 PR: https://git.openjdk.org/jdk/pull/22061 From duke at openjdk.org Tue Dec 10 15:00:22 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Tue, 10 Dec 2024 15:00:22 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v19] In-Reply-To: <60ahCXvKc_mW_xhFtHQgCW_FYBn6ZxMC0j6eE5t9ooU=.7f6f4cfb-d187-496a-8f51-ea4e717b25a7@github.com> References: <60ahCXvKc_mW_xhFtHQgCW_FYBn6ZxMC0j6eE5t9ooU=.7f6f4cfb-d187-496a-8f51-ea4e717b25a7@github.com> Message-ID: On Tue, 10 Dec 2024 09:20:03 GMT, Christian Hagedorn wrote: >> theoweidmannoracle has updated the pull request incrementally with two additional commits since the last revision: >> >> - Fix space >> - Add missing UDiv/UMod cases to no_dependent_zero_check and cannot_split_division > > Thanks for fixing the cases! Some minor variable renaming suggestions for the existing code that you moved. Otherwise, looks good to me, too. @chhagedorn Thanks for the suggestions. I renamed the variables. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22061#issuecomment-2531873406 From lmesnik at openjdk.org Tue Dec 10 15:43:01 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 10 Dec 2024 15:43:01 GMT Subject: RFR: 8345700: tier{1, 2, 3}_compiler don't cover all compiler tests [v2] In-Reply-To: References: Message-ID: > Hi, > could you please review following fix that update > tier3_compiler test group > so :hotspot_compiler is always a sum of > tier1_compiler, tier2_compiler, tier3_compiler > This is natural splitting of tests into 3 layers. > The fix is done to execution :hotspot_compiler into 3 tiers with corresponding group names. > > New tests in tier3: > 9 tests in compiler/ccp/ > 22 test in ompiler/predicates/ > 8 tests in compiler/splitif/ > So the number is not increased significantly. > > The new group > ` tier2_ctw` > was introduced for ctw testing. > It is different from tests in hotspot_compiler, and usually executed separately, So I added it as separate sub-test of tier2. > > I moved ctw from tier3 to tier2 to correspond current tiers of Hotspot CI in our system. > If anyone thinks it should be part of tier3 - we can discuss in PR comments how do deal with it. > > @shipilev, could you please take a look and check if these changes are ok to you since you the author of tier2/tier3. Leonid Mesnik has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' of https://github.com/openjdk/jdk into 8345700 - 8345700: tier{1,2,3}_compiler doesn't cover all compiler tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22614/files - new: https://git.openjdk.org/jdk/pull/22614/files/3a6cf638..27d324cd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22614&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22614&range=00-01 Stats: 2447 lines in 1094 files changed: 940 ins; 330 del; 1177 mod Patch: https://git.openjdk.org/jdk/pull/22614.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22614/head:pull/22614 PR: https://git.openjdk.org/jdk/pull/22614 From lmesnik at openjdk.org Tue Dec 10 15:56:41 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 10 Dec 2024 15:56:41 GMT Subject: Integrated: 8345746: Remove :resourcehogs/compiler from :hotspot_slow_compiler In-Reply-To: References: Message-ID: <1R6YdXZ5N5nqEDrnQEqQaqm6mXWgFoR3VK3nUfs9muo=.8ea2af68-b6ce-4cb1-92b2-d23bec012901@github.com> On Sat, 7 Dec 2024 03:33:34 GMT, Leonid Mesnik wrote: > The test group > :resourcehogs/compiler > contains tests that should not be executed concurrently with *any* tests. > > They might use a lot of resources and cause unexplained sporadic failures of other tests. > > So it should be removed from :hotspot_slow_compiler. This pull request has now been integrated. Changeset: d6b5264c Author: Leonid Mesnik URL: https://git.openjdk.org/jdk/commit/d6b5264c3f7d0c4157ebd73b2f1a98dd15273c61 Stats: 4 lines in 1 file changed: 3 ins; 1 del; 0 mod 8345746: Remove :resourcehogs/compiler from :hotspot_slow_compiler Reviewed-by: kvn ------------- PR: https://git.openjdk.org/jdk/pull/22626 From qamai at openjdk.org Tue Dec 10 15:57:46 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 10 Dec 2024 15:57:46 GMT Subject: RFR: 8342651: Refactor array constant to use an array of jbyte [v3] In-Reply-To: References: <0gyWEIQ_ZHlIoR_7zdB6sxvApC-5hXkG3RnYQSqWp6w=.fad5dcb7-ffaf-4841-a55c-9afc3475a48d@github.com> Message-ID: <7pWyOA5tFu9q_zbgRIsyf1jRBfnXGxRU-JMopB7Ybkw=.6e91ef88-8863-4225-b529-b9e096b976b8@github.com> On Thu, 14 Nov 2024 06:44:37 GMT, Tobias Hartmann wrote: >> Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - indentation >> - Merge branch 'master' into constanttable >> - Merge branch 'master' into constanttable >> - refactor array constant, fix codebuffer reallocation > > Looks good to me. @TobiHartmann Could you please reapprove this PR, thanks very much. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21596#issuecomment-2532133371 From qamai at openjdk.org Tue Dec 10 16:10:09 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 10 Dec 2024 16:10:09 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v10] In-Reply-To: References: Message-ID: > Hi, > > This is just a redo of https://github.com/openjdk/jdk/pull/13093. mostly just the revert of the backout. > > Regarding the related issues: > > - [JDK-8306008](https://bugs.openjdk.org/browse/JDK-8306008) and [JDK-8309531](https://bugs.openjdk.org/browse/JDK-8309531) have been fixed before the backout. > - [JDK-8309373](https://bugs.openjdk.org/browse/JDK-8309373) was due to missing `ForceInline` on `AbstractVector::toBitsVectorTemplate` > - [JDK-8306592](https://bugs.openjdk.org/browse/JDK-8306592), I have not been able to find the root causes. I'm not sure if this is a blocker, now I cannot even build x86-32 tests. > > Finally, I moved some implementation of public methods and methods that call into intrinsics to the concrete class as that may help the compiler know the correct types of the variables. > > Please take a look and leave reviews. Thanks a lot. > > The description of the original PR: > > This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, `VectorShuffle` is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: > > Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. > Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. > Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. > Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. > Upon these changes, a `rearrange` can emit more efficient code: > > var species = IntVector.SPECIES_128; > var v1 = IntVector.fromArray(species, SRC1, 0); > var v2 = IntVector.fromArray(species, SRC2, 0); > v1.rearrange(v2.toShuffle()).intoArray(DST, 0); > > Before: > movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} > vmovdqu 0x10(%r10),%xmm2 > movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} > vmovdqu 0x10(%r10),%xmm1 > movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} > vmovdqu 0x10(%r10),%xmm0 > vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byte_mask > ; {ex... Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: optimize slice/unslice ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21042/files - new: https://git.openjdk.org/jdk/pull/21042/files/146c8cea..a933d466 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21042&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21042&range=08-09 Stats: 134 lines in 8 files changed: 34 ins; 0 del; 100 mod Patch: https://git.openjdk.org/jdk/pull/21042.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21042/head:pull/21042 PR: https://git.openjdk.org/jdk/pull/21042 From qamai at openjdk.org Tue Dec 10 16:10:10 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 10 Dec 2024 16:10:10 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v9] In-Reply-To: References: Message-ID: On Tue, 10 Dec 2024 08:34:30 GMT, Jatin Bhateja wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> Change wording on VectorLoadShuffleNode >> >> Co-authored-by: Jatin Bhateja > > I am observing some performance drops in slice / unslice benchmarks, I have just completed this run and not got a chance to root cause. > ![image](https://github.com/user-attachments/assets/ecb686b6-c3b0-47e9-9325-40341c19ff9d) > > Can you kindly verify once at your end. @jatin-bhateja Thanks for the performance notice, it was due to the failure to inline `VectorSupport::convert` which I believe is similar to [JDK-8302459](https://bugs.openjdk.org/browse/JDK-8302459) because the `convert` sequence is pretty complex. For now, I refactor the `slice/unslice` routines to take advantage of `toBitsVector`, could you verify if the change fixes the regression on your side? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21042#issuecomment-2532158271 From qamai at openjdk.org Tue Dec 10 16:10:10 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 10 Dec 2024 16:10:10 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v8] In-Reply-To: References: <6yhvabADaFjfg6DLQXTXSyOapN45WsW5vbSgdu7ZpXM=.e0ef06b5-5ba3-4f7e-b364-fb95909a9acd@github.com> Message-ID: On Tue, 10 Dec 2024 09:02:40 GMT, Jatin Bhateja wrote: >> This is called in an `assert` so I think throwing `AssertionError` seems more reasonable, the original implementation also throws `AssertionError` > > I see, you want to throw an error here and not just use an assert statement that reports an assertion failure with -ea flag. This is a newly added routine, why don't you simply return a false and let the assertion invoking this routine do the rest? This routine is moved from `AbstractShuffle` where it was implemented when all shuffle types share the same constructor there. Throwing here allows easier construction of the error message. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21042#discussion_r1878383065 From shade at openjdk.org Tue Dec 10 16:23:43 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 10 Dec 2024 16:23:43 GMT Subject: RFR: 8345700: tier{1, 2, 3}_compiler don't cover all compiler tests [v2] In-Reply-To: References: Message-ID: On Tue, 10 Dec 2024 15:43:01 GMT, Leonid Mesnik wrote: >> Hi, >> could you please review following fix that update >> tier3_compiler test group >> so :hotspot_compiler is always a sum of >> tier1_compiler, tier2_compiler, tier3_compiler >> This is natural splitting of tests into 3 layers. >> The fix is done to execution :hotspot_compiler into 3 tiers with corresponding group names. >> >> New tests in tier3: >> 9 tests in compiler/ccp/ >> 22 test in ompiler/predicates/ >> 8 tests in compiler/splitif/ >> So the number is not increased significantly. >> >> The new group >> ` tier2_ctw` >> was introduced for ctw testing. >> It is different from tests in hotspot_compiler, and usually executed separately, So I added it as separate sub-test of tier2. >> >> I moved ctw from tier3 to tier2 to correspond current tiers of Hotspot CI in our system. >> If anyone thinks it should be part of tier3 - we can discuss in PR comments how do deal with it. >> >> @shipilev, could you please take a look and check if these changes are ok to you since you the author of tier2/tier3. > > Leonid Mesnik has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: > > - Merge branch 'master' of https://github.com/openjdk/jdk into 8345700 > - 8345700: tier{1,2,3}_compiler doesn't cover all compiler tests Looks fine, assuming GHA turns out green. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22614#pullrequestreview-2492859848 From jbhateja at openjdk.org Tue Dec 10 16:55:44 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 10 Dec 2024 16:55:44 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v10] In-Reply-To: References: Message-ID: On Tue, 10 Dec 2024 16:10:09 GMT, Quan Anh Mai wrote: >> Hi, >> >> This is just a redo of https://github.com/openjdk/jdk/pull/13093. mostly just the revert of the backout. >> >> Regarding the related issues: >> >> - [JDK-8306008](https://bugs.openjdk.org/browse/JDK-8306008) and [JDK-8309531](https://bugs.openjdk.org/browse/JDK-8309531) have been fixed before the backout. >> - [JDK-8309373](https://bugs.openjdk.org/browse/JDK-8309373) was due to missing `ForceInline` on `AbstractVector::toBitsVectorTemplate` >> - [JDK-8306592](https://bugs.openjdk.org/browse/JDK-8306592), I have not been able to find the root causes. I'm not sure if this is a blocker, now I cannot even build x86-32 tests. >> >> Finally, I moved some implementation of public methods and methods that call into intrinsics to the concrete class as that may help the compiler know the correct types of the variables. >> >> Please take a look and leave reviews. Thanks a lot. >> >> The description of the original PR: >> >> This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, `VectorShuffle` is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: >> >> Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. >> Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. >> Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. >> Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. >> Upon these changes, a `rearrange` can emit more efficient code: >> >> var species = IntVector.SPECIES_128; >> var v1 = IntVector.fromArray(species, SRC1, 0); >> var v2 = IntVector.fromArray(species, SRC2, 0); >> v1.rearrange(v2.toShuffle()).intoArray(DST, 0); >> >> Before: >> movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} >> vmovdqu 0x10(%r10),%xmm2 >> movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} >> vmovdqu 0x10(%r10),%xmm1 >> movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} >> vmovdqu 0x10(%r10),%xmm0 >> vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byt... > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > optimize slice/unslice Marked as reviewed by jbhateja (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21042#pullrequestreview-2492970106 From jbhateja at openjdk.org Tue Dec 10 16:55:46 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 10 Dec 2024 16:55:46 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v9] In-Reply-To: References: Message-ID: On Tue, 10 Dec 2024 08:34:30 GMT, Jatin Bhateja wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> Change wording on VectorLoadShuffleNode >> >> Co-authored-by: Jatin Bhateja > > I am observing some performance drops in slice / unslice benchmarks, I have just completed this run and not got a chance to root cause. > ![image](https://github.com/user-attachments/assets/ecb686b6-c3b0-47e9-9325-40341c19ff9d) > > Can you kindly verify once at your end. > @jatin-bhateja Thanks for the performance notice, it was due to the failure to inline `VectorSupport::convert` which I believe is similar to [JDK-8302459](https://bugs.openjdk.org/browse/JDK-8302459) because the `convert` sequence is pretty complex. For now, I refactor the `slice/unslice` routines to take advantage of `toBitsVector`, could you verify if the change fixes the regres sion on your side? ![image](https://github.com/user-attachments/assets/3a3b309c-1361-4c7d-8435-b837cfaa3333) Thanks @merykitty, your latest commit fixed the performance drop. LGTM. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21042#issuecomment-2532276493 From psandoz at openjdk.org Tue Dec 10 17:27:43 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 10 Dec 2024 17:27:43 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v10] In-Reply-To: References: Message-ID: On Tue, 10 Dec 2024 16:10:09 GMT, Quan Anh Mai wrote: >> Hi, >> >> This is just a redo of https://github.com/openjdk/jdk/pull/13093. mostly just the revert of the backout. >> >> Regarding the related issues: >> >> - [JDK-8306008](https://bugs.openjdk.org/browse/JDK-8306008) and [JDK-8309531](https://bugs.openjdk.org/browse/JDK-8309531) have been fixed before the backout. >> - [JDK-8309373](https://bugs.openjdk.org/browse/JDK-8309373) was due to missing `ForceInline` on `AbstractVector::toBitsVectorTemplate` >> - [JDK-8306592](https://bugs.openjdk.org/browse/JDK-8306592), I have not been able to find the root causes. I'm not sure if this is a blocker, now I cannot even build x86-32 tests. >> >> Finally, I moved some implementation of public methods and methods that call into intrinsics to the concrete class as that may help the compiler know the correct types of the variables. >> >> Please take a look and leave reviews. Thanks a lot. >> >> The description of the original PR: >> >> This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, `VectorShuffle` is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: >> >> Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. >> Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. >> Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. >> Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. >> Upon these changes, a `rearrange` can emit more efficient code: >> >> var species = IntVector.SPECIES_128; >> var v1 = IntVector.fromArray(species, SRC1, 0); >> var v2 = IntVector.fromArray(species, SRC2, 0); >> v1.rearrange(v2.toShuffle()).intoArray(DST, 0); >> >> Before: >> movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} >> vmovdqu 0x10(%r10),%xmm2 >> movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} >> vmovdqu 0x10(%r10),%xmm1 >> movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} >> vmovdqu 0x10(%r10),%xmm0 >> vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byt... > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > optimize slice/unslice Marked as reviewed by psandoz (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21042#pullrequestreview-2493047392 From lmesnik at openjdk.org Tue Dec 10 17:42:44 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 10 Dec 2024 17:42:44 GMT Subject: Integrated: 8345700: tier{1,2,3}_compiler don't cover all compiler tests In-Reply-To: References: Message-ID: On Fri, 6 Dec 2024 20:10:08 GMT, Leonid Mesnik wrote: > Hi, > could you please review following fix that update > tier3_compiler test group > so :hotspot_compiler is always a sum of > tier1_compiler, tier2_compiler, tier3_compiler > This is natural splitting of tests into 3 layers. > The fix is done to execution :hotspot_compiler into 3 tiers with corresponding group names. > > New tests in tier3: > 9 tests in compiler/ccp/ > 22 test in ompiler/predicates/ > 8 tests in compiler/splitif/ > So the number is not increased significantly. > > The new group > ` tier2_ctw` > was introduced for ctw testing. > It is different from tests in hotspot_compiler, and usually executed separately, So I added it as separate sub-test of tier2. > > I moved ctw from tier3 to tier2 to correspond current tiers of Hotspot CI in our system. > If anyone thinks it should be part of tier3 - we can discuss in PR comments how do deal with it. > > @shipilev, could you please take a look and check if these changes are ok to you since you the author of tier2/tier3. This pull request has now been integrated. Changeset: 1def2d82 Author: Leonid Mesnik URL: https://git.openjdk.org/jdk/commit/1def2d82ac003a974759048c6cc0a173b1fc692f Stats: 16 lines in 1 file changed: 4 ins; 11 del; 1 mod 8345700: tier{1,2,3}_compiler don't cover all compiler tests Reviewed-by: dholmes, shade ------------- PR: https://git.openjdk.org/jdk/pull/22614 From bulasevich at openjdk.org Tue Dec 10 22:23:17 2024 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Tue, 10 Dec 2024 22:23:17 GMT Subject: RFR: 8343789: Move mutable nmethod data out of CodeCache [v4] In-Reply-To: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> References: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> Message-ID: <_-pjMgFvCdXc8Qcg4DVCDUCYQXVi7ZIyQGqSIx7M8g4=.771d6db8-4ec8-4d34-91a8-dfc95a0c3a58@github.com> > This change relocates mutable data (such as relocations, oops, and metadata) from the nmethod. The change follows the recent PR #18984, which relocated immutable nmethod data from the CodeCache. > > The core idea remains the same: use the CodeCache for executable code while moving additional data to the C heap. The primary motivations are improving security and enhancing code density. > > Although performance is not the main focus, testing on AArch64 CPUs, where code density plays a significant role, has shown a 1?2% performance improvement in specific scenarios, such as the CodeCacheStress test and the Renaissance Dotty benchmark. > > The numbers. Immutable data constitutes **~30%** on the nmehtod. Mutable data constitutes **~8%** of nmethod. Example (statistics collected on the CodeCacheStress benchmark): > - nmethod_count:134000, total_compilation_time: 510460ms > - total allocation time malloc_mutable/malloc_immutable/CodeCache_alloc: 62ms/114ms/6333ms, > - total allocation size (mutable/immutable/nmentod): 64MB/192MB/488MB > > Functional testing: jtreg on arm/aarch/x86. > Performance testing: renaissance/dacapo/SPECjvm2008 benchmarks. > > Alternative solution (see comments): In the future, relocations can be moved to _immutable_data. Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision: extra bool parameter for ldr_patchable ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21276/files - new: https://git.openjdk.org/jdk/pull/21276/files/27d27aa3..ee697996 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21276&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21276&range=02-03 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21276.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21276/head:pull/21276 PR: https://git.openjdk.org/jdk/pull/21276 From bulasevich at openjdk.org Tue Dec 10 22:34:45 2024 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Tue, 10 Dec 2024 22:34:45 GMT Subject: RFR: 8343789: Move mutable nmethod data out of CodeCache [v2] In-Reply-To: References: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> Message-ID: <0-RuI0yrrHrzDhR9Od-KDkfhDBQERUlR8mtaQrzWFD0=.d84d10a0-c5b8-4138-a164-935947aa080d@github.com> On Mon, 2 Dec 2024 22:20:04 GMT, Dean Long wrote: >> Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision: >> >> rework movoop for not_supports_instruction_patching case: correcting in ldr_constant and relocations fixup > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 1423: > >> 1421: } else { >> 1422: uint64_t offset; >> 1423: adrp(dest, const_addr, offset); > > I don't see how this ADRP path ever gets called now. The only caller is in MacroAssembler::movoop(), which uses a dummy Address in the CodeCache. I think we need to force near/far with an extra bool parameter. The way this function is currently used, a better name might be ldr_patchable(). Thanks for pointing that out. I reworked the method a little. ldr_patchable() is only called by movoop(). If oops is moved to a separate location, the distance is certainly > 1MB and adrp+ldr is the only way to access oops. The single LDR path is not used now, but I leave it for future use. void ldr_patchable(Register dest, const Address &const_addr, bool fits_in_ldr_range = false) { if (fits_in_ldr_range) { intptr_t offset = pc() - const_addr.target(); assert(offset >= -1024 * 1024, offset < 1024 * 1024, "pointer does not fit into pc-relative ldr range") ldr(dest, const_addr); } else { uint64_t offset; adrp(dest, const_addr, offset); ldr(dest, Address(dest, offset)); } } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21276#discussion_r1879005045 From bulasevich at openjdk.org Tue Dec 10 22:34:48 2024 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Tue, 10 Dec 2024 22:34:48 GMT Subject: RFR: 8343789: Move mutable nmethod data out of CodeCache [v4] In-Reply-To: References: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> Message-ID: On Fri, 22 Nov 2024 02:54:35 GMT, Dean Long wrote: >> Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision: >> >> extra bool parameter for ldr_patchable > > src/hotspot/share/code/nmethod.cpp line 2152: > >> 2150: delete[] _compiled_ic_data; >> 2151: >> 2152: if (_immutable_data != blob_end()) { > > Is this just a name change, or a semantic change? In several places _immutable_data is set to data_end() "valid not null address" address which was actually the end of the code blob. With this change I remove the _data part of the code blob as well as the data_begin() and data_end() functions. I think blob_end() is a good replacement for data_end() for empty _immutable_data cases. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21276#discussion_r1879006168 From jkarthikeyan at openjdk.org Tue Dec 10 22:35:47 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Tue, 10 Dec 2024 22:35:47 GMT Subject: RFR: 8341781: Improve Min/Max node identities [v4] In-Reply-To: References: Message-ID: <9YWYWsNnHlmLryoziOjfF4OXj-N_OR5DQeXe8-CEg5o=.587645cb-8561-4ffc-a051-88d634ec4021@github.com> On Mon, 4 Nov 2024 04:25:07 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This patch implements some missing identities for Min/Max nodes. It adds static type-based operand choosing for MinI/MaxI, such as the ones that MinL/MaxL use. In addition, it adds simplification for patterns such as `Max(A, Max(A, B))` to `Max(A, B)` and `Max(A, Min(A, B))` to `A`. These simplifications stem from the [lattice identity rules](https://en.wikipedia.org/wiki/Lattice_(order)#As_algebraic_structure). The main place I've seen this pattern is with MinL/MaxL nodes created during loop optimizations. Some examples of where this occurs include BigInteger addition/subtraction, and regex code. I've run some of the existing benchmarks and found some nice improvements: >> >> Baseline Patch >> Benchmark Mode Cnt Score Error Units Score Error Units Improvement >> BigIntegers.testAdd avgt 15 25.096 ? 3.936 ns/op 19.214 ? 0.521 ns/op (+ 26.5%) >> PatternBench.charPatternCompile avgt 8 453.727 ? 117.265 ns/op 370.054 ? 26.106 ns/op (+ 20.3%) >> PatternBench.charPatternMatch avgt 8 917.604 ? 121.766 ns/op 810.560 ? 38.437 ns/op (+ 12.3%) >> PatternBench.charPatternMatchWithCompile avgt 8 1477.703 ? 255.783 ns/op 1224.460 ? 28.220 ns/op (+ 18.7%) >> PatternBench.longStringGraphemeMatches avgt 8 860.909 ? 124.661 ns/op 743.729 ? 22.877 ns/op (+ 14.6%) >> PatternBench.splitFlags avgt 8 420.506 ? 76.252 ns/op 321.911 ? 11.661 ns/op (+ 26.6%) >> >> I've added some IR tests, and tier 1 testing passes on my linux machine. Reviews would be appreciated! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Make long tests check IR Since it's after the fork, I'll integrate it now. Thanks again for the reviews everyone! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21439#issuecomment-2533081301 From jkarthikeyan at openjdk.org Tue Dec 10 22:35:48 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Tue, 10 Dec 2024 22:35:48 GMT Subject: Integrated: 8341781: Improve Min/Max node identities In-Reply-To: References: Message-ID: On Thu, 10 Oct 2024 02:59:14 GMT, Jasmine Karthikeyan wrote: > Hi all, > This patch implements some missing identities for Min/Max nodes. It adds static type-based operand choosing for MinI/MaxI, such as the ones that MinL/MaxL use. In addition, it adds simplification for patterns such as `Max(A, Max(A, B))` to `Max(A, B)` and `Max(A, Min(A, B))` to `A`. These simplifications stem from the [lattice identity rules](https://en.wikipedia.org/wiki/Lattice_(order)#As_algebraic_structure). The main place I've seen this pattern is with MinL/MaxL nodes created during loop optimizations. Some examples of where this occurs include BigInteger addition/subtraction, and regex code. I've run some of the existing benchmarks and found some nice improvements: > > Baseline Patch > Benchmark Mode Cnt Score Error Units Score Error Units Improvement > BigIntegers.testAdd avgt 15 25.096 ? 3.936 ns/op 19.214 ? 0.521 ns/op (+ 26.5%) > PatternBench.charPatternCompile avgt 8 453.727 ? 117.265 ns/op 370.054 ? 26.106 ns/op (+ 20.3%) > PatternBench.charPatternMatch avgt 8 917.604 ? 121.766 ns/op 810.560 ? 38.437 ns/op (+ 12.3%) > PatternBench.charPatternMatchWithCompile avgt 8 1477.703 ? 255.783 ns/op 1224.460 ? 28.220 ns/op (+ 18.7%) > PatternBench.longStringGraphemeMatches avgt 8 860.909 ? 124.661 ns/op 743.729 ? 22.877 ns/op (+ 14.6%) > PatternBench.splitFlags avgt 8 420.506 ? 76.252 ns/op 321.911 ? 11.661 ns/op (+ 26.6%) > > I've added some IR tests, and tier 1 testing passes on my linux machine. Reviews would be appreciated! This pull request has now been integrated. Changeset: 29d648c6 Author: Jasmine Karthikeyan URL: https://git.openjdk.org/jdk/commit/29d648c642a68699340a9ab43252f832efdb5cbf Stats: 297 lines in 5 files changed: 289 ins; 2 del; 6 mod 8341781: Improve Min/Max node identities Reviewed-by: chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/21439 From bulasevich at openjdk.org Tue Dec 10 22:42:42 2024 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Tue, 10 Dec 2024 22:42:42 GMT Subject: RFR: 8343789: Move mutable nmethod data out of CodeCache [v4] In-Reply-To: References: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> <5qBcX1j2O16hvCKyLjknxQqH50qdfwhlQf2P1FEUqEU=.451f7d8b-2bf2-4f28-8c8c-79e7a7b8d613@github.com> Message-ID: <_uqMIRcyf1lIFj-VgedDGLObKaRXfSENlKzYGmmrhBI=.2b8455f9-126f-48d9-be4d-10d0d28786f1@github.com> On Fri, 22 Nov 2024 02:51:03 GMT, Dean Long wrote: >> src/hotspot/share/code/codeBlob.cpp line 103: >> >>> 101: // The mutable_data_size is either calculated by the nmethod constructor to account >>> 102: // for reloc_info and additional data, or it is set here to accommodate only the relocation data. >>> 103: _mutable_data_size = (mutable_data_size == 0) ? cb->total_relocation_size() : mutable_data_size; >> >> This seems strange to treat relocations as special. Wouldn't it be better to have the caller always pass in the correct value? > > Or compute using something like required_mutable_data_space()? Alright. Thank you. I moved mutable_data_size calculation out of CodeBlob. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21276#discussion_r1879013357 From bulasevich at openjdk.org Tue Dec 10 22:42:44 2024 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Tue, 10 Dec 2024 22:42:44 GMT Subject: RFR: 8343789: Move mutable nmethod data out of CodeCache [v4] In-Reply-To: <5qBcX1j2O16hvCKyLjknxQqH50qdfwhlQf2P1FEUqEU=.451f7d8b-2bf2-4f28-8c8c-79e7a7b8d613@github.com> References: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> <5qBcX1j2O16hvCKyLjknxQqH50qdfwhlQf2P1FEUqEU=.451f7d8b-2bf2-4f28-8c8c-79e7a7b8d613@github.com> Message-ID: On Fri, 22 Nov 2024 02:43:25 GMT, Dean Long wrote: >> Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision: >> >> extra bool parameter for ldr_patchable > > src/hotspot/share/code/codeBlob.hpp line 108: > >> 106: >> 107: int _size; // total size of CodeBlob in bytes >> 108: int _relocation_size; // size of relocation (could be bigger than 64Kb) > > For offsets into the external mutable/immutable data, we could reduce codecache footprint further by moving these into a a header section of the external data block. That also allows those blocks to be self-describing, which could help with error reporting or debugging. Sounds reasonable. But the downside is that in this case the oops iterator needs an additional load (nmethod->mutable_data->oop_size) to check if the oops list is empty. > src/hotspot/share/code/codeBlob.hpp line 135: > >> 133: CodeBlob(const char* name, CodeBlobKind kind, CodeBuffer* cb, int size, uint16_t header_size, >> 134: int16_t frame_complete_offset, int frame_size, OopMapSet* oop_maps, bool caller_must_gc_arguments, >> 135: int mutable_data_size = 0); > > If we want to allow the default for mutable data size to be the relocations size, then instead of using = 0 here, you could do this instead: > > CodeBlob(const char* name, CodeBlobKind kind, CodeBuffer* cb, int size, uint16_t header_size, > int16_t frame_complete_offset, int frame_size, OopMapSet* oop_maps, bool caller_must_gc_arguments, > int mutable_data_size); > > CodeBlob(const char* name, CodeBlobKind kind, CodeBuffer* cb, int size, uint16_t header_size, > int16_t frame_complete_offset, int frame_size, OopMapSet* oop_maps, bool caller_must_gc_arguments) : > CodeBlob(name, kind, cb, size, header_size, > frame_complete_offset, frame_size, oop_maps, caller_must_gc_arguments, > cb->total_relocation_size) > { > } > > but I would prefer not to treat relocations as special, and have the caller always pass the correct value. Agree. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21276#discussion_r1879014631 PR Review Comment: https://git.openjdk.org/jdk/pull/21276#discussion_r1879015667 From bulasevich at openjdk.org Tue Dec 10 22:51:39 2024 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Tue, 10 Dec 2024 22:51:39 GMT Subject: RFR: 8343789: Move mutable nmethod data out of CodeCache [v3] In-Reply-To: References: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> <8312jgwPU7U_JVE8QtFR8wHlsArpfs9W07O4Ux-C_PI=.e0c2a9ca-1da2-4776-94ea-240580cabde8@github.com> Message-ID: <3AkWtKBScPcjThhZD7NtlvxHDliqsLLPLg-AC3bfKug=.9062d6e1-9e1f-4a51-918b-896ca3a7dc4d@github.com> On Mon, 9 Dec 2024 23:14:31 GMT, Dean Long wrote: >> Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision: >> >> a bit of cleanup and satisfying review suggestions > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 1419: > >> 1417: >> 1418: void ldr_patchable(Register dest, const Address &const_addr) { >> 1419: if (CodeCache::contains(const_addr.target())) { > > Doesn't this cause us to generate LDR for oops outside the code cache, when what we need is the ADRP below? The caller is still using a PC-relative dummy address. (written above) in ldr_patchable adrp+ldr path is used. Code generator encodes ldr to dummy address (real address is stored in oop_Relocation). During relocation, adrp+ldr are patched to a real address. This code only works for UseShenandoahGC and ldr_patchable only uses the adrp+ldr path. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21276#discussion_r1879024114 From cslucas at openjdk.org Wed Dec 11 00:35:41 2024 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 11 Dec 2024 00:35:41 GMT Subject: RFR: 8343607: C2: Shenandoah crashes during barrier expansion in Continuation::enter In-Reply-To: References: Message-ID: On Tue, 10 Dec 2024 14:40:44 GMT, Roland Westrelin wrote: > If a load barrier is used both in the fallthrough and exception > handling paths out of a call, it needs to be cloned so each path has > its copy of the barrier. In the case of the crash, cloning the barrier > is attempted for a runtime call that doesn't have an exception > handling path. Fix simply detects that corner case. LGTM, thanks for fixing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22663#issuecomment-2533349105 From vladimir.x.ivanov at oracle.com Wed Dec 11 00:54:47 2024 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 10 Dec 2024 16:54:47 -0800 Subject: RFC: Untangle native libraries and the JVM: SVML, SLEEF, and libsimdsort In-Reply-To: <5EA2B22F-0BFD-49E6-A20B-0F91ED94FF8C@oracle.com> References: <1bf041a1-1002-4d18-85b3-15f437fa534e@oracle.com> <5EA2B22F-0BFD-49E6-A20B-0F91ED94FF8C@oracle.com> Message-ID: <24117ca6-abf9-43a0-933c-00c3333097e4@oracle.com> On 12/9/24 07:55, Paul Sandoz wrote: > Some further observations. > > - This arguably makes it harder for the auto-vectorize to access the SVML/SLEEF functionality. However, in comes cases, we cannot guarantee the same guarantees (IIRC mainly around monotonicity) as the scalar operations in Math. I'm not too optimistic about auto-vectorization unless the very same stubs are shared between scalar and vectorized code. Our previous experience with FP operations strongly indicates that users expect FP operations to give reproducible results (bitwise equivalent) across the same run. Moreover, migration to FFI enables usage of SVML/SLEEF across all execution modes which should make it easier to reason about Vector API usages. > - There is an open bug to adjust the simd sort behavior on AMD zen 4 cores due to poor performance of an AVX 512 instruction. The simplest solution is to fall back to AVX2. That may be simpler to manage in Java? (I was looking at the HotSpot code). For now, the patch guards AVX512 entries with VM.isIntelCPU() check. In order to distinguish between AMD Zen 4 and 5, either a new platform-sensing check is needed or reimplementation of x86-specific platform sensing in Java on top of CPUID info. Best regards, Vladimir Ivanov >> On Dec 6, 2024, at 4:48?PM, Vladimir Ivanov wrote: >> >> Thanks, Paul. >> >>> Excellent work, very happy to see more of this moved to Java leveraging Panama features. The Java code looks very organized. >>> I am wondering if this technique can be applied to stubs dynamically generated by HotSpot via some sort of special library lookup e.g., for crypto. >> >> It's an interesting idea. A JVM could expose individual symbols, so they can be looked up, but a more promising approach is to just expose a table of generated stubs through a native call into JVM (similar to simdsort_link [1]). >> >> The problematic part is that stubs don't have to obey to platform ABI. Some of them deliberately rely on very restrictive calling conventions (e.g., no caller-saved registers), so calling them from generated code is much simpler and cheaper. >> >> In a longer term, custom calling conventions for each entry point can be coded if there's enough java.lang.foreign support present. (So, an entry point returned by the JVM comprises of an entry address accompanied by an appropriate invoker.) >> >> >>> Do you have a sense of the differences in static memory footprint and startup cost? Things I imagine Leyden could help with. >> >> Are you asking about simdsort/SVML/SLEEF case here? > > Yes. > > >> I didn't measure, but initialization costs will definitely be higher (compared to JVM-only solution). In absolute numbers it should be negligible though (the libraries expose small number of entry points). >> >> >>> Regarding CPU dispatching, my preference would be to do it in Java. Less native logic. >> >> Fair enough. The nice thing about doing CPU dispatching on native library side is that all those cryptic naming conventions don't show up on Java side [2], but IMO it requires too much ceremony, so I kept it on Java side for now. >> >> >>> This may also be useful to help determine whether we can/should expose capabilities in the Vector API regarding what is optimally supported or not. >> >> IMO Vector API (as it is implemented now) would benefit from a higher-level C2-specific API. >> > > Ok. > > Paul. > >> >>> I presume it also does not preclude some sort of jlink plugin that strips unused methods from the native libraries, something which may be tricker if done in the native library itself? >> >> Good point. It may be the case, but I don't have enough experience with native library stripping to comment on it. >> >> Best regards, >> Vladimir Ivanov >> >> [1] https://github.com/openjdk/jdk/commit/b6e6f2e20772e86fbf9088bcef01391461c17f11 >> >> >> [2] https://github.com/iwanowww/jdk/blob/09234832b6419e54c4fc182e77f6214b36afa4c5/src/java.base/linux/native/libsimdsort/simdsort.c >> >>> Paul. >>>> On Dec 6, 2024, at 3:18?PM, Vladimir Ivanov wrote: >>>> >>>> Recently, a trend emerged to use native libraries to back intrinsics in HotSpot JVM. SVML stubs for Vector API paved the road and it was soon followed by SLEEF and simdsort libraries. >>>> >>>> After examining their support, I must confess that it doesn't look pretty. It introduces significant accidental complexity on JVM side. HotSpot has to be taught about every entry point in each library in an ad-hoc manner. It's inherently unsafe, error-prone to implement and hard to maintain: JVM makes a lot of assumptions about an entry point based solely on its symbolic name and each library has its own naming conventions. Overall, current approach doesn't scale well. >>>> >>>> Fortunately, new FFI API (java.lang.foreign) was finalized in 22. It provides enough functionality to interact with native libraries from Java in performant manner. >>>> >>>> I did an exercise to migrate all 3 libraries away from intrinsics and the results look promising: >>>> >>>> simdsort: https://github.com/openjdk/jdk/pull/22621 >>>> >>>> SVML/SLEEF: https://github.com/openjdk/jdk/pull/22619 >>>> >>>> As of now, java.lang.foreign lacks vector calling convention support, so the actual calls into SVML/SLEEF are still backed by intrinsics. But it still enables a major cleanup on JVM side. >>>> >>>> Also, I coded library headers and used jextract to produce initial library API sketch in Java and it worked really well. Eventually, it can be incorporated into JDK build process to ensure the consistency between native and Java parts of library API. >>>> >>>> Performance wise, it is on par with current (intrinsic-based) implementation. >>>> >>>> One open question relates to CPU dispatching. >>>> >>>> Each library exposes multiple functions with different requirements about CPU ISA extension support (e.g., no AVX vs AVX2 vs AVX512, NEON vs SVE). Right now, it's JVM responsibility, but once it gets out of the loop, the library itself should make the decision. I experimented with 2 approaches: (1) perform CPU dispatching with linking library from Java code (as illustrated in aforementioned PRs); or (2) call into native library to query it about the right entry point [1] [2] [3]. In both cases, it depends on additional API to sense the JVM/hardware capabilities (exposed on jdk.internal.misc.VM for now). >>>> >>>> Let me know if you have any questions/suggestions/concerns. Thanks! >>>> >>>> I plan to eventually start publishing PRs to upstream this work. >>>> >>>> Best regards, >>>> Vladimir Ivanov >>>> >>>> [1] https://github.com/openjdk/jdk/commit/b6e6f2e20772e86fbf9088bcef01391461c17f11 >>>> >>>> [2] https://github.com/iwanowww/jdk/blob/09234832b6419e54c4fc182e77f6214b36afa4c5/src/java.base/share/classes/java/util/SIMDSortLibrary.java >>>> >>>> [3] https://github.com/iwanowww/jdk/blob/09234832b6419e54c4fc182e77f6214b36afa4c5/src/java.base/linux/native/libsimdsort/simdsort.c >>>> >> > From vladimir.x.ivanov at oracle.com Wed Dec 11 01:04:36 2024 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Tue, 10 Dec 2024 17:04:36 -0800 Subject: RFC: Untangle native libraries and the JVM: SVML, SLEEF, and libsimdsort In-Reply-To: <6af9e206-4c97-4d54-9bc4-aac67308e0c1@oracle.com> References: <1bf041a1-1002-4d18-85b3-15f437fa534e@oracle.com> <6af9e206-4c97-4d54-9bc4-aac67308e0c1@oracle.com> Message-ID: <8556f741-421b-477d-b0e3-959637a12b63@oracle.com> Thanks, Maurizio. On 12/9/24 03:42, Maurizio Cimadamore wrote: > Great work Vlad! > > The simdsort part seems a more "classic" FFM binding - where you have a > method handle per entry point. That seems to fit the design of FFM > rather well. In the second case (SVML/SLEEF) usage of FFM is limited to > build a "table of entry points" (e.g. we're just using SymbolLookup + > MemorySegment here -- the invocation part is intrinsified as part of the > new VectorSupport methods). I'd say that both simdsort and SVML/SLEEF cases are slightly off from the sweet spot FFM API is designed for since all 3 libraries heavily rely on CPU dispatching. > If it helps, it might be possible to define a custom (JDK internal) > family of value layouts for vector types. Then we could enhance the > Linker classification to support such layouts. This means you could call > into native functions with vector parameters and return types using the > Linker API more directly. Not sure if it will give you the same > performance, but it's also an approach worth exploring. FTR I experimented a bit with vector calling conventions support, but as Vector API is implemented now, it introduced significant amount of complexity on both sides, so I decided to keep vector intrinsics for now. It already enables significant simplifications in Vector API. Still, it would be convenient to eventually get vector support in FFM. > Re. support for custom calling conventions to call into hotspot stubs > from Java, this might be possible - our story for supporting calling > conventions other than the system calling convention is that there > should be a dedicated linker instance per calling convention. So, if the > JVM defines its own calling convention for its stubs there should > probably be a custom Linker implementation that is used to call into > such stubs - which uses the machinery in the Linker implementation (e.g. > Bindings) to classify the incoming function descriptors and determine > the shuffle sequence for a given particular call. This should all be > doable (at least inside the JDK) - it's just matter of "writing more code". Interesting. Thanks for the details. > I agree with Paul that, as we move more stuff to use Panama, we will > need to look more at the avenues available to us to claim back some of > the additional warm up cost introduced by the use of var/method handles. > This is probably part of a bigger exploration on warmup and FFM. In case of C2 intrinsics it may be less of an issue. Additional startup costs may be quickly recuperated during warmup because optimized implementation is available earlier. Best regards, Vladimir Ivanov > On 06/12/2024 23:18, Vladimir Ivanov wrote: >> Recently, a trend emerged to use native libraries to back intrinsics >> in HotSpot JVM. SVML stubs for Vector API paved the road and it was >> soon followed by SLEEF and simdsort libraries. >> >> After examining their support, I must confess that it doesn't look >> pretty. It introduces significant accidental complexity on JVM side. >> HotSpot has to be taught about every entry point in each library in an >> ad-hoc manner. It's inherently unsafe, error-prone to implement and >> hard to maintain: JVM makes a lot of assumptions about an entry point >> based solely on its symbolic name and each library has its own naming >> conventions. Overall, current approach doesn't scale well. >> >> Fortunately, new FFI API (java.lang.foreign) was finalized in 22. It >> provides enough functionality to interact with native libraries from >> Java in performant manner. >> >> I did an exercise to migrate all 3 libraries away from intrinsics and >> the results look promising: >> >> ? simdsort: https://github.com/openjdk/jdk/pull/22621 >> >> ? SVML/SLEEF: https://github.com/openjdk/jdk/pull/22619 >> >> As of now, java.lang.foreign lacks vector calling convention support, >> so the actual calls into SVML/SLEEF are still backed by intrinsics. >> But it still enables a major cleanup on JVM side. >> >> Also, I coded library headers and used jextract to produce initial >> library API sketch in Java and it worked really well. Eventually, it >> can be incorporated into JDK build process to ensure the consistency >> between native and Java parts of library API. >> >> Performance wise, it is on par with current (intrinsic-based) >> implementation. >> >> One open question relates to CPU dispatching. >> >> Each library exposes multiple functions with different requirements >> about CPU ISA extension support (e.g., no AVX vs AVX2 vs AVX512, NEON >> vs SVE). Right now, it's JVM responsibility, but once it gets out of >> the loop, the library itself should make the decision. I experimented >> with 2 approaches: (1) perform CPU dispatching with linking library >> from Java code (as illustrated in aforementioned PRs); or (2) call >> into native library to query it about the right entry point [1] [2] >> [3]. In both cases, it depends on additional API to sense the JVM/ >> hardware capabilities (exposed on jdk.internal.misc.VM for now). >> >> Let me know if you have any questions/suggestions/concerns. Thanks! >> >> I plan to eventually start publishing PRs to upstream this work. >> >> Best regards, >> Vladimir Ivanov >> >> [1] https://github.com/openjdk/jdk/commit/ >> b6e6f2e20772e86fbf9088bcef01391461c17f11 >> >> [2] https://github.com/iwanowww/jdk/ >> blob/09234832b6419e54c4fc182e77f6214b36afa4c5/src/java.base/share/ >> classes/java/util/SIMDSortLibrary.java >> >> [3] https://github.com/iwanowww/jdk/ >> blob/09234832b6419e54c4fc182e77f6214b36afa4c5/src/java.base/linux/ >> native/libsimdsort/simdsort.c >> From jkarthikeyan at openjdk.org Wed Dec 11 04:16:36 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 11 Dec 2024 04:16:36 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v5] In-Reply-To: References: Message-ID: <7hsST_7-e0j6kT4WQUsTP0kIP7d7i98XBluSd8tG5VY=.6a0bbad0-46f6-423d-a8f1-7ad0b5c10ff6@github.com> > Hi all, > This patch adds a new pass to consolidate lowering of complex backend-specific code patterns, such as `MacroLogicV` and the optimization proposed by #21244. Moving these optimizations to backend code can simplify shared code, while also making it easier to develop more in-depth optimizations. The linked bug has an example of a new optimization this could enable. The new phase does GVN to de-duplicate nodes and calls nodes' `Value()` method, but it does not call `Identity()` or `Ideal()` to avoid undoing any changes done during lowering. It also reuses the IGVN worklist to avoid needing to re-create the notification mechanism. > > In this PR only the skeleton code for the pass is added, moving `MacroLogicV` to this system will be done separately in a future patch. Tier 1 tests pass on my linux x64 machine. Feedback on this patch would be greatly appreciated! Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - Fix merge conflict - Re-use optimize() and add backend-specific should_lower() - Merge branch 'master' into phase-lowering - Remove platform-dependent node definitions, rework PhaseLowering implementation - Address some changes from code review - Implement PhaseLowering ------------- Changes: https://git.openjdk.org/jdk/pull/21599/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21599&range=04 Stats: 298 lines in 14 files changed: 297 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21599.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21599/head:pull/21599 PR: https://git.openjdk.org/jdk/pull/21599 From amitkumar at openjdk.org Wed Dec 11 04:46:45 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 11 Dec 2024 04:46:45 GMT Subject: RFR: 8336356: [s390x] preserve Vector Register before using for string compress / expand In-Reply-To: References: Message-ID: On Mon, 25 Nov 2024 04:58:11 GMT, Amit Kumar wrote: > This PR adds `TEMP` effect for the vector register, allotted by register allocator, used in the string compress/expand intrinsic. Also it enabled the Vector computation part of those intrinsics which was disabled by https://github.com/openjdk/jdk/pull/18162 > > tier1 test, which also includes string intrinsic tests `compiler/intrinsics/string/` are clean. No regression seen. @RealLucy ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22354#issuecomment-2533619701 From jkarthikeyan at openjdk.org Wed Dec 11 04:54:39 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 11 Dec 2024 04:54:39 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v3] In-Reply-To: References: Message-ID: <3z-tAI6f1fUGXLgIwNylfaUB1DdNaQ6ZM-n2WZ_V9Ak=.61a402ae-7234-4a0d-ab0e-e29866522988@github.com> On Mon, 11 Nov 2024 08:28:42 GMT, Jatin Bhateja wrote: >> Thanks everyone for the discussion. I've pushed a commit that restructures the pass, removing the backend-specific node definition and making the pass extend `PhaseIterGVN` so that nodes can do further idealizations during lowering without complicating the main lowering switch. I also added a shared component to lowering, to facilitate moving transforms that impact multiple backends like `DivMod` to it. Lowering is also now the final phase before final graph reshaping now, since late inlines could also use IGVN. Some more comments: >> >>> It looks attractive at first, but the downside is subsequent passes may start to require platform-specific code as well (e.g., think of final graph reshaping which operates on Node opcodes). >> >> This makes sense to me. I agree that the extra complexity required to deal with this change in other parts of the code isn't worth it. The new commit removes this part of the changeset. >> >>> BTW it's not clear to me now what particular benefits IGVN brings. `DivMod` transformation doesn't use IGVN and after examining `MacroLogicV` code it can be rewritten to avoid IGVN as well. >> >> The main benefits are being able to reuse node hashing to de-duplicate redundant nodes and being able to use the existing IGVN types that were calculated (which #21244 uses). Some examples where GVN could be useful in final graph reshaping is when reshaping shift nodes and `Op_CmpUL`, where new nodes are created to approximate existing nodes on platforms without support. While I think it is unlikely that any of the created nodes would common with existing nodes except the `ConNode`s, I think it would be nice to reduce the possibility of redundant nodes in the graph before matching. This would include `DivMod` in the cases where the backend doesn't support the `DivMod` node, as multiplication and subtraction is emitted instead. I'm working on refactoring these cases in my example patch. I think it would be nice to make lowering where these platform specific optimizations occur while final graph reshaping focuses on preparing the graph for matching. >> >>> I'd say that if we want the lowering pass being discussed to be truly scalable, it's better to follow the same pattern. I have some doubts that platform-specific ad-hoc IR tweaks scale will scale well. >> >> My main concern with the macro-expansion style is that with the proposed transforms unconditional expansion/lowering of nodes isn't always possible. For example, In final graph reshaping for `DivM... > > Hi @jaskarth , > > I was trying to lower LShiftVB/URShiftVB IR to LShiftVS/URShiftVS for the x86 backend. I intend to factor out through GVN upfront byte vector to short vector conversion for both input and shift vectors if these are shared across two operations since the x86 ISA does not support direct byte vector shifts. > > To begin with, I made the following diff expecting status quo, but getting the following Fatal error at build time, can you kindly check? > > > diff --git a/src/hotspot/cpu/x86/c2_lowering_x86.cpp b/src/hotspot/cpu/x86/c2_lowering_x86.cpp > index cf4c014ffda..bc8df186396 100644 > --- a/src/hotspot/cpu/x86/c2_lowering_x86.cpp > +++ b/src/hotspot/cpu/x86/c2_lowering_x86.cpp > @@ -32,6 +32,6 @@ Node* PhaseLowering::lower_node_platform(Node* n) { > } > > bool PhaseLowering::should_lower() { > - return false; > + return true; > } > #endif // COMPILER2 > ``` > > > ERROR: Build failed for target 'images' in configuration 'linux-x86_64-server-fastdebug' (exit code 2) > > === Output from failing command(s) repeated here === > * For target support_interim-image-jlink__jlink_interim_image_exec: > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (/home/jatinbha/sandboxes/jdk-trunk/jdk/src/hotspot/share/opto/node.hpp:960), pid=1961256, tid=1961293 > # assert(is_MachReturn()) failed: invalid node class: Con > # > # JRE version: OpenJDK Runtime Environment (24.0) (fastdebug build 24-internal-adhoc.root.jdk) > # Java VM: OpenJDK 64-Bit Server VM (fastdebug 24-internal-adhoc.root.jdk, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) > # Problematic frame: > # V [libjvm.so+0x140f939] Matcher::Fixup_Save_On_Entry()+0x279 > # > # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /home/jatinbha/sandboxes/jdk-trunk/jdk/make/core.1961256) > # > # An error report file with more information is saved as: > # /home/jatinbha/sandboxes/jdk-trunk/jdk/make/hs_err_pid1961256.log > ... (rest of output omitted) > > * All command lines available in /home/jatinbha/sandboxes/jdk-trunk/jdk/build/linux-x86_64-server-fastdebug/make-support/failure-logs. @jatin-bhateja Thanks for the bug report! I was able to reproduce it on my system as well. I apologize for the late response, I had been busy with my studies and only had a chance to look recently. It looks like this bug is caused because we run `Identity()` on all of the nodes, which finds new optimizations by replacing `LoadNode`s with a non-load identity. This causes an uncommon trap branch to be optimized out and the `HaltNode` replaced with a TOP `ConNode`. Usually these are removed by `RootNode::Ideal`, but since we don't do regular `Ideal` in lowering it ends up running into an assert later. This is interesting because it suggests that there are potentially calls to `Identity()` that aren't run in earlier IGVN rounds. I didn't take a very in-depth look but it might be good to investigate separately. Because of that I'm not sure if we can rely on `Identity()` here, especially if there are identity transforms that rely on other `Ideal` calls. @merykitty do you have any thoughts on the best way to fix this? @eme64 Ah, do you mean the recursive reduction op generation based on vector length? I think that could be handled here as well by splitting the reduction operation into multiple backend-specific nodes. It would still need to be implemented per-backend, but there could be benefits with optimizing more complex reductions, as you mention. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21599#issuecomment-2533628139 From thartmann at openjdk.org Wed Dec 11 06:32:40 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 11 Dec 2024 06:32:40 GMT Subject: RFR: 8342651: Refactor array constant to use an array of jbyte [v5] In-Reply-To: References: Message-ID: On Fri, 6 Dec 2024 11:54:25 GMT, Quan Anh Mai wrote: >> Hi, >> >> This small patch refactors array constants in C2 to use an array of `jbyte`s instead of an array of `jvalue`. The former is much easier to work with and we can do `memcpy` with them trivially. >> >> Since code buffers support alignment of the constant section, I have also allowed constant tables to be aligned more than 8 bytes and used it for constant vectors on machines not supporting `SSE3`. I also fixed an issue with code buffer relocation where the temporary buffer is not correctly aligned. >> >> This patch is extracted from https://github.com/openjdk/jdk/pull/21229. Tests passed with `UseSSE=2` where 16-byte constants would be generated, as well as normal testing routines. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge branch 'master' into constanttable > - add comment to ConstantTable::alignment > - Merge branch 'master' into constanttable > - indentation > - Merge branch 'master' into constanttable > - Merge branch 'master' into constanttable > - refactor array constant, fix codebuffer reallocation Still good! ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21596#pullrequestreview-2494431815 From thartmann at openjdk.org Wed Dec 11 06:47:40 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 11 Dec 2024 06:47:40 GMT Subject: RFR: 8344026: Ubsan: prevent potential integer overflow in c1_LIRGenerator_.cpp file [v8] In-Reply-To: <22-J3Eelze79XROngKRHXcnkMgzwjdqC3zd7bry48qo=.5c3fe69c-bbd6-4a7b-9429-1e5582324860@github.com> References: <2en5GIxXeljH8KabsgIjJ0-m2OYUCj_bXaFpOSfRZiM=.0fa0c343-a924-4051-8a67-58cf20733ff5@github.com> <0Fc1FNC7m8UIskRUEZVyGDMkvPP-UAjxGLPw3JNNLBk=.1ad396f5-fed3-42bf-95a7-47f1846fa1f9@github.com> <5e_MHZarll6kDLApsRIwapr5YSqW7VlRNiMC93VH0Uw=.3ff988f1-726f-4c91-ac6e-fcc5538bb974@github.com> <6RikOqqZivUpV30HKp4GhaZzzx7RgT2tde_8tLy1TCk=.d3457927-512f-4742-a8ad-3e116619c606@github.com> <22-J3Eelze79XROngKRHXcnkMgzwjdqC3zd7bry48qo=.5c3fe69c-bbd6-4a7b-9429-1e5582324860@github.com> Message-ID: On Wed, 4 Dec 2024 10:19:56 GMT, Amit Kumar wrote: >> Ok, I forgot that unsigned has properly defined overflow semantics. >> >>> No, as of now I only ran tier1 test cases with c1 compiler. Nothing else. >> >> Tests would really be great. Is that possible? > >> Tests would really be great. Is that possible? > > I could try, Wouldn't this be already tested by `test/hotspot/jtreg/compiler/c1/MultiplyByMaxInt.java` and `test/hotspot/jtreg/compiler/integerArithmetic/MultiplyByIntegerMinHang.java` tests :) Shouldn't we get an incorrect result with these tests then? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22144#discussion_r1879434649 From thartmann at openjdk.org Wed Dec 11 06:55:36 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 11 Dec 2024 06:55:36 GMT Subject: RFR: 8341696: C2: Non-fluid StringBuilder pattern bails out in OptoStringConcat [v3] In-Reply-To: References: Message-ID: On Wed, 4 Dec 2024 09:15:37 GMT, theoweidmannoracle wrote: >> Extends stringopts to also recognize non-fluid uses of StringBuilder and optimize them the same way. >> >> For example, this basic case was not optimized before and is optimized with this PR: >> >> >> StringBuilder sb = new StringBuilder(); >> sb.append("a"); >> sb.append(a); >> return sb.toString(); > > theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: > > Move test Please make sure to run testing with javac flag `-XDstringConcat` to have it use StringBuffer instead of invokedynamic based string concat (see [JEP 280](https://openjdk.org/jeps/280)). ------------- PR Comment: https://git.openjdk.org/jdk/pull/22537#issuecomment-2533778142 From amitkumar at openjdk.org Wed Dec 11 07:11:40 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 11 Dec 2024 07:11:40 GMT Subject: RFR: 8344026: Ubsan: prevent potential integer overflow in c1_LIRGenerator_.cpp file [v8] In-Reply-To: References: <2en5GIxXeljH8KabsgIjJ0-m2OYUCj_bXaFpOSfRZiM=.0fa0c343-a924-4051-8a67-58cf20733ff5@github.com> <0Fc1FNC7m8UIskRUEZVyGDMkvPP-UAjxGLPw3JNNLBk=.1ad396f5-fed3-42bf-95a7-47f1846fa1f9@github.com> <5e_MHZarll6kDLApsRIwapr5YSqW7VlRNiMC93VH0Uw=.3ff988f1-726f-4c91-ac6e-fcc5538bb974@github.com> <6RikOqqZivUpV30HKp4GhaZzzx7RgT2tde_8tLy1TCk=.d3457927-512f-4742-a8ad-3e116619c606@github.com> <22-J3Eelze79XROngKRHXcnkMgzwjdqC3zd7bry48qo=.5c3fe69c-bbd6-4a7b-9429-1e5582324860@github.com> Message-ID: On Wed, 11 Dec 2024 06:44:56 GMT, Tobias Hartmann wrote: > Shouldn't we get an incorrect result with these tests then? I don't see any failures, again why would there be an incorrect result ? I didn't get that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22144#discussion_r1879460890 From epeter at openjdk.org Wed Dec 11 07:11:40 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 11 Dec 2024 07:11:40 GMT Subject: RFR: 8344026: Ubsan: prevent potential integer overflow in c1_LIRGenerator_.cpp file [v8] In-Reply-To: References: <2en5GIxXeljH8KabsgIjJ0-m2OYUCj_bXaFpOSfRZiM=.0fa0c343-a924-4051-8a67-58cf20733ff5@github.com> <0Fc1FNC7m8UIskRUEZVyGDMkvPP-UAjxGLPw3JNNLBk=.1ad396f5-fed3-42bf-95a7-47f1846fa1f9@github.com> <5e_MHZarll6kDLApsRIwapr5YSqW7VlRNiMC93VH0Uw=.3ff988f1-726f-4c91-ac6e-fcc5538bb974@github.com> <6RikOqqZivUpV30HKp4GhaZzzx7RgT2tde_8tLy1TCk=.d3457927-512f-4742-a8ad-3e116619c606@github.com> <22-J3Eelze79XROngKRHXcnkMgzwjdqC3zd7bry48qo=.5c3fe69c-bbd6-4a7b-9429-1e5582324860@github.com> Message-ID: On Wed, 11 Dec 2024 07:07:13 GMT, Amit Kumar wrote: >> Shouldn't we get an incorrect result with these tests then? > >> Shouldn't we get an incorrect result with these tests then? > > I don't see any failures, again why would there be an incorrect result ? I didn't get that. Is this just undefined behaviour, but no compiler so far actually does something unexpected? If so, it will be impossible to have a failing regression test before the patch. But if there is actually an overflow bug, then there could be a regression test that would be failing before the patch, and we should try to find it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22144#discussion_r1879462868 From epeter at openjdk.org Wed Dec 11 07:22:39 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 11 Dec 2024 07:22:39 GMT Subject: RFR: 8341696: C2: Non-fluid StringBuilder pattern bails out in OptoStringConcat [v3] In-Reply-To: References: Message-ID: On Wed, 4 Dec 2024 09:15:37 GMT, theoweidmannoracle wrote: >> Extends stringopts to also recognize non-fluid uses of StringBuilder and optimize them the same way. >> >> For example, this basic case was not optimized before and is optimized with this PR: >> >> >> StringBuilder sb = new StringBuilder(); >> sb.append("a"); >> sb.append(a); >> return sb.toString(); > > theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: > > Move test test/hotspot/jtreg/compiler/stringopts/TestFluidAndNonFluid.java line 68: > 66: public static String fluidNoParam() { > 67: return new StringBuilder("0").append("a").append("c").toString(); > 68: } Drive by comment: you seem to only have negative `failOn` tests here. You could consider adding a test where you have a positive rule, just to make sure you are matching the correct IR nodes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22537#discussion_r1879474618 From duke at openjdk.org Wed Dec 11 07:50:41 2024 From: duke at openjdk.org (duke) Date: Wed, 11 Dec 2024 07:50:41 GMT Subject: RFR: 8343148: C2: Refactor uses of "PhaseValue::*con*() + PhaseIdealLoop::set_ctrl()" into separate method [v12] In-Reply-To: References: Message-ID: On Mon, 18 Nov 2024 07:56:31 GMT, theoweidmannoracle wrote: >> This patch introduces the methods `PhaseIdealLoop::intcon` and `PhaseIdealLoop::longcon` which are wrappers for: >> >> >> ConINode* node = _igvn.intcon(i); >> set_ctrl(node, C->root()); >> >> >> and >> >> >> ConLNode* node = _igvn.longcon(i); >> set_ctrl(node, C->root()); >> >> >> Occurrences of this pattern in loopnode.cpp were replaced with the appropriate call to the new methods. > > theoweidmannoracle has updated the pull request incrementally with three additional commits since the last revision: > > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn > - Update src/hotspot/share/opto/loopnode.cpp > > Co-authored-by: Christian Hagedorn @theoweidmannoracle Your change (at version 4ed14b2f918aa5a7b32d143fc373361134aeb030) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21836#issuecomment-2534310868 From thartmann at openjdk.org Wed Dec 11 07:52:47 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 11 Dec 2024 07:52:47 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v6] In-Reply-To: References: Message-ID: On Wed, 4 Dec 2024 15:35:44 GMT, Roland Westrelin wrote: >> To optimize a long counted loop and long range checks in a long or int >> counted loop, the loop is turned into a loop nest. When the loop has >> few iterations, the overhead of having an outer loop whose backedge is >> never taken, has a measurable cost. Furthermore, creating the loop >> nest usually causes one iteration of the loop to be peeled so >> predicates can be set up. If the loop is short running, then it's an >> extra iteration that's run with range checks (compared to an int >> counted loop with int range checks). >> >> This change doesn't create a loop nest when: >> >> 1- it can be determined statically at loop nest creation time that the >> loop runs for a short enough number of iterations >> >> 2- profiling reports that the loop runs for no more than ShortLoopIter >> iterations (1000 by default). >> >> For 2-, a guard is added which is implemented as yet another predicate. >> >> While this change is in principle simple, I ran into a few >> implementation issues: >> >> - while c2 has a way to compute the number of iterations of an int >> counted loop, it doesn't have that for long counted loop. The >> existing logic for int counted loops promotes values to long to >> avoid overflows. I reworked it so it now works for both long and int >> counted loops. >> >> - I added a new deoptimization reason (Reason_short_running_loop) for >> the new predicate. Given the number of iterations is narrowed down >> by the predicate, the limit of the loop after transformation is a >> cast node that's control dependent on the short running loop >> predicate. Because once the counted loop is transformed, it is >> likely that range check predicates will be inserted and they will >> depend on the limit, the short running loop predicate has to be the >> one that's further away from the loop entry. Now it is also possible >> that the limit before transformation depends on a predicate >> (TestShortRunningLongCountedLoopPredicatesClone is an example), we >> can have: new predicates inserted after the transformation that >> depend on the casted limit that itself depend on old predicates >> added before the transformation. To solve this cicular dependency, >> parse and assert predicates are cloned between the old predicates >> and the loop head. The cloned short running loop parse predicate is >> the one that's used to insert the short running loop predicate. >> >> - In the case of a long counted loop, the loop is transformed into a >> regular loop with a ... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/loopTransform.cpp > > Co-authored-by: Emanuel Peter FTR, testing is all clean now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21630#issuecomment-2534335125 From epeter at openjdk.org Wed Dec 11 07:54:44 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 11 Dec 2024 07:54:44 GMT Subject: RFR: 8332827: [REDO] C2: crash in compiled code because of dependency on removed range check CastIIs In-Reply-To: References: Message-ID: On Thu, 5 Dec 2024 09:33:43 GMT, Roland Westrelin wrote: > The failures that caused the backout were due to a bug in: > > `find_or_make_integer_cast()` > > which caused the `_range_check_dependency` field's value of the > existing cast node to not be set in the new cast node. I re-ran some > testing with this fixed and current jdk repo and found that a few > vectorization tests fail now because the patch pushes range check > `CastII` nodes through `AddI`/`SubI`. To fix this, I delayed that > transformation to after loop opts. Hi Roland, thanks for taking this on! I heave a few first comments / questions. src/hotspot/share/opto/castnode.cpp line 259: > 257: if (_range_check_dependency) { > 258: if (phase->C->post_loop_opts_phase()) { > 259: return this->in(1); Does the removal of the CastII happen anywhere at all any more now? src/hotspot/share/opto/compile.cpp line 3147: > 3145: DivModNode* divmod = DivModNode::make(n, bt, is_unsigned); > 3146: divmod->add_prec_from(n); > 3147: divmod->add_prec_from(d); Can you explain why you added this? src/hotspot/share/opto/compile.cpp line 3801: > 3799: } > 3800: > 3801: void Compile::remove_range_check_cast(CastIINode* cast) { Why not make this a member function of `CastIINode`? test/hotspot/jtreg/compiler/rangechecks/TestArrayAccessAboveRCAfterRCCastIIEliminated.java line 26: > 24: /** > 25: * @test > 26: * @bug 8324517 This bug number does not match the issue. test/hotspot/jtreg/compiler/rangechecks/TestRangeCheckCastIISplitThruPhi.java line 31: > 29: * @run main/othervm -XX:-TieredCompilation -XX:-UseOnStackReplacement -XX:-BackgroundCompilation TestRangeCheckCastIISplitThruPhi > 30: * > 31: * Formatting is a little off. And would it make sense to add a run without flags? test/hotspot/jtreg/compiler/vectorization/TestVectorizationNegativeScale.java line 27: > 25: * @test > 26: * @bug 8332827 > 27: * @summary [REDO] C2: crash in compiled code because of dependency on removed range check CastIIs Can you say what happened with this test before your fix? Did it not vectorize? Or crash? test/hotspot/jtreg/compiler/vectorization/TestVectorizationNegativeScale.java line 30: > 28: * > 29: * @library /test/lib / > 30: * @requires vm.compiler2.enabled Is this required? The IR rules are only executed with C2 anyway. ------------- PR Review: https://git.openjdk.org/jdk/pull/22568#pullrequestreview-2494539813 PR Review Comment: https://git.openjdk.org/jdk/pull/22568#discussion_r1879508946 PR Review Comment: https://git.openjdk.org/jdk/pull/22568#discussion_r1879511897 PR Review Comment: https://git.openjdk.org/jdk/pull/22568#discussion_r1879523078 PR Review Comment: https://git.openjdk.org/jdk/pull/22568#discussion_r1879485274 PR Review Comment: https://git.openjdk.org/jdk/pull/22568#discussion_r1879488024 PR Review Comment: https://git.openjdk.org/jdk/pull/22568#discussion_r1879490574 PR Review Comment: https://git.openjdk.org/jdk/pull/22568#discussion_r1879488584 From duke at openjdk.org Wed Dec 11 07:54:45 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Wed, 11 Dec 2024 07:54:45 GMT Subject: Integrated: 8343148: C2: Refactor uses of "PhaseValue::*con*() + PhaseIdealLoop::set_ctrl()" into separate method In-Reply-To: References: Message-ID: On Fri, 1 Nov 2024 16:04:38 GMT, theoweidmannoracle wrote: > This patch introduces the methods `PhaseIdealLoop::intcon` and `PhaseIdealLoop::longcon` which are wrappers for: > > > ConINode* node = _igvn.intcon(i); > set_ctrl(node, C->root()); > > > and > > > ConLNode* node = _igvn.longcon(i); > set_ctrl(node, C->root()); > > > Occurrences of this pattern in loopnode.cpp were replaced with the appropriate call to the new methods. This pull request has now been integrated. Changeset: e88e793c Author: theoweidmannoracle Committer: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/e88e793cfd9a5db8745aa187c2726ad029b60ab7 Stats: 134 lines in 8 files changed: 44 ins; 44 del; 46 mod 8343148: C2: Refactor uses of "PhaseValue::*con*() + PhaseIdealLoop::set_ctrl()" into separate method Reviewed-by: kvn, chagedorn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/21836 From epeter at openjdk.org Wed Dec 11 09:06:39 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 11 Dec 2024 09:06:39 GMT Subject: RFR: 8341696: C2: Non-fluid StringBuilder pattern bails out in OptoStringConcat [v3] In-Reply-To: References: Message-ID: <_bu0MzNZOzcSEB4RF4pIz62lDYEchWiTHzUfkuo0ySI=.425b8866-5256-4f8c-90bf-67e84633f498@github.com> On Wed, 11 Dec 2024 07:20:23 GMT, Emanuel Peter wrote: >> theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: >> >> Move test > > test/hotspot/jtreg/compiler/stringopts/TestFluidAndNonFluid.java line 68: > >> 66: public static String fluidNoParam() { >> 67: return new StringBuilder("0").append("a").append("c").toString(); >> 68: } > > Drive by comment: you seem to only have negative `failOn` tests here. You could consider adding a test where you have a positive rule, just to make sure you are matching the correct IR nodes. Never mind, I did not see the positive cases below ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22537#discussion_r1879654644 From duke at openjdk.org Wed Dec 11 09:18:09 2024 From: duke at openjdk.org (Yagmur Eren) Date: Wed, 11 Dec 2024 09:18:09 GMT Subject: RFR: 8345580: Remove const from Node::_idx which is modified Message-ID: `Node::_idx` is declared as `const`, however, it is modified by `Node::set_idx`. Please see: https://github.com/openjdk/jdk/blob/166c12771d9d8c466e73a9490c4eb1fc9a5f6c24/src/hotspot/share/opto/node.hpp#L588 As already stated in [JDK-8345580](https://bugs.openjdk.org/browse/JDK-8345580) issue, this behavior is counterintuitive because a const variable is expected to remain unmodified throughout its lifetime. Additionally, C++ International Standard states that "_...any attempt to modify a `const` object during its lifetime results in undefined behavior._ ". To address this, `const` should be removed from the declaration of `Node::_idx` to align with its intended use and avoid violating the C++ standard. Tested with tier1,2,3,4,5. ------------- Commit messages: - 8345580: Remove const from Node::_idx which is modified Changes: https://git.openjdk.org/jdk/pull/22646/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22646&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8345580 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22646.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22646/head:pull/22646 PR: https://git.openjdk.org/jdk/pull/22646 From mdoerr at openjdk.org Wed Dec 11 09:37:39 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 11 Dec 2024 09:37:39 GMT Subject: RFR: 8344026: Ubsan: prevent potential integer overflow in c1_LIRGenerator_.cpp file [v8] In-Reply-To: References: <2en5GIxXeljH8KabsgIjJ0-m2OYUCj_bXaFpOSfRZiM=.0fa0c343-a924-4051-8a67-58cf20733ff5@github.com> <0Fc1FNC7m8UIskRUEZVyGDMkvPP-UAjxGLPw3JNNLBk=.1ad396f5-fed3-42bf-95a7-47f1846fa1f9@github.com> <5e_MHZarll6kDLApsRIwapr5YSqW7VlRNiMC93VH0Uw=.3ff988f1-726f-4c91-ac6e-fcc5538bb974@github.com> <6RikOqqZivUpV30HKp4GhaZzzx7RgT2tde_8tLy1TCk=.d3457927-512f-4742-a8ad-3e116619c606@github.com> <22-J3Eelze79XROngKRHXcnkMgzwjdqC3zd7bry48qo=.5c3fe69c-bbd6-4a7b-9429-1e5582324860@github.com> Message-ID: On Wed, 11 Dec 2024 07:09:28 GMT, Emanuel Peter wrote: > Is this just undefined behaviour, but no compiler so far actually does something unexpected? Exactly. The compilers already seem to generate code which matches the unsigned behavior. Only UBSan checks found the issue. They run into the issue with existing tests. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22144#discussion_r1879708667 From amitkumar at openjdk.org Wed Dec 11 11:24:13 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 11 Dec 2024 11:24:13 GMT Subject: RFR: 8344026: Ubsan: prevent potential integer overflow in c1_LIRGenerator_.cpp file [v9] In-Reply-To: References: Message-ID: > This PR converts datatype from `jint` to `juint` for contstant `c` check in c1_LIRGenerator_.cpp. Please look JBS for more info. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: adds testcase ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22144/files - new: https://git.openjdk.org/jdk/pull/22144/files/ba9e3867..673281e5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22144&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22144&range=07-08 Stats: 70 lines in 1 file changed: 70 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22144.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22144/head:pull/22144 PR: https://git.openjdk.org/jdk/pull/22144 From qamai at openjdk.org Wed Dec 11 11:47:45 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 11 Dec 2024 11:47:45 GMT Subject: RFR: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation [v10] In-Reply-To: References: Message-ID: On Tue, 10 Dec 2024 16:10:09 GMT, Quan Anh Mai wrote: >> Hi, >> >> This is just a redo of https://github.com/openjdk/jdk/pull/13093. mostly just the revert of the backout. >> >> Regarding the related issues: >> >> - [JDK-8306008](https://bugs.openjdk.org/browse/JDK-8306008) and [JDK-8309531](https://bugs.openjdk.org/browse/JDK-8309531) have been fixed before the backout. >> - [JDK-8309373](https://bugs.openjdk.org/browse/JDK-8309373) was due to missing `ForceInline` on `AbstractVector::toBitsVectorTemplate` >> - [JDK-8306592](https://bugs.openjdk.org/browse/JDK-8306592), I have not been able to find the root causes. I'm not sure if this is a blocker, now I cannot even build x86-32 tests. >> >> Finally, I moved some implementation of public methods and methods that call into intrinsics to the concrete class as that may help the compiler know the correct types of the variables. >> >> Please take a look and leave reviews. Thanks a lot. >> >> The description of the original PR: >> >> This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, `VectorShuffle` is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: >> >> Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. >> Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. >> Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. >> Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. >> Upon these changes, a `rearrange` can emit more efficient code: >> >> var species = IntVector.SPECIES_128; >> var v1 = IntVector.fromArray(species, SRC1, 0); >> var v2 = IntVector.fromArray(species, SRC2, 0); >> v1.rearrange(v2.toShuffle()).intoArray(DST, 0); >> >> Before: >> movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} >> vmovdqu 0x10(%r10),%xmm2 >> movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} >> vmovdqu 0x10(%r10),%xmm1 >> movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} >> vmovdqu 0x10(%r10),%xmm0 >> vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byt... > > Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: > > optimize slice/unslice Thanks a lot for your reviews, I really appreciate it. I will integrate this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21042#issuecomment-2535708384 From aph at openjdk.org Wed Dec 11 11:59:41 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 11 Dec 2024 11:59:41 GMT Subject: RFR: 8344026: Ubsan: prevent potential integer overflow in c1_LIRGenerator_.cpp file [v9] In-Reply-To: References: Message-ID: On Wed, 11 Dec 2024 11:24:13 GMT, Amit Kumar wrote: >> This PR converts datatype from `jint` to `juint` for contstant `c` check in c1_LIRGenerator_.cpp. Please look JBS for more info. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > adds testcase Marked as reviewed by aph (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22144#pullrequestreview-2495452990 From duke at openjdk.org Wed Dec 11 12:06:01 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Wed, 11 Dec 2024 12:06:01 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v22] In-Reply-To: References: Message-ID: <0t94pqtH3N0lElf_A967QusqLWbW-VBDMd5vrN0aT-8=.f1439c86-c6d2-41f7-96d8-116c80bf9a0b@github.com> > This PR introduces > - several new optimizations to unsigned division and modulo > - x % 1, x % x, x % 2^k > - x / 1, x / x, x / 2^k > - does not implement the Granlund and Montgomery algorithm, which has been implemented for signed modulo division in the past. It is unclear if a lot is to be gained by implementing this. > - tests to test existing optimizations for signed division and modulo > - does not test the Granlund and Montgomery algorithm directly theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/share/opto/divnode.cpp Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22061/files - new: https://git.openjdk.org/jdk/pull/22061/files/fe2232ee..4f33cd6c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22061&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22061&range=20-21 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/22061.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22061/head:pull/22061 PR: https://git.openjdk.org/jdk/pull/22061 From chagedorn at openjdk.org Wed Dec 11 12:06:02 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 11 Dec 2024 12:06:02 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v21] In-Reply-To: References: Message-ID: On Tue, 10 Dec 2024 15:00:21 GMT, theoweidmannoracle wrote: >> This PR introduces >> - several new optimizations to unsigned division and modulo >> - x % 1, x % x, x % 2^k >> - x / 1, x / x, x / 2^k >> - does not implement the Granlund and Montgomery algorithm, which has been implemented for signed modulo division in the past. It is unclear if a lot is to be gained by implementing this. >> - tests to test existing optimizations for signed division and modulo >> - does not test the Granlund and Montgomery algorithm directly > > theoweidmannoracle has updated the pull request incrementally with two additional commits since the last revision: > > - Merge branch 'unsigned-div-opts' of https://github.com/theoweidmannoracle/jdk into unsigned-div-opts > - Rename variables Marked as reviewed by chagedorn (Reviewer). src/hotspot/share/opto/divnode.cpp line 1200: > 1198: Unsigned au = static_cast(type_dividend->get_con()); > 1199: Unsigned bu = static_cast(type_divisor->get_con()); > 1200: return TypeClass::make(static_cast(au % bu)); Thanks for the updates, one last nit: Suggestion: Unsigned dividend = static_cast(type_dividend->get_con()); Unsigned divisor = static_cast(type_divisor->get_con()); return TypeClass::make(static_cast(dividend % divisor)); ------------- PR Review: https://git.openjdk.org/jdk/pull/22061#pullrequestreview-2492679451 PR Review Comment: https://git.openjdk.org/jdk/pull/22061#discussion_r1878299973 From thartmann at openjdk.org Wed Dec 11 12:07:41 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 11 Dec 2024 12:07:41 GMT Subject: RFR: 8344026: Ubsan: prevent potential integer overflow in c1_LIRGenerator_.cpp file [v8] In-Reply-To: References: <2en5GIxXeljH8KabsgIjJ0-m2OYUCj_bXaFpOSfRZiM=.0fa0c343-a924-4051-8a67-58cf20733ff5@github.com> <0Fc1FNC7m8UIskRUEZVyGDMkvPP-UAjxGLPw3JNNLBk=.1ad396f5-fed3-42bf-95a7-47f1846fa1f9@github.com> <5e_MHZarll6kDLApsRIwapr5YSqW7VlRNiMC93VH0Uw=.3ff988f1-726f-4c91-ac6e-fcc5538bb974@github.com> <6RikOqqZivUpV30HKp4GhaZzzx7RgT2tde_8tLy1TCk=.d3457927-512f-4742-a8ad-3e116619c606@github.com> <22-J3Eelze79XROngKRHXcnkMgzwjdqC3zd7bry48qo=.5c3fe69c-bbd6-4a7b-9429-1e5582324860@github.com> Message-ID: On Wed, 11 Dec 2024 09:34:53 GMT, Martin Doerr wrote: >> Is this just undefined behaviour, but no compiler so far actually does something unexpected? If so, it will be impossible to have a failing regression test before the patch. But if there is actually an overflow bug, then there could be a regression test that would be failing before the patch, and we should try to find it. > >> Is this just undefined behaviour, but no compiler so far actually does something unexpected? > > Exactly. The compilers already seem to generate code which matches the unsigned behavior. Only UBSan checks found the issue. They run into the issue with existing tests which are passing when UBSan is disabled. Overflow doesn't cause errors when using "wrap around" behavior which matches the Java integer arithmetic semantics. Okay, thanks for the clarification! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22144#discussion_r1880079685 From chagedorn at openjdk.org Wed Dec 11 12:31:43 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 11 Dec 2024 12:31:43 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v22] In-Reply-To: <0t94pqtH3N0lElf_A967QusqLWbW-VBDMd5vrN0aT-8=.f1439c86-c6d2-41f7-96d8-116c80bf9a0b@github.com> References: <0t94pqtH3N0lElf_A967QusqLWbW-VBDMd5vrN0aT-8=.f1439c86-c6d2-41f7-96d8-116c80bf9a0b@github.com> Message-ID: On Wed, 11 Dec 2024 12:06:01 GMT, theoweidmannoracle wrote: >> This PR introduces >> - several new optimizations to unsigned division and modulo >> - x % 1, x % x, x % 2^k >> - x / 1, x / x, x / 2^k >> - does not implement the Granlund and Montgomery algorithm, which has been implemented for signed modulo division in the past. It is unclear if a lot is to be gained by implementing this. >> - tests to test existing optimizations for signed division and modulo >> - does not test the Granlund and Montgomery algorithm directly > > theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/divnode.cpp > > Co-authored-by: Christian Hagedorn Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22061#pullrequestreview-2495530505 From goetz at openjdk.org Wed Dec 11 12:38:38 2024 From: goetz at openjdk.org (Goetz Lindenmaier) Date: Wed, 11 Dec 2024 12:38:38 GMT Subject: RFR: 8345609: [C1] LIR Operations with one input should be implemented as LIR_Op1 [v2] In-Reply-To: References: Message-ID: On Thu, 5 Dec 2024 21:29:51 GMT, Martin Doerr wrote: >> Change `lir_sqrt`, `lir_abs`, `lir_neg`, `lir_f2hf`, `lir_hf2f` to use `LIR_Op1`. Extend `LIR_Op1` to support one temp register operand. Also see JBS issue. >> >> Remove `lir_tan` and `lir_log10` which are unused (dead and incomplete code). They should have been removed with or after https://github.com/openjdk/jdk/commit/ad79a5ae65d24861ead3ae96cf148c8bc0f02736 and the corresponding changes on other platforms. >> >> The removal of the unnecessary float constant loads improves performance: >> make run-test TEST="micro:Fp16ConversionBenchmark" MICRO="VM_OPTIONS=-XX:TieredStopAtLevel=1" >> >> Power 10 without patch: >> >> Benchmark (size) Mode Cnt Score Error Units >> Fp16ConversionBenchmark.floatToFloat16 2048 thrpt 15 247.064 ? 0.189 ops/ms >> >> >> Power 10 with patch: >> >> Benchmark (size) Mode Cnt Score Error Units >> Fp16ConversionBenchmark.floatToFloat16 2048 thrpt 15 308.372 ? 0.432 ops/ms >> >> >> x64 machine without patch: >> >> Benchmark (size) Mode Cnt Score Error Units >> Fp16ConversionBenchmark.floatToFloat16 2048 thrpt 15 384.565 ? 3.758 ops/ms >> >> >> x64 machine with patch: >> >> Benchmark (size) Mode Cnt Score Error Units >> Fp16ConversionBenchmark.floatToFloat16 2048 thrpt 15 406.121 ? 3.228 ops/ms >> >> >> Testing: tier1-4 on x64 (Windows, linux, MacOS), aarch64 (linux, MacOS), ppc64 (linux, AIX) > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Update Copyright headers. LGTM ------------- Marked as reviewed by goetz (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22582#pullrequestreview-2495544250 From goetz at openjdk.org Wed Dec 11 12:38:40 2024 From: goetz at openjdk.org (Goetz Lindenmaier) Date: Wed, 11 Dec 2024 12:38:40 GMT Subject: RFR: 8345609: [C1] LIR Operations with one input should be implemented as LIR_Op1 [v2] In-Reply-To: References: Message-ID: On Thu, 5 Dec 2024 17:17:18 GMT, Martin Doerr wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Update Copyright headers. > > src/hotspot/share/c1/c1_LIR.cpp line 492: > >> 490: assert(op1->_info != nullptr, ""); do_info(op1->_info); >> 491: if (op1->_opr->is_valid()) do_temp(op1->_opr); // safepoints on SPARC need temporary register >> 492: assert(op1->_tmp->is_illegal(), "not used"); > > I think safepoints should ideally be implemented as LIR_Op0. SPARC support is removed. We could file a new RFE for that. Well, a grep showed there are more leftovers of sparc and solaris in hotspot, but not a matter of this change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22582#discussion_r1880119999 From mdoerr at openjdk.org Wed Dec 11 12:52:43 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 11 Dec 2024 12:52:43 GMT Subject: RFR: 8345609: [C1] LIR Operations with one input should be implemented as LIR_Op1 [v2] In-Reply-To: References: Message-ID: <_QWtDL8PLo73dJARnDgsEm81NqS_rG718fHl14xSMuI=.9e1e508c-6b52-4f97-8e5a-ace45996f466@github.com> On Thu, 5 Dec 2024 21:29:51 GMT, Martin Doerr wrote: >> Change `lir_sqrt`, `lir_abs`, `lir_neg`, `lir_f2hf`, `lir_hf2f` to use `LIR_Op1`. Extend `LIR_Op1` to support one temp register operand. Also see JBS issue. >> >> Remove `lir_tan` and `lir_log10` which are unused (dead and incomplete code). They should have been removed with or after https://github.com/openjdk/jdk/commit/ad79a5ae65d24861ead3ae96cf148c8bc0f02736 and the corresponding changes on other platforms. >> >> The removal of the unnecessary float constant loads improves performance: >> make run-test TEST="micro:Fp16ConversionBenchmark" MICRO="VM_OPTIONS=-XX:TieredStopAtLevel=1" >> >> Power 10 without patch: >> >> Benchmark (size) Mode Cnt Score Error Units >> Fp16ConversionBenchmark.floatToFloat16 2048 thrpt 15 247.064 ? 0.189 ops/ms >> >> >> Power 10 with patch: >> >> Benchmark (size) Mode Cnt Score Error Units >> Fp16ConversionBenchmark.floatToFloat16 2048 thrpt 15 308.372 ? 0.432 ops/ms >> >> >> x64 machine without patch: >> >> Benchmark (size) Mode Cnt Score Error Units >> Fp16ConversionBenchmark.floatToFloat16 2048 thrpt 15 384.565 ? 3.758 ops/ms >> >> >> x64 machine with patch: >> >> Benchmark (size) Mode Cnt Score Error Units >> Fp16ConversionBenchmark.floatToFloat16 2048 thrpt 15 406.121 ? 3.228 ops/ms >> >> >> Testing: tier1-4 on x64 (Windows, linux, MacOS), aarch64 (linux, MacOS), ppc64 (linux, AIX) > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Update Copyright headers. Thanks for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22582#issuecomment-2535903697 From mdoerr at openjdk.org Wed Dec 11 12:52:44 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 11 Dec 2024 12:52:44 GMT Subject: Integrated: 8345609: [C1] LIR Operations with one input should be implemented as LIR_Op1 In-Reply-To: References: Message-ID: On Thu, 5 Dec 2024 16:57:24 GMT, Martin Doerr wrote: > Change `lir_sqrt`, `lir_abs`, `lir_neg`, `lir_f2hf`, `lir_hf2f` to use `LIR_Op1`. Extend `LIR_Op1` to support one temp register operand. Also see JBS issue. > > Remove `lir_tan` and `lir_log10` which are unused (dead and incomplete code). They should have been removed with or after https://github.com/openjdk/jdk/commit/ad79a5ae65d24861ead3ae96cf148c8bc0f02736 and the corresponding changes on other platforms. > > The removal of the unnecessary float constant loads improves performance: > make run-test TEST="micro:Fp16ConversionBenchmark" MICRO="VM_OPTIONS=-XX:TieredStopAtLevel=1" > > Power 10 without patch: > > Benchmark (size) Mode Cnt Score Error Units > Fp16ConversionBenchmark.floatToFloat16 2048 thrpt 15 247.064 ? 0.189 ops/ms > > > Power 10 with patch: > > Benchmark (size) Mode Cnt Score Error Units > Fp16ConversionBenchmark.floatToFloat16 2048 thrpt 15 308.372 ? 0.432 ops/ms > > > x64 machine without patch: > > Benchmark (size) Mode Cnt Score Error Units > Fp16ConversionBenchmark.floatToFloat16 2048 thrpt 15 384.565 ? 3.758 ops/ms > > > x64 machine with patch: > > Benchmark (size) Mode Cnt Score Error Units > Fp16ConversionBenchmark.floatToFloat16 2048 thrpt 15 406.121 ? 3.228 ops/ms > > > Testing: tier1-4 on x64 (Windows, linux, MacOS), aarch64 (linux, MacOS), ppc64 (linux, AIX) This pull request has now been integrated. Changeset: a21d21f4 Author: Martin Doerr URL: https://git.openjdk.org/jdk/commit/a21d21f4d7b74e21f68b6bf9c5dc9ba7d3f9963c Stats: 158 lines in 10 files changed: 59 ins; 89 del; 10 mod 8345609: [C1] LIR Operations with one input should be implemented as LIR_Op1 Reviewed-by: rrich, goetz ------------- PR: https://git.openjdk.org/jdk/pull/22582 From chagedorn at openjdk.org Wed Dec 11 12:57:45 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 11 Dec 2024 12:57:45 GMT Subject: RFR: 8343685: C2 SuperWord: refactor VPointer with MemPointer In-Reply-To: References: Message-ID: <7vTNT5cH-WGyOwgPRMglRr_aGdzyoCvmeYUPniFf0XE=.66b777e6-7e48-4d47-ac61-d7fdb0a23824@github.com> On Wed, 6 Nov 2024 13:07:13 GMT, Emanuel Peter wrote: > **This is a required step towards adding runtime-checks for Aliasing Analysis, especially Important for FFM / MemorySegments.** > > I know this one is large, but it consists of a lot of renamings, and new tests. On the whole, the new `VPointer` code is less than the old! > > **Goal** > > Replace old `VPointer` with a new version that relies on `MemPointer` - which then is a shared utility for both `MergeStores` and `SuperWord / AutoVectorization`. `MemPointer` generally parses pointers, and `VPointer` specializes this facility for the use in loops (`VLoop`). > > The old `VPointer` implementation with its recursive pattern matching was quite complicated and difficult to reason about for correctness. The approach in `MemPointer` is much simpler: iteratively decomposing sub-expressions. Further: the new implementation is more powerful at detecting equivalent invariants. > > **Future**: with the `MemPointer` implementation of `VPointer`, it should be easier to implement speculative runtime-checks for Aliasing-Analysis [JDK-8324751](https://bugs.openjdk.org/browse/JDK-8324751). The pressing need for this has come from the FFM / MemorySegment folks, like @mcimadamore and @minborg . > > **Details** > > This looks like a rather big patch, so let me explain the parts. > - Refactor of `MemPointer` in `mepointer.hpp/cpp`: > - Added concept of `Base` to `MemPointer`. This is required for the aliasing computation in `VPointer`. > - `sub_expression_has_native_base_candidate`: add special case to parse through `CastX2P` if we find a native memory base `MemorySegment.address()`, i.e. `jdk.internal.foreign.NativeMemorySegmentImpl.min`. This helps some native memory segment cases to vectorize that did not before. > - So far `MemPointer` could only answer adjacency queries. But VPointer also needs overlap queries, see the old `VPointer::not_equal` (i.e. can we prove that the two `VPointer` never overlap?). So I had to add a new case to aliasing computation: `NotOrAtDistance`. It is useful to answer the new and better named `MemPointer::never_overlaps_with`. > - Collapsed together `MemPointerDecomposedForm` and `MemPointer`. It was an unnecessary and unhelpful split. > - Re-write of `VPointer` based on `MemPointer`: > - Old pattern: > - `VPointer[mem: 847 StoreI, base: 37, adr: 37, base[ 37] + offset( 16) + invar( 0) + scale( 4) * iv]` > - `VPointer[mem: 3189 LoadB, base: 1, adr: 2273, base[ 1] + offset( 0) + invar( 0) + scale( 1) * iv]` -> `adr = CastX2P`, the a... Impressive work! I'm out for the rest of the week but can have a look at it next week. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21926#issuecomment-2535915257 From jwaters at openjdk.org Wed Dec 11 13:13:37 2024 From: jwaters at openjdk.org (Julian Waters) Date: Wed, 11 Dec 2024 13:13:37 GMT Subject: RFR: 8345580: Remove const from Node::_idx which is modified In-Reply-To: References: Message-ID: <2WjtB8nALV-EvIvnhvzOcgiC5roBR2pZjAq5TX8y7JM=.274b8b7a-76bb-41fe-be41-a94653a39df1@github.com> On Mon, 9 Dec 2024 15:16:32 GMT, Yagmur Eren wrote: > `Node::_idx` is declared as `const`, however, it is modified by `Node::set_idx`. Please see: https://github.com/openjdk/jdk/blob/166c12771d9d8c466e73a9490c4eb1fc9a5f6c24/src/hotspot/share/opto/node.hpp#L588 > > As already stated in [JDK-8345580](https://bugs.openjdk.org/browse/JDK-8345580) issue, this behavior is counterintuitive because a const variable is expected to remain unmodified throughout its lifetime. Additionally, C++ International Standard states that "_...any attempt to modify a `const` object during its lifetime results in undefined behavior._ ". > To address this, `const` should be removed from the declaration of `Node::_idx` to align with its intended use and avoid violating the C++ standard. Tested with tier1,2,3,4,5. I do wonder why this compiles fine since compilers should reject code that modifies a const ------------- PR Comment: https://git.openjdk.org/jdk/pull/22646#issuecomment-2535951302 From bulasevich at openjdk.org Wed Dec 11 13:19:25 2024 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Wed, 11 Dec 2024 13:19:25 GMT Subject: RFR: 8343789: Move mutable nmethod data out of CodeCache [v5] In-Reply-To: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> References: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> Message-ID: > This change relocates mutable data (such as relocations, oops, and metadata) from the nmethod. The change follows the recent PR #18984, which relocated immutable nmethod data from the CodeCache. > > The core idea remains the same: use the CodeCache for executable code while moving additional data to the C heap. The primary motivations are improving security and enhancing code density. > > Although performance is not the main focus, testing on AArch64 CPUs, where code density plays a significant role, has shown a 1?2% performance improvement in specific scenarios, such as the CodeCacheStress test and the Renaissance Dotty benchmark. > > The numbers. Immutable data constitutes **~30%** on the nmehtod. Mutable data constitutes **~8%** of nmethod. Example (statistics collected on the CodeCacheStress benchmark): > - nmethod_count:134000, total_compilation_time: 510460ms > - total allocation time malloc_mutable/malloc_immutable/CodeCache_alloc: 62ms/114ms/6333ms, > - total allocation size (mutable/immutable/nmentod): 64MB/192MB/488MB > > Functional testing: jtreg on arm/aarch/x86. > Performance testing: renaissance/dacapo/SPECjvm2008 benchmarks. > > Alternative solution (see comments): In the future, relocations can be moved to _immutable_data. Boris Ulasevich has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: a bit of cleanup and addressing review suggestions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21276/files - new: https://git.openjdk.org/jdk/pull/21276/files/ee697996..b4c7c24b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21276&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21276&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21276.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21276/head:pull/21276 PR: https://git.openjdk.org/jdk/pull/21276 From duke at openjdk.org Wed Dec 11 13:43:46 2024 From: duke at openjdk.org (duke) Date: Wed, 11 Dec 2024 13:43:46 GMT Subject: RFR: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes [v22] In-Reply-To: <0t94pqtH3N0lElf_A967QusqLWbW-VBDMd5vrN0aT-8=.f1439c86-c6d2-41f7-96d8-116c80bf9a0b@github.com> References: <0t94pqtH3N0lElf_A967QusqLWbW-VBDMd5vrN0aT-8=.f1439c86-c6d2-41f7-96d8-116c80bf9a0b@github.com> Message-ID: On Wed, 11 Dec 2024 12:06:01 GMT, theoweidmannoracle wrote: >> This PR introduces >> - several new optimizations to unsigned division and modulo >> - x % 1, x % x, x % 2^k >> - x / 1, x / x, x / 2^k >> - does not implement the Granlund and Montgomery algorithm, which has been implemented for signed modulo division in the past. It is unclear if a lot is to be gained by implementing this. >> - tests to test existing optimizations for signed division and modulo >> - does not test the Granlund and Montgomery algorithm directly > > theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/divnode.cpp > > Co-authored-by: Christian Hagedorn @theoweidmannoracle Your change (at version 4f33cd6c6759b2db1a6884a2479c4f75a4171e88) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22061#issuecomment-2536022612 From roland at openjdk.org Wed Dec 11 13:50:41 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 11 Dec 2024 13:50:41 GMT Subject: RFR: 8343607: C2: Shenandoah crashes during barrier expansion in Continuation::enter In-Reply-To: References: Message-ID: On Tue, 10 Dec 2024 14:47:23 GMT, Aleksey Shipilev wrote: >> If a load barrier is used both in the fallthrough and exception >> handling paths out of a call, it needs to be cloned so each path has >> its copy of the barrier. In the case of the crash, cloning the barrier >> is attempted for a runtime call that doesn't have an exception >> handling path. Fix simply detects that corner case. > > This looks reasonable to me. @JohnTortugo should take a look too. @shipilev @JohnTortugo thanks for the reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/22663#issuecomment-2536037629 From roland at openjdk.org Wed Dec 11 13:50:42 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 11 Dec 2024 13:50:42 GMT Subject: Integrated: 8343607: C2: Shenandoah crashes during barrier expansion in Continuation::enter In-Reply-To: References: Message-ID: On Tue, 10 Dec 2024 14:40:44 GMT, Roland Westrelin wrote: > If a load barrier is used both in the fallthrough and exception > handling paths out of a call, it needs to be cloned so each path has > its copy of the barrier. In the case of the crash, cloning the barrier > is attempted for a runtime call that doesn't have an exception > handling path. Fix simply detects that corner case. This pull request has now been integrated. Changeset: 45c914c3 Author: Roland Westrelin URL: https://git.openjdk.org/jdk/commit/45c914c3ad8fbc406af9ba9dec97f11c28c91299 Stats: 10 lines in 1 file changed: 9 ins; 0 del; 1 mod 8343607: C2: Shenandoah crashes during barrier expansion in Continuation::enter Reviewed-by: shade ------------- PR: https://git.openjdk.org/jdk/pull/22663 From duke at openjdk.org Wed Dec 11 14:19:40 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Wed, 11 Dec 2024 14:19:40 GMT Subject: RFR: 8331717: C2: Crash with SIGFPE Message-ID: Fixes a bug in loop predication where not strictly invariant tests involving divisions or modulo are pulled out of the loop. The bug can be seen in this code: public class Reduced { static int iArr[] = new int[100]; public static void main(String[] strArr) { for (int i = 0; i < 10000; i++) { test(); } } static void test() { int i1 = 0; for (int i4 : iArr) { i4 = i1; try { iArr[0] = 1 / i4; i4 = iArr[2 / i4]; // Source of the crash } catch (ArithmeticException a_e) { } } } } The crucial element is the division `2 / i4`. Since it is used to access an array, it is the input to a range check. See node 230: Screenshot 2024-12-11 at 15 14 47 Loop predication will try to pull this range check together with its input, the division, before the `for` loop. Due to a bug in Invariance::compute_invariance loop predication is allowed to do so, which results in the division being pulled out without its non-zero check. 322 is a clone of 230 placed before the loop head without any zero check for the divisor: Screenshot 2024-12-11 at 15 11 48 To fix this, Invariance::compute_invariance must check that the node not only `depends_only_on_test()` but also that it has `no_dependent_zero_check(n)`. Similar past bug, which introduced `no_dependent_zero_check`: https://github.com/openjdk/jdk16/pull/9 ------------- Commit messages: - Fix typo - Update TestLoopPredicationDivZeroCheck2.java - Fix copyright - Describe tests - Add tests - try to fix Changes: https://git.openjdk.org/jdk/pull/22666/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22666&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8331717 Stats: 124 lines in 3 files changed: 123 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22666.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22666/head:pull/22666 PR: https://git.openjdk.org/jdk/pull/22666 From duke at openjdk.org Wed Dec 11 14:33:52 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Wed, 11 Dec 2024 14:33:52 GMT Subject: Integrated: 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes In-Reply-To: References: Message-ID: <7d6v3b71WnUKLFH5IcrN45b31UJdKrKqyPOCt0oQOow=.032624d2-bc02-4f6e-9be7-4a458f78c0a2@github.com> On Wed, 13 Nov 2024 09:45:37 GMT, theoweidmannoracle wrote: > This PR introduces > - several new optimizations to unsigned division and modulo > - x % 1, x % x, x % 2^k > - x / 1, x / x, x / 2^k > - does not implement the Granlund and Montgomery algorithm, which has been implemented for signed modulo division in the past. It is unclear if a lot is to be gained by implementing this. > - tests to test existing optimizations for signed division and modulo > - does not test the Granlund and Montgomery algorithm directly This pull request has now been integrated. Changeset: d381d581 Author: theoweidmannoracle URL: https://git.openjdk.org/jdk/commit/d381d581bfc5bbe1db966088ed4cad01b65c5123 Stats: 1137 lines in 17 files changed: 1119 ins; 11 del; 7 mod 8332268: C2: Add missing optimizations for UDivI/L and UModI/L and unify the shared logic with the signed nodes Reviewed-by: chagedorn, thartmann, epeter, qamai ------------- PR: https://git.openjdk.org/jdk/pull/22061 From roland at openjdk.org Wed Dec 11 15:00:34 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 11 Dec 2024 15:00:34 GMT Subject: RFR: 8332827: [REDO] C2: crash in compiled code because of dependency on removed range check CastIIs In-Reply-To: References: Message-ID: On Wed, 11 Dec 2024 07:39:40 GMT, Emanuel Peter wrote: >> The failures that caused the backout were due to a bug in: >> >> `find_or_make_integer_cast()` >> >> which caused the `_range_check_dependency` field's value of the >> existing cast node to not be set in the new cast node. I re-ran some >> testing with this fixed and current jdk repo and found that a few >> vectorization tests fail now because the patch pushes range check >> `CastII` nodes through `AddI`/`SubI`. To fix this, I delayed that >> transformation to after loop opts. > > src/hotspot/share/opto/castnode.cpp line 259: > >> 257: if (_range_check_dependency) { >> 258: if (phase->C->post_loop_opts_phase()) { >> 259: return this->in(1); > > Does the removal of the CastII happen anywhere at all any more now? Yes,during final graph reshape. > test/hotspot/jtreg/compiler/rangechecks/TestRangeCheckCastIISplitThruPhi.java line 31: > >> 29: * @run main/othervm -XX:-TieredCompilation -XX:-UseOnStackReplacement -XX:-BackgroundCompilation TestRangeCheckCastIISplitThruPhi >> 30: * >> 31: * > > Formatting is a little off. And would it make sense to add a run without flags? I don't see anything wrong with the formatting. What am I missing? > test/hotspot/jtreg/compiler/vectorization/TestVectorizationNegativeScale.java line 27: > >> 25: * @test >> 26: * @bug 8332827 >> 27: * @summary [REDO] C2: crash in compiled code because of dependency on removed range check CastIIs > > Can you say what happened with this test before your fix? Did it not vectorize? Or crash? It's a test case for a failure I ran into while running tests. The root of the failure is some other bug that got fixed in the meantime. But the failure with this particular test case only happens with the change for this PR (and without the bug fix for the other bug). So while the test is not about this PR per se, I thought was interesting to keep the test anyway. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22568#discussion_r1880352614 PR Review Comment: https://git.openjdk.org/jdk/pull/22568#discussion_r1880347494 PR Review Comment: https://git.openjdk.org/jdk/pull/22568#discussion_r1880351082 From roland at openjdk.org Wed Dec 11 15:06:48 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 11 Dec 2024 15:06:48 GMT Subject: RFR: 8332827: [REDO] C2: crash in compiled code because of dependency on removed range check CastIIs In-Reply-To: References: Message-ID: On Wed, 11 Dec 2024 07:41:33 GMT, Emanuel Peter wrote: >> The failures that caused the backout were due to a bug in: >> >> `find_or_make_integer_cast()` >> >> which caused the `_range_check_dependency` field's value of the >> existing cast node to not be set in the new cast node. I re-ran some >> testing with this fixed and current jdk repo and found that a few >> vectorization tests fail now because the patch pushes range check >> `CastII` nodes through `AddI`/`SubI`. To fix this, I delayed that >> transformation to after loop opts. > > src/hotspot/share/opto/compile.cpp line 3147: > >> 3145: DivModNode* divmod = DivModNode::make(n, bt, is_unsigned); >> 3146: divmod->add_prec_from(n); >> 3147: divmod->add_prec_from(d); > > Can you explain why you added this? If the divisor input for a `Div` (or `Mod` etc.) is not not null, then the control input of the `Div` is set to null. It could be that the divisor input is found not null because the subgraph for that input contains a `CastII`. If that happens, removing the `CastII` during final graph reshap could cause the `Div` to float above the `CastII` and above the condition that allowed the type of the `CastII` to be narrowed. This could cause a crash. So when the `CastII` is removed, precedence edges are aded to the `Div` node. If the `Div` is then replaced by a `DivMod`, we need to transfer the precedence edges to the `DivMod` node. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22568#discussion_r1880370799 From mdoerr at openjdk.org Wed Dec 11 15:11:16 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 11 Dec 2024 15:11:16 GMT Subject: RFR: 8345040: Clean up unused variables and code in `generate_native_wrapper` In-Reply-To: References: Message-ID: <566tfKp7RsFhUd6N6ebd4ZPPzXSCKFEv0H7fZYaX5mM=.501e9c54-73e3-4b10-b404-30bcf6b8eb59@github.com> On Tue, 26 Nov 2024 08:16:56 GMT, Qizheng Xing wrote: > Some of variables and code are related to critical JNI natives feature, which was removed in JDK 18. This patch cleans them up. Thanks for cleaning this up! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22384#issuecomment-2536264781 From qxing at openjdk.org Wed Dec 11 15:15:47 2024 From: qxing at openjdk.org (Qizheng Xing) Date: Wed, 11 Dec 2024 15:15:47 GMT Subject: Integrated: 8345040: Clean up unused variables and code in `generate_native_wrapper` In-Reply-To: References: Message-ID: On Tue, 26 Nov 2024 08:16:56 GMT, Qizheng Xing wrote: > Some of variables and code are related to critical JNI natives feature, which was removed in JDK 18. This patch cleans them up. This pull request has now been integrated. Changeset: cc479184 Author: Qizheng Xing Committer: Martin Doerr URL: https://git.openjdk.org/jdk/commit/cc47918445b3b49fc188d4655996e43e7a3c75c3 Stats: 29 lines in 6 files changed: 0 ins; 29 del; 0 mod 8345040: Clean up unused variables and code in `generate_native_wrapper` Reviewed-by: mli, dfenacci, kvn ------------- PR: https://git.openjdk.org/jdk/pull/22384 From duke at openjdk.org Wed Dec 11 15:28:18 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Wed, 11 Dec 2024 15:28:18 GMT Subject: Integrated: 8346007: Incorrect copyright header in UModLNodeIdealizationTests.java Message-ID: Fixes a missing comma in the copyright notice. ------------- Commit messages: - Add missing comma Changes: https://git.openjdk.org/jdk/pull/22683/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22683&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8346007 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22683.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22683/head:pull/22683 PR: https://git.openjdk.org/jdk/pull/22683 From thartmann at openjdk.org Wed Dec 11 15:28:18 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 11 Dec 2024 15:28:18 GMT Subject: Integrated: 8346007: Incorrect copyright header in UModLNodeIdealizationTests.java In-Reply-To: References: Message-ID: On Wed, 11 Dec 2024 15:02:40 GMT, theoweidmannoracle wrote: > Fixes a missing comma in the copyright notice. Looks good and trivial. Ship it! ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22683#pullrequestreview-2495954319 From duke at openjdk.org Wed Dec 11 15:28:18 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Wed, 11 Dec 2024 15:28:18 GMT Subject: Integrated: 8346007: Incorrect copyright header in UModLNodeIdealizationTests.java In-Reply-To: References: Message-ID: On Wed, 11 Dec 2024 15:02:40 GMT, theoweidmannoracle wrote: > Fixes a missing comma in the copyright notice. This pull request has now been integrated. Changeset: 72c6daf1 Author: theoweidmannoracle Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/72c6daf1b1073bc1eb9d1b07794c0e8ba5b9b437 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8346007: Incorrect copyright header in UModLNodeIdealizationTests.java Reviewed-by: thartmann ------------- PR: https://git.openjdk.org/jdk/pull/22683 From roland at openjdk.org Wed Dec 11 15:36:51 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 11 Dec 2024 15:36:51 GMT Subject: RFR: 8332827: [REDO] C2: crash in compiled code because of dependency on removed range check CastIIs [v2] In-Reply-To: References: Message-ID: > The failures that caused the backout were due to a bug in: > > `find_or_make_integer_cast()` > > which caused the `_range_check_dependency` field's value of the > existing cast node to not be set in the new cast node. I re-ran some > testing with this fixed and current jdk repo and found that a few > vectorization tests fail now because the patch pushes range check > `CastII` nodes through `AddI`/`SubI`. To fix this, I delayed that > transformation to after loop opts. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'master' into JDK-8332827 - review - white spaces - test & fix - Revert "8332829: [BACKOUT] C2: crash in compiled code because of dependency on removed range check CastIIs" This reverts commit c9a7b9772d96d9a4825d9da2aacc277534282860. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22568/files - new: https://git.openjdk.org/jdk/pull/22568/files/ea75615a..85a9155a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22568&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22568&range=00-01 Stats: 85871 lines in 1694 files changed: 69383 ins; 11236 del; 5252 mod Patch: https://git.openjdk.org/jdk/pull/22568.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22568/head:pull/22568 PR: https://git.openjdk.org/jdk/pull/22568 From roland at openjdk.org Wed Dec 11 15:36:51 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 11 Dec 2024 15:36:51 GMT Subject: RFR: 8332827: [REDO] C2: crash in compiled code because of dependency on removed range check CastIIs [v2] In-Reply-To: References: Message-ID: On Wed, 11 Dec 2024 07:46:53 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8332827 >> - review >> - white spaces >> - test & fix >> - Revert "8332829: [BACKOUT] C2: crash in compiled code because of dependency on removed range check CastIIs" >> >> This reverts commit c9a7b9772d96d9a4825d9da2aacc277534282860. > > src/hotspot/share/opto/compile.cpp line 3801: > >> 3799: } >> 3800: >> 3801: void Compile::remove_range_check_cast(CastIINode* cast) { > > Why not make this a member function of `CastIINode`? Done in new commit. > test/hotspot/jtreg/compiler/rangechecks/TestArrayAccessAboveRCAfterRCCastIIEliminated.java line 26: > >> 24: /** >> 25: * @test >> 26: * @bug 8324517 > > This bug number does not match the issue. Fixed in new commit. > test/hotspot/jtreg/compiler/vectorization/TestVectorizationNegativeScale.java line 30: > >> 28: * >> 29: * @library /test/lib / >> 30: * @requires vm.compiler2.enabled > > Is this required? The IR rules are only executed with C2 anyway. Removed in new commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22568#discussion_r1880419021 PR Review Comment: https://git.openjdk.org/jdk/pull/22568#discussion_r1880420660 PR Review Comment: https://git.openjdk.org/jdk/pull/22568#discussion_r1880419642 From roland at openjdk.org Wed Dec 11 15:36:51 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 11 Dec 2024 15:36:51 GMT Subject: RFR: 8332827: [REDO] C2: crash in compiled code because of dependency on removed range check CastIIs [v2] In-Reply-To: References: Message-ID: On Wed, 11 Dec 2024 14:50:50 GMT, Roland Westrelin wrote: >> test/hotspot/jtreg/compiler/rangechecks/TestRangeCheckCastIISplitThruPhi.java line 31: >> >>> 29: * @run main/othervm -XX:-TieredCompilation -XX:-UseOnStackReplacement -XX:-BackgroundCompilation TestRangeCheckCastIISplitThruPhi >>> 30: * >>> 31: * >> >> Formatting is a little off. And would it make sense to add a run without flags? > > I don't see anything wrong with the formatting. What am I missing? Added run without flags in new commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22568#discussion_r1880420108 From qamai at openjdk.org Wed Dec 11 16:02:19 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 11 Dec 2024 16:02:19 GMT Subject: RFR: 8342651: Refactor array constant to use an array of jbyte [v5] In-Reply-To: References: Message-ID: On Fri, 6 Dec 2024 11:54:25 GMT, Quan Anh Mai wrote: >> Hi, >> >> This small patch refactors array constants in C2 to use an array of `jbyte`s instead of an array of `jvalue`. The former is much easier to work with and we can do `memcpy` with them trivially. >> >> Since code buffers support alignment of the constant section, I have also allowed constant tables to be aligned more than 8 bytes and used it for constant vectors on machines not supporting `SSE3`. I also fixed an issue with code buffer relocation where the temporary buffer is not correctly aligned. >> >> This patch is extracted from https://github.com/openjdk/jdk/pull/21229. Tests passed with `UseSSE=2` where 16-byte constants would be generated, as well as normal testing routines. >> >> Please take a look and leave your reviews, thanks a lot. > > Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: > > - Merge branch 'master' into constanttable > - add comment to ConstantTable::alignment > - Merge branch 'master' into constanttable > - indentation > - Merge branch 'master' into constanttable > - Merge branch 'master' into constanttable > - refactor array constant, fix codebuffer reallocation Thanks very much for your reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21596#issuecomment-2536397198 From qamai at openjdk.org Wed Dec 11 16:02:20 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 11 Dec 2024 16:02:20 GMT Subject: Integrated: 8342651: Refactor array constant to use an array of jbyte In-Reply-To: References: Message-ID: On Sun, 20 Oct 2024 11:39:24 GMT, Quan Anh Mai wrote: > Hi, > > This small patch refactors array constants in C2 to use an array of `jbyte`s instead of an array of `jvalue`. The former is much easier to work with and we can do `memcpy` with them trivially. > > Since code buffers support alignment of the constant section, I have also allowed constant tables to be aligned more than 8 bytes and used it for constant vectors on machines not supporting `SSE3`. I also fixed an issue with code buffer relocation where the temporary buffer is not correctly aligned. > > This patch is extracted from https://github.com/openjdk/jdk/pull/21229. Tests passed with `UseSSE=2` where 16-byte constants would be generated, as well as normal testing routines. > > Please take a look and leave your reviews, thanks a lot. This pull request has now been integrated. Changeset: 2c4567a6 Author: Quan Anh Mai URL: https://git.openjdk.org/jdk/commit/2c4567a689091721476b6ef0ef4ad042fd63c3fd Stats: 176 lines in 8 files changed: 77 ins; 44 del; 55 mod 8342651: Refactor array constant to use an array of jbyte Reviewed-by: thartmann, kvn ------------- PR: https://git.openjdk.org/jdk/pull/21596 From qamai at openjdk.org Wed Dec 11 16:10:17 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 11 Dec 2024 16:10:17 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v3] In-Reply-To: <3z-tAI6f1fUGXLgIwNylfaUB1DdNaQ6ZM-n2WZ_V9Ak=.61a402ae-7234-4a0d-ab0e-e29866522988@github.com> References: <3z-tAI6f1fUGXLgIwNylfaUB1DdNaQ6ZM-n2WZ_V9Ak=.61a402ae-7234-4a0d-ab0e-e29866522988@github.com> Message-ID: On Wed, 11 Dec 2024 04:51:48 GMT, Jasmine Karthikeyan wrote: >> Hi @jaskarth , >> >> I was trying to lower LShiftVB/URShiftVB IR to LShiftVS/URShiftVS for the x86 backend. I intend to factor out through GVN upfront byte vector to short vector conversion for both input and shift vectors if these are shared across two operations since the x86 ISA does not support direct byte vector shifts. >> >> To begin with, I made the following diff expecting status quo, but getting the following Fatal error at build time, can you kindly check? >> >> >> diff --git a/src/hotspot/cpu/x86/c2_lowering_x86.cpp b/src/hotspot/cpu/x86/c2_lowering_x86.cpp >> index cf4c014ffda..bc8df186396 100644 >> --- a/src/hotspot/cpu/x86/c2_lowering_x86.cpp >> +++ b/src/hotspot/cpu/x86/c2_lowering_x86.cpp >> @@ -32,6 +32,6 @@ Node* PhaseLowering::lower_node_platform(Node* n) { >> } >> >> bool PhaseLowering::should_lower() { >> - return false; >> + return true; >> } >> #endif // COMPILER2 >> ``` >> >> >> ERROR: Build failed for target 'images' in configuration 'linux-x86_64-server-fastdebug' (exit code 2) >> >> === Output from failing command(s) repeated here === >> * For target support_interim-image-jlink__jlink_interim_image_exec: >> # >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # Internal Error (/home/jatinbha/sandboxes/jdk-trunk/jdk/src/hotspot/share/opto/node.hpp:960), pid=1961256, tid=1961293 >> # assert(is_MachReturn()) failed: invalid node class: Con >> # >> # JRE version: OpenJDK Runtime Environment (24.0) (fastdebug build 24-internal-adhoc.root.jdk) >> # Java VM: OpenJDK 64-Bit Server VM (fastdebug 24-internal-adhoc.root.jdk, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) >> # Problematic frame: >> # V [libjvm.so+0x140f939] Matcher::Fixup_Save_On_Entry()+0x279 >> # >> # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /home/jatinbha/sandboxes/jdk-trunk/jdk/make/core.1961256) >> # >> # An error report file with more information is saved as: >> # /home/jatinbha/sandboxes/jdk-trunk/jdk/make/hs_err_pid1961256.log >> ... (rest of output omitted) >> >> * All command lines available in /home/jatinbha/sandboxes/jdk-trunk/jdk/build/linux-x86_64-server-fastdebug/make-support/failure-logs. > > @jatin-bhateja Thanks for the bug report! I was able to reproduce it on my system as well. I apologize for the late response, I had been busy with my studies and only had a chance to look recently. It looks like this bug is caused because we run `Identity()` on all of the nodes, which finds new optimizations by replacing `LoadNode`s with a non-load identity. This causes an uncommon trap branch to be optimized out and the `HaltNode` replaced with a TOP `ConNode`. Usually these are removed by `RootNode::Ideal`, but since we don't do regular `Ideal` in lowering it ends up running into an assert later. > > This is interesting because it suggests that there are potentially calls to `Identity()` that aren't run in earlier IGVN rounds. I didn't take a very in-depth look but it might be good to investigate separately. Because of that I'm not sure if we can rely on `Identity()` here, especially if there are identity transforms that rely on other `Ideal` calls. @merykitty do you have any thoughts on the best way to fix this? > > @eme64 Ah, do you mean the recursive reduction op generation based on vector length? I think that could be handled here as well by splitting the reduction operation into multiple backend-specific nodes. It would still need to be implemented per-backend, but there could be benefits with optimizing more complex reductions, as you mention. @jaskarth That's really interesting, it indeed seems like a missed transformation to me. I think adding an IGVN verification where we verify that no other transformation can take place after an IGVN transformation would catch this, what do you think @eme64 ? For this PR, my optimalist side tells me that you can try repeatedly applying IGVN until there is no further progress. It may be cheap since there should be few to none transformations left. If that sounds inelegant a possible approach is to refactor the `Identity` call into an `apply_identity` similar to `apply_ideal` where you can override it with a method that always returns the current node until the missed transformation is fixed. IMO in principle, there should not be cases where we legitimately depend on `Ideal` to be invoked after `Identity` like this case. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21599#issuecomment-2536418534 From paul.sandoz at oracle.com Wed Dec 11 18:29:30 2024 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Wed, 11 Dec 2024 18:29:30 +0000 Subject: RFC: Untangle native libraries and the JVM: SVML, SLEEF, and libsimdsort In-Reply-To: <24117ca6-abf9-43a0-933c-00c3333097e4@oracle.com> References: <1bf041a1-1002-4d18-85b3-15f437fa534e@oracle.com> <5EA2B22F-0BFD-49E6-A20B-0F91ED94FF8C@oracle.com> <24117ca6-abf9-43a0-933c-00c3333097e4@oracle.com> Message-ID: <96DE2A6E-1CA2-41BD-B346-C4D7D1328031@oracle.com> > On Dec 10, 2024, at 4:54?PM, Vladimir Ivanov wrote: > > > > On 12/9/24 07:55, Paul Sandoz wrote: >> Some further observations. >> - This arguably makes it harder for the auto-vectorize to access the SVML/SLEEF functionality. However, in comes cases, we cannot guarantee the same guarantees (IIRC mainly around monotonicity) as the scalar operations in Math. > > I'm not too optimistic about auto-vectorization unless the very same stubs are shared between scalar and vectorized code. Our previous experience with FP operations strongly indicates that users expect FP operations to give reproducible results (bitwise equivalent) across the same run. > > Moreover, migration to FFI enables usage of SVML/SLEEF across all execution modes which should make it easier to reason about Vector API usages. > Agreed. >> - There is an open bug to adjust the simd sort behavior on AMD zen 4 cores due to poor performance of an AVX 512 instruction. The simplest solution is to fall back to AVX2. That may be simpler to manage in Java? (I was looking at the HotSpot code). > > For now, the patch guards AVX512 entries with VM.isIntelCPU() check. In order to distinguish between AMD Zen 4 and 5, either a new platform-sensing check is needed or reimplementation of x86-specific platform sensing in Java on top of CPUID info. > Probably best just to update as required. (Also I don?t seem anything in the HotSpot code to determine the AMD zen core version.) Any general CPU vendor/model solution seems a little more challenging than that of surfacing up the CPU feature set as a string. Note that the System.getProperties() surfaces up ?os.arch?. A more general solution could add further properties for the CPU? Paul. From qamai at openjdk.org Wed Dec 11 18:35:18 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Wed, 11 Dec 2024 18:35:18 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v6] In-Reply-To: References: Message-ID: On Wed, 4 Dec 2024 15:35:44 GMT, Roland Westrelin wrote: >> To optimize a long counted loop and long range checks in a long or int >> counted loop, the loop is turned into a loop nest. When the loop has >> few iterations, the overhead of having an outer loop whose backedge is >> never taken, has a measurable cost. Furthermore, creating the loop >> nest usually causes one iteration of the loop to be peeled so >> predicates can be set up. If the loop is short running, then it's an >> extra iteration that's run with range checks (compared to an int >> counted loop with int range checks). >> >> This change doesn't create a loop nest when: >> >> 1- it can be determined statically at loop nest creation time that the >> loop runs for a short enough number of iterations >> >> 2- profiling reports that the loop runs for no more than ShortLoopIter >> iterations (1000 by default). >> >> For 2-, a guard is added which is implemented as yet another predicate. >> >> While this change is in principle simple, I ran into a few >> implementation issues: >> >> - while c2 has a way to compute the number of iterations of an int >> counted loop, it doesn't have that for long counted loop. The >> existing logic for int counted loops promotes values to long to >> avoid overflows. I reworked it so it now works for both long and int >> counted loops. >> >> - I added a new deoptimization reason (Reason_short_running_loop) for >> the new predicate. Given the number of iterations is narrowed down >> by the predicate, the limit of the loop after transformation is a >> cast node that's control dependent on the short running loop >> predicate. Because once the counted loop is transformed, it is >> likely that range check predicates will be inserted and they will >> depend on the limit, the short running loop predicate has to be the >> one that's further away from the loop entry. Now it is also possible >> that the limit before transformation depends on a predicate >> (TestShortRunningLongCountedLoopPredicatesClone is an example), we >> can have: new predicates inserted after the transformation that >> depend on the casted limit that itself depend on old predicates >> added before the transformation. To solve this cicular dependency, >> parse and assert predicates are cloned between the old predicates >> and the loop head. The cloned short running loop parse predicate is >> the one that's used to insert the short running loop predicate. >> >> - In the case of a long counted loop, the loop is transformed into a >> regular loop with a ... > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > Update src/hotspot/share/opto/loopTransform.cpp > > Co-authored-by: Emanuel Peter I'm worried this solution will be susceptible to profile pollution, i.e. if a loop is called from 2 places, one with a large trip count and one with a small trip count, then the caller with a small trip count will suffer. As a result, in addition to this, making loop nests with 1 iteration of the outer loop cheaper will also be necessary. Profile pollution seems to be an issue that leads to [JDK-8307084](https://bugs.openjdk.org/browse/JDK-8307084), too. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21630#issuecomment-2536822711 From kvn at openjdk.org Wed Dec 11 19:27:12 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 11 Dec 2024 19:27:12 GMT Subject: RFR: 8331717: C2: Crash with SIGFPE Because Loop Predication Wrongly Hoists Division Requiring Zero Check In-Reply-To: References: Message-ID: On Tue, 10 Dec 2024 16:09:26 GMT, theoweidmannoracle wrote: > Fixes a bug in loop predication where not strictly invariant tests involving divisions or modulo are pulled out of the loop. > > The bug can be seen in this code: > > > public class Reduced { > static int iArr[] = new int[100]; > > public static void main(String[] strArr) { > for (int i = 0; i < 10000; i++) { > test(); > } > } > > static void test() { > int i1 = 0; > > for (int i4 : iArr) { > i4 = i1; > try { > iArr[0] = 1 / i4; > i4 = iArr[2 / i4]; // Source of the crash > } catch (ArithmeticException a_e) { > } > } > } > } > > > The crucial element is the division `2 / i4`. Since it is used to access an array, it is the input to a range check. See node 230: > Screenshot 2024-12-11 at 15 14 47 > > Loop predication will try to pull this range check together with its input, the division, before the `for` loop. Due to a bug in Invariance::compute_invariance loop predication is allowed to do so, which results in the division being pulled out without its non-zero check. 322 is a clone of 230 placed before the loop head without any zero check for the divisor: > > Screenshot 2024-12-11 at 15 11 48 > > To fix this, Invariance::compute_invariance must check that the node not only `depends_only_on_test()` but also that it has `no_dependent_zero_check(n)`. > > Similar past bug, which introduced `no_dependent_zero_check`: https://github.com/openjdk/jdk16/pull/9 Looks good for this issue. Would be interesting if we can collapse such graph by propagating `I1 == 0` through `i4` into zero check. As separate RFE. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22666#pullrequestreview-2496601515 PR Comment: https://git.openjdk.org/jdk/pull/22666#issuecomment-2536907055 From dholmes at openjdk.org Wed Dec 11 20:30:42 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 11 Dec 2024 20:30:42 GMT Subject: RFR: 8345609: [C1] LIR Operations with one input should be implemented as LIR_Op1 [v2] In-Reply-To: References: Message-ID: On Thu, 5 Dec 2024 21:29:51 GMT, Martin Doerr wrote: >> Change `lir_sqrt`, `lir_abs`, `lir_neg`, `lir_f2hf`, `lir_hf2f` to use `LIR_Op1`. Extend `LIR_Op1` to support one temp register operand. Also see JBS issue. >> >> Remove `lir_tan` and `lir_log10` which are unused (dead and incomplete code). They should have been removed with or after https://github.com/openjdk/jdk/commit/ad79a5ae65d24861ead3ae96cf148c8bc0f02736 and the corresponding changes on other platforms. >> >> The removal of the unnecessary float constant loads improves performance: >> make run-test TEST="micro:Fp16ConversionBenchmark" MICRO="VM_OPTIONS=-XX:TieredStopAtLevel=1" >> >> Power 10 without patch: >> >> Benchmark (size) Mode Cnt Score Error Units >> Fp16ConversionBenchmark.floatToFloat16 2048 thrpt 15 247.064 ? 0.189 ops/ms >> >> >> Power 10 with patch: >> >> Benchmark (size) Mode Cnt Score Error Units >> Fp16ConversionBenchmark.floatToFloat16 2048 thrpt 15 308.372 ? 0.432 ops/ms >> >> >> x64 machine without patch: >> >> Benchmark (size) Mode Cnt Score Error Units >> Fp16ConversionBenchmark.floatToFloat16 2048 thrpt 15 384.565 ? 3.758 ops/ms >> >> >> x64 machine with patch: >> >> Benchmark (size) Mode Cnt Score Error Units >> Fp16ConversionBenchmark.floatToFloat16 2048 thrpt 15 406.121 ? 3.228 ops/ms >> >> >> Testing: tier1-4 on x64 (Windows, linux, MacOS), aarch64 (linux, MacOS), ppc64 (linux, AIX) > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Update Copyright headers. We have a large number of failures in our tier 2 (so far) CI after this was integrated, all with issues in the JIT. There is a crash in LIR_Assembler::negate `# assert(regs[i] != regs[j]) failed: regs[0] and regs[1] are both: xmm2` There are FP failures of different kinds `assertEquals expected: 1.401298464324817E-45 but was: 1.727233711018889E-77` `RuntimeException: pow(+Infinity, 0.5), expected: Infinity, actual: Infinity` Sorry but this change will be backed out. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22582#issuecomment-2537056214 From dholmes at openjdk.org Wed Dec 11 20:48:22 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 11 Dec 2024 20:48:22 GMT Subject: RFR: 8346039: [BACKOUT] - [C1] LIR Operations with one input should be implemented as LIR_Op1 Message-ID: Revert "8345609: [C1] LIR Operations with one input should be implemented as LIR_Op1" This reverts commit a21d21f4d7b74e21f68b6bf9c5dc9ba7d3f9963c. Testing tiers 1-3 in progress Thanks ------------- Commit messages: - Revert "8345609: [C1] LIR Operations with one input should be implemented as LIR_Op1" Changes: https://git.openjdk.org/jdk/pull/22690/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22690&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8346039 Stats: 158 lines in 10 files changed: 89 ins; 59 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/22690.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22690/head:pull/22690 PR: https://git.openjdk.org/jdk/pull/22690 From rcastanedalo at openjdk.org Wed Dec 11 20:48:35 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 11 Dec 2024 20:48:35 GMT Subject: RFR: 8345067: C2: enable implicit null checks for ZGC reads Message-ID: Currently, C2 cannot exploit late-expanded GC memory accesses as implicit null checks because of their use of temporary operands, which prevents `PhaseCFG::implicit_null_check` from [hoisting the memory accesses to the test basic block](https://github.com/openjdk/jdk/blob/f88c1c6ff86b8f29a71647e46136b6432bb67619/src/hotspot/share/opto/lcm.cpp#L319-L335). This changeset extends the scope of the implicit null check optimization so that it can exploit ZGC object loads. It introduces a platform-dependent predicate (`MachNode::has_initial_implicit_null_check_candidate`) to mark late-expanded instructions that emit a suitable memory access as a first instruction as candidates, and extends the optimization to recognize and hoist candidate memory accesses that use temporary operands: ![example](https://github.com/user-attachments/assets/b5f9bbc8-d75d-4cf3-841e-73db3dbae753) Exploiting ZGC loads increases the effectiveness of the implicit null check optimization (measured in percent of explicit null checks turned into implicit ones at compile time) by around 10% in the DaCapo chopin benchmarks: ![C2-inc-hit-rate-jdk-25+1-vs-jdk-25+1-with-8345067](https://github.com/user-attachments/assets/8d114058-c6b2-4254-a374-0d0b220af718) The larger number of implicit null checks results in slight performance improvements (in the 1-2% range) in a few DaCapo and SPECjvm2008 benchmarks and an overall slight improvement across Renaissance benchmarks. A further extension of the optimization to arbitrary memory access instructions (including e.g. G1 object stores, which emit multiple memory accesses at arbitrary address offsets) will be investigated separately as part of [JDK-8344627](https://bugs.openjdk.org/browse/JDK-8344627). #### Testing - tier1-5, compiler stress test (linux-x64, macosx-x64, windows-x64, linux-aarch64, macosx-aarch64; release and debug mode). ------------- Commit messages: - Revert unnecessary changes - Move check to original location - Enable zLoadP as implicit null check candidates on riscv and ppc - Refactor assertion - Simplify test - Mark zLoadP in x64 as exploitable by implicit null check optimization - Fix comment - Do not mark g1LoadP/g1LoadN as initial_implicit_null_check_candidate, they cannot be exploited anyway due to indirect memory operand - Exploit zLoadP only if the memory operand is indOffL8 (indirect does not work anyway due to limitations in C2's analysis) - Complete test with stores and atomics - ... and 10 more: https://git.openjdk.org/jdk/compare/bedb68ab...01dd8618 Changes: https://git.openjdk.org/jdk/pull/22678/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22678&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8345067 Stats: 381 lines in 15 files changed: 336 ins; 37 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/22678.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22678/head:pull/22678 PR: https://git.openjdk.org/jdk/pull/22678 From kvn at openjdk.org Wed Dec 11 20:53:36 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 11 Dec 2024 20:53:36 GMT Subject: RFR: 8346039: [BACKOUT] - [C1] LIR Operations with one input should be implemented as LIR_Op1 In-Reply-To: References: Message-ID: On Wed, 11 Dec 2024 20:41:29 GMT, David Holmes wrote: > Revert "8345609: [C1] LIR Operations with one input should be implemented as LIR_Op1" > > This reverts commit a21d21f4d7b74e21f68b6bf9c5dc9ba7d3f9963c. > > Testing tiers 1-3 in progress > > Thanks Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22690#pullrequestreview-2496925051 From dholmes at openjdk.org Wed Dec 11 20:53:37 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 11 Dec 2024 20:53:37 GMT Subject: RFR: 8346039: [BACKOUT] - [C1] LIR Operations with one input should be implemented as LIR_Op1 In-Reply-To: References: Message-ID: <3g4fs31KD_2UgsUsbX8JLYjQtTjUxH9JrGRYP1s1m28=.a2333125-2479-487d-a25c-3c4fe69693d6@github.com> On Wed, 11 Dec 2024 20:49:50 GMT, Vladimir Kozlov wrote: >> Revert "8345609: [C1] LIR Operations with one input should be implemented as LIR_Op1" >> >> This reverts commit a21d21f4d7b74e21f68b6bf9c5dc9ba7d3f9963c. >> >> Testing tiers 1-3 in progress >> >> Thanks > > Good. Thanks for the review @vnkozlov ! Just awaiting testing to complete. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22690#issuecomment-2537134638 From mdoerr at openjdk.org Wed Dec 11 21:25:36 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 11 Dec 2024 21:25:36 GMT Subject: RFR: 8346039: [BACKOUT] - [C1] LIR Operations with one input should be implemented as LIR_Op1 In-Reply-To: References: Message-ID: On Wed, 11 Dec 2024 20:41:29 GMT, David Holmes wrote: > Revert "8345609: [C1] LIR Operations with one input should be implemented as LIR_Op1" > > This reverts commit a21d21f4d7b74e21f68b6bf9c5dc9ba7d3f9963c. > > Testing tiers 1-3 in progress > > Thanks Backout looks correct. I wonder on what kind of machine the issue showed up. Was it 32 bit? Tier 2 has passed on our machines. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22690#pullrequestreview-2497038112 From dholmes at openjdk.org Wed Dec 11 21:35:37 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 11 Dec 2024 21:35:37 GMT Subject: RFR: 8346039: [BACKOUT] - [C1] LIR Operations with one input should be implemented as LIR_Op1 In-Reply-To: References: Message-ID: On Wed, 11 Dec 2024 21:23:04 GMT, Martin Doerr wrote: >> Revert "8345609: [C1] LIR Operations with one input should be implemented as LIR_Op1" >> >> This reverts commit a21d21f4d7b74e21f68b6bf9c5dc9ba7d3f9963c. >> >> Testing tiers 1-3 in progress >> >> Thanks > > Backout looks correct. I wonder on what kind of machine the issue showed up. Was it 32 bit? Tier 2 has passed on our machines. @TheRealMDoerr Windows -x64 and Linux-x64. It is intermittent (I guess random register content might cause that). Thanks for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22690#issuecomment-2537238648 From dholmes at openjdk.org Wed Dec 11 21:41:37 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 11 Dec 2024 21:41:37 GMT Subject: RFR: 8346039: [BACKOUT] - [C1] LIR Operations with one input should be implemented as LIR_Op1 In-Reply-To: References: Message-ID: On Wed, 11 Dec 2024 21:23:04 GMT, Martin Doerr wrote: >> Revert "8345609: [C1] LIR Operations with one input should be implemented as LIR_Op1" >> >> This reverts commit a21d21f4d7b74e21f68b6bf9c5dc9ba7d3f9963c. >> >> Testing tiers 1-3 in progress >> >> Thanks > > Backout looks correct. I wonder on what kind of machine the issue showed up. Was it 32 bit? Tier 2 has passed on our machines. @TheRealMDoerr I've added some failing test info to [JDK-8346038](https://bugs.openjdk.org/browse/JDK-8346038) ------------- PR Comment: https://git.openjdk.org/jdk/pull/22690#issuecomment-2537244540 From mdoerr at openjdk.org Wed Dec 11 21:41:38 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 11 Dec 2024 21:41:38 GMT Subject: RFR: 8346039: [BACKOUT] - [C1] LIR Operations with one input should be implemented as LIR_Op1 In-Reply-To: References: Message-ID: On Wed, 11 Dec 2024 20:41:29 GMT, David Holmes wrote: > Revert "8345609: [C1] LIR Operations with one input should be implemented as LIR_Op1" > > This reverts commit a21d21f4d7b74e21f68b6bf9c5dc9ba7d3f9963c. > > Testing tiers 1-3 in progress > > Thanks It must be machine dependent. I guess `UseAVX > 2 && !VM_Version::supports_avx512vl()` is the case which is not hit by our machines. Could you confirm that? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22690#issuecomment-2537247138 From dholmes at openjdk.org Wed Dec 11 22:02:44 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 11 Dec 2024 22:02:44 GMT Subject: RFR: 8346039: [BACKOUT] - [C1] LIR Operations with one input should be implemented as LIR_Op1 In-Reply-To: References: Message-ID: On Wed, 11 Dec 2024 20:41:29 GMT, David Holmes wrote: > Revert "8345609: [C1] LIR Operations with one input should be implemented as LIR_Op1" > > This reverts commit a21d21f4d7b74e21f68b6bf9c5dc9ba7d3f9963c. > > Testing tiers 1-3 in progress > > Thanks Yes the failing runs have `-XX:UseAVX=3 -XX:+UnlockDiagnosticVMOptions -XX:+UseKNLSetting` ------------- PR Comment: https://git.openjdk.org/jdk/pull/22690#issuecomment-2537283528 From mdoerr at openjdk.org Wed Dec 11 22:30:11 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 11 Dec 2024 22:30:11 GMT Subject: RFR: 8346039: [BACKOUT] - [C1] LIR Operations with one input should be implemented as LIR_Op1 In-Reply-To: References: Message-ID: On Wed, 11 Dec 2024 20:41:29 GMT, David Holmes wrote: > Revert "8345609: [C1] LIR Operations with one input should be implemented as LIR_Op1" > > This reverts commit a21d21f4d7b74e21f68b6bf9c5dc9ba7d3f9963c. > > Testing tiers 1-3 in progress > > Thanks Thanks for confirming! @vnkozlov: Regarding the redo: Do you think we can remove the separate case https://github.com/openjdk/jdk/blob/64fad1c7d374bbc635bad3b1fa7941379f39565f/src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp#L3802 and fall back to `xorps` in C1? That would simplify the code a lot and avoid an extra temp register and the preloading with `LIR_OprFact::floatConst(-0.0)`? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22690#issuecomment-2537305177 From vladimir.x.ivanov at oracle.com Wed Dec 11 22:39:08 2024 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 11 Dec 2024 14:39:08 -0800 Subject: RFC: Untangle native libraries and the JVM: SVML, SLEEF, and libsimdsort In-Reply-To: <96DE2A6E-1CA2-41BD-B346-C4D7D1328031@oracle.com> References: <1bf041a1-1002-4d18-85b3-15f437fa534e@oracle.com> <5EA2B22F-0BFD-49E6-A20B-0F91ED94FF8C@oracle.com> <24117ca6-abf9-43a0-933c-00c3333097e4@oracle.com> <96DE2A6E-1CA2-41BD-B346-C4D7D1328031@oracle.com> Message-ID: <266bb453-f50b-4f0f-8199-ae749f1906a1@oracle.com> >>> - There is an open bug to adjust the simd sort behavior on AMD zen 4 cores due to poor performance of an AVX 512 instruction. The simplest solution is to fall back to AVX2. That may be simpler to manage in Java? (I was looking at the HotSpot code). >> >> For now, the patch guards AVX512 entries with VM.isIntelCPU() check. In order to distinguish between AMD Zen 4 and 5, either a new platform-sensing check is needed or reimplementation of x86-specific platform sensing in Java on top of CPUID info. >> > > Probably best just to update as required. (Also I don?t seem anything in the HotSpot code to determine the AMD zen core version.) Yes, it's not implemented yet. But there are enough examples for Intel chips to see how it will look like (e.g., is_intel_skylake() or is_knights_family()). > Any general CPU vendor/model solution seems a little more challenging than that of surfacing up the CPU feature set as a string. Note that the System.getProperties() surfaces up ?os.arch?. A more general solution could add further properties for the CPU? Personally, I'm not a fan of "stringy" APIs. It may look convenient at first, but it suffers from deficiencies of both approaches. I would prefer to use a strongly typed platform-specific API instead (an equivalent of VM_Version class in hotspot). Best regards, Vladimir Ivanov From kvn at openjdk.org Wed Dec 11 23:13:34 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 11 Dec 2024 23:13:34 GMT Subject: RFR: 8346039: [BACKOUT] - [C1] LIR Operations with one input should be implemented as LIR_Op1 In-Reply-To: References: Message-ID: On Wed, 11 Dec 2024 22:11:52 GMT, Martin Doerr wrote: >> Revert "8345609: [C1] LIR Operations with one input should be implemented as LIR_Op1" >> >> This reverts commit a21d21f4d7b74e21f68b6bf9c5dc9ba7d3f9963c. >> >> Testing tiers 1-3 in progress >> >> Thanks > > Thanks for confirming! > @vnkozlov: Regarding the redo: Do you think we can remove the separate case https://github.com/openjdk/jdk/blob/64fad1c7d374bbc635bad3b1fa7941379f39565f/src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp#L3802 and fall back to `xorps` in C1? That would simplify the code a lot and avoid an extra temp register and the preloading with `LIR_OprFact::floatConst(-0.0)`? @TheRealMDoerr I don't know simple answer for your question. These checks were added after failures during testing changes for [JDK-821076](https://bugs.openjdk.org/browse/JDK-8210764) [diffs](https://github.com/openjdk/jdk/commit/092fe55fb10a979534cfd69eaa35546f3af11677#diff-144639248eb0a12e15e1e8d36dae1fdd5d6ad6948610f283c7c26a807b431a6eL3768) Here is review of these changes: https://mail.openjdk.org/pipermail/hotspot-compiler-dev/2018-September/030629.html The issue could be that we should avoid mixing encoding modes for AVX instructions (when C1 and C2 compiled methods call each other). May be we should wait when @sviswa7 is back from vacation and get her answer to your question. If it is not about mixing mode we can accept small performance issues since C1 generated code will be replaced with C2 generated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22690#issuecomment-2537389097 From dholmes at openjdk.org Thu Dec 12 00:06:40 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 12 Dec 2024 00:06:40 GMT Subject: Integrated: 8346039: [BACKOUT] - [C1] LIR Operations with one input should be implemented as LIR_Op1 In-Reply-To: References: Message-ID: On Wed, 11 Dec 2024 20:41:29 GMT, David Holmes wrote: > Revert "8345609: [C1] LIR Operations with one input should be implemented as LIR_Op1" > > This reverts commit a21d21f4d7b74e21f68b6bf9c5dc9ba7d3f9963c. > > Testing tiers 1-3 in progress > > Thanks This pull request has now been integrated. Changeset: ec219ae5 Author: David Holmes URL: https://git.openjdk.org/jdk/commit/ec219ae56f7b3037375bae221861007ccbf2ce0d Stats: 158 lines in 10 files changed: 89 ins; 59 del; 10 mod 8346039: [BACKOUT] - [C1] LIR Operations with one input should be implemented as LIR_Op1 Reviewed-by: kvn, mdoerr ------------- PR: https://git.openjdk.org/jdk/pull/22690 From qamai at openjdk.org Thu Dec 12 02:55:44 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 12 Dec 2024 02:55:44 GMT Subject: RFR: 8346039: [BACKOUT] - [C1] LIR Operations with one input should be implemented as LIR_Op1 In-Reply-To: References: Message-ID: On Wed, 11 Dec 2024 22:11:52 GMT, Martin Doerr wrote: >> Revert "8345609: [C1] LIR Operations with one input should be implemented as LIR_Op1" >> >> This reverts commit a21d21f4d7b74e21f68b6bf9c5dc9ba7d3f9963c. >> >> Testing tiers 1-3 in progress >> >> Thanks > > Thanks for confirming! > @vnkozlov: Regarding the redo: Do you think we can remove the separate case https://github.com/openjdk/jdk/blob/64fad1c7d374bbc635bad3b1fa7941379f39565f/src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp#L3802 and fall back to `xorps` in C1? That would simplify the code a lot and avoid an extra temp register and the preloading with `LIR_OprFact::floatConst(-0.0)`? @TheRealMDoerr I have not looked too closely at your change, but from the error messages I guess the failure may be due to the fact that input operands are not required to live during the node, while temp operands live only during the node. As a result, they may be assigned the same register by the allocator, which is not correct for some of the nodes here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22690#issuecomment-2537679958 From qamai at openjdk.org Thu Dec 12 03:11:47 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 12 Dec 2024 03:11:47 GMT Subject: Integrated: 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation In-Reply-To: References: Message-ID: On Tue, 17 Sep 2024 16:13:55 GMT, Quan Anh Mai wrote: > Hi, > > This is just a redo of https://github.com/openjdk/jdk/pull/13093. mostly just the revert of the backout. > > Regarding the related issues: > > - [JDK-8306008](https://bugs.openjdk.org/browse/JDK-8306008) and [JDK-8309531](https://bugs.openjdk.org/browse/JDK-8309531) have been fixed before the backout. > - [JDK-8309373](https://bugs.openjdk.org/browse/JDK-8309373) was due to missing `ForceInline` on `AbstractVector::toBitsVectorTemplate` > - [JDK-8306592](https://bugs.openjdk.org/browse/JDK-8306592), I have not been able to find the root causes. I'm not sure if this is a blocker, now I cannot even build x86-32 tests. > > Finally, I moved some implementation of public methods and methods that call into intrinsics to the concrete class as that may help the compiler know the correct types of the variables. > > Please take a look and leave reviews. Thanks a lot. > > The description of the original PR: > > This patch reimplements `VectorShuffle` implementations to be a vector of the bit type. Currently, `VectorShuffle` is stored as a byte array, and would be expanded upon usage. This poses several drawbacks: > > Inefficient conversions between a shuffle and its corresponding vector. This hinders the performance when the shuffle indices are not constant and are loaded or computed dynamically. > Redundant expansions in `rearrange` operations. On all platforms, it seems that a shuffle index vector is always expanded to the correct type before executing the `rearrange` operations. > Some redundant intrinsics are needed to support this handling as well as special considerations in the C2 compiler. > Range checks are performed using `VectorShuffle::toVector`, which is inefficient for FP types since both FP conversions and FP comparisons are more expensive than the integral ones. > Upon these changes, a `rearrange` can emit more efficient code: > > var species = IntVector.SPECIES_128; > var v1 = IntVector.fromArray(species, SRC1, 0); > var v2 = IntVector.fromArray(species, SRC2, 0); > v1.rearrange(v2.toShuffle()).intoArray(DST, 0); > > Before: > movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} > vmovdqu 0x10(%r10),%xmm2 > movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} > vmovdqu 0x10(%r10),%xmm1 > movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} > vmovdqu 0x10(%r10),%xmm0 > vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byte_mask > ; {ex... This pull request has now been integrated. Changeset: 75cfb640 Author: Quan Anh Mai URL: https://git.openjdk.org/jdk/commit/75cfb640a6bbdb714321107bceedb39913ee6e1f Stats: 5051 lines in 64 files changed: 2649 ins; 1066 del; 1336 mod 8310691: [REDO] [vectorapi] Refactor VectorShuffle implementation Reviewed-by: psandoz, jbhateja, epeter ------------- PR: https://git.openjdk.org/jdk/pull/21042 From stuefe at openjdk.org Thu Dec 12 04:29:44 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 12 Dec 2024 04:29:44 GMT Subject: RFR: 8344013: "bad tag in log" assert with +LogCompilation +CITimeVerbose [v3] In-Reply-To: <4v95roGlt4jgnrQQ4kDK1AP0uCyV7j6lWEydZHvyKlo=.0fafd6ea-8071-49f7-8aad-3d6cbefe9ab3@github.com> References: <4v95roGlt4jgnrQQ4kDK1AP0uCyV7j6lWEydZHvyKlo=.0fafd6ea-8071-49f7-8aad-3d6cbefe9ab3@github.com> Message-ID: <3H0Bc-pBDWS0aMs6iABlRGR72SFOd-lMxS33F1f9_zk=.61a0a281-e8df-4a9e-ac3c-2b9477419b51@github.com> On Mon, 25 Nov 2024 20:43:36 GMT, Sonia Zaldana Calles wrote: >> Hi folks, >> >> This PR addresses [8344013](https://bugs.openjdk.org/browse/JDK-8344013). >> >> Sometimes the writing to xmlStream is mixed from several threads, and therefore the xmlStream tag stack can end up in a bad state. When this occurs, the VM crashes in `xmlStream::pop_tag` with `assert(false) failed: bad tag in log`. >> >> In this case, running `java -XX:+LogCompilation -XX:CompileCommand="log,*.*" -XX:+CITimeVerbose -Xcomp -Xbatch -version` , `xmlStream::pop_tag` is expecting to pop the tag `task` but finds `phase` instead. >> >> I found the issue stems from [8330157](https://bugs.openjdk.org/browse/JDK-8330157). The problematic code is in the destructor for [TracePhase](https://github.com/openjdk/jdk/blame/master/src/hotspot/share/opto/compile.cpp#L4337). >> >> Note how the constructor adds the [phase tag](https://github.com/openjdk/jdk/blame/master/src/hotspot/share/opto/compile.cpp#L4327). >> >> However, in the destructor, if we return early, we don?t pop that tag, leading to the xmlStream tag stack to end up in a bad state. With this patch, I made sure we pop the tag even if we return early. >> >> Cheers, >> Sonia > > Sonia Zaldana Calles has updated the pull request incrementally with one additional commit since the last revision: > > Changes based on feedback Very good. Thank you for fixing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22331#issuecomment-2537787458 From jatin.bhateja at intel.com Thu Dec 12 06:04:52 2024 From: jatin.bhateja at intel.com (Bhateja, Jatin) Date: Thu, 12 Dec 2024 06:04:52 +0000 Subject: RFC: Untangle native libraries and the JVM: SVML, SLEEF, and libsimdsort In-Reply-To: <266bb453-f50b-4f0f-8199-ae749f1906a1@oracle.com> References: <1bf041a1-1002-4d18-85b3-15f437fa534e@oracle.com> <5EA2B22F-0BFD-49E6-A20B-0F91ED94FF8C@oracle.com> <24117ca6-abf9-43a0-933c-00c3333097e4@oracle.com> <96DE2A6E-1CA2-41BD-B346-C4D7D1328031@oracle.com> <266bb453-f50b-4f0f-8199-ae749f1906a1@oracle.com> Message-ID: Hi Vladimir, This looks good, I recall we discussed a?similar topic on panama-dev mailing list [1] to give Java developers a handle to write custom target-specific logic. The approach is identical to .NET hardware intrinsics[2] though in this case, we deal in higher-level target-specific intrinsics rather than operating at the?instruction level. Best Regards, Jatin [1] https://mail.openjdk.org/pipermail/panama-dev/2024-June/020493.html [1] https://devblogs.microsoft.com/dotnet/dotnet-8-hardware-intrinsics/ > -----Original Message----- > From: panama-dev On Behalf Of Vladimir > Ivanov > Sent: Thursday, December 12, 2024 4:09 AM > To: Paul Sandoz > Cc: hotspot-compiler-dev at openjdk.org; panama-dev at openjdk.org; core- > libs-dev > Subject: Re: RFC: Untangle native libraries and the JVM: SVML, SLEEF, and > libsimdsort > > > > >>> - There is an open bug to adjust the simd sort behavior on AMD zen 4 > cores due to poor performance of an AVX 512 instruction. The simplest > solution is to fall back to AVX2. That may be simpler to manage in Java? (I > was looking at the HotSpot code). > >> > >> For now, the patch guards AVX512 entries with VM.isIntelCPU() check. In > order to distinguish between AMD Zen 4 and 5, either a new platform- > sensing check is needed or reimplementation of x86-specific platform > sensing in Java on top of CPUID info. > >> > > > > Probably best just to update as required. (Also I don?t seem anything > > in the HotSpot code to determine the AMD zen core version.) > > Yes, it's not implemented yet. But there are enough examples for Intel chips > to see how it will look like (e.g., is_intel_skylake() or is_knights_family()). > > > Any general CPU vendor/model solution seems a little more challenging > than that of surfacing up the CPU feature set as a string. Note that the > System.getProperties() surfaces up ?os.arch?. A more general solution could > add further properties for the CPU? > > Personally, I'm not a fan of "stringy" APIs. It may look convenient at first, > but it suffers from deficiencies of both approaches. I would prefer to use a > strongly typed platform-specific API instead (an equivalent of VM_Version > class in hotspot). > > Best regards, > Vladimir Ivanov From djelinski at openjdk.org Thu Dec 12 06:59:35 2024 From: djelinski at openjdk.org (Daniel =?UTF-8?B?SmVsacWEc2tp?=) Date: Thu, 12 Dec 2024 06:59:35 GMT Subject: RFR: 8345471: Clean up compiler/intrinsics/sha/cli tests In-Reply-To: References: Message-ID: On Wed, 4 Dec 2024 20:06:19 GMT, Vladimir Kozlov wrote: >> Merge all the GenericTestCaseForUnsupportedXXXCPU and GenericTestCaseForOtherCPU into GenericTestCaseForUnsupportedCPU.java. >> >> The CPU-specific files are almost identical; I chose to resolve the differences in favor of the AArch64 version. The OtherCPU version looks wrong, and it wasn't executed on any supported platform. >> >> The tests continue to pass on linux-aarch64/x64, windows-x64 and mac-aarch64. I didn't test other platforms. >> >> After the change, the tests will start running on PPC and S390. They will also automatically run on any new architectures. >> >> For those interested in historical background, when the tests were introduced, there were only 2 supported CPU architectures. X86 did not support any of the intrinsics, and the X86 test case did not even call `getPredicateForOption`. The call to `getPredicateForOption` was added in f2e9b827d699115f8683e9def06c249e5476fd50, and since then all the cases are the same. > > Good. Thanks for the review @vnkozlov. Still waiting for a second reviewer... ------------- PR Comment: https://git.openjdk.org/jdk/pull/22517#issuecomment-2537965552 From epeter at openjdk.org Thu Dec 12 07:06:38 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 12 Dec 2024 07:06:38 GMT Subject: RFR: 8332827: [REDO] C2: crash in compiled code because of dependency on removed range check CastIIs [v2] In-Reply-To: References: Message-ID: On Wed, 11 Dec 2024 15:31:45 GMT, Roland Westrelin wrote: >> I don't see anything wrong with the formatting. What am I missing? > > Added run without flags in new commit. >I don't see anything wrong with the formatting. What am I missing? Just the extra empty comment lines. No big deal. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22568#discussion_r1881491183 From epeter at openjdk.org Thu Dec 12 07:19:37 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 12 Dec 2024 07:19:37 GMT Subject: RFR: 8332827: [REDO] C2: crash in compiled code because of dependency on removed range check CastIIs [v2] In-Reply-To: References: Message-ID: <0Cgmu8NKsfGbHYwNo7In32WmPwbOEk4h1SlpIVB9-a0=.8d376993-dfc4-40d9-b9cd-67d75d3eabd1@github.com> On Wed, 11 Dec 2024 15:04:31 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/compile.cpp line 3147: >> >>> 3145: DivModNode* divmod = DivModNode::make(n, bt, is_unsigned); >>> 3146: divmod->add_prec_from(n); >>> 3147: divmod->add_prec_from(d); >> >> Can you explain why you added this? > > If the divisor input for a `Div` (or `Mod` etc.) is not not null, then the control input of the `Div` is set to null. It could be that the divisor input is found not null because the subgraph for that input contains a `CastII`. If that happens, removing the `CastII` during final graph reshap could cause the `Div` to float above the `CastII` and above the condition that allowed the type of the `CastII` to be narrowed. This could cause a crash. > So when the `CastII` is removed, precedence edges are aded to the `Div` node. If the `Div` is then replaced by a `DivMod`, we need to transfer the precedence edges to the `DivMod` node. Why not add such a comment to the code then? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22568#discussion_r1881504693 From epeter at openjdk.org Thu Dec 12 07:25:40 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 12 Dec 2024 07:25:40 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v3] In-Reply-To: References: <3z-tAI6f1fUGXLgIwNylfaUB1DdNaQ6ZM-n2WZ_V9Ak=.61a402ae-7234-4a0d-ab0e-e29866522988@github.com> Message-ID: <3LO6j8U3KDCPdaG5HsGVz0qy4n344h0jGweglQU2VeE=.cda7dd55-6607-448c-96d1-8985c731ce00@github.com> On Wed, 11 Dec 2024 16:07:04 GMT, Quan Anh Mai wrote: >> @jatin-bhateja Thanks for the bug report! I was able to reproduce it on my system as well. I apologize for the late response, I had been busy with my studies and only had a chance to look recently. It looks like this bug is caused because we run `Identity()` on all of the nodes, which finds new optimizations by replacing `LoadNode`s with a non-load identity. This causes an uncommon trap branch to be optimized out and the `HaltNode` replaced with a TOP `ConNode`. Usually these are removed by `RootNode::Ideal`, but since we don't do regular `Ideal` in lowering it ends up running into an assert later. >> >> This is interesting because it suggests that there are potentially calls to `Identity()` that aren't run in earlier IGVN rounds. I didn't take a very in-depth look but it might be good to investigate separately. Because of that I'm not sure if we can rely on `Identity()` here, especially if there are identity transforms that rely on other `Ideal` calls. @merykitty do you have any thoughts on the best way to fix this? >> >> @eme64 Ah, do you mean the recursive reduction op generation based on vector length? I think that could be handled here as well by splitting the reduction operation into multiple backend-specific nodes. It would still need to be implemented per-backend, but there could be benefits with optimizing more complex reductions, as you mention. > > @jaskarth That's really interesting, it indeed seems like a missed transformation to me. I think adding an IGVN verification where we verify that no other transformation can take place after an IGVN transformation would catch this, what do you think @eme64 ? > > For this PR, my idealist side tells me that you can try repeatedly applying IGVN until there is no further progress. It may be cheap since there should be few to none transformations left. If that sounds inelegant a possible approach is to refactor the `Identity` call into an `apply_identity` similar to `apply_ideal` where you can override it with a method that always returns the current node until the missed transformation is fixed. IMO in principle, there should not be cases where we legitimately depend on `Ideal` to be invoked after `Identity` like this case. @merykitty not sure if you are talking about `VerifyIterativeGVN`? So far we only verify `Value` optimizations, and even there we have to make some exceptions: the problem is that some optimization at node n can in some cases enable an optimization at some node n2 far away. This happens if n2's optimization does some graph walk that looks at more than just its direct neigbhours. We have an RFE to extend this to `Ideal` and `Identity`, and maybe more: [JDK-8298951](https://bugs.openjdk.org/browse/JDK-8298951) Umbrella: improve CCP and IGVN verification (feel free to contact me if you want to work on one of these) But the issue is that some optimizations just walk too far, so that we cannot reasonably expect for all those nodes to notify us back if they have a mutation. We cannot really affort to notify everybody, or repeat IGVN on the whole graph. The goal of IGVN was that it is basically linear. If you make some local change, this should only trigger IGVN on the local graph, and not necessarily on the whole graph. That would be a huge performance impact. Still: for many optimizations we could probably verify that they are always done after IGVN. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21599#issuecomment-2538005603 From epeter at openjdk.org Thu Dec 12 07:27:43 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 12 Dec 2024 07:27:43 GMT Subject: RFR: 8332827: [REDO] C2: crash in compiled code because of dependency on removed range check CastIIs [v2] In-Reply-To: References: Message-ID: <_mrmBtTPF1ZViTaKy9d4L8wzIiCD7DqfWKAl1ErYjuo=.69dd3404-12a9-46d3-8a0f-8e2ed24e8951@github.com> On Wed, 11 Dec 2024 15:36:51 GMT, Roland Westrelin wrote: >> The failures that caused the backout were due to a bug in: >> >> `find_or_make_integer_cast()` >> >> which caused the `_range_check_dependency` field's value of the >> existing cast node to not be set in the new cast node. I re-ran some >> testing with this fixed and current jdk repo and found that a few >> vectorization tests fail now because the patch pushes range check >> `CastII` nodes through `AddI`/`SubI`. To fix this, I delayed that >> transformation to after loop opts. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge branch 'master' into JDK-8332827 > - review > - white spaces > - test & fix > - Revert "8332829: [BACKOUT] C2: crash in compiled code because of dependency on removed range check CastIIs" > > This reverts commit c9a7b9772d96d9a4825d9da2aacc277534282860. Thanks for the updates, @rwestrel ! The fix looks reasonable. You can add an extra comment for the precedence edges for Div. I launched some testing now, feel free to ping me later. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22568#issuecomment-2538009216 From amitkumar at openjdk.org Thu Dec 12 07:56:41 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 12 Dec 2024 07:56:41 GMT Subject: RFR: 8344026: Ubsan: prevent potential integer overflow in c1_LIRGenerator_.cpp file [v8] In-Reply-To: References: <2en5GIxXeljH8KabsgIjJ0-m2OYUCj_bXaFpOSfRZiM=.0fa0c343-a924-4051-8a67-58cf20733ff5@github.com> <0Fc1FNC7m8UIskRUEZVyGDMkvPP-UAjxGLPw3JNNLBk=.1ad396f5-fed3-42bf-95a7-47f1846fa1f9@github.com> <5e_MHZarll6kDLApsRIwapr5YSqW7VlRNiMC93VH0Uw=.3ff988f1-726f-4c91-ac6e-fcc5538bb974@github.com> <6RikOqqZivUpV30HKp4GhaZzzx7RgT2tde_8tLy1TCk=.d3457927-512f-4742-a8ad-3e116619c606@github.com> <22-J3Eelze79XROngKRHXcnkMgzwjdqC3zd7bry48qo=.5c3fe69c-bbd6-4a7b-9429-1e5582324860@github.com> Message-ID: On Wed, 11 Dec 2024 12:05:07 GMT, Tobias Hartmann wrote: >>> Is this just undefined behaviour, but no compiler so far actually does something unexpected? >> >> Exactly. The compilers already seem to generate code which matches the unsigned behavior. Only UBSan checks found the issue. They run into the issue with existing tests which are passing when UBSan is disabled. Overflow doesn't cause errors when using "wrap around" behavior which matches the Java integer arithmetic semantics. > > Okay, thanks for the clarification! > Tests would really be great. Is that possible? @eme64 I have added one testcase. Can you have a look. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22144#discussion_r1881543473 From duke at openjdk.org Thu Dec 12 08:48:14 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Thu, 12 Dec 2024 08:48:14 GMT Subject: RFR: 8331717: C2: Crash with SIGFPE Because Loop Predication Wrongly Hoists Division Requiring Zero Check [v2] In-Reply-To: References: Message-ID: > Fixes a bug in loop predication where not strictly invariant tests involving divisions or modulo are pulled out of the loop. > > The bug can be seen in this code: > > > public class Reduced { > static int iArr[] = new int[100]; > > public static void main(String[] strArr) { > for (int i = 0; i < 10000; i++) { > test(); > } > } > > static void test() { > int i1 = 0; > > for (int i4 : iArr) { > i4 = i1; > try { > iArr[0] = 1 / i4; > i4 = iArr[2 / i4]; // Source of the crash > } catch (ArithmeticException a_e) { > } > } > } > } > > > The crucial element is the division `2 / i4`. Since it is used to access an array, it is the input to a range check. See node 230: > Screenshot 2024-12-11 at 15 14 47 > > Loop predication will try to pull this range check together with its input, the division, before the `for` loop. Due to a bug in Invariance::compute_invariance loop predication is allowed to do so, which results in the division being pulled out without its non-zero check. 322 is a clone of 230 placed before the loop head without any zero check for the divisor: > > Screenshot 2024-12-11 at 15 11 48 > > > More specifically, this bug occurs because 230's zero check (174 If) is not its direct control. Between the zero check and the division is another unrelated check (189 If), which can be hoisted: > > Screenshot 2024-12-12 at 09 14 37 > > Due to the way the Invariance class works, a check that can be hoisted will be marked as invariant. Then, to determine if any given node is invariant, Invariance::compute_invariance checks if all its inputs are invariant: > > https://github.com/openjdk/jdk/blob/ceb4366ebf02f64165acc4a23195e9e3a7398a5c/src/hotspot/share/opto/loopPredicate.cpp#L456-L475 > > Therefore, when recursively traversing the inputs for 230 Div, the hoisted, unrelated check 174 If is hit before the zero check. As that check has been hoisted before already, it is marked invariant and `all_inputs_invariant` will be set to true. (All other inputs are also trivially invariant as they are constant.) > > To fix this, Invariance::compute_invariance must check tha... theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: Combine test files ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22666/files - new: https://git.openjdk.org/jdk/pull/22666/files/50146ee8..8da7cb58 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22666&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22666&range=00-01 Stats: 106 lines in 2 files changed: 36 ins; 70 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22666.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22666/head:pull/22666 PR: https://git.openjdk.org/jdk/pull/22666 From roland at openjdk.org Thu Dec 12 08:57:07 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 12 Dec 2024 08:57:07 GMT Subject: RFR: 8332827: [REDO] C2: crash in compiled code because of dependency on removed range check CastIIs [v3] In-Reply-To: References: Message-ID: <8xV78YdPMP_gGozlp_8CIFB0JRjYU7RuqtsJOE72L8c=.f0816dd7-13f1-4a33-96a9-5c6d8d27ccfa@github.com> > The failures that caused the backout were due to a bug in: > > `find_or_make_integer_cast()` > > which caused the `_range_check_dependency` field's value of the > existing cast node to not be set in the new cast node. I re-ran some > testing with this fixed and current jdk repo and found that a few > vectorization tests fail now because the patch pushes range check > `CastII` nodes through `AddI`/`SubI`. To fix this, I delayed that > transformation to after loop opts. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22568/files - new: https://git.openjdk.org/jdk/pull/22568/files/85a9155a..2aa63d52 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22568&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22568&range=01-02 Stats: 5 lines in 1 file changed: 5 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22568.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22568/head:pull/22568 PR: https://git.openjdk.org/jdk/pull/22568 From roland at openjdk.org Thu Dec 12 09:01:37 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 12 Dec 2024 09:01:37 GMT Subject: RFR: 8332827: [REDO] C2: crash in compiled code because of dependency on removed range check CastIIs [v2] In-Reply-To: <_mrmBtTPF1ZViTaKy9d4L8wzIiCD7DqfWKAl1ErYjuo=.69dd3404-12a9-46d3-8a0f-8e2ed24e8951@github.com> References: <_mrmBtTPF1ZViTaKy9d4L8wzIiCD7DqfWKAl1ErYjuo=.69dd3404-12a9-46d3-8a0f-8e2ed24e8951@github.com> Message-ID: On Thu, 12 Dec 2024 07:25:33 GMT, Emanuel Peter wrote: > Thanks for the updates, @rwestrel ! The fix looks reasonable. You can add an extra comment for the precedence edges for Div. I launched some testing now, feel free to ping me later. Thanks for reviewing this. Would it be possible to run performance testing again? Performance testing was run for the initial fix (that was backed out) and this one is slightly different. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22568#issuecomment-2538257343 From roland at openjdk.org Thu Dec 12 09:01:38 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 12 Dec 2024 09:01:38 GMT Subject: RFR: 8332827: [REDO] C2: crash in compiled code because of dependency on removed range check CastIIs [v3] In-Reply-To: <0Cgmu8NKsfGbHYwNo7In32WmPwbOEk4h1SlpIVB9-a0=.8d376993-dfc4-40d9-b9cd-67d75d3eabd1@github.com> References: <0Cgmu8NKsfGbHYwNo7In32WmPwbOEk4h1SlpIVB9-a0=.8d376993-dfc4-40d9-b9cd-67d75d3eabd1@github.com> Message-ID: <2PtIuDBDnFpruKgigfIzfyX3jP8EDF4Z35ruAJFauaQ=.df5de959-ed53-48da-841b-0388d7c401d9@github.com> On Thu, 12 Dec 2024 07:17:06 GMT, Emanuel Peter wrote: >> If the divisor input for a `Div` (or `Mod` etc.) is not not null, then the control input of the `Div` is set to null. It could be that the divisor input is found not null because the subgraph for that input contains a `CastII`. If that happens, removing the `CastII` during final graph reshap could cause the `Div` to float above the `CastII` and above the condition that allowed the type of the `CastII` to be narrowed. This could cause a crash. >> So when the `CastII` is removed, precedence edges are aded to the `Div` node. If the `Div` is then replaced by a `DivMod`, we need to transfer the precedence edges to the `DivMod` node. > > Why not add such a comment to the code then? Added ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22568#discussion_r1881645924 From roland at openjdk.org Thu Dec 12 09:22:40 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 12 Dec 2024 09:22:40 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v6] In-Reply-To: References: Message-ID: On Wed, 11 Dec 2024 18:33:03 GMT, Quan Anh Mai wrote: > I'm worried this solution will be susceptible to profile pollution, i.e. if a loop is called from 2 places, one with a large trip count and one with a small trip count, then the caller with a small trip count will suffer. As a result, in addition to this, making loop nests with 1 iteration of the outer loop cheaper will also be necessary. Profile pollution seems to be an issue that leads to [JDK-8307084](https://bugs.openjdk.org/browse/JDK-8307084), too. With the current patch, a loop is short running if it has fewer than `ShortLoopIter` because restricting the number of iterations that much makes the outer strip mined loop go away as well. We could give up on that and consider that all loops that run for no more than roughly `max_jint/max RC scale` is short running. That would realistically cover a lot of loops (loops with millions of iterations that is most of them?) and would help mitigate that problem. I'm also looking at ways to make range checks (once hoisted out of loop by predication) cheaper to compute as the current RC expressions end up being quite convoluted. Do you have anything in mind? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21630#issuecomment-2538327199 From epeter at openjdk.org Thu Dec 12 09:41:40 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 12 Dec 2024 09:41:40 GMT Subject: RFR: 8344026: Ubsan: prevent potential integer overflow in c1_LIRGenerator_.cpp file [v9] In-Reply-To: References: Message-ID: <25bsUwnk1kUiZTtUG2JAuDNs9z3HrjgI2GB_RTh8TcE=.a077c9f2-204e-47f5-a0a3-26c35bcaa31f@github.com> On Wed, 11 Dec 2024 11:24:13 GMT, Amit Kumar wrote: >> This PR converts datatype from `jint` to `juint` for contstant `c` check in c1_LIRGenerator_.cpp. Please look JBS for more info. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > adds testcase Ok, looks reasonable. Thanks for adding the tests! ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22144#pullrequestreview-2498626481 From amitkumar at openjdk.org Thu Dec 12 09:54:47 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 12 Dec 2024 09:54:47 GMT Subject: RFR: 8344026: Ubsan: prevent potential integer overflow in c1_LIRGenerator_.cpp file [v9] In-Reply-To: References: Message-ID: On Wed, 11 Dec 2024 11:24:13 GMT, Amit Kumar wrote: >> This PR converts datatype from `jint` to `juint` for contstant `c` check in c1_LIRGenerator_.cpp. Please look JBS for more info. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > adds testcase Thanks for the reviews :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/22144#issuecomment-2538399255 From amitkumar at openjdk.org Thu Dec 12 09:54:48 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 12 Dec 2024 09:54:48 GMT Subject: Integrated: 8344026: Ubsan: prevent potential integer overflow in c1_LIRGenerator_.cpp file In-Reply-To: References: Message-ID: On Fri, 15 Nov 2024 10:04:51 GMT, Amit Kumar wrote: > This PR converts datatype from `jint` to `juint` for contstant `c` check in c1_LIRGenerator_.cpp. Please look JBS for more info. This pull request has now been integrated. Changeset: 77e49322 Author: Amit Kumar URL: https://git.openjdk.org/jdk/commit/77e493226d6875bb73faaadedc4170dbb5d4fdc5 Stats: 109 lines in 5 files changed: 88 ins; 0 del; 21 mod 8344026: Ubsan: prevent potential integer overflow in c1_LIRGenerator_.cpp file Reviewed-by: aph, epeter, mdoerr ------------- PR: https://git.openjdk.org/jdk/pull/22144 From duke at openjdk.org Thu Dec 12 09:55:24 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Thu, 12 Dec 2024 09:55:24 GMT Subject: RFR: 8341696: C2: Non-fluid StringBuilder pattern bails out in OptoStringConcat [v4] In-Reply-To: References: Message-ID: > Extends stringopts to also recognize non-fluid uses of StringBuilder and optimize them the same way. > > For example, this basic case was not optimized before and is optimized with this PR: > > > StringBuilder sb = new StringBuilder(); > sb.append("a"); > sb.append(a); > return sb.toString(); theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: Fix test name ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22537/files - new: https://git.openjdk.org/jdk/pull/22537/files/69127e10..b0a1b226 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22537&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22537&range=02-03 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/22537.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22537/head:pull/22537 PR: https://git.openjdk.org/jdk/pull/22537 From mdoerr at openjdk.org Thu Dec 12 10:40:40 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 12 Dec 2024 10:40:40 GMT Subject: RFR: 8346039: [BACKOUT] - [C1] LIR Operations with one input should be implemented as LIR_Op1 In-Reply-To: References: Message-ID: On Wed, 11 Dec 2024 20:41:29 GMT, David Holmes wrote: > Revert "8345609: [C1] LIR Operations with one input should be implemented as LIR_Op1" > > This reverts commit a21d21f4d7b74e21f68b6bf9c5dc9ba7d3f9963c. > > Testing tiers 1-3 in progress > > Thanks Thanks everyone for looking into the issue and the assistance. I have understood the problem. C1 allows using the same register for a temp and an input operand. Another problem is that the `UseKNLSetting` code needs the `floatConst(-0.0)` for `lir_abs` and `lir_neg`. Only the code for `UseAVX > 2 && !VM_Version::supports_avx512vl()` is affected. The rest looks correct and all tests had passed without this mode. I'll check with @sviswa7 if we can remove this extra code from C1. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22690#issuecomment-2538508923 From mdoerr at openjdk.org Thu Dec 12 13:24:41 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 12 Dec 2024 13:24:41 GMT Subject: RFR: 8346039: [BACKOUT] - [C1] LIR Operations with one input should be implemented as LIR_Op1 In-Reply-To: References: Message-ID: <46sWQrU0OJt8huH0o_GZGyfZPMXVGDdoD1_mhdA6WzA=.c96bd956-f436-41c1-a6f3-cbc6e95dbd1f@github.com> On Wed, 11 Dec 2024 20:41:29 GMT, David Holmes wrote: > Revert "8345609: [C1] LIR Operations with one input should be implemented as LIR_Op1" > > This reverts commit a21d21f4d7b74e21f68b6bf9c5dc9ba7d3f9963c. > > Testing tiers 1-3 in progress > > Thanks I've created a draft PR with the removal: https://github.com/openjdk/jdk/pull/22709 The new code actually looks better to me, but I'll wait for the feedback from Intel. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22690#issuecomment-2538905574 From roland at openjdk.org Thu Dec 12 13:43:56 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 12 Dec 2024 13:43:56 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v7] In-Reply-To: References: Message-ID: > To optimize a long counted loop and long range checks in a long or int > counted loop, the loop is turned into a loop nest. When the loop has > few iterations, the overhead of having an outer loop whose backedge is > never taken, has a measurable cost. Furthermore, creating the loop > nest usually causes one iteration of the loop to be peeled so > predicates can be set up. If the loop is short running, then it's an > extra iteration that's run with range checks (compared to an int > counted loop with int range checks). > > This change doesn't create a loop nest when: > > 1- it can be determined statically at loop nest creation time that the > loop runs for a short enough number of iterations > > 2- profiling reports that the loop runs for no more than ShortLoopIter > iterations (1000 by default). > > For 2-, a guard is added which is implemented as yet another predicate. > > While this change is in principle simple, I ran into a few > implementation issues: > > - while c2 has a way to compute the number of iterations of an int > counted loop, it doesn't have that for long counted loop. The > existing logic for int counted loops promotes values to long to > avoid overflows. I reworked it so it now works for both long and int > counted loops. > > - I added a new deoptimization reason (Reason_short_running_loop) for > the new predicate. Given the number of iterations is narrowed down > by the predicate, the limit of the loop after transformation is a > cast node that's control dependent on the short running loop > predicate. Because once the counted loop is transformed, it is > likely that range check predicates will be inserted and they will > depend on the limit, the short running loop predicate has to be the > one that's further away from the loop entry. Now it is also possible > that the limit before transformation depends on a predicate > (TestShortRunningLongCountedLoopPredicatesClone is an example), we > can have: new predicates inserted after the transformation that > depend on the casted limit that itself depend on old predicates > added before the transformation. To solve this cicular dependency, > parse and assert predicates are cloned between the old predicates > and the loop head. The cloned short running loop parse predicate is > the one that's used to insert the short running loop predicate. > > - In the case of a long counted loop, the loop is transformed into a > regular loop with a new limit and transformed range checks that's > later turned into an in counted loop. The int ... Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 25 commits: - Merge branch 'master' into JDK-8342692 - review - reviews - Update src/hotspot/share/opto/loopTransform.cpp Co-authored-by: Emanuel Peter - Merge branch 'master' into JDK-8342692 - whitespaces - more - merge - more - one more test - ... and 15 more: https://git.openjdk.org/jdk/compare/0ad64234...5e29f364 ------------- Changes: https://git.openjdk.org/jdk/pull/21630/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21630&range=06 Stats: 1311 lines in 23 files changed: 1247 ins; 15 del; 49 mod Patch: https://git.openjdk.org/jdk/pull/21630.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21630/head:pull/21630 PR: https://git.openjdk.org/jdk/pull/21630 From qamai at openjdk.org Thu Dec 12 13:55:43 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 12 Dec 2024 13:55:43 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v3] In-Reply-To: <3z-tAI6f1fUGXLgIwNylfaUB1DdNaQ6ZM-n2WZ_V9Ak=.61a402ae-7234-4a0d-ab0e-e29866522988@github.com> References: <3z-tAI6f1fUGXLgIwNylfaUB1DdNaQ6ZM-n2WZ_V9Ak=.61a402ae-7234-4a0d-ab0e-e29866522988@github.com> Message-ID: <77jVfeJ5kShS42DFpUMuIKfy-RpKd-9HPoOK1Yxng4o=.150b4ef8-f912-4795-9d91-09302f32d679@github.com> On Wed, 11 Dec 2024 04:51:48 GMT, Jasmine Karthikeyan wrote: >> Hi @jaskarth , >> >> I was trying to lower LShiftVB/URShiftVB IR to LShiftVS/URShiftVS for the x86 backend. I intend to factor out through GVN upfront byte vector to short vector conversion for both input and shift vectors if these are shared across two operations since the x86 ISA does not support direct byte vector shifts. >> >> To begin with, I made the following diff expecting status quo, but getting the following Fatal error at build time, can you kindly check? >> >> >> diff --git a/src/hotspot/cpu/x86/c2_lowering_x86.cpp b/src/hotspot/cpu/x86/c2_lowering_x86.cpp >> index cf4c014ffda..bc8df186396 100644 >> --- a/src/hotspot/cpu/x86/c2_lowering_x86.cpp >> +++ b/src/hotspot/cpu/x86/c2_lowering_x86.cpp >> @@ -32,6 +32,6 @@ Node* PhaseLowering::lower_node_platform(Node* n) { >> } >> >> bool PhaseLowering::should_lower() { >> - return false; >> + return true; >> } >> #endif // COMPILER2 >> ``` >> >> >> ERROR: Build failed for target 'images' in configuration 'linux-x86_64-server-fastdebug' (exit code 2) >> >> === Output from failing command(s) repeated here === >> * For target support_interim-image-jlink__jlink_interim_image_exec: >> # >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # Internal Error (/home/jatinbha/sandboxes/jdk-trunk/jdk/src/hotspot/share/opto/node.hpp:960), pid=1961256, tid=1961293 >> # assert(is_MachReturn()) failed: invalid node class: Con >> # >> # JRE version: OpenJDK Runtime Environment (24.0) (fastdebug build 24-internal-adhoc.root.jdk) >> # Java VM: OpenJDK 64-Bit Server VM (fastdebug 24-internal-adhoc.root.jdk, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) >> # Problematic frame: >> # V [libjvm.so+0x140f939] Matcher::Fixup_Save_On_Entry()+0x279 >> # >> # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /home/jatinbha/sandboxes/jdk-trunk/jdk/make/core.1961256) >> # >> # An error report file with more information is saved as: >> # /home/jatinbha/sandboxes/jdk-trunk/jdk/make/hs_err_pid1961256.log >> ... (rest of output omitted) >> >> * All command lines available in /home/jatinbha/sandboxes/jdk-trunk/jdk/build/linux-x86_64-server-fastdebug/make-support/failure-logs. > > @jatin-bhateja Thanks for the bug report! I was able to reproduce it on my system as well. I apologize for the late response, I had been busy with my studies and only had a chance to look recently. It looks like this bug is caused because we run `Identity()` on all of the nodes, which finds new optimizations by replacing `LoadNode`s with a non-load identity. This causes an uncommon trap branch to be optimized out and the `HaltNode` replaced with a TOP `ConNode`. Usually these are removed by `RootNode::Ideal`, but since we don't do regular `Ideal` in lowering it ends up running into an assert later. > > This is interesting because it suggests that there are potentially calls to `Identity()` that aren't run in earlier IGVN rounds. I didn't take a very in-depth look but it might be good to investigate separately. Because of that I'm not sure if we can rely on `Identity()` here, especially if there are identity transforms that rely on other `Ideal` calls. @merykitty do you have any thoughts on the best way to fix this? > > @eme64 Ah, do you mean the recursive reduction op generation based on vector length? I think that could be handled here as well by splitting the reduction operation into multiple backend-specific nodes. It would still need to be implemented per-backend, but there could be benefits with optimizing more complex reductions, as you mention. I agree, repeating IGVN on the whole graph seems too expensive. For now, I think it would be suitable to disable `Identity` during lowering with a `PhaseGVN::apply_identity` then try to fix the missed optimization on a case-by-case basis when we want to use `Identity` during lowering. What do you think @jaskarth ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21599#issuecomment-2539020544 From qamai at openjdk.org Thu Dec 12 14:05:42 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 12 Dec 2024 14:05:42 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v7] In-Reply-To: References: Message-ID: On Thu, 12 Dec 2024 13:43:56 GMT, Roland Westrelin wrote: >> To optimize a long counted loop and long range checks in a long or int >> counted loop, the loop is turned into a loop nest. When the loop has >> few iterations, the overhead of having an outer loop whose backedge is >> never taken, has a measurable cost. Furthermore, creating the loop >> nest usually causes one iteration of the loop to be peeled so >> predicates can be set up. If the loop is short running, then it's an >> extra iteration that's run with range checks (compared to an int >> counted loop with int range checks). >> >> This change doesn't create a loop nest when: >> >> 1- it can be determined statically at loop nest creation time that the >> loop runs for a short enough number of iterations >> >> 2- profiling reports that the loop runs for no more than ShortLoopIter >> iterations (1000 by default). >> >> For 2-, a guard is added which is implemented as yet another predicate. >> >> While this change is in principle simple, I ran into a few >> implementation issues: >> >> - while c2 has a way to compute the number of iterations of an int >> counted loop, it doesn't have that for long counted loop. The >> existing logic for int counted loops promotes values to long to >> avoid overflows. I reworked it so it now works for both long and int >> counted loops. >> >> - I added a new deoptimization reason (Reason_short_running_loop) for >> the new predicate. Given the number of iterations is narrowed down >> by the predicate, the limit of the loop after transformation is a >> cast node that's control dependent on the short running loop >> predicate. Because once the counted loop is transformed, it is >> likely that range check predicates will be inserted and they will >> depend on the limit, the short running loop predicate has to be the >> one that's further away from the loop entry. Now it is also possible >> that the limit before transformation depends on a predicate >> (TestShortRunningLongCountedLoopPredicatesClone is an example), we >> can have: new predicates inserted after the transformation that >> depend on the casted limit that itself depend on old predicates >> added before the transformation. To solve this cicular dependency, >> parse and assert predicates are cloned between the old predicates >> and the loop head. The cloned short running loop parse predicate is >> the one that's used to insert the short running loop predicate. >> >> - In the case of a long counted loop, the loop is transformed into a >> regular loop with a ... > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 25 commits: > > - Merge branch 'master' into JDK-8342692 > - review > - reviews > - Update src/hotspot/share/opto/loopTransform.cpp > > Co-authored-by: Emanuel Peter > - Merge branch 'master' into JDK-8342692 > - whitespaces > - more > - merge > - more > - one more test > - ... and 15 more: https://git.openjdk.org/jdk/compare/0ad64234...5e29f364 I didn't conduct extensive testing as you did so I don't know the threshold below which the overhead from the loop nest is significant. I think when it is significant then the trip count should be very low and the loop body be very small. As a result, when the loop body is small enough, you can clone the loop and create the loop nest for one of them. As the other should have a very low trip count it should stay low in size. Am I correct? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21630#issuecomment-2539042777 From epeter at openjdk.org Thu Dec 12 15:14:38 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 12 Dec 2024 15:14:38 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v7] In-Reply-To: References: Message-ID: On Thu, 12 Dec 2024 14:02:42 GMT, Quan Anh Mai wrote: > As a result, when the loop body is small enough, you can clone the loop and create the loop nest for one of them. So we could do multiversioning. I'm currently experimenting with runtime-checks for Aliasing Analysis in SuperWord, so I'll use both a trap and a multiversioning approach. Just FYI, in case we end up both doing this we should come up with a common way to do multiversioning. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21630#issuecomment-2539236831 From roland at openjdk.org Thu Dec 12 15:29:42 2024 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 12 Dec 2024 15:29:42 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v7] In-Reply-To: References: Message-ID: On Thu, 12 Dec 2024 13:43:56 GMT, Roland Westrelin wrote: >> To optimize a long counted loop and long range checks in a long or int >> counted loop, the loop is turned into a loop nest. When the loop has >> few iterations, the overhead of having an outer loop whose backedge is >> never taken, has a measurable cost. Furthermore, creating the loop >> nest usually causes one iteration of the loop to be peeled so >> predicates can be set up. If the loop is short running, then it's an >> extra iteration that's run with range checks (compared to an int >> counted loop with int range checks). >> >> This change doesn't create a loop nest when: >> >> 1- it can be determined statically at loop nest creation time that the >> loop runs for a short enough number of iterations >> >> 2- profiling reports that the loop runs for no more than ShortLoopIter >> iterations (1000 by default). >> >> For 2-, a guard is added which is implemented as yet another predicate. >> >> While this change is in principle simple, I ran into a few >> implementation issues: >> >> - while c2 has a way to compute the number of iterations of an int >> counted loop, it doesn't have that for long counted loop. The >> existing logic for int counted loops promotes values to long to >> avoid overflows. I reworked it so it now works for both long and int >> counted loops. >> >> - I added a new deoptimization reason (Reason_short_running_loop) for >> the new predicate. Given the number of iterations is narrowed down >> by the predicate, the limit of the loop after transformation is a >> cast node that's control dependent on the short running loop >> predicate. Because once the counted loop is transformed, it is >> likely that range check predicates will be inserted and they will >> depend on the limit, the short running loop predicate has to be the >> one that's further away from the loop entry. Now it is also possible >> that the limit before transformation depends on a predicate >> (TestShortRunningLongCountedLoopPredicatesClone is an example), we >> can have: new predicates inserted after the transformation that >> depend on the casted limit that itself depend on old predicates >> added before the transformation. To solve this cicular dependency, >> parse and assert predicates are cloned between the old predicates >> and the loop head. The cloned short running loop parse predicate is >> the one that's used to insert the short running loop predicate. >> >> - In the case of a long counted loop, the loop is transformed into a >> regular loop with a ... > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 25 commits: > > - Merge branch 'master' into JDK-8342692 > - review > - reviews > - Update src/hotspot/share/opto/loopTransform.cpp > > Co-authored-by: Emanuel Peter > - Merge branch 'master' into JDK-8342692 > - whitespaces > - more > - merge > - more > - one more test > - ... and 15 more: https://git.openjdk.org/jdk/compare/0ad64234...5e29f364 what about this part of my comment: > consider that all loops that run for no more than roughly max_jint/max RC scale is short running. That would realistically cover a lot of loops (loops with millions of iterations that is most of them?) Do we expect loops that run for millions of iterations (or use unusual scale factors in RC) to be that common? That would be much simpler than multiversioning. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21630#issuecomment-2539284224 From qamai at openjdk.org Thu Dec 12 15:51:37 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 12 Dec 2024 15:51:37 GMT Subject: RFR: 8342692: C2: long counted loop/long range checks: don't create loop-nest for short running loops [v7] In-Reply-To: References: Message-ID: <1BMthfIAKkiqOPFPkVejF8yWKuLdLU5IE1Kwgf9VBTU=.009abbcf-c08c-4d26-b404-7d4e13bfc36b@github.com> On Thu, 12 Dec 2024 15:27:02 GMT, Roland Westrelin wrote: > Do we expect loops that run for millions of iterations (or use unusual scale factors in RC) to be that common? That would be much simpler than multiversioning. I agree, that should be adequate and simpler. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21630#issuecomment-2539336537 From epeter at openjdk.org Thu Dec 12 15:57:04 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 12 Dec 2024 15:57:04 GMT Subject: RFR: 8346106: Verify.checkEQ: testing utility for recursive value verification Message-ID: In testing, we often generate "golden" values, and then compare the results with it. This requires comparison loops etc in every test. I would like to create a dedicated facility for this, to simplify testing in the future. This is also preparation for [JDK-8344942](https://bugs.openjdk.org/browse/JDK-8344942). I have written code like this in various tests before, see: `test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVectorFuzzer.java` `./test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegment.java` `test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java`. It is now time to make a proper facility, so I can save time when writing tests in the future. ------------- Commit messages: - fix whitespaces - JDK-8344942 Changes: https://git.openjdk.org/jdk/pull/22715/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22715&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8346106 Stats: 834 lines in 4 files changed: 834 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22715.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22715/head:pull/22715 PR: https://git.openjdk.org/jdk/pull/22715 From epeter at openjdk.org Thu Dec 12 15:57:05 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 12 Dec 2024 15:57:05 GMT Subject: RFR: 8346106: Verify.checkEQ: testing utility for recursive value verification In-Reply-To: References: Message-ID: On Thu, 12 Dec 2024 15:42:29 GMT, Emanuel Peter wrote: > In testing, we often generate "golden" values, and then compare the results with it. This requires comparison loops etc in every test. I would like to create a dedicated facility for this, to simplify testing in the future. This is also preparation for [JDK-8344942](https://bugs.openjdk.org/browse/JDK-8344942). > > I have written code like this in various tests before, see: > `test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVectorFuzzer.java` > `./test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegment.java` > `test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java`. > > It is now time to make a proper facility, so I can save time when writing tests in the future. test/hotspot/jtreg/compiler/lib/verify/Verify.java line 42: > 40: public static void checkEQ(Object a, Object b) { > 41: checkEQ(a, b, ""); > 42: } Note: this is the only entry point to the Utility. test/hotspot/jtreg/testlibrary_tests/verify/examples/TestVerifyInCheckMethod.java line 75: > 73: public static void check(Object result) { > 74: Verify.checkEQ(result, GOLD); > 75: } Note: this is how we might generate Templates in the future: Using the `@Check` method with `Verify.checkEQ`. It allows the template to basically return whatever it wants, and it will be verified with the interpreter run. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22715#discussion_r1882404152 PR Review Comment: https://git.openjdk.org/jdk/pull/22715#discussion_r1882406233 From lucy at openjdk.org Thu Dec 12 16:07:37 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Thu, 12 Dec 2024 16:07:37 GMT Subject: RFR: 8336356: [s390x] preserve Vector Register before using for string compress / expand In-Reply-To: References: Message-ID: On Mon, 25 Nov 2024 04:58:11 GMT, Amit Kumar wrote: > This PR adds `TEMP` effect for the vector register, allotted by register allocator, used in the string compress/expand intrinsic. Also it enabled the Vector computation part of those intrinsics which was disabled by https://github.com/openjdk/jdk/pull/18162 > > tier1 test, which also includes string intrinsic tests `compiler/intrinsics/string/` are clean. No regression seen. Looks good. Thank you for reactivating the vector code. ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22354#pullrequestreview-2500114893 From kvn at openjdk.org Thu Dec 12 18:01:37 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 12 Dec 2024 18:01:37 GMT Subject: RFR: 8346106: Verify.checkEQ: testing utility for recursive value verification In-Reply-To: References: Message-ID: <3WI-O9hS4jrGBOjI_N16sTlmD-YPnvoc8JJ804rP1dY=.25efdb24-d48e-4903-9409-6af007cb2da3@github.com> On Thu, 12 Dec 2024 15:42:29 GMT, Emanuel Peter wrote: > In testing, we often generate "golden" values, and then compare the results with it. This requires comparison loops etc in every test. I would like to create a dedicated facility for this, to simplify testing in the future. This is also preparation for [JDK-8344942](https://bugs.openjdk.org/browse/JDK-8344942), the Template framework. > > I have written code like this in various tests before, see: > `test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVectorFuzzer.java` > `./test/hotspot/jtreg/compiler/loopopts/superword/TestMemorySegment.java` > `test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java`. > > It is now time to make a proper facility, so I can save time when writing tests in the future. > > A related PR, for value generation: https://github.com/openjdk/jdk/pull/22716 Okay. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22715#pullrequestreview-2500425730 From jkarthikeyan at openjdk.org Thu Dec 12 19:24:52 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Thu, 12 Dec 2024 19:24:52 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v5] In-Reply-To: <7hsST_7-e0j6kT4WQUsTP0kIP7d7i98XBluSd8tG5VY=.6a0bbad0-46f6-423d-a8f1-7ad0b5c10ff6@github.com> References: <7hsST_7-e0j6kT4WQUsTP0kIP7d7i98XBluSd8tG5VY=.6a0bbad0-46f6-423d-a8f1-7ad0b5c10ff6@github.com> Message-ID: On Wed, 11 Dec 2024 04:16:36 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This patch adds a new pass to consolidate lowering of complex backend-specific code patterns, such as `MacroLogicV` and the optimization proposed by #21244. Moving these optimizations to backend code can simplify shared code, while also making it easier to develop more in-depth optimizations. The linked bug has an example of a new optimization this could enable. The new phase does GVN to de-duplicate nodes and calls nodes' `Value()` method, but it does not call `Identity()` or `Ideal()` to avoid undoing any changes done during lowering. It also reuses the IGVN worklist to avoid needing to re-create the notification mechanism. >> >> In this PR only the skeleton code for the pass is added, moving `MacroLogicV` to this system will be done separately in a future patch. Tier 1 tests pass on my linux x64 machine. Feedback on this patch would be greatly appreciated! > > Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Fix merge conflict > - Re-use optimize() and add backend-specific should_lower() > - Merge branch 'master' into phase-lowering > - Remove platform-dependent node definitions, rework PhaseLowering implementation > - Address some changes from code review > - Implement PhaseLowering Thanks a lot for the ideas! I agree that running IGVN for the whole graph could become too expensive. I think that the suggestion with `PhaseGVN::apply_identity` is good, I'll make sure to implement it in the next commit. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21599#issuecomment-2539831610 From tholenstein at openjdk.org Thu Dec 12 21:01:24 2024 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 12 Dec 2024 21:01:24 GMT Subject: RFR: 8345041: IGV: Free Placement Mode in IGV Layout [v9] In-Reply-To: References: Message-ID: > This PR depends on https://github.com/openjdk/jdk/pull/22430. To check out this PR locally: > > git fetch https://git.openjdk.org/jdk.git pull/22438/head:pull/22438 > git checkout pull/22438 > > > Introduce a Free Placement Mode to IGV, allowing users to position nodes freely without being limited to the hierarchical layout constraints. > > In this mode, users can manually drag and place nodes anywhere within the space, giving them complete control over the visual arrangement of the graph. Connections between nodes will be rendered as straight (or S curved) lines, without recalculating or enforcing hierarchical constraints. > > This feature is ideal for users who need a flexible, non-restrictive way to organize and visualize complex graph structures in a customized and intuitive manner. The free placement of nodes will remain persistent until the layout is reset or another layout mode is selected. > > ![free](https://github.com/user-attachments/assets/c150334e-4d9f-4abf-97ea-3cb42bd1c602) Tobias Holenstein has updated the pull request incrementally with one additional commit since the last revision: Update LayoutGraph.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22438/files - new: https://git.openjdk.org/jdk/pull/22438/files/14d20181..f8d699af Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22438&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22438&range=07-08 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22438.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22438/head:pull/22438 PR: https://git.openjdk.org/jdk/pull/22438 From epeter at openjdk.org Fri Dec 13 06:25:55 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 13 Dec 2024 06:25:55 GMT Subject: RFR: 8343685: C2 SuperWord: refactor VPointer with MemPointer [v2] In-Reply-To: References: Message-ID: > **This is a required step towards adding runtime-checks for Aliasing Analysis, especially Important for FFM / MemorySegments.** > > I know this one is large, but it consists of a lot of renamings, and new tests. On the whole, the new `VPointer` code is less than the old! > > **Goal** > > Replace old `VPointer` with a new version that relies on `MemPointer` - which then is a shared utility for both `MergeStores` and `SuperWord / AutoVectorization`. `MemPointer` generally parses pointers, and `VPointer` specializes this facility for the use in loops (`VLoop`). > > The old `VPointer` implementation with its recursive pattern matching was quite complicated and difficult to reason about for correctness. The approach in `MemPointer` is much simpler: iteratively decomposing sub-expressions. Further: the new implementation is more powerful at detecting equivalent invariants. > > **Future**: with the `MemPointer` implementation of `VPointer`, it should be easier to implement speculative runtime-checks for Aliasing-Analysis [JDK-8324751](https://bugs.openjdk.org/browse/JDK-8324751). The pressing need for this has come from the FFM / MemorySegment folks, like @mcimadamore and @minborg . > > **Details** > > This looks like a rather big patch, so let me explain the parts. > - Refactor of `MemPointer` in `mepointer.hpp/cpp`: > - Added concept of `Base` to `MemPointer`. This is required for the aliasing computation in `VPointer`. > - `sub_expression_has_native_base_candidate`: add special case to parse through `CastX2P` if we find a native memory base `MemorySegment.address()`, i.e. `jdk.internal.foreign.NativeMemorySegmentImpl.min`. This helps some native memory segment cases to vectorize that did not before. > - So far `MemPointer` could only answer adjacency queries. But VPointer also needs overlap queries, see the old `VPointer::not_equal` (i.e. can we prove that the two `VPointer` never overlap?). So I had to add a new case to aliasing computation: `NotOrAtDistance`. It is useful to answer the new and better named `MemPointer::never_overlaps_with`. > - Collapsed together `MemPointerDecomposedForm` and `MemPointer`. It was an unnecessary and unhelpful split. > - Re-write of `VPointer` based on `MemPointer`: > - Old pattern: > - `VPointer[mem: 847 StoreI, base: 37, adr: 37, base[ 37] + offset( 16) + invar( 0) + scale( 4) * iv]` > - `VPointer[mem: 3189 LoadB, base: 1, adr: 2273, base[ 1] + offset( 0) + invar( 0) + scale( 1) * iv]` -> `adr = CastX2P`, the a... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: fix printing ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21926/files - new: https://git.openjdk.org/jdk/pull/21926/files/4b3c7d29..4ef7cee9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21926&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21926&range=00-01 Stats: 5 lines in 1 file changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/21926.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21926/head:pull/21926 PR: https://git.openjdk.org/jdk/pull/21926 From epeter at openjdk.org Fri Dec 13 06:30:15 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 13 Dec 2024 06:30:15 GMT Subject: RFR: 8343685: C2 SuperWord: refactor VPointer with MemPointer [v3] In-Reply-To: References: Message-ID: > **This is a required step towards adding runtime-checks for Aliasing Analysis, especially Important for FFM / MemorySegments.** > > I know this one is large, but it consists of a lot of renamings, and new tests. On the whole, the new `VPointer` code is less than the old! > > **Goal** > > Replace old `VPointer` with a new version that relies on `MemPointer` - which then is a shared utility for both `MergeStores` and `SuperWord / AutoVectorization`. `MemPointer` generally parses pointers, and `VPointer` specializes this facility for the use in loops (`VLoop`). > > The old `VPointer` implementation with its recursive pattern matching was quite complicated and difficult to reason about for correctness. The approach in `MemPointer` is much simpler: iteratively decomposing sub-expressions. Further: the new implementation is more powerful at detecting equivalent invariants. > > **Future**: with the `MemPointer` implementation of `VPointer`, it should be easier to implement speculative runtime-checks for Aliasing-Analysis [JDK-8324751](https://bugs.openjdk.org/browse/JDK-8324751). The pressing need for this has come from the FFM / MemorySegment folks, like @mcimadamore and @minborg . > > **Details** > > This looks like a rather big patch, so let me explain the parts. > - Refactor of `MemPointer` in `mepointer.hpp/cpp`: > - Added concept of `Base` to `MemPointer`. This is required for the aliasing computation in `VPointer`. > - `sub_expression_has_native_base_candidate`: add special case to parse through `CastX2P` if we find a native memory base `MemorySegment.address()`, i.e. `jdk.internal.foreign.NativeMemorySegmentImpl.min`. This helps some native memory segment cases to vectorize that did not before. > - So far `MemPointer` could only answer adjacency queries. But VPointer also needs overlap queries, see the old `VPointer::not_equal` (i.e. can we prove that the two `VPointer` never overlap?). So I had to add a new case to aliasing computation: `NotOrAtDistance`. It is useful to answer the new and better named `MemPointer::never_overlaps_with`. > - Collapsed together `MemPointerDecomposedForm` and `MemPointer`. It was an unnecessary and unhelpful split. > - Re-write of `VPointer` based on `MemPointer`: > - Old pattern: > - `VPointer[mem: 847 StoreI, base: 37, adr: 37, base[ 37] + offset( 16) + invar( 0) + scale( 4) * iv]` > - `VPointer[mem: 3189 LoadB, base: 1, adr: 2273, base[ 1] + offset( 0) + invar( 0) + scale( 1) * iv]` -> `adr = CastX2P`, the a... Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 114 commits: - manual merge - fix printing - rename - fix up print - add TestEquivalentInvariants.java - improve documentation - hide parser via delegation - Merge branch 'master' into JDK-8343685-VPointer-MemPointer - make sort stable - some comment and naming improvements - ... and 104 more: https://git.openjdk.org/jdk/compare/31ceec7c...4b0504d0 ------------- Changes: https://git.openjdk.org/jdk/pull/21926/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21926&range=02 Stats: 4030 lines in 18 files changed: 1849 ins; 1536 del; 645 mod Patch: https://git.openjdk.org/jdk/pull/21926.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21926/head:pull/21926 PR: https://git.openjdk.org/jdk/pull/21926 From epeter at openjdk.org Fri Dec 13 06:52:43 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 13 Dec 2024 06:52:43 GMT Subject: RFR: 8332827: [REDO] C2: crash in compiled code because of dependency on removed range check CastIIs [v2] In-Reply-To: References: <_mrmBtTPF1ZViTaKy9d4L8wzIiCD7DqfWKAl1ErYjuo=.69dd3404-12a9-46d3-8a0f-8e2ed24e8951@github.com> Message-ID: On Thu, 12 Dec 2024 08:58:59 GMT, Roland Westrelin wrote: >> Thanks for the updates, @rwestrel ! >> The fix looks reasonable. You can add an extra comment for the precedence edges for Div. >> I launched some testing now, feel free to ping me later. > >> Thanks for the updates, @rwestrel ! The fix looks reasonable. You can add an extra comment for the precedence edges for Div. I launched some testing now, feel free to ping me later. > > Thanks for reviewing this. Would it be possible to run performance testing again? Performance testing was run for the initial fix (that was backed out) and this one is slightly different. @rwestrel the tests I launched yesterday look clean. I can run some performance testing. It will take a while to build, launch and run. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22568#issuecomment-2540671420 From epeter at openjdk.org Fri Dec 13 09:01:56 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 13 Dec 2024 09:01:56 GMT Subject: RFR: 8346107: Generators: testing utility for random value generation Message-ID: For verification testing, it is often critical to generate "interesting" values, to provoke overflows, NaN, etc. And to generate these values in the correct distribution to trigger certain optimizations. I would like to start a collection of such generators, that can then be used in testing. The goal is to grow this collection in the future, and add new types. For example `byte`, `char`, `short`, or even `Float16`. This will be helpful for the Template framework [JDK-8344942](https://bugs.openjdk.org/browse/JDK-8344942), but also other tests. Related PR, for value verification: https://github.com/openjdk/jdk/pull/22715 ------------- Commit messages: - double generators - more float generators - wip float generators - JDK-8346107 Changes: https://git.openjdk.org/jdk/pull/22716/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22716&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8346107 Stats: 1612 lines in 21 files changed: 1612 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22716.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22716/head:pull/22716 PR: https://git.openjdk.org/jdk/pull/22716 From epeter at openjdk.org Fri Dec 13 09:02:40 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 13 Dec 2024 09:02:40 GMT Subject: RFR: 8332827: [REDO] C2: crash in compiled code because of dependency on removed range check CastIIs [v2] In-Reply-To: References: <_mrmBtTPF1ZViTaKy9d4L8wzIiCD7DqfWKAl1ErYjuo=.69dd3404-12a9-46d3-8a0f-8e2ed24e8951@github.com> Message-ID: On Thu, 12 Dec 2024 08:58:59 GMT, Roland Westrelin wrote: >> Thanks for the updates, @rwestrel ! >> The fix looks reasonable. You can add an extra comment for the precedence edges for Div. >> I launched some testing now, feel free to ping me later. > >> Thanks for the updates, @rwestrel ! The fix looks reasonable. You can add an extra comment for the precedence edges for Div. I launched some testing now, feel free to ping me later. > > Thanks for reviewing this. Would it be possible to run performance testing again? Performance testing was run for the initial fix (that was backed out) and this one is slightly different. @rwestrel Ok, performance testing is launched. Please ping me again next week! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22568#issuecomment-2540917991 From roland at openjdk.org Fri Dec 13 13:10:45 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 13 Dec 2024 13:10:45 GMT Subject: RFR: 8343685: C2 SuperWord: refactor VPointer with MemPointer [v3] In-Reply-To: References: Message-ID: On Fri, 13 Dec 2024 06:30:15 GMT, Emanuel Peter wrote: >> **This is a required step towards adding runtime-checks for Aliasing Analysis, especially Important for FFM / MemorySegments.** >> >> I know this one is large, but it consists of a lot of renamings, and new tests. On the whole, the new `VPointer` code is less than the old! >> >> **Goal** >> >> Replace old `VPointer` with a new version that relies on `MemPointer` - which then is a shared utility for both `MergeStores` and `SuperWord / AutoVectorization`. `MemPointer` generally parses pointers, and `VPointer` specializes this facility for the use in loops (`VLoop`). >> >> The old `VPointer` implementation with its recursive pattern matching was quite complicated and difficult to reason about for correctness. The approach in `MemPointer` is much simpler: iteratively decomposing sub-expressions. Further: the new implementation is more powerful at detecting equivalent invariants. >> >> **Future**: with the `MemPointer` implementation of `VPointer`, it should be easier to implement speculative runtime-checks for Aliasing-Analysis [JDK-8324751](https://bugs.openjdk.org/browse/JDK-8324751). The pressing need for this has come from the FFM / MemorySegment folks, like @mcimadamore and @minborg . >> >> **Details** >> >> This looks like a rather big patch, so let me explain the parts. >> - Refactor of `MemPointer` in `mepointer.hpp/cpp`: >> - Added concept of `Base` to `MemPointer`. This is required for the aliasing computation in `VPointer`. >> - `sub_expression_has_native_base_candidate`: add special case to parse through `CastX2P` if we find a native memory base `MemorySegment.address()`, i.e. `jdk.internal.foreign.NativeMemorySegmentImpl.min`. This helps some native memory segment cases to vectorize that did not before. >> - So far `MemPointer` could only answer adjacency queries. But VPointer also needs overlap queries, see the old `VPointer::not_equal` (i.e. can we prove that the two `VPointer` never overlap?). So I had to add a new case to aliasing computation: `NotOrAtDistance`. It is useful to answer the new and better named `MemPointer::never_overlaps_with`. >> - Collapsed together `MemPointerDecomposedForm` and `MemPointer`. It was an unnecessary and unhelpful split. >> - Re-write of `VPointer` based on `MemPointer`: >> - Old pattern: >> - `VPointer[mem: 847 StoreI, base: 37, adr: 37, base[ 37] + offset( 16) + invar( 0) + scale( 4) * iv]` >> - `VPointer[mem: 3189 LoadB, base: 1, adr: 2273, base[ 1] + offset( 0) + invar( 0... > > Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 114 commits: > > - manual merge > - fix printing > - rename > - fix up print > - add TestEquivalentInvariants.java > - improve documentation > - hide parser via delegation > - Merge branch 'master' into JDK-8343685-VPointer-MemPointer > - make sort stable > - some comment and naming improvements > - ... and 104 more: https://git.openjdk.org/jdk/compare/31ceec7c...4b0504d0 This is tricky to review but looks reasonable to me. src/hotspot/share/opto/mempointer.cpp line 253: > 251: // (2) LoadL from field jdk.internal.foreign.NativeMemorySegmentImpl.min > 252: // Holds the address() of a native memory segment. > 253: bool MemPointerParser::is_native_memory_base_candidate(Node* n) { Does this really belong to the move from VPointer to MemPointer? AFAIU, it's an extra optimization on top of the move. Should be done as a separate PR? ------------- PR Review: https://git.openjdk.org/jdk/pull/21926#pullrequestreview-2502259410 PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1883911898 From roland at openjdk.org Fri Dec 13 13:11:38 2024 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 13 Dec 2024 13:11:38 GMT Subject: RFR: 8332827: [REDO] C2: crash in compiled code because of dependency on removed range check CastIIs [v2] In-Reply-To: References: <_mrmBtTPF1ZViTaKy9d4L8wzIiCD7DqfWKAl1ErYjuo=.69dd3404-12a9-46d3-8a0f-8e2ed24e8951@github.com> Message-ID: On Thu, 12 Dec 2024 08:58:59 GMT, Roland Westrelin wrote: >> Thanks for the updates, @rwestrel ! >> The fix looks reasonable. You can add an extra comment for the precedence edges for Div. >> I launched some testing now, feel free to ping me later. > >> Thanks for the updates, @rwestrel ! The fix looks reasonable. You can add an extra comment for the precedence edges for Div. I launched some testing now, feel free to ping me later. > > Thanks for reviewing this. Would it be possible to run performance testing again? Performance testing was run for the initial fix (that was backed out) and this one is slightly different. > @rwestrel Ok, performance testing is launched. Please ping me again next week! Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22568#issuecomment-2541430207 From lucy at openjdk.org Fri Dec 13 14:52:48 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Fri, 13 Dec 2024 14:52:48 GMT Subject: RFR: 8341908: CodeHeapAnalytics: Output Imperfections and unwanted vm termination In-Reply-To: References: Message-ID: On Thu, 10 Oct 2024 14:45:55 GMT, Lutz Schmidt wrote: > Output is properly aligned again now. Was messed up when method hotness was removed (part of method sweeper). > Assertions have been replaced by printing an error message and gracefully returning. Avoids vm crashes caused by diagnostic actions. > Some code restructuring, removal of redundancies. > > Reviews are highly welcomed. Waiting for final tweaks. Then I'll start the RFR process. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21452#issuecomment-2523092366 From lucy at openjdk.org Fri Dec 13 14:52:48 2024 From: lucy at openjdk.org (Lutz Schmidt) Date: Fri, 13 Dec 2024 14:52:48 GMT Subject: RFR: 8341908: CodeHeapAnalytics: Output Imperfections and unwanted vm termination Message-ID: Output is properly aligned again now. Was messed up when method hotness was removed (part of method sweeper). Assertions have been replaced by printing an error message and gracefully returning. Avoids vm crashes caused by diagnostic actions. Some code restructuring, removal of redundancies. Reviews are highly welcomed. ------------- Commit messages: - 8341908: CodeHeapAnalytics: Output Imperfections and unwanted vm termination Changes: https://git.openjdk.org/jdk/pull/21452/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21452&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8341908 Stats: 147 lines in 2 files changed: 72 ins; 29 del; 46 mod Patch: https://git.openjdk.org/jdk/pull/21452.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21452/head:pull/21452 PR: https://git.openjdk.org/jdk/pull/21452 From psandoz at openjdk.org Fri Dec 13 18:46:44 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Fri, 13 Dec 2024 18:46:44 GMT Subject: RFR: 8346174: UMAX/UMIN are missing from XXXVector::reductionOperations Message-ID: Add functional support for unsigned min/max reductions on vectors. We also need to ensure that the `reductionCoerced` intrinsic bails out when there is no reduction operation for the lanewise operation. When intrinsic support is added for integral vectors this will still be the case for floating point vectors. ------------- Commit messages: - Functional support for unsigned min/max reductions. Changes: https://git.openjdk.org/jdk/pull/22741/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22741&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8346174 Stats: 3590 lines in 27 files changed: 3590 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22741.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22741/head:pull/22741 PR: https://git.openjdk.org/jdk/pull/22741 From qamai at openjdk.org Fri Dec 13 18:53:03 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 13 Dec 2024 18:53:03 GMT Subject: RFR: 8346174: UMAX/UMIN are missing from XXXVector::reductionOperations In-Reply-To: References: Message-ID: On Fri, 13 Dec 2024 18:42:27 GMT, Paul Sandoz wrote: > Add functional support for unsigned min/max reductions on vectors. > > We also need to ensure that the `reductionCoerced` intrinsic bails out when there is no reduction operation for the lanewise operation. When intrinsic support is added for integral vectors this will still be the case for floating point vectors. LGTM otherwise. @jatin-bhateja You may want to implement reduction intrinsics for these operations. src/hotspot/share/opto/vectorIntrinsics.cpp line 1388: > 1386: int opc = VectorSupport::vop2ideal(opr->get_con(), elem_bt); > 1387: int sopc = ReductionNode::opcode(opc, elem_bt); > 1388: if (sopc == opc) { Can you merge this with the following `if`? ------------- Marked as reviewed by qamai (Committer). PR Review: https://git.openjdk.org/jdk/pull/22741#pullrequestreview-2502924424 PR Comment: https://git.openjdk.org/jdk/pull/22741#issuecomment-2542054187 PR Review Comment: https://git.openjdk.org/jdk/pull/22741#discussion_r1884349873 From psandoz at openjdk.org Fri Dec 13 21:25:48 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Fri, 13 Dec 2024 21:25:48 GMT Subject: RFR: 8346174: UMAX/UMIN are missing from XXXVector::reductionOperations [v2] In-Reply-To: References: Message-ID: > Add functional support for unsigned min/max reductions on vectors. > > We also need to ensure that the `reductionCoerced` intrinsic bails out when there is no reduction operation for the lanewise operation. When intrinsic support is added for integral vectors this will still be the case for floating point vectors. Paul Sandoz has updated the pull request incrementally with one additional commit since the last revision: Merge checks ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22741/files - new: https://git.openjdk.org/jdk/pull/22741/files/b21e79d7..7b06675f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22741&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22741&range=00-01 Stats: 8 lines in 1 file changed: 1 ins; 6 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22741.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22741/head:pull/22741 PR: https://git.openjdk.org/jdk/pull/22741 From psandoz at openjdk.org Fri Dec 13 21:25:49 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Fri, 13 Dec 2024 21:25:49 GMT Subject: RFR: 8346174: UMAX/UMIN are missing from XXXVector::reductionOperations [v2] In-Reply-To: References: Message-ID: On Fri, 13 Dec 2024 18:48:31 GMT, Quan Anh Mai wrote: >> Paul Sandoz has updated the pull request incrementally with one additional commit since the last revision: >> >> Merge checks > > src/hotspot/share/opto/vectorIntrinsics.cpp line 1388: > >> 1386: int opc = VectorSupport::vop2ideal(opr->get_con(), elem_bt); >> 1387: int sopc = ReductionNode::opcode(opc, elem_bt); >> 1388: if (sopc == opc) { > > Can you merge this with the following `if`? Yes, i merged them. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22741#discussion_r1884556079 From dlong at openjdk.org Fri Dec 13 22:09:39 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 13 Dec 2024 22:09:39 GMT Subject: RFR: 8343789: Move mutable nmethod data out of CodeCache [v2] In-Reply-To: <0-RuI0yrrHrzDhR9Od-KDkfhDBQERUlR8mtaQrzWFD0=.d84d10a0-c5b8-4138-a164-935947aa080d@github.com> References: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> <0-RuI0yrrHrzDhR9Od-KDkfhDBQERUlR8mtaQrzWFD0=.d84d10a0-c5b8-4138-a164-935947aa080d@github.com> Message-ID: On Tue, 10 Dec 2024 22:31:05 GMT, Boris Ulasevich wrote: >> src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 1423: >> >>> 1421: } else { >>> 1422: uint64_t offset; >>> 1423: adrp(dest, const_addr, offset); >> >> I don't see how this ADRP path ever gets called now. The only caller is in MacroAssembler::movoop(), which uses a dummy Address in the CodeCache. I think we need to force near/far with an extra bool parameter. The way this function is currently used, a better name might be ldr_patchable(). > > Thanks for pointing that out. I reworked the method a little. > > ldr_patchable() is only called by movoop(). If oops is moved to a separate location, the distance is certainly > 1MB and adrp+ldr is the only way to access oops. The single LDR path is not used now, but I leave it for future use. > > > void ldr_patchable(Register dest, const Address &const_addr, bool fits_in_ldr_range = false) { > if (fits_in_ldr_range) { > intptr_t offset = pc() - const_addr.target(); > assert(offset >= -1024 * 1024, offset < 1024 * 1024, "pointer does not fit into pc-relative ldr range") > ldr(dest, const_addr); > } else { > uint64_t offset; > adrp(dest, const_addr, offset); > ldr(dest, Address(dest, offset)); > } > } The problem with leaving "fits_in_ldr_range" in for future use is that it hasn't been tested. I suggest just removing it. It uses pc() instead of the codecache boundaries to decide the range, so it only works for code that isn't relocated. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21276#discussion_r1884593551 From aph at openjdk.org Sat Dec 14 10:29:36 2024 From: aph at openjdk.org (Andrew Haley) Date: Sat, 14 Dec 2024 10:29:36 GMT Subject: RFR: 8336356: [s390x] preserve Vector Register before using for string compress / expand In-Reply-To: References: Message-ID: On Mon, 25 Nov 2024 04:58:11 GMT, Amit Kumar wrote: > This PR adds `TEMP` effect for the vector register, allotted by register allocator, used in the string compress/expand intrinsic. Also it enabled the Vector computation part of those intrinsics which was disabled by https://github.com/openjdk/jdk/pull/18162 > > tier1 test, which also includes string intrinsic tests `compiler/intrinsics/string/` are clean. No regression seen. src/hotspot/cpu/s390/c2_MacroAssembler_s390.cpp line 187: > 185: VectorRegister Vzero = v19; > 186: VectorRegister Vsrc_first = v20; > 187: VectorRegister Vsrc_last = v23; It would make more sense for the arguments to this function have names `Vtmp1`, `Vtmp2`, and so on. Rather than these aliases. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22354#discussion_r1885016702 From amitkumar at openjdk.org Sat Dec 14 11:09:32 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Sat, 14 Dec 2024 11:09:32 GMT Subject: RFR: 8336356: [s390x] preserve Vector Register before using for string compress / expand [v2] In-Reply-To: References: Message-ID: <74bL2UiQS8CryPFKQobm4e26StJcPtx2E8uzKC9xJyU=.fe406703-3a6b-4a30-b159-b3c39af3de94@github.com> > This PR adds `TEMP` effect for the vector register, allotted by register allocator, used in the string compress/expand intrinsic. Also it enabled the Vector computation part of those intrinsics which was disabled by https://github.com/openjdk/jdk/pull/18162 > > tier1 test, which also includes string intrinsic tests `compiler/intrinsics/string/` are clean. No regression seen. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: suggestion from Andrew ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22354/files - new: https://git.openjdk.org/jdk/pull/22354/files/a01706dd..3543e1c8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22354&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22354&range=00-01 Stats: 23 lines in 2 files changed: 0 ins; 10 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/22354.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22354/head:pull/22354 PR: https://git.openjdk.org/jdk/pull/22354 From amitkumar at openjdk.org Sat Dec 14 11:24:39 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Sat, 14 Dec 2024 11:24:39 GMT Subject: RFR: 8336356: [s390x] preserve Vector Register before using for string compress / expand [v2] In-Reply-To: References: Message-ID: On Sat, 14 Dec 2024 10:26:48 GMT, Andrew Haley wrote: >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> suggestion from Andrew > > src/hotspot/cpu/s390/c2_MacroAssembler_s390.cpp line 187: > >> 185: VectorRegister Vzero = v19; >> 186: VectorRegister Vsrc_first = v20; >> 187: VectorRegister Vsrc_last = v23; > > It would make more sense for the arguments to this function have names `Vtmp1`, `Vtmp2`, and so on. Rather than these aliases. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22354#discussion_r1885035403 From qamai at openjdk.org Sat Dec 14 13:06:35 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 14 Dec 2024 13:06:35 GMT Subject: RFR: 8346174: UMAX/UMIN are missing from XXXVector::reductionOperations [v2] In-Reply-To: References: Message-ID: On Fri, 13 Dec 2024 21:25:48 GMT, Paul Sandoz wrote: >> Add functional support for unsigned min/max reductions on vectors. >> >> We also need to ensure that the `reductionCoerced` intrinsic bails out when there is no reduction operation for the lanewise operation. When intrinsic support is added for integral vectors this will still be the case for floating point vectors. > > Paul Sandoz has updated the pull request incrementally with one additional commit since the last revision: > > Merge checks Marked as reviewed by qamai (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22741#pullrequestreview-2503863317 From aph at openjdk.org Sat Dec 14 18:35:41 2024 From: aph at openjdk.org (Andrew Haley) Date: Sat, 14 Dec 2024 18:35:41 GMT Subject: RFR: 8336356: [s390x] preserve Vector Register before using for string compress / expand [v2] In-Reply-To: <74bL2UiQS8CryPFKQobm4e26StJcPtx2E8uzKC9xJyU=.fe406703-3a6b-4a30-b159-b3c39af3de94@github.com> References: <74bL2UiQS8CryPFKQobm4e26StJcPtx2E8uzKC9xJyU=.fe406703-3a6b-4a30-b159-b3c39af3de94@github.com> Message-ID: <8mGTEam-npGv5VpNOjrRQyRtTl7lapNpaoCVBfVIE8Q=.85184882-4e59-4567-a6b7-d4a28d5d28ca@github.com> On Sat, 14 Dec 2024 11:09:32 GMT, Amit Kumar wrote: >> This PR adds `TEMP` effect for the vector register, allotted by register allocator, used in the string compress/expand intrinsic. Also it enabled the Vector computation part of those intrinsics which was disabled by https://github.com/openjdk/jdk/pull/18162 >> >> tier1 test, which also includes string intrinsic tests `compiler/intrinsics/string/` are clean. No regression seen. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > suggestion from Andrew src/hotspot/cpu/s390/s390.ad line 10675: > 10673: ins_encode %{ > 10674: __ string_compress($result$$Register, $src$$Register, $dst$$Register, $len$$Register, > 10675: $tmp$$Register, true, false, $v16$$VectorRegister, $v17$$VectorRegister, $v18$$VectorRegister, Fix indentation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22354#discussion_r1885350415 From amitkumar at openjdk.org Sun Dec 15 02:51:21 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Sun, 15 Dec 2024 02:51:21 GMT Subject: RFR: 8336356: [s390x] preserve Vector Register before using for string compress / expand [v3] In-Reply-To: References: Message-ID: <2IChmmSYUxMnxDEr_GyA4IK0k4KT4nZ0p6K5mFRi1ZU=.4900a17e-30dd-4bc3-9c47-18fbcc510e94@github.com> > This PR adds `TEMP` effect for the vector register, allotted by register allocator, used in the string compress/expand intrinsic. Also it enabled the Vector computation part of those intrinsics which was disabled by https://github.com/openjdk/jdk/pull/18162 > > tier1 test, which also includes string intrinsic tests `compiler/intrinsics/string/` are clean. No regression seen. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: indentation fix in s390.ad file ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22354/files - new: https://git.openjdk.org/jdk/pull/22354/files/3543e1c8..5c5079f6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22354&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22354&range=01-02 Stats: 6 lines in 1 file changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/22354.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22354/head:pull/22354 PR: https://git.openjdk.org/jdk/pull/22354 From amitkumar at openjdk.org Sun Dec 15 02:54:41 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Sun, 15 Dec 2024 02:54:41 GMT Subject: RFR: 8336356: [s390x] preserve Vector Register before using for string compress / expand [v2] In-Reply-To: <8mGTEam-npGv5VpNOjrRQyRtTl7lapNpaoCVBfVIE8Q=.85184882-4e59-4567-a6b7-d4a28d5d28ca@github.com> References: <74bL2UiQS8CryPFKQobm4e26StJcPtx2E8uzKC9xJyU=.fe406703-3a6b-4a30-b159-b3c39af3de94@github.com> <8mGTEam-npGv5VpNOjrRQyRtTl7lapNpaoCVBfVIE8Q=.85184882-4e59-4567-a6b7-d4a28d5d28ca@github.com> Message-ID: On Sat, 14 Dec 2024 18:33:16 GMT, Andrew Haley wrote: >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> suggestion from Andrew > > src/hotspot/cpu/s390/s390.ad line 10675: > >> 10673: ins_encode %{ >> 10674: __ string_compress($result$$Register, $src$$Register, $dst$$Register, $len$$Register, >> 10675: $tmp$$Register, true, false, $v16$$VectorRegister, $v17$$VectorRegister, $v18$$VectorRegister, > > Fix indentation. done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22354#discussion_r1885466910 From qamai at openjdk.org Sun Dec 15 05:29:26 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Sun, 15 Dec 2024 05:29:26 GMT Subject: RFR: 8315066: Add unsigned bounds and known bits to TypeInt/Long [v30] In-Reply-To: References: Message-ID: > Hi, > > This patch adds unsigned bounds and known bits constraints to TypeInt and TypeLong. This opens more transformation opportunities in an elegant manner as well as helps avoid some ad-hoc rules in Hotspot. > > In general, a `TypeInt/Long` represents a set of values `x` that satisfies: `x s>= lo && x s<= hi && x u>= ulo && x u<= uhi && (x & zeros) == 0 && (x & ones) == ones`. These constraints are not independent, e.g. an int that lies in [0, 3] in signed domain must also lie in [0, 3] in unsigned domain and have all bits but the last 2 being unset. As a result, we must canonicalize the constraints (tighten the constraints so that they are optimal) before constructing a `TypeInt/Long` instance. > > This is extracted from #15440 , node value transformations are left for later PRs. I have also added unit tests to verify the soundness of constraint normalization. > > Please kindly review, thanks a lot. > > Testing > > - [x] GHA > - [x] Linux x64, tier 1-4 Quan Anh Mai has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 40 commits: - Merge branch 'master' into unsignedbounds - move try_cast to Type - Merge branch 'master' into unsignedbounds - build failure - build failures - whitespace - further reviews - Merge branch 'master' into unsignedbounds - Merge branch 'master' into unsignedbounds - address reviews - ... and 30 more: https://git.openjdk.org/jdk/compare/6b022bb6...85acf6ef ------------- Changes: https://git.openjdk.org/jdk/pull/17508/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=17508&range=29 Stats: 2002 lines in 10 files changed: 1446 ins; 325 del; 231 mod Patch: https://git.openjdk.org/jdk/pull/17508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/17508/head:pull/17508 PR: https://git.openjdk.org/jdk/pull/17508 From aph at openjdk.org Sun Dec 15 13:11:37 2024 From: aph at openjdk.org (Andrew Haley) Date: Sun, 15 Dec 2024 13:11:37 GMT Subject: RFR: 8336356: [s390x] preserve Vector Register before using for string compress / expand [v3] In-Reply-To: <2IChmmSYUxMnxDEr_GyA4IK0k4KT4nZ0p6K5mFRi1ZU=.4900a17e-30dd-4bc3-9c47-18fbcc510e94@github.com> References: <2IChmmSYUxMnxDEr_GyA4IK0k4KT4nZ0p6K5mFRi1ZU=.4900a17e-30dd-4bc3-9c47-18fbcc510e94@github.com> Message-ID: <0PxMtC9386ql7tJJSn4ASrOUiDgze53x3jc-R-ddw_c=.e36ebb4f-f205-451c-8876-a47dd467fa0e@github.com> On Sun, 15 Dec 2024 02:51:21 GMT, Amit Kumar wrote: >> This PR adds `TEMP` effect for the vector register, allotted by register allocator, used in the string compress/expand intrinsic. Also it enabled the Vector computation part of those intrinsics which was disabled by https://github.com/openjdk/jdk/pull/18162 >> >> tier1 test, which also includes string intrinsic tests `compiler/intrinsics/string/` are clean. No regression seen. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > indentation fix in s390.ad file That looks right. It's extremely difficult to be really sure that no mistakes have been made in the translation, but I think it's better than it was. ------------- Marked as reviewed by aph (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22354#pullrequestreview-2504553592 From amitkumar at openjdk.org Sun Dec 15 16:02:40 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Sun, 15 Dec 2024 16:02:40 GMT Subject: RFR: 8336356: [s390x] preserve Vector Register before using for string compress / expand [v3] In-Reply-To: <0PxMtC9386ql7tJJSn4ASrOUiDgze53x3jc-R-ddw_c=.e36ebb4f-f205-451c-8876-a47dd467fa0e@github.com> References: <2IChmmSYUxMnxDEr_GyA4IK0k4KT4nZ0p6K5mFRi1ZU=.4900a17e-30dd-4bc3-9c47-18fbcc510e94@github.com> <0PxMtC9386ql7tJJSn4ASrOUiDgze53x3jc-R-ddw_c=.e36ebb4f-f205-451c-8876-a47dd467fa0e@github.com> Message-ID: On Sun, 15 Dec 2024 13:09:09 GMT, Andrew Haley wrote: > That looks right. It's extremely difficult to be really sure that no mistakes have been made in the translation, but I think it's better than it was. this statement is true for half of my ports ;-) Thanks for the reviews!! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22354#issuecomment-2543928291 From jbhateja at openjdk.org Sun Dec 15 17:59:51 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sun, 15 Dec 2024 17:59:51 GMT Subject: Withdrawn: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 11:40:01 GMT, Jatin Bhateja wrote: > Hi All, > > This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) > > Following is the summary of changes included with this patch:- > > 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. > 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. > 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. > - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. > 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. > 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. > 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. > 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF > 6. Auto-vectorization of newly supported scalar operations. > 7. X86 and AARCH64 backend implementation for all supported intrinsics. > 9. Functional and Performance validation tests. > > Kindly review and share your feedback. > > Best Regards, > Jatin This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/21490 From jbhateja at openjdk.org Sun Dec 15 18:19:35 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sun, 15 Dec 2024 18:19:35 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated scalar operations Message-ID: Hi All, This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) Following is the summary of changes included with this patch:- 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/22754#issuecomment-2543982577)for more details. 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF 6. Auto-vectorization of newly supported scalar operations. 7. X86 and AARCH64 backend implementation for all supported intrinsics. 9. Functional and Performance validation tests. Kindly review and share your feedback. Best Regards, Jatin ------------- Commit messages: - C2 compiler support for float16 scalar operations. Changes: https://git.openjdk.org/jdk/pull/22754/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22754&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342103 Stats: 2633 lines in 54 files changed: 2589 ins; 0 del; 44 mod Patch: https://git.openjdk.org/jdk/pull/22754.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22754/head:pull/22754 PR: https://git.openjdk.org/jdk/pull/22754 From jbhateja at openjdk.org Sun Dec 15 18:19:35 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sun, 15 Dec 2024 18:19:35 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated scalar operations In-Reply-To: References: Message-ID: <5s62x13e3X2XmGxSwxY6zrtlSLVS7Y_uDdTHJpxNz1U=.2d5cf677-7af2-4483-8ff1-1f91fb26a5da@github.com> On Sun, 15 Dec 2024 18:05:02 GMT, Jatin Bhateja wrote: > Hi All, > > This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) > > Following is the summary of changes included with this patch:- > > 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. > 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. > 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. > - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. > 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. > 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/22754#issuecomment-2543982577)for more details. > 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. > 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF > 6. Auto-vectorization of newly supported scalar operations. > 7. X86 and AARCH64 backend implementation for all supported intrinsics. > 9. Functional and Performance validation tests. > > Kindly review and share your feedback. > > Best Regards, > Jatin Some FAQs on the newly added ideal type for half-float IR nodes:- Q. Why do we not use existing TypeInt::SHORT instead of creating a new TypeH type? A. Newly defined half float type named TypeH is special as its basic type is T_SHORT while its ideal type is RegF. Thus, the C2 type system views its associated IR node as a 16-bit short value while the register allocator assigns it a floating point register. Q. Problem with ConF? A. During Auto-Vectorization, ConF replication constrains the operational vector lane count to half of what can otherwise be used for regular Float16 operation i.e. only 16 floats can be accommodated into a 512-bit vector thereby limiting the lane count of vectors in its use-def chain, one possible way to address it is through a kludge in auto-vectorizer to cast them to a 16 bits constant by analyzing its context. Newly defined Float16 constant nodes 'ConH' are inherently 16-bit encoded IEEE 754 FP16 values and can be efficiently packed to leverage full target vector width. All Float16 IR nodes now carry newly defined Type::HALF_FLOAT type instead of Type::FLOAT, thus we no longer need special handling in auto-vectorizer to prune their container type to short. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22754#issuecomment-2543982577 From dholmes at openjdk.org Mon Dec 16 00:50:41 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 16 Dec 2024 00:50:41 GMT Subject: RFR: 8344068: Windows x86-64: Out of CodeBuffer space when generating final stubs In-Reply-To: References: Message-ID: On Tue, 3 Dec 2024 14:38:39 GMT, Andrew Haley wrote: > I've had a look at the difference between an Intel AVX-512 machine (which does run out of memory) and an AMD machine, and it seems to be that the AVX-512 stubs required by Windows really do take up a lot of space. This should be sufficient. @theRealAph this needs backporting to JDK 24. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22516#issuecomment-2544252538 From jkarthikeyan at openjdk.org Mon Dec 16 02:23:30 2024 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Mon, 16 Dec 2024 02:23:30 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v6] In-Reply-To: References: Message-ID: > Hi all, > This patch adds a new pass to consolidate lowering of complex backend-specific code patterns, such as `MacroLogicV` and the optimization proposed by #21244. Moving these optimizations to backend code can simplify shared code, while also making it easier to develop more in-depth optimizations. The linked bug has an example of a new optimization this could enable. The new phase does GVN to de-duplicate nodes and calls nodes' `Value()` method, but it does not call `Identity()` or `Ideal()` to avoid undoing any changes done during lowering. It also reuses the IGVN worklist to avoid needing to re-create the notification mechanism. > > In this PR only the skeleton code for the pass is added, moving `MacroLogicV` to this system will be done separately in a future patch. Tier 1 tests pass on my linux x64 machine. Feedback on this patch would be greatly appreciated! Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: Implement apply_identity ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21599/files - new: https://git.openjdk.org/jdk/pull/21599/files/2ee3fcfd..13acc8fc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21599&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21599&range=04-05 Stats: 14 lines in 2 files changed: 12 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21599.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21599/head:pull/21599 PR: https://git.openjdk.org/jdk/pull/21599 From chagedorn at openjdk.org Mon Dec 16 06:24:50 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 16 Dec 2024 06:24:50 GMT Subject: RFR: 8344171: Clone and initialize Assertion Predicates in order instead of in reverse-order [v5] In-Reply-To: References: <7mVxXGT2OSC-v_LHrVEQmlvs0NrJxRBf3iRrMyXCAnQ=.8fc644ed-bee0-4c98-872b-3c7dc763a1ec@github.com> Message-ID: On Thu, 28 Nov 2024 09:00:25 GMT, Christian Hagedorn wrote: >> This patch changes the order in which we clone and initialize Assertion Predicates from "reverse-order" to "in-order". >> >> #### Current State: Mostly "reverse-order" for Assertion Predicates >> We are currently cloning and initializing Assertion Predicates in reverse-order out of convenience and simplicity for most of the loop splitting optimizations - except for Loop Unswitching (see next section). This means that we do the following: >> >> old target loop entry >> | >> x Cloned Template Assertion >> | Predicate 2 >> Template Assertion | >> Predicate 1 Initialized Assertion >> | ==> Predicate 2 >> Template Assertion | >> Predicate 2 Cloned Template Assertion >> | Predicate 1 >> source loop | >> Initialized Assertion >> Predicate 1 >> | >> target loop >> >> I don't think this is wrong but still kinda unexpected when trying to reason about a graph. But now with the recent refactorings, I think it's easy to change this to an in-order processing: >> >> old target loop entry >> | >> x Cloned Template Assertion >> | Predicate 1 >> Template Assertion | >> Predicate 1 Initialized Assertion >> | ==> Predicate 1 >> Template Assertion | >> Predicate 2 Cloned Template Assertion >> | Predicate 2 >> source loop | >> Initialized Assertion >> Predicate 2 >> | >> target loop >> >> This will also align all cloning/initializing of Assertion Predicates to the same order which was not the case before: Loop Unswitching already had an in... > > Christian Hagedorn has updated the pull request incrementally with two additional commits since the last revision: > > - Revert "8344035: Replace predicate walking code in Loop Unswitching with a predicate visitor" > > This reverts commit 550933659a8021131d9d1424fc6ff77b51745cbe. > - 8344035: Replace predicate walking code in Loop Unswitching with a predicate visitor Ran some testing again with latest master which looked good. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22275#issuecomment-2544701529 From chagedorn at openjdk.org Mon Dec 16 06:24:50 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 16 Dec 2024 06:24:50 GMT Subject: Integrated: 8344171: Clone and initialize Assertion Predicates in order instead of in reverse-order In-Reply-To: <7mVxXGT2OSC-v_LHrVEQmlvs0NrJxRBf3iRrMyXCAnQ=.8fc644ed-bee0-4c98-872b-3c7dc763a1ec@github.com> References: <7mVxXGT2OSC-v_LHrVEQmlvs0NrJxRBf3iRrMyXCAnQ=.8fc644ed-bee0-4c98-872b-3c7dc763a1ec@github.com> Message-ID: <2qUTMXVwBHkg79ud1wfrfQ-gmrahdBUzwmawvwLFbdM=.a5404162-34d0-413a-8138-d198dd1638c8@github.com> On Wed, 20 Nov 2024 12:40:58 GMT, Christian Hagedorn wrote: > This patch changes the order in which we clone and initialize Assertion Predicates from "reverse-order" to "in-order". > > #### Current State: Mostly "reverse-order" for Assertion Predicates > We are currently cloning and initializing Assertion Predicates in reverse-order out of convenience and simplicity for most of the loop splitting optimizations - except for Loop Unswitching (see next section). This means that we do the following: > > old target loop entry > | > x Cloned Template Assertion > | Predicate 2 > Template Assertion | > Predicate 1 Initialized Assertion > | ==> Predicate 2 > Template Assertion | > Predicate 2 Cloned Template Assertion > | Predicate 1 > source loop | > Initialized Assertion > Predicate 1 > | > target loop > > I don't think this is wrong but still kinda unexpected when trying to reason about a graph. But now with the recent refactorings, I think it's easy to change this to an in-order processing: > > old target loop entry > | > x Cloned Template Assertion > | Predicate 1 > Template Assertion | > Predicate 1 Initialized Assertion > | ==> Predicate 1 > Template Assertion | > Predicate 2 Cloned Template Assertion > | Predicate 2 > source loop | > Initialized Assertion > Predicate 2 > | > target loop > > This will also align all cloning/initializing of Assertion Predicates to the same order which was not the case before: Loop Unswitching already had an in-order cloning. > > #### Why Does Loop Unswitching Use In-Order? > The main reason wa... This pull request has now been integrated. Changeset: 3518b4bd Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/3518b4bd205f67a356bc6b531c0622ac1d97a962 Stats: 111 lines in 4 files changed: 65 ins; 33 del; 13 mod 8344171: Clone and initialize Assertion Predicates in order instead of in reverse-order Reviewed-by: epeter, kvn ------------- PR: https://git.openjdk.org/jdk/pull/22275 From epeter at openjdk.org Mon Dec 16 06:54:51 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Dec 2024 06:54:51 GMT Subject: RFR: 8343685: C2 SuperWord: refactor VPointer with MemPointer [v3] In-Reply-To: References: Message-ID: On Fri, 13 Dec 2024 13:07:09 GMT, Roland Westrelin wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 114 commits: >> >> - manual merge >> - fix printing >> - rename >> - fix up print >> - add TestEquivalentInvariants.java >> - improve documentation >> - hide parser via delegation >> - Merge branch 'master' into JDK-8343685-VPointer-MemPointer >> - make sort stable >> - some comment and naming improvements >> - ... and 104 more: https://git.openjdk.org/jdk/compare/31ceec7c...4b0504d0 > > src/hotspot/share/opto/mempointer.cpp line 253: > >> 251: // (2) LoadL from field jdk.internal.foreign.NativeMemorySegmentImpl.min >> 252: // Holds the address() of a native memory segment. >> 253: bool MemPointerParser::is_native_memory_base_candidate(Node* n) { > > Does this really belong to the move from VPointer to MemPointer? AFAIU, it's an extra optimization on top of the move. Should be done as a separate PR? I had thought about splitting this out, but it I don't see a great way of doing that. Let me explain. Before this patch, `MemPointer` parses through `CastX2P` in all cases, but `VPointer` never parses through `CastX2P`. The current patch strikes a compromise: Only parse through `CastX2P` if we have a `MemorySegment` access. Let me explore some alternatives here: - Do a separate patch, and make the change in `MemPointer` first, restricting parsing through `CastX2P` to MemorySegment cases. This could be done, but would not make sense on its own yet, as `MemPointer` is perfectly happy with parsing through `CastX2P` in all cases. - First replace `VPointer` parsing with `MemPointer` logic. But then the question is what I do with the parsing through `CastX2P`. - Never parse through CastX2P: I would have to temporarily introduce some flag that is enabled for MergeStores, but disabled for VPointer. That is extra code you would have to review as well. - Always parse through CastX2P: that would make it impossible to find a base in non-MemorySegment cases of native memory access. There are some examples in - `test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java`. I would parse through `CastX2P`, and then it is not clear what would be a suitable "base", which I now need for `VPointer` (MergeStores did not need to know about bases). TLDR: I'm not sure how to split this out without creating regressions, or creating unmotivated/untested code in the meantime. I also think that this code is relatively contained: it's all in `sub_expression_has_native_base_candidate` and `is_native_memory_base_candidate`. I was also thinking of doing some refactoring of `mempointer.cpp/hpp` separately first - but again that would not be very well motivated: for example if I add the "Base" logic first separately, then it has no use. It will only be used by `VPointer`. And these changes are very well contained to those files, so if you just look at these files first, and only then at the other files it should be relatively straight forward. Let me know if you have a good idea though. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21926#discussion_r1886266372 From epeter at openjdk.org Mon Dec 16 06:58:43 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Dec 2024 06:58:43 GMT Subject: RFR: 8343685: C2 SuperWord: refactor VPointer with MemPointer [v3] In-Reply-To: References: Message-ID: On Fri, 13 Dec 2024 13:07:32 GMT, Roland Westrelin wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 114 commits: >> >> - manual merge >> - fix printing >> - rename >> - fix up print >> - add TestEquivalentInvariants.java >> - improve documentation >> - hide parser via delegation >> - Merge branch 'master' into JDK-8343685-VPointer-MemPointer >> - make sort stable >> - some comment and naming improvements >> - ... and 104 more: https://git.openjdk.org/jdk/compare/31ceec7c...4b0504d0 > > This is tricky to review but looks reasonable to me. @rwestrel thanks for having a first look. I responded to your question about splitting in a comment. Please let me know what you think :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21926#issuecomment-2544748974 From epeter at openjdk.org Mon Dec 16 07:24:36 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Dec 2024 07:24:36 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated scalar operations In-Reply-To: References: Message-ID: On Sun, 15 Dec 2024 18:05:02 GMT, Jatin Bhateja wrote: > Hi All, > > This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) > > Following is the summary of changes included with this patch:- > > 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. > 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. > 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. > - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. > 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. > 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/22754#issuecomment-2543982577)for more details. > 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA generally operates over floating point registers, thus the compiler injects reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. > 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF > 9. X86 backend implementation for all supported intrinsics. > 10. Functional and Performance validation tests. > > Kindly review the patch and share your feedback. > > Best Regards, > Jatin Can you quickly summarize what tests you have, and what they test? test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorConvChain.java line 49: > 47: counts = {IRNode.VECTOR_CAST_HF2F, IRNode.VECTOR_SIZE_ANY, ">= 1", IRNode.VECTOR_CAST_F2HF, IRNode.VECTOR_SIZE_ANY, " >= 1"}) > 48: @IR(applyIfCPUFeatureAnd = {"avx512_fp16", "false", "zvfh", "true"}, > 49: counts = {IRNode.VECTOR_CAST_HF2F, IRNode.VECTOR_SIZE_ANY, ">= 1", IRNode.VECTOR_CAST_F2HF, IRNode.VECTOR_SIZE_ANY, " >= 1"}) Looks like this is having vector changes? And this is pre-existing: but why are we using `VECTOR_SIZE_ANY` here? Can we not know the vector size? Maybe we can introduce a new tag `max_float16` or `max_hf`. And do something like this: `IRNode.VECTOR_SIZE + "min(max_float, max_hf)", "> 0"` The downside with using `ANY` is that the exact size is not tested, and that might mean that the size is much smaller than ideal. ------------- PR Review: https://git.openjdk.org/jdk/pull/22754#pullrequestreview-2505332519 PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1886290546 From jbhateja at openjdk.org Mon Dec 16 08:16:39 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 16 Dec 2024 08:16:39 GMT Subject: RFR: 8346174: UMAX/UMIN are missing from XXXVector::reductionOperations [v2] In-Reply-To: References: Message-ID: On Fri, 13 Dec 2024 21:25:48 GMT, Paul Sandoz wrote: >> Add functional support for unsigned min/max reductions on vectors. >> >> We also need to ensure that the `reductionCoerced` intrinsic bails out when there is no reduction operation for the lanewise operation. When intrinsic support is added for integral vectors this will still be the case for floating point vectors. > > Paul Sandoz has updated the pull request incrementally with one additional commit since the last revision: > > Merge checks LGTM. ------------- Marked as reviewed by jbhateja (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22741#pullrequestreview-2505433143 From jbhateja at openjdk.org Mon Dec 16 08:20:35 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 16 Dec 2024 08:20:35 GMT Subject: RFR: 8346174: UMAX/UMIN are missing from XXXVector::reductionOperations [v2] In-Reply-To: References: Message-ID: <95RUi4ghB4_18yEyjmoYqa_L7RvzoJiViabNggatO_Y=.b824b8bf-f32b-454a-bd15-552d110093f5@github.com> On Mon, 16 Dec 2024 08:13:48 GMT, Jatin Bhateja wrote: >> Paul Sandoz has updated the pull request incrementally with one additional commit since the last revision: >> >> Merge checks > > LGTM. > @jatin-bhateja You may want to implement reduction intrinsics for these operations. Thanks @merykitty assigned this to myself https://bugs.openjdk.org/browse/JDK-8346256 ------------- PR Comment: https://git.openjdk.org/jdk/pull/22741#issuecomment-2544885973 From jbhateja at openjdk.org Mon Dec 16 08:35:31 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 16 Dec 2024 08:35:31 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated scalar operations [v2] In-Reply-To: References: Message-ID: > Hi All, > > This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) > > Following is the summary of changes included with this patch:- > > 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. > 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. > 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. > - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. > 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. > 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/22754#issuecomment-2543982577)for more details. > 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA generally operates over floating point registers, thus the compiler injects reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. > 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF > 9. X86 backend implementation for all supported intrinsics. > 10. Functional and Performance validation tests. > > Kindly review the patch and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Adding missed check in container type detection. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22754/files - new: https://git.openjdk.org/jdk/pull/22754/files/c215eac7..7cb694fa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22754&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22754&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22754.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22754/head:pull/22754 PR: https://git.openjdk.org/jdk/pull/22754 From jbhateja at openjdk.org Mon Dec 16 08:35:33 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 16 Dec 2024 08:35:33 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated scalar operations [v2] In-Reply-To: References: Message-ID: On Mon, 16 Dec 2024 07:22:04 GMT, Emanuel Peter wrote: > Can you quickly summarize what tests you have, and what they test? Patch includes functional and performance tests, as per your suggestions IR framework-based tests now cover various special cases for constant folding transformation. Let me know if you see any gaps. > test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorConvChain.java line 49: > >> 47: counts = {IRNode.VECTOR_CAST_HF2F, IRNode.VECTOR_SIZE_ANY, ">= 1", IRNode.VECTOR_CAST_F2HF, IRNode.VECTOR_SIZE_ANY, " >= 1"}) >> 48: @IR(applyIfCPUFeatureAnd = {"avx512_fp16", "false", "zvfh", "true"}, >> 49: counts = {IRNode.VECTOR_CAST_HF2F, IRNode.VECTOR_SIZE_ANY, ">= 1", IRNode.VECTOR_CAST_F2HF, IRNode.VECTOR_SIZE_ANY, " >= 1"}) > > Looks like this is having vector changes? > And this is pre-existing: but why are we using `VECTOR_SIZE_ANY` here? Can we not know the vector size? Maybe we can introduce a new tag `max_float16` or `max_hf`. And do something like this: > `IRNode.VECTOR_SIZE + "min(max_float, max_hf)", "> 0"` > > The downside with using `ANY` is that the exact size is not tested, and that might mean that the size is much smaller than ideal. Hi @eme64 , Test modification looks ok to me, we intend to trigger these IR rules on non AVX512-FP16 targets. On AVX512-FP16 target compiler will infer scalar float16 add operation which will not get auto-vectorized. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22754#issuecomment-2544914959 PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1886373922 From epeter at openjdk.org Mon Dec 16 09:06:36 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Dec 2024 09:06:36 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated scalar operations [v2] In-Reply-To: References: Message-ID: On Mon, 16 Dec 2024 08:32:32 GMT, Jatin Bhateja wrote: > > Can you quickly summarize what tests you have, and what they test? > > Patch includes functional and performance tests, as per your suggestions IR framework-based tests now cover various special cases for constant folding transformation. Let me know if you see any gaps. I was hoping that you could make a list of all optimizations that are included here, and tell me where the tests are for it. That would significantly reduce the review time on my end. Otherwise I have to correlate everything myself, and that will take me hours. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22754#issuecomment-2544992852 From chagedorn at openjdk.org Mon Dec 16 09:24:47 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Mon, 16 Dec 2024 09:24:47 GMT Subject: RFR: 8332827: [REDO] C2: crash in compiled code because of dependency on removed range check CastIIs [v3] In-Reply-To: <8xV78YdPMP_gGozlp_8CIFB0JRjYU7RuqtsJOE72L8c=.f0816dd7-13f1-4a33-96a9-5c6d8d27ccfa@github.com> References: <8xV78YdPMP_gGozlp_8CIFB0JRjYU7RuqtsJOE72L8c=.f0816dd7-13f1-4a33-96a9-5c6d8d27ccfa@github.com> Message-ID: On Thu, 12 Dec 2024 08:57:07 GMT, Roland Westrelin wrote: >> The failures that caused the backout were due to a bug in: >> >> `find_or_make_integer_cast()` >> >> which caused the `_range_check_dependency` field's value of the >> existing cast node to not be set in the new cast node. I re-ran some >> testing with this fixed and current jdk repo and found that a few >> vectorization tests fail now because the patch pushes range check >> `CastII` nodes through `AddI`/`SubI`. To fix this, I delayed that >> transformation to after loop opts. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review Looks good to me, too. Friendly ping to @eme64 about the performance testing results. src/hotspot/share/opto/compile.cpp line 3147: > 3145: DivModNode* divmod = DivModNode::make(n, bt, is_unsigned); > 3146: // If the divisor input for a Div (or Mod etc.) is not null, then the control input of the Div is set to null. > 3147: // It could be that the divisor input is found not null because its type is narrowed down by a CastII in the You mean not zero? Suggestion: // If the divisor input for a Div (or Mod etc.) is not zero, then the control input of the Div is set to null. // It could be that the divisor input is found not zero because its type is narrowed down by a CastII in the src/hotspot/share/opto/compile.cpp line 3441: > 3439: n->as_CastII()->remove_range_check_cast(this); > 3440: } > 3441: break; I guess it still works but you should move the `break` inside the braces: Suggestion: n->as_CastII()->remove_range_check_cast(this); break; } src/hotspot/share/opto/compile.hpp line 56: > 54: class CallGenerator; > 55: class CallStaticJavaNode; > 56: class CastIINode; Is this required? There is no additional change/use of `CastIINode` in this file. test/hotspot/jtreg/compiler/rangechecks/TestArrayAccessAboveRCAfterRCCastIIEliminated.java line 35: > 33: * -XX:CompileCommand=dontinline,TestArrayAccessAboveRCAfterRCCastIIEliminated::notInlined > 34: * -XX:+UnlockDiagnosticVMOptions -XX:+StressGCM TestArrayAccessAboveRCAfterRCCastIIEliminated > 35: * @run main/othervm TestArrayAccessAboveRCAfterRCCastIIEliminated Would `main` be sufficient if you run without flags? Suggestion: * @run main TestArrayAccessAboveRCAfterRCCastIIEliminated test/hotspot/jtreg/compiler/rangechecks/TestRangeCheckCastIISplitThruPhi.java line 30: > 28: * > 29: * @run main/othervm -XX:-TieredCompilation -XX:-UseOnStackReplacement -XX:-BackgroundCompilation TestRangeCheckCastIISplitThruPhi > 30: * @run main/othervm TestRangeCheckCastIISplitThruPhi Same here: Suggestion: * @run main TestRangeCheckCastIISplitThruPhi ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22568#pullrequestreview-2505483146 PR Review Comment: https://git.openjdk.org/jdk/pull/22568#discussion_r1886383838 PR Review Comment: https://git.openjdk.org/jdk/pull/22568#discussion_r1886397864 PR Review Comment: https://git.openjdk.org/jdk/pull/22568#discussion_r1886446579 PR Review Comment: https://git.openjdk.org/jdk/pull/22568#discussion_r1886456422 PR Review Comment: https://git.openjdk.org/jdk/pull/22568#discussion_r1886456893 From epeter at openjdk.org Mon Dec 16 09:47:43 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 16 Dec 2024 09:47:43 GMT Subject: RFR: 8332827: [REDO] C2: crash in compiled code because of dependency on removed range check CastIIs [v2] In-Reply-To: References: <_mrmBtTPF1ZViTaKy9d4L8wzIiCD7DqfWKAl1ErYjuo=.69dd3404-12a9-46d3-8a0f-8e2ed24e8951@github.com> Message-ID: On Fri, 13 Dec 2024 13:09:17 GMT, Roland Westrelin wrote: >>> Thanks for the updates, @rwestrel ! The fix looks reasonable. You can add an extra comment for the precedence edges for Div. I launched some testing now, feel free to ping me later. >> >> Thanks for reviewing this. Would it be possible to run performance testing again? Performance testing was run for the initial fix (that was backed out) and this one is slightly different. > >> @rwestrel Ok, performance testing is launched. Please ping me again next week! > > Thanks! @rwestrel the performance testing looks good - though it is always hard to be 100% sure. I see that this is now targetted for JDK24. Is that intentional or can we move it to JDK25? Of course we want to backport it eventually, but maybe we can give it a little more time to see if performance drops due to this over JDK25, and the backport a little later. What do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22568#issuecomment-2545085912 From shade at openjdk.org Mon Dec 16 11:05:05 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 16 Dec 2024 11:05:05 GMT Subject: RFR: 8346264: "Total compile time" counter should include time spent in failing/bailout compiles Message-ID: Noticed this when looking through JMH compiler profiler results. Current `CompilerBroker` counters that are fed into `CompilationMXBean.getTotalCompilationTime()` and JFR `CompilerStatistics` only records the time for successful compilations. If we take a while in compilation and then fail/bail, that time would not be accounted for. While this seems to be a long-standing behavior, there are problems with this: 1. This is not what "total" means. 2. This gives us a blind spot in measuring time taken in failing/bailing compilations. 3. It does not match well the Javadoc for `CompilationMXBean.getTotalCompilationTime()`: "Returns the approximate accumulated elapsed time (in milliseconds) spent in compilation." -- since the time spent in failing/bailing compilation is still time spent in compilation. Additional testing: - [x] Linux x86_64 server release, `jdk/jfr java/lang/management` ------------- Commit messages: - Fix Changes: https://git.openjdk.org/jdk/pull/22760/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22760&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8346264 Stats: 11 lines in 1 file changed: 5 ins; 6 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22760.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22760/head:pull/22760 PR: https://git.openjdk.org/jdk/pull/22760 From shade at openjdk.org Mon Dec 16 11:25:50 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 16 Dec 2024 11:25:50 GMT Subject: RFR: 8346264: "Total compile time" counter should include time spent in failing/bailout compiles [v2] In-Reply-To: References: Message-ID: > Noticed this when looking through JMH compiler profiler results. > > Current `CompilerBroker` counters that are fed into `CompilationMXBean.getTotalCompilationTime()` and JFR `CompilerStatistics` only records the time for successful compilations. If we take a while in compilation and then fail/bail, that time would not be accounted for. > > While this seems to be a long-standing behavior, there are problems with this: > 1. This is not what "total" means. > 2. This gives us a blind spot in measuring time taken in failing/bailing compilations. > 3. It does not match well the Javadoc for `CompilationMXBean.getTotalCompilationTime()`: "Returns the approximate accumulated elapsed time (in milliseconds) spent in compilation." -- since the time spent in failing/bailing compilation is still time spent in compilation. > > Additional testing: > - [x] Linux x86_64 server release, `jdk/jfr java/lang/management` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Update comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22760/files - new: https://git.openjdk.org/jdk/pull/22760/files/0873477b..67a11988 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22760&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22760&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/22760.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22760/head:pull/22760 PR: https://git.openjdk.org/jdk/pull/22760 From rcastanedalo at openjdk.org Mon Dec 16 12:52:07 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Mon, 16 Dec 2024 12:52:07 GMT Subject: RFR: 8344951: Stabilize write barrier micro-benchmarks Message-ID: This changeset makes the `testArrayWriteBarrierFastPath*` micro-benchmarks in `WriteBarrier.java` more robust w.r.t. a few external factors, so that different GC barrier models can be compared more reliably. More specifically, it ensures that: - the main loop is never unrolled regardless of the selected GC algorithm, - no spilling occurs within the main loop for the final C2 compilation, and - the majority of the execution time is spent in the write operation and its associated barrier. The changes preserve the original G1 barrier test profile, i.e. practically no write crosses heap regions under the default G1 configuration. More sophisticated benchmarks may be added in the future that exercise different G1 barrier levels. Thanks to Thomas Schatzl for reporting and discussing issues in the micro-benchmarks. **Testing:** build and run the micro-benchmarks (linux-x64, linux-aarch64, windows-x64, macosx-x64, macosx-aarch64). ------------- Commit messages: - Allow inlining, get rid of reads - Add tentative testArrayWriteBarrierFastPathRealLarge version with a single, fixed new value - Avoid loads and range checks in null-writing micro-benchmarks - Do not inline array micro-benchmarks to avoid spilling in the innermost loop - Disable loop unrolling - Update copyright Changes: https://git.openjdk.org/jdk/pull/22763/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22763&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8344951 Stats: 21 lines in 1 file changed: 5 ins; 10 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/22763.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22763/head:pull/22763 PR: https://git.openjdk.org/jdk/pull/22763 From jbhateja at openjdk.org Mon Dec 16 14:23:16 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 16 Dec 2024 14:23:16 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated scalar operations [v3] In-Reply-To: References: Message-ID: > Hi All, > > This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) > > Following is the summary of changes included with this patch:- > > 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. > 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. > 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. > - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. > 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. > 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/22754#issuecomment-2543982577)for more details. > 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA generally operates over floating point registers, thus the compiler injects reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. > 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF > 9. X86 backend implementation for all supported intrinsics. > 10. Functional and Performance validation tests. > > Kindly review the patch and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Adding more test points ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22754/files - new: https://git.openjdk.org/jdk/pull/22754/files/7cb694fa..3a6697e3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22754&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22754&range=01-02 Stats: 56 lines in 3 files changed: 54 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22754.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22754/head:pull/22754 PR: https://git.openjdk.org/jdk/pull/22754 From jbhateja at openjdk.org Mon Dec 16 14:23:16 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 16 Dec 2024 14:23:16 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated scalar operations [v3] In-Reply-To: References: Message-ID: <03ozC1NfpoBMN8fyLJY6gt2_7GZQpDtTHEj8cgxD_dU=.dd851537-820d-4b72-acf9-b170aa756e4b@github.com> On Mon, 16 Dec 2024 09:03:38 GMT, Emanuel Peter wrote: > > > Can you quickly summarize what tests you have, and what they test? > > > > > > Patch includes functional and performance tests, as per your suggestions IR framework-based tests now cover various special cases for constant folding transformation. Let me know if you see any gaps. > > I was hoping that you could make a list of all optimizations that are included here, and tell me where the tests are for it. That would significantly reduce the review time on my end. Otherwise I have to correlate everything myself, and that will take me hours. Validations details:- A) x86 backend changes - new assembler instruction - macro assembly routines. Test point:- test/jdk/jdk/incubator/vector/ScalarFloat16OperationsTest.java - This test is based on a testng framework and includes new DataProviders to generate test vectors. - Test vectors cover the entire float16 value range and also special floating point values (NaN, +Int, -Inf, 0.0 and -0.0) B) GVN transformations:- - Value Transforms Test point:- test test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java - Covers all the constant folding scenarios for add, sub, mul, div, sqrt, fma, min, and max operations addressed by this patch. - It also tests special case scenarios for each operation as specified by Java language specification. - identity Transforms Test point:- test test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java - Covers identity transformation for ReinterpretS2HFNode, DivHFNode - idealization Transforms Test points:- test/hotspot/jtreg/compiler/c2/irTests/MulHFNodeIdealizationTests.java :- test test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java - Contains test point for the following transform MulHF idealization i.e. MulHF * 2 => AddHF - Contains test point for the following transform DivHF SRC , PoT(constant) => MulHF SRC * reciprocal (constant) - Contains idealization test points for the following transform ConvF2HF(FP32BinOp(ConvHF2F(x), ConvHF2F(y))) => ReinterpretHF2S(FP16BinOp(ReinterpretS2HF(x), ReinterpretS2HF(y))) ------------- PR Comment: https://git.openjdk.org/jdk/pull/22754#issuecomment-2545754021 From qamai at openjdk.org Mon Dec 16 15:14:44 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 16 Dec 2024 15:14:44 GMT Subject: RFR: 8342662: C2: Add new phase for backend-specific lowering [v6] In-Reply-To: References: Message-ID: On Mon, 16 Dec 2024 02:23:30 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This patch adds a new pass to consolidate lowering of complex backend-specific code patterns, such as `MacroLogicV` and the optimization proposed by #21244. Moving these optimizations to backend code can simplify shared code, while also making it easier to develop more in-depth optimizations. The linked bug has an example of a new optimization this could enable. The new phase does GVN to de-duplicate nodes and calls nodes' `Value()` method, but it does not call `Identity()` or `Ideal()` to avoid undoing any changes done during lowering. It also reuses the IGVN worklist to avoid needing to re-create the notification mechanism. >> >> In this PR only the skeleton code for the pass is added, moving `MacroLogicV` to this system will be done separately in a future patch. Tier 1 tests pass on my linux x64 machine. Feedback on this patch would be greatly appreciated! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Implement apply_identity Marked as reviewed by qamai (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21599#pullrequestreview-2506470461 From duke at openjdk.org Mon Dec 16 15:46:45 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Mon, 16 Dec 2024 15:46:45 GMT Subject: RFR: 8346289: Confusing phrasing in IR Framework README / User-defined Regexes Message-ID: > If such a user-defined regex represents a not yet supported C2 IR node, it is highly encouraged to directly add a new IR node placeholder string definition to IRNode for it instead together with a static regex mapping block. The combination of "instead together" makes this sentence hard to read. I had to re-read it several times to grasp it. ------------- Commit messages: - 8346289: Confusing phrasing in IR Framework README / User-defined Regexes Changes: https://git.openjdk.org/jdk/pull/22766/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22766&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8346289 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22766.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22766/head:pull/22766 PR: https://git.openjdk.org/jdk/pull/22766 From amitkumar at openjdk.org Mon Dec 16 16:12:43 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 16 Dec 2024 16:12:43 GMT Subject: Integrated: 8336356: [s390x] preserve Vector Register before using for string compress / expand In-Reply-To: References: Message-ID: <_faCant8sktztcvYymNTcnZh8eCnLDHJTWTH0gnRuGw=.70633e72-8634-4702-88e9-c8237643d06e@github.com> On Mon, 25 Nov 2024 04:58:11 GMT, Amit Kumar wrote: > This PR adds `TEMP` effect for the vector register, allotted by register allocator, used in the string compress/expand intrinsic. Also it enabled the Vector computation part of those intrinsics which was disabled by https://github.com/openjdk/jdk/pull/18162 > > tier1 test, which also includes string intrinsic tests `compiler/intrinsics/string/` are clean. No regression seen. This pull request has now been integrated. Changeset: cb925955 Author: Amit Kumar URL: https://git.openjdk.org/jdk/commit/cb92595599a8a22a807a29bf56f1e02e792386a9 Stats: 221 lines in 3 files changed: 155 ins; 16 del; 50 mod 8336356: [s390x] preserve Vector Register before using for string compress / expand Reviewed-by: aph, lucy ------------- PR: https://git.openjdk.org/jdk/pull/22354 From duke at openjdk.org Mon Dec 16 16:13:47 2024 From: duke at openjdk.org (duke) Date: Mon, 16 Dec 2024 16:13:47 GMT Subject: Withdrawn: 8340602: C2: LoadNode::split_through_phi might exhaust nodes in case of base_is_phi In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 11:58:12 GMT, Daohan Qu wrote: > # Description > > [JDK-6934604](https://github.com/openjdk/jdk/commit/b4977e887a53c898b96a7d37a3bf94742c7cc194) introduces the flag `AggressiveUnboxing` in jdk8u, and [JDK-8217919](https://github.com/openjdk/jdk/commit/71759e3177fcd6926bb36a30a26f9f3976f2aae8) enables it by default in jdk13u. > > But it seems that JDK-6934604 forgets to check duplicate `PhiNode` generated in the function `LoadNode::split_through_phi(PhaseGVN *phase)` (in `memnode.cpp`) in the case that `base` is phi but `mem` is not phi. More exactly, `LoadNode::Identity(PhaseTransform *phase)` doesn't search for `PhiNode` in the correct region in that case. > > This might cause infinite split in the case of a loop, which is similar to the bugs fixed in [JDK-6673473](https://github.com/openjdk/jdk/commit/30dc0edfc877000c0ae20384f228b45ba82807b7). The infinite split results in "Out of nodes" and make the method "not compilable". > > Since JDK-8217919 (in jdk13u), all the later versions of jdks are affected by this bug when the expected optimization pattern appears in the code. For example, the following three micro-benchmarks running with > > > make test \ > TEST="micro:org.openjdk.bench.java.util.stream.tasks.IntegerMax.Bulk.bulk_seq_inner micro:org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_lambda micro:org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_methodRef" \ > TEST_OPTS="VM_OPTIONS=-XX:+UseParallelGC" > > > shows performance improvement after this PR applied. (`-XX:+UseParallelGC` is only for reproduce this bug, all the bms in the following table are run with this option.) > > |benchmark (throughput, unit: ops/s)| jdk-before-this-patch | jdk-after-this-patch | > |---|---|---| > |org.openjdk.bench.java.util.stream.tasks.IntegerMax.Bulk.bulk_seq_inner | 26.678 ?(99.9%) 0.574 ops/s | 55.692 ?(99.9%) 4.419 ops/s | > |org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_lambda | 26.792 ?(99.9%) 1.924 ops/s | 64.882 ?(99.9%) 4.175 ops/s | > |org.openjdk.bench.java.util.stream.tasks.IntegerMax.Lambda.bulk_seq_methodRef | 27.023 ?(99.9%) 1.116 ops/s | 66.313 ?(99.9%) 0.802 ops/s | > > # Reproduction > > Compiled and run the reduced test case `Test.java` in the appendix below using > > > java -Xbatch -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation -XX:LogFile=comp.log -XX:+UseParallelGC Test > > > and you could find that `Test$Obj.calc` is tagged with `make_not_compilable` and see some output like > > > " > > > And when `-XX:+AbortVMOn... This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/21134 From dnsimon at openjdk.org Mon Dec 16 16:51:12 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 16 Dec 2024 16:51:12 GMT Subject: RFR: 8346282: [JVMCI] Add failure reason support to UnresolvedJava/Type/Method/Field Message-ID: The JVMCI UnresolvedJava/Type/Method/Field types can be used to represent resolution failures. It would be useful if an exception describing the resolution failure could be attached to these objects. ------------- Commit messages: - allow exception to be attached to UnresolvedJava/Type/Method/Field Changes: https://git.openjdk.org/jdk/pull/22767/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22767&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8346282 Stats: 66 lines in 4 files changed: 54 ins; 1 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/22767.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22767/head:pull/22767 PR: https://git.openjdk.org/jdk/pull/22767 From bulasevich at openjdk.org Mon Dec 16 16:59:57 2024 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Mon, 16 Dec 2024 16:59:57 GMT Subject: RFR: 8343789: Move mutable nmethod data out of CodeCache [v6] In-Reply-To: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> References: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> Message-ID: > This change relocates mutable data (such as relocations, oops, and metadata) from the nmethod. The change follows the recent PR #18984, which relocated immutable nmethod data from the CodeCache. > > The core idea remains the same: use the CodeCache for executable code while moving additional data to the C heap. The primary motivations are improving security and enhancing code density. > > Although performance is not the main focus, testing on AArch64 CPUs, where code density plays a significant role, has shown a 1?2% performance improvement in specific scenarios, such as the CodeCacheStress test and the Renaissance Dotty benchmark. > > The numbers. Immutable data constitutes **~30%** on the nmehtod. Mutable data constitutes **~8%** of nmethod. Example (statistics collected on the CodeCacheStress benchmark): > - nmethod_count:134000, total_compilation_time: 510460ms > - total allocation time malloc_mutable/malloc_immutable/CodeCache_alloc: 62ms/114ms/6333ms, > - total allocation size (mutable/immutable/nmentod): 64MB/192MB/488MB > > Functional testing: jtreg on arm/aarch/x86. > Performance testing: renaissance/dacapo/SPECjvm2008 benchmarks. > > Alternative solution (see comments): In the future, relocations can be moved to _immutable_data. Boris Ulasevich has updated the pull request incrementally with one additional commit since the last revision: removing dead code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21276/files - new: https://git.openjdk.org/jdk/pull/21276/files/b4c7c24b..6b97993a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21276&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21276&range=04-05 Stats: 10 lines in 1 file changed: 0 ins; 5 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/21276.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21276/head:pull/21276 PR: https://git.openjdk.org/jdk/pull/21276 From bulasevich at openjdk.org Mon Dec 16 16:59:57 2024 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Mon, 16 Dec 2024 16:59:57 GMT Subject: RFR: 8343789: Move mutable nmethod data out of CodeCache [v2] In-Reply-To: References: <9mDuowjpORyWudVnSB1FWCW_o1pBgMnAvJus6YGkXLs=.67ba4652-2470-448d-baa2-464e824b2fcb@github.com> <0-RuI0yrrHrzDhR9Od-KDkfhDBQERUlR8mtaQrzWFD0=.d84d10a0-c5b8-4138-a164-935947aa080d@github.com> Message-ID: On Fri, 13 Dec 2024 22:06:42 GMT, Dean Long wrote: >> Thanks for pointing that out. I reworked the method a little. >> >> ldr_patchable() is only called by movoop(). If oops is moved to a separate location, the distance is certainly > 1MB and adrp+ldr is the only way to access oops. The single LDR path is not used now, but I leave it for future use. >> >> >> void ldr_patchable(Register dest, const Address &const_addr, bool fits_in_ldr_range = false) { >> if (fits_in_ldr_range) { >> intptr_t offset = pc() - const_addr.target(); >> assert(offset >= -1024 * 1024, offset < 1024 * 1024, "pointer does not fit into pc-relative ldr range") >> ldr(dest, const_addr); >> } else { >> uint64_t offset; >> adrp(dest, const_addr, offset); >> ldr(dest, Address(dest, offset)); >> } >> } > > The problem with leaving "fits_in_ldr_range" in for future use is that it hasn't been tested. I suggest just removing it. It uses pc() instead of the codecache boundaries to decide the range, so it only works for code that isn't relocated. Agree. Thank you! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21276#discussion_r1887179694 From darcy at openjdk.org Mon Dec 16 18:45:39 2024 From: darcy at openjdk.org (Joe Darcy) Date: Mon, 16 Dec 2024 18:45:39 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated scalar operations [v3] In-Reply-To: References: Message-ID: On Mon, 16 Dec 2024 14:23:16 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) >> >> Following is the summary of changes included with this patch:- >> >> 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. >> 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. >> 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. >> - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. >> 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. >> 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/22754#issuecomment-2543982577)for more details. >> 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA generally operates over floating point registers, thus the compiler injects reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. >> 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF >> 9. X86 backend implementation for all supported intrinsics. >> 10. Functional and Performance validation tests. >> >> Kindly review the patch and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Adding more test points src/java.base/share/classes/jdk/internal/vm/vector/Float16Math.java line 35: > 33: * The class {@code Float16Math} constains intrinsic entry points corresponding > 34: * to scalar numeric operations defined in Float16 class. > 35: * @author Please remove all author tags. We haven't used them in new code in the JDK for some time. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1887325969 From darcy at openjdk.org Mon Dec 16 18:50:42 2024 From: darcy at openjdk.org (Joe Darcy) Date: Mon, 16 Dec 2024 18:50:42 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated scalar operations [v3] In-Reply-To: References: Message-ID: On Mon, 16 Dec 2024 14:23:16 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) >> >> Following is the summary of changes included with this patch:- >> >> 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. >> 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. >> 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. >> - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. >> 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. >> 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/22754#issuecomment-2543982577)for more details. >> 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA generally operates over floating point registers, thus the compiler injects reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. >> 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF >> 9. X86 backend implementation for all supported intrinsics. >> 10. Functional and Performance validation tests. >> >> Kindly review the patch and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Adding more test points src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Float16.java line 1415: > 1413: // double; not necessary to widen to double before the > 1414: // multiply. > 1415: short fa = float16ToRawShortBits(a); The new implementations in fma and sqrt are comparatively long and obscure compared to the current versions. That might be the price of intrinsification, but it would be helpful to at least have a comment to the reader explaining why the more obvious code was not being used. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1887333029 From psandoz at openjdk.org Mon Dec 16 18:55:47 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Mon, 16 Dec 2024 18:55:47 GMT Subject: Integrated: 8346174: UMAX/UMIN are missing from XXXVector::reductionOperations In-Reply-To: References: Message-ID: On Fri, 13 Dec 2024 18:42:27 GMT, Paul Sandoz wrote: > Add functional support for unsigned min/max reductions on vectors. > > We also need to ensure that the `reductionCoerced` intrinsic bails out when there is no reduction operation for the lanewise operation. When intrinsic support is added for integral vectors this will still be the case for floating point vectors. This pull request has now been integrated. Changeset: 31c3b191 Author: Paul Sandoz URL: https://git.openjdk.org/jdk/commit/31c3b191745b5c97ae4e757323355fb9831da9fe Stats: 3586 lines in 27 files changed: 3585 ins; 0 del; 1 mod 8346174: UMAX/UMIN are missing from XXXVector::reductionOperations Reviewed-by: qamai, jbhateja ------------- PR: https://git.openjdk.org/jdk/pull/22741 From kvn at openjdk.org Mon Dec 16 18:56:44 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 16 Dec 2024 18:56:44 GMT Subject: RFR: 8346264: "Total compile time" counter should include time spent in failing/bailout compiles [v2] In-Reply-To: References: Message-ID: <61eIR3NK1Ci6sbUfoBxEXvKgSy3n8_lzUp0Xp2U3qkY=.c642ddf9-b14d-41e9-835e-86e28619c1e8@github.com> On Mon, 16 Dec 2024 11:25:50 GMT, Aleksey Shipilev wrote: >> Noticed this when looking through JMH compiler profiler results. >> >> Current `CompilerBroker` counters that are fed into `CompilationMXBean.getTotalCompilationTime()` and JFR `CompilerStatistics` only records the time for successful compilations. If we take a while in compilation and then fail/bail, that time would not be accounted for. >> >> While this seems to be a long-standing behavior, there are problems with this: >> 1. This is not what "total" means. >> 2. This gives us a blind spot in measuring time taken in failing/bailing compilations. >> 3. It does not match well the Javadoc for `CompilationMXBean.getTotalCompilationTime()`: "Returns the approximate accumulated elapsed time (in milliseconds) spent in compilation." -- since the time spent in failing/bailing compilation is still time spent in compilation. >> >> Additional testing: >> - [x] Linux x86_64 server release, `jdk/jfr java/lang/management` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Update comment Would be nice to add separate time for failed/bailout compilations to get sense how much time we spend in them. ------------- PR Review: https://git.openjdk.org/jdk/pull/22760#pullrequestreview-2507017038 From kvn at openjdk.org Mon Dec 16 18:58:43 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 16 Dec 2024 18:58:43 GMT Subject: RFR: 8344951: Stabilize write barrier micro-benchmarks In-Reply-To: References: Message-ID: On Mon, 16 Dec 2024 12:35:33 GMT, Roberto Casta?eda Lozano wrote: > This changeset makes the `testArrayWriteBarrierFastPath*` micro-benchmarks in `WriteBarrier.java` more robust w.r.t. a few external factors, so that different GC barrier models can be compared more reliably. More specifically, it ensures that: > > - the main loop is never unrolled regardless of the selected GC algorithm, > - no spilling occurs within the main loop for the final C2 compilation, and > - the majority of the execution time is spent in the write operation and its associated barrier. > > The changes preserve the original G1 barrier test profile, i.e. practically no write crosses heap regions under the default G1 configuration. More sophisticated benchmarks may be added in the future that exercise different G1 barrier levels. > > Thanks to Thomas Schatzl for reporting and discussing issues in the micro-benchmarks. > > **Testing:** build and run the micro-benchmarks (linux-x64, linux-aarch64, windows-x64, macosx-x64, macosx-aarch64). Is it possible to have 2 runs: one with default `LoopUnrollLimit` and an other as you set. ------------- PR Review: https://git.openjdk.org/jdk/pull/22763#pullrequestreview-2507019577 From shade at openjdk.org Mon Dec 16 19:58:37 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 16 Dec 2024 19:58:37 GMT Subject: RFR: 8346264: "Total compile time" counter should include time spent in failing/bailout compiles [v2] In-Reply-To: <61eIR3NK1Ci6sbUfoBxEXvKgSy3n8_lzUp0Xp2U3qkY=.c642ddf9-b14d-41e9-835e-86e28619c1e8@github.com> References: <61eIR3NK1Ci6sbUfoBxEXvKgSy3n8_lzUp0Xp2U3qkY=.c642ddf9-b14d-41e9-835e-86e28619c1e8@github.com> Message-ID: On Mon, 16 Dec 2024 18:54:30 GMT, Vladimir Kozlov wrote: > Would be nice to add separate time for failed/bailout compilations to get sense how much time we spend in them. `CompileBroker` already gathers it, and it is printed in `-XX:+CITime`: $ java -XX:+CITime -Xcomp ... Accumulated compiler times ---------------------------------------------------------- Total compilation time : 1.443 s Standard compilation : 1.443 s, Average : 0.000 s Bailed out compilation : 0.000 s, Average : 0.000 s On stack replacement : 0.000 s, Average : 0.000 s Invalidated : 0.000 s, Average : 0.000 s I don't think exposing bailed/failed time in `CompilerMXBean` would work well, as it is a public API. We can extend the JFR `CompilerStatistics` event, perhaps separately? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22760#issuecomment-2546603208 From never at openjdk.org Mon Dec 16 20:00:36 2024 From: never at openjdk.org (Tom Rodriguez) Date: Mon, 16 Dec 2024 20:00:36 GMT Subject: RFR: 8346282: [JVMCI] Add failure reason support to UnresolvedJava/Type/Method/Field In-Reply-To: References: Message-ID: On Mon, 16 Dec 2024 15:54:53 GMT, Doug Simon wrote: > The JVMCI UnresolvedJava/Type/Method/Field types can be used to represent resolution failures. It would be useful if an exception describing the resolution failure could be attached to these objects. src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/meta/UnresolvedJavaField.java line 35: > 33: > 34: /** > 35: * The reason method resolution failed. Can be null. field resolution ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22767#discussion_r1887513115 From dnsimon at openjdk.org Mon Dec 16 20:22:12 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 16 Dec 2024 20:22:12 GMT Subject: RFR: 8346282: [JVMCI] Add failure reason support to UnresolvedJava/Type/Method/Field [v2] In-Reply-To: References: Message-ID: > The JVMCI UnresolvedJava/Type/Method/Field types can be used to represent resolution failures. It would be useful if an exception describing the resolution failure could be attached to these objects. Doug Simon has updated the pull request incrementally with one additional commit since the last revision: fixed comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22767/files - new: https://git.openjdk.org/jdk/pull/22767/files/0127c1cf..b0b197aa Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22767&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22767&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22767.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22767/head:pull/22767 PR: https://git.openjdk.org/jdk/pull/22767 From never at openjdk.org Mon Dec 16 20:22:13 2024 From: never at openjdk.org (Tom Rodriguez) Date: Mon, 16 Dec 2024 20:22:13 GMT Subject: RFR: 8346282: [JVMCI] Add failure reason support to UnresolvedJava/Type/Method/Field [v2] In-Reply-To: References: Message-ID: <62v0TMtl7q-EzfKx_fuIa9zTkv4uG_wzOKWr5hYGzy8=.4615f84f-db63-4a3e-b65f-3ff520e2e86d@github.com> On Mon, 16 Dec 2024 20:19:02 GMT, Doug Simon wrote: >> The JVMCI UnresolvedJava/Type/Method/Field types can be used to represent resolution failures. It would be useful if an exception describing the resolution failure could be attached to these objects. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > fixed comment Marked as reviewed by never (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22767#pullrequestreview-2507276043 From yzheng at openjdk.org Mon Dec 16 21:01:41 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Mon, 16 Dec 2024 21:01:41 GMT Subject: RFR: 8346282: [JVMCI] Add failure reason support to UnresolvedJava/Type/Method/Field [v2] In-Reply-To: References: Message-ID: On Mon, 16 Dec 2024 20:22:12 GMT, Doug Simon wrote: >> The JVMCI UnresolvedJava/Type/Method/Field types can be used to represent resolution failures. It would be useful if an exception describing the resolution failure could be attached to these objects. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > fixed comment LGTM ------------- Marked as reviewed by yzheng (Committer). PR Review: https://git.openjdk.org/jdk/pull/22767#pullrequestreview-2507348278 From psandoz at openjdk.org Tue Dec 17 00:15:51 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 17 Dec 2024 00:15:51 GMT Subject: [jdk24] RFR: 8346174: UMAX/UMIN are missing from XXXVector::reductionOperations Message-ID: This pull request contains a backport of commit [31c3b191](https://github.com/openjdk/jdk/commit/31c3b191745b5c97ae4e757323355fb9831da9fe) from the [openjdk/jdk](https://git.openjdk.org/jdk) repository. The commit being backported was authored by Paul Sandoz on 16 Dec 2024 and was reviewed by Quan Anh Mai and Jatin Bhateja. ------------- Commit messages: - Backport 31c3b191745b5c97ae4e757323355fb9831da9fe Changes: https://git.openjdk.org/jdk/pull/22777/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22777&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8346174 Stats: 3586 lines in 27 files changed: 3585 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22777.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22777/head:pull/22777 PR: https://git.openjdk.org/jdk/pull/22777 From psandoz at openjdk.org Tue Dec 17 00:17:41 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 17 Dec 2024 00:17:41 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated scalar operations [v3] In-Reply-To: References: Message-ID: On Mon, 16 Dec 2024 18:47:50 GMT, Joe Darcy wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Adding more test points > > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Float16.java line 1415: > >> 1413: // double; not necessary to widen to double before the >> 1414: // multiply. >> 1415: short fa = float16ToRawShortBits(a); > > The new implementations in fma and sqrt are comparatively long and obscure compared to the current versions. That might be the price of intrinsification, but it would be helpful to at least have a comment to the reader explaining why the more obvious code was not being used. @jatin-bhateja could we change the intrinsic to declare the three Float16 values as additional parameters which are only ever passed to the lambda? I believe when intrinsic we will just drop those extra parameters. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1887733028 From darcy at openjdk.org Tue Dec 17 00:26:41 2024 From: darcy at openjdk.org (Joe Darcy) Date: Tue, 17 Dec 2024 00:26:41 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated scalar operations [v3] In-Reply-To: References: Message-ID: <22UQZNt9TGIWmQ4rS7CAMSZg5zmxBeV71UiBIRd0t5E=.4db6389a-48aa-4f0c-b4fd-dd4e9a5238bd@github.com> On Mon, 16 Dec 2024 14:23:16 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) >> >> Following is the summary of changes included with this patch:- >> >> 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. >> 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. >> 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. >> - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. >> 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. >> 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/22754#issuecomment-2543982577)for more details. >> 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA generally operates over floating point registers, thus the compiler injects reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. >> 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF >> 9. X86 backend implementation for all supported intrinsics. >> 10. Functional and Performance validation tests. >> >> Kindly review the patch and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Adding more test points src/jdk.incubator.vector/share/classes/jdk/incubator/vector/Float16.java line 328: > 326: @ForceInline > 327: public static Float16 valueOf(float f) { > 328: short hf = floatToFloat16(f); Does the VM need the explicit short variable here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1887738297 From kvn at openjdk.org Tue Dec 17 00:33:42 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 17 Dec 2024 00:33:42 GMT Subject: RFR: 8346264: "Total compile time" counter should include time spent in failing/bailout compiles [v2] In-Reply-To: References: Message-ID: On Mon, 16 Dec 2024 11:25:50 GMT, Aleksey Shipilev wrote: >> Noticed this when looking through JMH compiler profiler results. >> >> Current `CompilerBroker` counters that are fed into `CompilationMXBean.getTotalCompilationTime()` and JFR `CompilerStatistics` only records the time for successful compilations. If we take a while in compilation and then fail/bail, that time would not be accounted for. >> >> While this seems to be a long-standing behavior, there are problems with this: >> 1. This is not what "total" means. >> 2. This gives us a blind spot in measuring time taken in failing/bailing compilations. >> 3. It does not match well the Javadoc for `CompilationMXBean.getTotalCompilationTime()`: "Returns the approximate accumulated elapsed time (in milliseconds) spent in compilation." -- since the time spent in failing/bailing compilation is still time spent in compilation. >> >> Additional testing: >> - [x] Linux x86_64 server release, `jdk/jfr java/lang/management` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Update comment Good. I was mostly concern about `CITime` output. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22760#pullrequestreview-2507617962 PR Comment: https://git.openjdk.org/jdk/pull/22760#issuecomment-2547240145 From thartmann at openjdk.org Tue Dec 17 07:07:49 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 17 Dec 2024 07:07:49 GMT Subject: RFR: 8346289: Confusing phrasing in IR Framework README / User-defined Regexes In-Reply-To: References: Message-ID: On Mon, 16 Dec 2024 15:39:32 GMT, theoweidmannoracle wrote: >> If such a user-defined regex represents a not yet supported C2 IR node, it is highly encouraged to directly add a new IR node placeholder string definition to IRNode for it instead together with a static regex mapping block. > > The combination of "instead together" makes this sentence hard to read. I had to re-read it several times to grasp it. Looks good and trivial to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22766#pullrequestreview-2508009811 From chagedorn at openjdk.org Tue Dec 17 07:19:37 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 17 Dec 2024 07:19:37 GMT Subject: RFR: 8346289: Confusing phrasing in IR Framework README / User-defined Regexes In-Reply-To: References: Message-ID: <1nterwCMpOuvt7H5FBiwFYL4u5XCXzb6oMaTo_ZfhqM=.1d7e0d7c-6948-4804-bdf7-bf8733d9a59f@github.com> On Mon, 16 Dec 2024 15:39:32 GMT, theoweidmannoracle wrote: >> If such a user-defined regex represents a not yet supported C2 IR node, it is highly encouraged to directly add a new IR node placeholder string definition to IRNode for it instead together with a static regex mapping block. > > The combination of "instead together" makes this sentence hard to read. I had to re-read it several times to grasp it. Agreed, looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22766#pullrequestreview-2508027778 From chagedorn at openjdk.org Tue Dec 17 07:22:37 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 17 Dec 2024 07:22:37 GMT Subject: RFR: 8331717: C2: Crash with SIGFPE Because Loop Predication Wrongly Hoists Division Requiring Zero Check [v2] In-Reply-To: References: Message-ID: On Thu, 12 Dec 2024 08:48:14 GMT, theoweidmannoracle wrote: >> Fixes a bug in loop predication where not strictly invariant tests involving divisions or modulo are pulled out of the loop. >> >> The bug can be seen in this code: >> >> >> public class Reduced { >> static int iArr[] = new int[100]; >> >> public static void main(String[] strArr) { >> for (int i = 0; i < 10000; i++) { >> test(); >> } >> } >> >> static void test() { >> int i1 = 0; >> >> for (int i4 : iArr) { >> i4 = i1; >> try { >> iArr[0] = 1 / i4; >> i4 = iArr[2 / i4]; // Source of the crash >> } catch (ArithmeticException a_e) { >> } >> } >> } >> } >> >> >> The crucial element is the division `2 / i4`. Since it is used to access an array, it is the input to a range check. See node 230: >> Screenshot 2024-12-11 at 15 14 47 >> >> Loop predication will try to pull this range check together with its input, the division, before the `for` loop. Due to a bug in Invariance::compute_invariance loop predication is allowed to do so, which results in the division being pulled out without its non-zero check. 322 is a clone of 230 placed before the loop head without any zero check for the divisor: >> >> Screenshot 2024-12-11 at 15 11 48 >> >> >> More specifically, this bug occurs because 230's zero check (174 If) is not its direct control. Between the zero check and the division is another unrelated check (189 If), which can be hoisted: >> >> Screenshot 2024-12-12 at 09 14 37 >> >> Due to the way the Invariance class works, a check that can be hoisted will be marked as invariant. Then, to determine if any given node is invariant, Invariance::compute_invariance checks if all its inputs are invariant: >> >> https://github.com/openjdk/jdk/blob/ceb4366ebf02f64165acc4a23195e9e3a7398a5c/src/hotspot/share/opto/loopPredicate.cpp#L456-L475 >> >> Therefore, when recursively traversing the inputs for 230 Div, the hoisted, unrelated check 189 If is hit before the zero check. As that check has been hoisted before already, it is marked invariant and `all_inputs_invariant` will be set to true. (All other inputs are also trivially ... > > theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: > > Combine test files Nice summary! Looks good to me, too. > Would be interesting if we can collapse such graph by propagating `I1 == 0` through `i4` into zero check. As separate RFE. I think in this case, it's not possible because we have an OSR compilation where `i1` is just a `LoadI`. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22666#pullrequestreview-2508031613 From duke at openjdk.org Tue Dec 17 07:29:40 2024 From: duke at openjdk.org (duke) Date: Tue, 17 Dec 2024 07:29:40 GMT Subject: RFR: 8346289: Confusing phrasing in IR Framework README / User-defined Regexes In-Reply-To: References: Message-ID: On Mon, 16 Dec 2024 15:39:32 GMT, theoweidmannoracle wrote: >> If such a user-defined regex represents a not yet supported C2 IR node, it is highly encouraged to directly add a new IR node placeholder string definition to IRNode for it instead together with a static regex mapping block. > > The combination of "instead together" makes this sentence hard to read. I had to re-read it several times to grasp it. @theoweidmannoracle Your change (at version 84cc23263ea32228928ab606f2cc3b2c5f8ef518) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22766#issuecomment-2547686159 From epeter at openjdk.org Tue Dec 17 07:50:04 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 17 Dec 2024 07:50:04 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated scalar operations [v3] In-Reply-To: References: Message-ID: On Mon, 16 Dec 2024 14:23:16 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) >> >> Following is the summary of changes included with this patch:- >> >> 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. >> 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. >> 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. >> - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. >> 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. >> 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/22754#issuecomment-2543982577)for more details. >> 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA generally operates over floating point registers, thus the compiler injects reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. >> 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF >> 9. X86 backend implementation for all supported intrinsics. >> 10. Functional and Performance validation tests. >> >> Kindly review the patch and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Adding more test points @jatin-bhateja I took 1h to go over this change. I left 15 comments, probably some of them you can just answer by a quick explanation / pointing to the relevant test. src/hotspot/share/opto/convertnode.cpp line 282: > 280: return new ReinterpretHF2SNode(binop); > 281: } > 282: } Where are the constant folding tests for this? src/hotspot/share/opto/convertnode.cpp line 960: > 958: } > 959: return TypeInt::SHORT; > 960: } Do we have tests for these constant folding operations? src/hotspot/share/opto/divnode.cpp line 815: > 813: !g_isnan(t1->getf()) && g_isfinite(t1->getf()) && t1->getf() != 0.0) { // could be negative ZERO or NaN > 814: return TypeH::ONE; > 815: } Do we cover all cases here? src/hotspot/share/opto/divnode.cpp line 821: > 819: } > 820: > 821: // If divisor is a constant and not zero, divide them numbers Suggestion: // If divisor is a constant and not zero, divide the numbers src/hotspot/share/opto/divnode.cpp line 826: > 824: t2->getf() != 0.0) { > 825: // could be negative zero > 826: return TypeH::make(t1->getf()/t2->getf()); Suggestion: return TypeH::make(t1->getf() / t2->getf()); src/hotspot/share/opto/divnode.cpp line 840: > 838: if (g_isnan(t1->getf()) || g_isnan(t2->getf())) { > 839: return TypeH::make(NAN); > 840: } I'm a little confused here. We are working with nodes that have type Float16, but we are asking for Float constants here. Why is that, how does it work? src/hotspot/share/opto/subnode.cpp line 566: > 564: return t1; > 565: } > 566: else if(g_isnan(t2->getf())) { General question: why are you using `getf` and not `geth` all over the code? src/hotspot/share/opto/type.cpp line 1465: > 1463: //------------------------------meet------------------------------------------- > 1464: // Compute the MEET of two types. It returns a new Type object. > 1465: const Type *TypeH::xmeet( const Type *t ) const { Please write `TypeH*` and not `TypeH *` src/hotspot/share/opto/type.cpp line 1530: > 1528: uint TypeH::hash(void) const { > 1529: return *(uint*)(&_f); > 1530: } I just saw that `_f` is a `short`, which I think is 16 bits, right? And the cast to `uint` would mean we take 32 bits. That looks a bit off, but maybe it is not. Can you explain, and maybe also put a comment in the code for that? test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java line 275: > 273: @IR(counts = {IRNode.ADD_HF, " 0 ", IRNode.REINTERPRET_S2HF, " 0 ", IRNode.REINTERPRET_HF2S, " 0 "}, > 274: applyIfCPUFeature = {"avx512_fp16", "true"}) > 275: public void testAddConstantFolding() { Ok, this is great. I'm missing some cases that check correct rounding. For that, it might be good to have one example with random constants, so 2 random Float16 values. You can generate them in static context, and also compute the result in static context, so it should be evaluated in the interpreter. That way, we can compare the result of interpreter to compiled code. Do that for all operations. test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java line 421: > 419: > 420: assertResult(divide(valueOf(2.0f), valueOf(2.0f)).floatValue(), 1.0f, "testDivConstantFolding"); > 421: } What about cases like `x/x`, where `x` is a variable, and then feed in all sorts of values, including NaN. I think there we must ensure that it does not fold to `1`. Could be a separate IR test. But also `x/x` with all sorts of constants is relevant. It would test this section in the `Ideal` code: // x/x == 1, we ignore 0/0. // Note: if t1 and t2 are zero then result is NaN (JVMS page 213) // Does not work for variables because of NaN's if (in(1) == in(2) && t1->base() == Type::HalfFloatCon && !g_isnan(t1->getf()) && g_isfinite(t1->getf()) && t1->getf() != 0.0) { // could be negative ZERO or NaN return TypeH::ONE; } test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java line 494: > 492: assertResult(fma(valueOf(1.0f), valueOf(2.0f), valueOf(3.0f)).floatValue(), 1.0f * 2.0f + 3.0f, "testFMAConstantFolding"); > 493: } > 494: } I am missing constant folding tests with `shortBitsToFloat16` etc. ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22754#pullrequestreview-2508020252 PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1888008209 PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1888009160 PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1888012154 PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1888027070 PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1888027339 PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1888038360 PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1888030240 PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1888013140 PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1888017396 PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1888005513 PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1888026278 PR Review Comment: https://git.openjdk.org/jdk/pull/22754#discussion_r1888021315 From epeter at openjdk.org Tue Dec 17 07:50:04 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 17 Dec 2024 07:50:04 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated scalar operations [v3] In-Reply-To: References: Message-ID: On Tue, 17 Dec 2024 07:16:37 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Adding more test points > > src/hotspot/share/opto/convertnode.cpp line 960: > >> 958: } >> 959: return TypeInt::SHORT; >> 960: