From fyang at openjdk.org Tue Apr 1 01:42:15 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 1 Apr 2025 01:42:15 GMT Subject: RFR: 8353219: RISC-V: Fix client builds after JDK-8345298 In-Reply-To: References: Message-ID: On Sat, 29 Mar 2025 03:16:37 GMT, Feilong Jiang wrote: >> Hi, please review this trivial change fixing a client build issue. >> The definitions of both `generate_float16ToFloat()` and `generate_floatToFloat16()` should be moved out of `COMPILER2_OR_JVMCI` macro scope. Testing: client builds fine on linux-riscv64 with this change. > > Marked as reviewed by fjiang (Committer). @feilongjiang @robehn : Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24307#issuecomment-2767815574 From fyang at openjdk.org Tue Apr 1 01:42:16 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 1 Apr 2025 01:42:16 GMT Subject: Integrated: 8353219: RISC-V: Fix client builds after JDK-8345298 In-Reply-To: References: Message-ID: On Sat, 29 Mar 2025 02:01:17 GMT, Fei Yang wrote: > Hi, please review this trivial change fixing a client build issue. > The definitions of both `generate_float16ToFloat()` and `generate_floatToFloat16()` should be moved out of `COMPILER2_OR_JVMCI` macro scope. Testing: client builds fine on linux-riscv64 with this change. This pull request has now been integrated. Changeset: 860a789e Author: Fei Yang URL: https://git.openjdk.org/jdk/commit/860a789e9153448345f19d70dd07e294a0b62223 Stats: 4 lines in 1 file changed: 2 ins; 2 del; 0 mod 8353219: RISC-V: Fix client builds after JDK-8345298 Reviewed-by: fjiang, rehn ------------- PR: https://git.openjdk.org/jdk/pull/24307 From qamai at openjdk.org Tue Apr 1 02:17:14 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 1 Apr 2025 02:17:14 GMT Subject: RFR: 8347645: C2: XOR bounded value handling blocks constant folding [v46] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 22:28:49 GMT, Vladimir Ivanov wrote: >> Johannes Graham has updated the pull request incrementally with one additional commit since the last revision: >> >> add missing import > > Thanks. > >> The naming of that method evolved during the course of the review of this PR. I believe the thinking was that the check was not necessarily an overall upper bound, and a simpler name would imply it was more general. > > There's usually a lot of invariants a function assumes and it's simply impractical to encode everything in the name. Speaking of this particular case (`calc_xor_upper_bound_of_non_neg`): > * `calc_` is redundant and IMO only adds noise; > * `_non_neg` part is confusing; I'd stress instead that it works on **ranges**. > > So, `xor_upper_bound_for_ranges` then? (And, please, explain in the comment what's the correspondense between `S` and `U` template type parameters.) > >> `addnodeXorUtil.hpp` > > I'm fine with placing it under `opto`. Please, rename the file into `src/hotspot/share/opto/utilities/xor.hpp`. @iwanowww > `_non_neg` part is confusing; I'd stress instead that it works on ranges. I find it easier to think of it as calculating the upperbound of the xor of 2 non-negative integers whose upperbounds are given in the parameters. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23089#issuecomment-2767875213 From duke at openjdk.org Tue Apr 1 02:28:15 2025 From: duke at openjdk.org (Johannes Graham) Date: Tue, 1 Apr 2025 02:28:15 GMT Subject: RFR: 8347645: C2: XOR bounded value handling blocks constant folding [v47] In-Reply-To: References: Message-ID: > An interaction between xor bounds optimization and constant folding resulted in xor over constants not being optimized. This has a noticeable effect on `Long.expand` with a constant mask, on architectures that don't have instructions equivalent to `PDEP` to be used in an intrinsic. > > This change moves logic from the `Xor(L|I)Node::Value` methods into the `add_ring` methods, and gives priority to constant-folding. A static method was separated out to facilitate direct unit-testing. It also (subjectively) simplified the calculation of the upper bound and added an explanation of the reasoning behind it. > > In addition to testing for constant folding over xor, IR tests were added to `XorINodeIdealizationTests` and `XorLNodeIdealizationTests` to cover these related items: > - Bounds optimization of xor > - A check for `x ^ x = 0` > - Explicit testing of xor over booleans. > > Also `test_xor_node.cpp` was added to more extensively test the correctness of the bounds optimization. It exhaustively tests ranges of 4-bit numbers as well as at the high and low end of the affected types. Johannes Graham has updated the pull request incrementally with one additional commit since the last revision: address review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23089/files - new: https://git.openjdk.org/jdk/pull/23089/files/94a32dba..59875d54 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23089&range=46 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23089&range=45-46 Stats: 96 lines in 4 files changed: 47 ins; 41 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/23089.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23089/head:pull/23089 PR: https://git.openjdk.org/jdk/pull/23089 From duke at openjdk.org Tue Apr 1 02:44:03 2025 From: duke at openjdk.org (Johannes Graham) Date: Tue, 1 Apr 2025 02:44:03 GMT Subject: RFR: 8347645: C2: XOR bounded value handling blocks constant folding [v48] In-Reply-To: References: Message-ID: <1JYbwRdMBDikLGt3iXx87YRTWrF6NwzbFDH916UuoSA=.1fb10eab-4963-4d4c-a8ae-97ec3cecdfe2@github.com> > An interaction between xor bounds optimization and constant folding resulted in xor over constants not being optimized. This has a noticeable effect on `Long.expand` with a constant mask, on architectures that don't have instructions equivalent to `PDEP` to be used in an intrinsic. > > This change moves logic from the `Xor(L|I)Node::Value` methods into the `add_ring` methods, and gives priority to constant-folding. A static method was separated out to facilitate direct unit-testing. It also (subjectively) simplified the calculation of the upper bound and added an explanation of the reasoning behind it. > > In addition to testing for constant folding over xor, IR tests were added to `XorINodeIdealizationTests` and `XorLNodeIdealizationTests` to cover these related items: > - Bounds optimization of xor > - A check for `x ^ x = 0` > - Explicit testing of xor over booleans. > > Also `test_xor_node.cpp` was added to more extensively test the correctness of the bounds optimization. It exhaustively tests ranges of 4-bit numbers as well as at the high and low end of the affected types. Johannes Graham has updated the pull request incrementally with one additional commit since the last revision: remove unused methods ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23089/files - new: https://git.openjdk.org/jdk/pull/23089/files/59875d54..50d35dcd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23089&range=47 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23089&range=46-47 Stats: 12 lines in 2 files changed: 0 ins; 12 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23089.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23089/head:pull/23089 PR: https://git.openjdk.org/jdk/pull/23089 From duke at openjdk.org Tue Apr 1 02:52:17 2025 From: duke at openjdk.org (Johannes Graham) Date: Tue, 1 Apr 2025 02:52:17 GMT Subject: RFR: 8347645: C2: XOR bounded value handling blocks constant folding [v46] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 02:14:35 GMT, Quan Anh Mai wrote: >> Thanks. >> >>> The naming of that method evolved during the course of the review of this PR. I believe the thinking was that the check was not necessarily an overall upper bound, and a simpler name would imply it was more general. >> >> There's usually a lot of invariants a function assumes and it's simply impractical to encode everything in the name. Speaking of this particular case (`calc_xor_upper_bound_of_non_neg`): >> * `calc_` is redundant and IMO only adds noise; >> * `_non_neg` part is confusing; I'd stress instead that it works on **ranges**. >> >> So, `xor_upper_bound_for_ranges` then? (And, please, explain in the comment what's the correspondense between `S` and `U` template type parameters.) >> >>> `addnodeXorUtil.hpp` >> >> I'm fine with placing it under `opto`. Please, rename the file into `src/hotspot/share/opto/utilities/xor.hpp`. > > @iwanowww > >> `_non_neg` part is confusing; I'd stress instead that it works on ranges. > > I find it easier to think of it as calculating the upperbound of the xor of 2 non-negative integers whose upperbounds are given in the parameters. Renamed to `xor_upper_bound_for_ranges` before I saw your comment, @merykitty. I'd be ok with another name though. With the last changes, the method is no longer a member of the class, so it's no longer going to get as many eyes on it without context, so maybe it matters less now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23089#issuecomment-2767917005 From duke at openjdk.org Tue Apr 1 04:33:34 2025 From: duke at openjdk.org (Anjian-Wen) Date: Tue, 1 Apr 2025 04:33:34 GMT Subject: RFR: 8329887: RISC-V: C2: Support Zvbb Vector And-Not instruction [v3] In-Reply-To: <1KHNbMIgOO7jSZ1Fm4HzxadYaNzE4Xbq4nTitlKy3Po=.17d7860b-10de-4f19-87d8-87fc17313ce2@github.com> References: <1KHNbMIgOO7jSZ1Fm4HzxadYaNzE4Xbq4nTitlKy3Po=.17d7860b-10de-4f19-87d8-87fc17313ce2@github.com> Message-ID: > support Zvbb Vector And-Not vandn.vv match rule and add test Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: RISC-V: C2: Support Zvbb Vector And-Not instruction fix match rule for format ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24129/files - new: https://git.openjdk.org/jdk/pull/24129/files/7fc67099..a15d58dc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24129&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24129&range=01-02 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24129.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24129/head:pull/24129 PR: https://git.openjdk.org/jdk/pull/24129 From galder at openjdk.org Tue Apr 1 04:56:44 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Tue, 1 Apr 2025 04:56:44 GMT Subject: RFR: 8348887: Create IR framework test for JDK-8347997 In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 13:41:17 GMT, Marc Chevalier wrote: > As the ticket says: >> Create IR framework test which checks that allocations are eliminated in the regression test included in [JDK-8347997](https://bugs.openjdk.org/browse/JDK-8347997) fix. > > So here it is! We can see that in case of inlining, indeed, no allocation happens. The second part is some sanity check to emphasize the difference: of course, there is an allocation without inlining. The benefit of this second part is arguable. From my point of view, it's mostly to point out the difference to a future reader. But yes, there is nothing very surprising. > > Thanks, > Marc Changes requested by galder (Author). test/hotspot/jtreg/compiler/c2/irTests/TestContinuationPinningAndEA.java line 118: > 116: > 117: @DontInline > 118: public CrashesNoInline() throws Throwable { It's probably my own ignorance, but just in case are others are in the same boat, why does this crash? Could you add a brief javadoc for future readers? Same with other Crashes cases. ------------- PR Review: https://git.openjdk.org/jdk/pull/24328#pullrequestreview-2731106771 PR Review Comment: https://git.openjdk.org/jdk/pull/24328#discussion_r2022140499 From hgreule at openjdk.org Tue Apr 1 06:27:49 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Tue, 1 Apr 2025 06:27:49 GMT Subject: RFR: 8353359: C2: Or(I|L)Node::Ideal is missing AddNode::Ideal call Message-ID: Hi, this simple change adds a missing AddNode::Ideal call to Or(I|L)Node::Ideal. See the added tests for examples of optimizations that don't apply without this change. Please let me know what you think. ------------- Commit messages: - Call AddNode::Ideal in Or(I|L)Node::Ideal - Test AddNode::Ideal optimizations for Or(I|L) Changes: https://git.openjdk.org/jdk/pull/24348/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24348&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353359 Stats: 37 lines in 3 files changed: 33 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24348.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24348/head:pull/24348 PR: https://git.openjdk.org/jdk/pull/24348 From epeter at openjdk.org Tue Apr 1 07:06:32 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 1 Apr 2025 07:06:32 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v6] In-Reply-To: References: Message-ID: > We should extend the functionality of Verify.checkEQ: > - Allow different NaN encodings to be seen as equal (by default). > - Compare VectorAPI vectors. > - Compare Exceptions, and their messages. > - Compare arbitrary Objects via Reflection. > > Note: this is a prerequisite for the Template Library [JDK-8352861](https://bugs.openjdk.org/browse/JDK-8352861) / https://github.com/openjdk/jdk/pull/23418. Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: - upate copyright - Apply suggestions from code review Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24224/files - new: https://git.openjdk.org/jdk/pull/24224/files/d46c45de..4ca42699 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24224&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24224&range=04-05 Stats: 7 lines in 1 file changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/24224.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24224/head:pull/24224 PR: https://git.openjdk.org/jdk/pull/24224 From epeter at openjdk.org Tue Apr 1 07:06:33 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 1 Apr 2025 07:06:33 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v5] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 14:04:29 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8352869-Verify-NaN-Vector-Objects >> - Verify.Options refactor for Galder >> - Update test/hotspot/jtreg/compiler/lib/verify/Verify.java >> >> Co-authored-by: Galder Zamarre?o >> - Merge branch 'master' into JDK-8352869-Verify-NaN-Vector-Objects >> - clean up test >> - JDK-8352869 > > Nice extensions! Some initial comments. @chhagedorn Thanks for the suggestions and questions! I think I addressed them all :) > test/hotspot/jtreg/compiler/lib/verify/Verify.java line 25: > >> 23: >> 24: package compiler.lib.verify; >> 25: > > You should update the copyright year. done :) > test/hotspot/jtreg/compiler/lib/verify/Verify.java line 209: > >> 207: print(a, b, field, aParent, bParent); >> 208: throw new VerifyException("Object type not supported: " + ca.getName() + " -- did you mean to 'enableCheckWithArbitraryClasses'?"); >> 209: } > > What's the reason behind throwing instead of just comparing two arbitrary objects by default? If a user calls `Verify.checkEQ()` and sees this exception, I would guess he then just passes the additional option and we have the same result. But maybe I'm missing something. Good question. I think my reasoning was that comparing arbitrary classes requires reflection. And that is rather slow. So by default it would be good if that feature is not enabled, so the user tries to avoid it, and is aware when they enable it explicitly. But if you think that is not useful, I can remove the feature. @chhagedorn what do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24224#issuecomment-2768381692 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2022262263 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2022265798 From epeter at openjdk.org Tue Apr 1 07:08:27 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 1 Apr 2025 07:08:27 GMT Subject: RFR: 8352893: C2: OrL/INode::add_ring optimize (x | -1) to -1 [v3] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 14:31:43 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> The `add_ring()` implementations of `OrINode` and `OrLNode` are missing the optimization that an or with a value where all bits are ones (since we have signed integers in this case `~0 == -1`) will always yield all zeroes. >> >> # Changes >> >> This PR makes the following straight forward changes: >> - `Or(I|L)Node::add_ring()` returns `-1` if one of the two inputs is `-1`. >> - Add `Or(I|L)` nodes to the IR framework. >> - Add a regression IR test for the implemented optimization. >> >> # Testing >> >> - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14110978686) >> - Ran tier1 through tier3 and Oracle internal testing > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Remove loop in test and instead use random values Thanks for the updates, looks good to me now :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24289#pullrequestreview-2731657942 From epeter at openjdk.org Tue Apr 1 07:12:21 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 1 Apr 2025 07:12:21 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v7] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 10:38:21 GMT, Roland Westrelin wrote: >> This is primarily motivated by 8275202 (C2: optimize out more >> redundant conditions). In the following code snippet: >> >> >> int[] array = new int[arraySize]; >> if (j <= arraySize) { >> if (i >= 0) { >> if (i < j) { >> int v = array[i]; >> >> >> (`arraySize` is a constant) >> >> at the range check, `j` is known to be in `[min, arraySize]` as a >> consequence, `i` is known to be `[0, arraySize-1]`. The range check >> can be eliminated. >> >> Now, if later, `i` constant folds to some value that's positive but >> out of range for the array: >> >> - if that happens when the new pass runs, then it can prove that: >> >> if (i < j) { >> >> is never taken. >> >> - if that happens during IGVN or CCP however, that condition is not >> constant folded. And because the range check was removed, there's no >> guard protecting the range check `CastII`. It becomes `top` and, as >> a result, the graph can become broken. >> >> What I propose here is that when the `CastII` becomes dead, any CFG >> paths that use the `CastII` node is made unreachable. So in pseudo code: >> >> >> int[] array = new int[arraySize]; >> if (j <= arraySize) { >> if (i >= 0) { >> if (i < j) { >> halt(); >> >> >> Finding the CFG paths is implemented in the patch by following the >> uses of the node until a CFG node or a `Phi` is encountered. >> >> The patch applies this to all `Type` nodes as with 8275202, I also ran >> in some rare corner cases with other types of nodes. The exception is >> `Phi` nodes which may not be as easy to handle (and for which I had no >> issue with 8275202). >> >> Finally, the patch includes a test case that's unrelated to the >> discussion of 8275202 above. In that test case, a `CastII` becomes top >> but the test that guards it doesn't constant fold. The root cause is a >> transformation of: >> >> >> (CastII (AddI >> >> >> into >> >> >> (AddI (CastII ) (CastII)` >> >> >> which causes the resulting node to have a wider type. The `CastII` >> captures a type before the transformation above happens. Once it has >> happened, the guard for the `CastII` can't be constant folded when an >> out of bound value occurs. >> >> This is likely fixable some other way (eventhough it doesn't seem >> straightforward). Given the long history of similar issues (and the >> test case that shows that they are more hiding), I think it would >> make sense to try some other way of approaching them. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review src/hotspot/share/opto/node.cpp line 3096: > 3094: // paths. The dead paths are then replaced by a Halt node. > 3095: void TypeNode::make_paths_from_here_dead(PhaseIterGVN* igvn, PhaseIdealLoop* loop, const char* phase_str) { > 3096: Unique_Node_List wq; Should there be a `ResourceMark` here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23468#discussion_r2022275763 From epeter at openjdk.org Tue Apr 1 07:18:45 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 1 Apr 2025 07:18:45 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v7] In-Reply-To: References: Message-ID: <0b56TIXbIwSy7Zo77WAx4uweu2kM8iAmjPMomeT3sts=.06d78493-7b94-4386-a7be-4fb65837926b@github.com> > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: typo ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/77079807..be1c0ee9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From dskantz at openjdk.org Tue Apr 1 07:28:22 2025 From: dskantz at openjdk.org (Daniel Skantz) Date: Tue, 1 Apr 2025 07:28:22 GMT Subject: RFR: 8282053: IGV: refine schedule approximation Message-ID: This patch refines the schedule approximation in IGV by 1) placing parm. and projection nodes in the same block as their predecessors, and 2) disallows erroneously considering machine nodes such as prefetchAlloc and rep_stos as CFG nodes. The reader may refer to the corresponding JBS issue where graphs sampled before and after the change are attached. Testing: T1-T3 with no failures. Opened graphs before and after the change and saw no obvious problems. Opened a large number of graphs in CFG view and observed no unexpected IGV warnings, errors or assert failures. ------------- Commit messages: - fix Changes: https://git.openjdk.org/jdk/pull/24350/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24350&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8282053 Stats: 21 lines in 1 file changed: 20 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24350.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24350/head:pull/24350 PR: https://git.openjdk.org/jdk/pull/24350 From roland at openjdk.org Tue Apr 1 07:31:12 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 1 Apr 2025 07:31:12 GMT Subject: RFR: 8341976: C2: use_mem_state != load->find_exact_control(load->in(0)) assert failure [v7] In-Reply-To: References: Message-ID: > The `arraycopy` writes to a non escaping array so its `ArrayCopy` node > is marked as having a narrow memory effect. One of the loads from the > destination after the copy is transformed into a load from the source > array (the rationale being that if there's no load from the > destination of the copy, the `arraycopy` is not needed). The load from > the source has the input memory state of the `ArrayCopy` as memory > input. That load is then sunk out of the loop and its control is > updated to be after the `ArrayCopy`. That's legal because the > `ArrayCopy` only has a narrow memory effect and can't modify the > source. The `ArrayCopy` can't be eliminated and is expanded. In the > process, a `MemBar` that has a wide memory effect is added. The load > from the source has control after the membar but memory state before > and because the membar has a wide memory effect, the load is anti > dependent on the membar: the graph is broken (the load can't be pinned > after the membar and anti dependent on it). > > In short, the problem is that the graph is transformed under the > assumption that the `ArrayCopy` has a narrow effect but the > `ArrayCopy` is expanded to a subgraph that has a wide memory > effect. The fix I propose is to not insert a membar with a wide memory > effect. We still need a membar when the destination is non escaping > because the expanded `ArrayCopy`, if it writes to a tighly allocated > array, writes to raw memory and not to the destination memory slice. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: - review - Merge branch 'master' into JDK-8341976 - review - review - Merge branch 'master' into JDK-8341976 - -XX:+TraceLoopOpts fix - review - more - Merge branch 'master' into JDK-8341976 - more - ... and 6 more: https://git.openjdk.org/jdk/compare/47f2dbd6...9b21648d ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23465/files - new: https://git.openjdk.org/jdk/pull/23465/files/9f79e0b0..9b21648d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23465&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23465&range=05-06 Stats: 8742 lines in 156 files changed: 4824 ins; 3469 del; 449 mod Patch: https://git.openjdk.org/jdk/pull/23465.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23465/head:pull/23465 PR: https://git.openjdk.org/jdk/pull/23465 From roland at openjdk.org Tue Apr 1 07:32:47 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 1 Apr 2025 07:32:47 GMT Subject: RFR: 8341976: C2: use_mem_state != load->find_exact_control(load->in(0)) assert failure [v2] In-Reply-To: <9cGlvzZnXc8B5tNxXSE2Eqi2FDJzP26U7c-yan4ZdCc=.3f6b0821-8b6c-453d-87ee-91205cc6627a@github.com> References: <9cGlvzZnXc8B5tNxXSE2Eqi2FDJzP26U7c-yan4ZdCc=.3f6b0821-8b6c-453d-87ee-91205cc6627a@github.com> Message-ID: On Mon, 31 Mar 2025 12:26:42 GMT, Christian Hagedorn wrote: >> Right. So maybe, we could treat that `Opaque` node the way we do for `OpaqueZeroTripGuard` and have it constant fold when the backedge is never taken. >> >> So I should revert the change to the `IdealLoopTree::dump_head()` and the test run with `TraceLoopOpts`? > >> So maybe, we could treat that Opaque node the way we do for OpaqueZeroTripGuard and have it constant fold when the backedge is never taken. > > Right, that sounds like a good solution. > >> So I should revert the change to the IdealLoopTree::dump_head() and the test run with TraceLoopOpts? > > Yes, that would be great. We can make a comment in [JDK-8297752](https://bugs.openjdk.org/browse/JDK-8297752) to add `-XX:+TraceLoopOpts` as additional run to this test when we fix it. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23465#discussion_r2022305281 From rcastanedalo at openjdk.org Tue Apr 1 07:44:27 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 1 Apr 2025 07:44:27 GMT Subject: RFR: 8282053: IGV: refine schedule approximation In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 07:23:04 GMT, Daniel Skantz wrote: > This patch refines the schedule approximation in IGV by 1) placing parm. and projection nodes in the same block as their predecessors, and 2) disallows erroneously considering machine nodes such as prefetchAlloc and rep_stos as CFG nodes. > > The reader may refer to the corresponding JBS issue where graphs sampled before and after the change are attached. > > Testing: T1-T3 with no failures. Opened graphs before and after the change and saw no obvious problems. Opened a large number of graphs in CFG view and observed no unexpected IGV warnings, errors or assert failures. Thanks for working on this, Daniel. Looks good! ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24350#pullrequestreview-2731778708 From shade at openjdk.org Tue Apr 1 07:58:27 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 1 Apr 2025 07:58:27 GMT Subject: RFR: 8353188: C1: Clean up x86 backend after 32-bit x86 removal [v2] In-Reply-To: <-iwh_5JGpt-TAVpfZQjwbnIG_c8hvirNKCcmiZoLNls=.3b34bf15-51fc-42bf-a294-1c23ca99754c@github.com> References: <-iwh_5JGpt-TAVpfZQjwbnIG_c8hvirNKCcmiZoLNls=.3b34bf15-51fc-42bf-a294-1c23ca99754c@github.com> Message-ID: > Piece-wise cleanup of C1_LIRAssembler_x86, C1_MacroAssembler and related classes. C1 implements the bulk of arch-specific backend there. Major parts of this backend are already removed by #24274, this cleans up another large bulk, and hopefully most of it. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux x86_64 server fastdebug, `all` + `-XX:TieredStopAtLevel=1` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24301/files - new: https://git.openjdk.org/jdk/pull/24301/files/47f239c2..527854ec Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24301&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24301&range=00-01 Stats: 12 lines in 2 files changed: 0 ins; 11 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24301.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24301/head:pull/24301 PR: https://git.openjdk.org/jdk/pull/24301 From shade at openjdk.org Tue Apr 1 07:58:27 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 1 Apr 2025 07:58:27 GMT Subject: RFR: 8353188: C1: Clean up x86 backend after 32-bit x86 removal [v2] In-Reply-To: References: <-iwh_5JGpt-TAVpfZQjwbnIG_c8hvirNKCcmiZoLNls=.3b34bf15-51fc-42bf-a294-1c23ca99754c@github.com> Message-ID: <5WuuW8GQhWOxXqYgEsVG0DZAjsu8DTjOdJZKWaae7vU=.be96f09d-9e3e-4472-94d1-3d92b487eb33@github.com> On Mon, 31 Mar 2025 21:14:45 GMT, Vladimir Ivanov wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments > > src/hotspot/cpu/x86/c1_FrameMap_x86.cpp line 45: > >> 43: Register reg = r_1->as_Register(); >> 44: if (r_2->is_Register() && (type == T_LONG || type == T_DOUBLE)) { >> 45: Register reg2 = r_2->as_Register(); > > FTR `reg2` is unused. (Moreover, `r_2` and `r_2->is_Register()` are redundant on x64.) Right. Cleaned those up too. > src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp line 827: > >> 825: // compressed klass ptrs: T_METADATA can be a compressed klass >> 826: // ptr or a 64 bit method pointer. >> 827: ShouldNotReachHere(); > > Alternatively, you could drop the whole `T_METADATA` case and defer the handling to default case. I initially thought leaving the comment there as meaningful, but now I think that comment only relates to 32-bit x86, so now is redundant. So I dropped the `T_METADATA` case completely. > src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp line 3063: > >> 3061: ExternalAddress((address)double_signflip_pool), >> 3062: rscratch1); >> 3063: > > Is it intentional or just a leftover? Merge leftover, removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24301#discussion_r2022343408 PR Review Comment: https://git.openjdk.org/jdk/pull/24301#discussion_r2022344553 PR Review Comment: https://git.openjdk.org/jdk/pull/24301#discussion_r2022344878 From roland at openjdk.org Tue Apr 1 08:06:09 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 1 Apr 2025 08:06:09 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v7] In-Reply-To: References: Message-ID: <09Q1vDaTXq3VlLU4xxQl_E7wDM2FT7tqR_Bc8ky8RNc=.4e11f2f8-75c3-49a1-b0b3-20eac17c4b39@github.com> On Tue, 1 Apr 2025 07:09:46 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > src/hotspot/share/opto/node.cpp line 3096: > >> 3094: // paths. The dead paths are then replaced by a Halt node. >> 3095: void TypeNode::make_paths_from_here_dead(PhaseIterGVN* igvn, PhaseIdealLoop* loop, const char* phase_str) { >> 3096: Unique_Node_List wq; > > Should there be a `ResourceMark` here? The callers have the `ResourceMark`. This is because it's code I extracted from 8275202: I think it used to not be safe to call `PhaseIdealLoop::register_new_node` from within the `ResourceMark` but I see there were changes in that area (data structures used by `PhaseIdealLoop` no longer allocated in the resource area). So it looks like it could be changed now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23468#discussion_r2022357322 From epeter at openjdk.org Tue Apr 1 08:25:21 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 1 Apr 2025 08:25:21 GMT Subject: RFR: 8347449: C2: UseLoopPredicate off should also turn UseProfiledLoopPredicate off [v6] In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 09:09:59 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> When running with `-XX:-UseLoopPredicate` C2 still inserts profiled loop parse predicates, despite those being a form of loop parse predicate. Further, the loop predicate code is not always consistent when to insert/expect profiled parse predicates. >> >> # Change Summary >> >> Following the rationale, that profiled predicates are a subset of loop predicates, this PR disables profiled predicates whenever loop predicates are disabled. They are disabled on the level of arguments. Further, before any checks for whether profiled predicates are enabled, this PR inserts a check that loop predicates are enabled such that the code is consistent in its intention. >> >> Concretel, this PR >> - adds parse predicate nodes to the IR testing framework, >> - turns off `UseProfiledLoopPredicate` if `UseLoopPredicate` is turned off, >> - predicates all checks for `UseProfiledLoopPredicate`on `UseLoopPredicate` first for consistency, >> - adds a regression test. >> >> >> # Testing >> >> The changes passed the following testing: >> - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14078750038) >> - tier1 through tier3 and Oracle internal testing > > Manuel H?ssig has updated the pull request incrementally with two additional commits since the last revision: > > - idealKit::loop: always call add_parse_predicates > > It was contstrained on UseParsePredicate, but this is incorrect, since > all parse predicates are added in that function. > - Improve description of UseLoopPredicate argument Looks good to me now, thanks for the updates! ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24248#pullrequestreview-2731884935 From chagedorn at openjdk.org Tue Apr 1 08:30:34 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 1 Apr 2025 08:30:34 GMT Subject: RFR: 8347449: C2: UseLoopPredicate off should also turn UseProfiledLoopPredicate off [v6] In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 09:09:59 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> When running with `-XX:-UseLoopPredicate` C2 still inserts profiled loop parse predicates, despite those being a form of loop parse predicate. Further, the loop predicate code is not always consistent when to insert/expect profiled parse predicates. >> >> # Change Summary >> >> Following the rationale, that profiled predicates are a subset of loop predicates, this PR disables profiled predicates whenever loop predicates are disabled. They are disabled on the level of arguments. Further, before any checks for whether profiled predicates are enabled, this PR inserts a check that loop predicates are enabled such that the code is consistent in its intention. >> >> Concretel, this PR >> - adds parse predicate nodes to the IR testing framework, >> - turns off `UseProfiledLoopPredicate` if `UseLoopPredicate` is turned off, >> - predicates all checks for `UseProfiledLoopPredicate`on `UseLoopPredicate` first for consistency, >> - adds a regression test. >> >> >> # Testing >> >> The changes passed the following testing: >> - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14078750038) >> - tier1 through tier3 and Oracle internal testing > > Manuel H?ssig has updated the pull request incrementally with two additional commits since the last revision: > > - idealKit::loop: always call add_parse_predicates > > It was contstrained on UseParsePredicate, but this is incorrect, since > all parse predicates are added in that function. > - Improve description of UseLoopPredicate argument Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24248#pullrequestreview-2731900566 From chagedorn at openjdk.org Tue Apr 1 08:35:27 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 1 Apr 2025 08:35:27 GMT Subject: RFR: 8341976: C2: use_mem_state != load->find_exact_control(load->in(0)) assert failure [v7] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 07:31:12 GMT, Roland Westrelin wrote: >> The `arraycopy` writes to a non escaping array so its `ArrayCopy` node >> is marked as having a narrow memory effect. One of the loads from the >> destination after the copy is transformed into a load from the source >> array (the rationale being that if there's no load from the >> destination of the copy, the `arraycopy` is not needed). The load from >> the source has the input memory state of the `ArrayCopy` as memory >> input. That load is then sunk out of the loop and its control is >> updated to be after the `ArrayCopy`. That's legal because the >> `ArrayCopy` only has a narrow memory effect and can't modify the >> source. The `ArrayCopy` can't be eliminated and is expanded. In the >> process, a `MemBar` that has a wide memory effect is added. The load >> from the source has control after the membar but memory state before >> and because the membar has a wide memory effect, the load is anti >> dependent on the membar: the graph is broken (the load can't be pinned >> after the membar and anti dependent on it). >> >> In short, the problem is that the graph is transformed under the >> assumption that the `ArrayCopy` has a narrow effect but the >> `ArrayCopy` is expanded to a subgraph that has a wide memory >> effect. The fix I propose is to not insert a membar with a wide memory >> effect. We still need a membar when the destination is non escaping >> because the expanded `ArrayCopy`, if it writes to a tighly allocated >> array, writes to raw memory and not to the destination memory slice. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: > > - review > - Merge branch 'master' into JDK-8341976 > - review > - review > - Merge branch 'master' into JDK-8341976 > - -XX:+TraceLoopOpts fix > - review > - more > - Merge branch 'master' into JDK-8341976 > - more > - ... and 6 more: https://git.openjdk.org/jdk/compare/c777fe68...9b21648d Update looks good thanks! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23465#pullrequestreview-2731911587 From chagedorn at openjdk.org Tue Apr 1 08:35:27 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 1 Apr 2025 08:35:27 GMT Subject: RFR: 8341976: C2: use_mem_state != load->find_exact_control(load->in(0)) assert failure [v2] In-Reply-To: References: <9cGlvzZnXc8B5tNxXSE2Eqi2FDJzP26U7c-yan4ZdCc=.3f6b0821-8b6c-453d-87ee-91205cc6627a@github.com> Message-ID: On Tue, 1 Apr 2025 07:30:47 GMT, Roland Westrelin wrote: >>> So maybe, we could treat that Opaque node the way we do for OpaqueZeroTripGuard and have it constant fold when the backedge is never taken. >> >> Right, that sounds like a good solution. >> >>> So I should revert the change to the IdealLoopTree::dump_head() and the test run with TraceLoopOpts? >> >> Yes, that would be great. We can make a comment in [JDK-8297752](https://bugs.openjdk.org/browse/JDK-8297752) to add `-XX:+TraceLoopOpts` as additional run to this test when we fix it. > > Done. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23465#discussion_r2022403189 From chagedorn at openjdk.org Tue Apr 1 08:53:32 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 1 Apr 2025 08:53:32 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v7] In-Reply-To: <09Q1vDaTXq3VlLU4xxQl_E7wDM2FT7tqR_Bc8ky8RNc=.4e11f2f8-75c3-49a1-b0b3-20eac17c4b39@github.com> References: <09Q1vDaTXq3VlLU4xxQl_E7wDM2FT7tqR_Bc8ky8RNc=.4e11f2f8-75c3-49a1-b0b3-20eac17c4b39@github.com> Message-ID: On Tue, 1 Apr 2025 08:03:51 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/node.cpp line 3096: >> >>> 3094: // paths. The dead paths are then replaced by a Halt node. >>> 3095: void TypeNode::make_paths_from_here_dead(PhaseIterGVN* igvn, PhaseIdealLoop* loop, const char* phase_str) { >>> 3096: Unique_Node_List wq; >> >> Should there be a `ResourceMark` here? > > The callers have the `ResourceMark`. This is because it's code I extracted from 8275202: I think it used to not be safe to call `PhaseIdealLoop::register_new_node` from within the `ResourceMark` but I see there were changes in that area (data structures used by `PhaseIdealLoop` no longer allocated in the resource area). So it looks like it could be changed now. I assume that JDK-8275202 also calls this method with a non-null `PhaseIdealLoop` pointer? Now we only pass in null, so the `loop` parameter could be removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23468#discussion_r2022418989 From chagedorn at openjdk.org Tue Apr 1 08:53:33 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 1 Apr 2025 08:53:33 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v7] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 10:38:21 GMT, Roland Westrelin wrote: >> This is primarily motivated by 8275202 (C2: optimize out more >> redundant conditions). In the following code snippet: >> >> >> int[] array = new int[arraySize]; >> if (j <= arraySize) { >> if (i >= 0) { >> if (i < j) { >> int v = array[i]; >> >> >> (`arraySize` is a constant) >> >> at the range check, `j` is known to be in `[min, arraySize]` as a >> consequence, `i` is known to be `[0, arraySize-1]`. The range check >> can be eliminated. >> >> Now, if later, `i` constant folds to some value that's positive but >> out of range for the array: >> >> - if that happens when the new pass runs, then it can prove that: >> >> if (i < j) { >> >> is never taken. >> >> - if that happens during IGVN or CCP however, that condition is not >> constant folded. And because the range check was removed, there's no >> guard protecting the range check `CastII`. It becomes `top` and, as >> a result, the graph can become broken. >> >> What I propose here is that when the `CastII` becomes dead, any CFG >> paths that use the `CastII` node is made unreachable. So in pseudo code: >> >> >> int[] array = new int[arraySize]; >> if (j <= arraySize) { >> if (i >= 0) { >> if (i < j) { >> halt(); >> >> >> Finding the CFG paths is implemented in the patch by following the >> uses of the node until a CFG node or a `Phi` is encountered. >> >> The patch applies this to all `Type` nodes as with 8275202, I also ran >> in some rare corner cases with other types of nodes. The exception is >> `Phi` nodes which may not be as easy to handle (and for which I had no >> issue with 8275202). >> >> Finally, the patch includes a test case that's unrelated to the >> discussion of 8275202 above. In that test case, a `CastII` becomes top >> but the test that guards it doesn't constant fold. The root cause is a >> transformation of: >> >> >> (CastII (AddI >> >> >> into >> >> >> (AddI (CastII ) (CastII)` >> >> >> which causes the resulting node to have a wider type. The `CastII` >> captures a type before the transformation above happens. Once it has >> happened, the guard for the `CastII` can't be constant folded when an >> out of bound value occurs. >> >> This is likely fixable some other way (eventhough it doesn't seem >> straightforward). Given the long history of similar issues (and the >> test case that shows that they are more hiding), I think it would >> make sense to try some other way of approaching them. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review src/hotspot/share/opto/node.cpp line 3155: > 3153: > 3154: > 3155: Suggestion: src/hotspot/share/opto/phaseX.cpp line 1836: > 1834: _type_nodes.push(n); > 1835: } > 1836: const Type* new_type = n->Value(this); Could we also only add `n` to `_type_nodes` if `new_type` is not top? Then we could also rename `_type_nodes` to `_maybe_top_type_nodes` or something like that. test/hotspot/jtreg/compiler/c2/TestGuardOfCastIIDoesntFold.java line 31: > 29: * -XX:CompileCommand=dontinline,TestGuardOfCastIIDoesntFold::notInlined > 30: * TestGuardOfCastIIDoesntFold > 31: * @run main/othervm TestGuardOfCastIIDoesntFold You can use `main` since you don't pass any flags: Suggestion: * @run main TestGuardOfCastIIDoesntFold ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23468#discussion_r2022428891 PR Review Comment: https://git.openjdk.org/jdk/pull/23468#discussion_r2022428263 PR Review Comment: https://git.openjdk.org/jdk/pull/23468#discussion_r2022422961 From chagedorn at openjdk.org Tue Apr 1 09:14:43 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 1 Apr 2025 09:14:43 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v6] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 07:06:32 GMT, Emanuel Peter wrote: >> We should extend the functionality of Verify.checkEQ: >> - Allow different NaN encodings to be seen as equal (by default). >> - Compare VectorAPI vectors. >> - Compare Exceptions, and their messages. >> - Compare arbitrary Objects via Reflection. >> >> Note: this is a prerequisite for the Template Library [JDK-8352861](https://bugs.openjdk.org/browse/JDK-8352861) / https://github.com/openjdk/jdk/pull/23418. > > Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: > > - upate copyright > - Apply suggestions from code review > > Co-authored-by: Christian Hagedorn I'll have a closer look at the code later again :-) ------------- PR Review: https://git.openjdk.org/jdk/pull/24224#pullrequestreview-2732020875 From chagedorn at openjdk.org Tue Apr 1 09:14:43 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 1 Apr 2025 09:14:43 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v5] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 07:02:11 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/lib/verify/Verify.java line 209: >> >>> 207: print(a, b, field, aParent, bParent); >>> 208: throw new VerifyException("Object type not supported: " + ca.getName() + " -- did you mean to 'enableCheckWithArbitraryClasses'?"); >>> 209: } >> >> What's the reason behind throwing instead of just comparing two arbitrary objects by default? If a user calls `Verify.checkEQ()` and sees this exception, I would guess he then just passes the additional option and we have the same result. But maybe I'm missing something. > > Good question. I think my reasoning was that comparing arbitrary classes requires reflection. And that is rather slow. So by default it would be good if that feature is not enabled, so the user tries to avoid it, and is aware when they enable it explicitly. > > But if you think that is not useful, I can remove the feature. > > @chhagedorn what do you think? I think the intention to let the user double check is good. I'm not sure though if the user is really aware of the potential slow down without diving deeper into the implementation. All they know is that `checkEQ` somehow does not support their some objects but there is a simple workaround to still use it. So, the real question is: How many users will then consider doing something different when facing this exception and not just enable it anyway? I guess enabling is probably the most natural thing to do. Given that, I would probably just drop this. It would also simplify the API usage in the following way: We would only have checks with NaNs being all equals and comparing raw bits (i.e. NaNs not equal). Then you could offer `checkEQ()` (default) and `checkRawBitsEQ()` or something like that. Then users do not need to worry about creating and passing in an `Options`. What do you think about these suggestions? What we could do either way at the `checkEQ()` API method: Describe the potential slow down with reflection when not using certain classes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2022461294 From chagedorn at openjdk.org Tue Apr 1 10:09:15 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 1 Apr 2025 10:09:15 GMT Subject: RFR: 8350563: C2 compilation fails because PhaseCCP does not reach a fixpoint [v7] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 16:08:57 GMT, Liam Miller-Cushon wrote: >> Hello, please consider this fix for [JDK-8350563](https://bugs.openjdk.org/browse/JDK-8350563) contributed by my colleague Matthias Ernst. >> >> https://github.com/openjdk/jdk/pull/22856 introduced a new `Value()` optimization for the pattern `AndIL(Con, Mask)`. >> This optimization can look through CastNodes, and therefore requires additional logic in CCP to push these >> transitive uses to the worklist. >> >> The optimization is closely related to analogous optimizations for SHIFT nodes, and we also extend the existing logic for >> CCP worklist handling: the current logic is "if the shift input to a SHIFT node changes, push indirect AND node uses to the CCP worklist". >> We extend it by adding "if the (new) type of a node is an IntegerType that `is_con, ...` to the predicate. > > Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: > > Explicitly check for OP_Con instead of TypeInteger::is_con. > > 322 Phi === 303 119 255 [[ 399 388 351 751 366 377 ]] #int:-256..127 !jvms: Integer::parseInt @ bci:151 (line 625) > > While this Phi dumps as "#int:-256..127", `phase->type(expr)` returns a type that is_con -256. Thanks Matthias for having a look at the issue and proposing a fix! While this fix seems to work, I think we should address it slightly differently with an explicit bailout, though. Let's step back a bit: CCP first sets all types to top and then tries to widen them (i.e. an optimistic approach) while IGVN does the opposite: We start by setting all types to bottom and then try to narrow them (i.e. a pessimistic approach). The assert we've faced in CCP complains that we tried to narrow some type again which is against the rules of CCP - we can only widen types. Now when CCP runs, we start with every type of every node at top. When visiting `AndI` at some point, we see what you reported above: > What I observe for the Integer.parseInt reproducer is that expr dumps as a phi node with type #int:-256...127, but phase->type(expr) returns a type that is_con() with value -256. That is perfectly fine. What happened here is that only one input of the phi with type `#int:-256` is non-top. The other inputs are still top (i.e. not processed in CCP, yet). Therefore, the phi's type is set to `#int:-256`. Note that the `TypeNode::_type` field of the phi is still set to the type we had before CCP, i.e. ` #int:-256...127` . In CCP, we use `PhaseValues::_types` which are set to top in the beginning and we leave `TypeNode::_type` unchanged during the analysis. As a consequence this can happen when having a phi and only looking at the currently tracked CCP types: > In consequence, the AND(phi-node, mask) gets optimized to zero. Let's look at the output of the failure: 304 ConI === 0 [[ 506 ]] #int:255 996 CastII === 461 453 [[ 557 546 535 524 1034 506 ]] #int:-256..127 extra types: {0:int:-256} strong dependency !orig=[478] !jvms: Integer::parseInt @ bci:144 (line 550) 506 AndI === _ 996 304 [[ 507 ]] !jvms: Integer::parseInt @ bci:170 (line 552) told = int:0 tnew = top it looks like we first optimized `AndI` to zero (i.e. `told`) and then set it to top again in a later `Value()` call in CCP (i.e. `tnew`). This is a violation of the rules for CCP. When we suddenly see top again, it suggests that we prematurely applied an optimization while one of the involved inputs was actually still top. This looks wrong and we should have waited until all the involved inputs are non-top. When looking at the code, we check that `mask` is an integer type and thus non-top here: https://github.com/openjdk/jdk/blob/f25f701652900d02858c905f4cd0bb43208c13d5/src/hotspot/share/opto/mulnode.cpp#L2255-L2260 But it looks like we miss that for `expr` when it is a cast node (which is `996 CastII` in the failing test). We pass `expr` to `AndIL_min_trailing_zeros()` and then uncast it and only then check if it is a proper integer type: https://github.com/openjdk/jdk/blob/f25f701652900d02858c905f4cd0bb43208c13d5/src/hotspot/share/opto/mulnode.cpp#L2180-L2185 So, if the type of `996 CastII` in CCP is still top, we skip it with `uncast()` and then check the phi above which has first the constant type `#int:-256`. We can apply the optimization to return type zero. When later updating the type of the phi to `#int:-256...127`, we can no longer apply the optimization and fall back to `MulNode::Value()` where we return top because the input `996 CastII` is still top: https://github.com/openjdk/jdk/blob/f25f701652900d02858c905f4cd0bb43208c13d5/src/hotspot/share/opto/mulnode.cpp#L185-L187 We find top which is narrower than type zero and we fail with the assert. Long story short, you should check for `expr` being top before uncasting it. This was hard to see and is only a problem in CCP. I suggest to add the small reproducer as additional test case. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23871#issuecomment-2768852982 From thartmann at openjdk.org Tue Apr 1 11:42:17 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 1 Apr 2025 11:42:17 GMT Subject: RFR: 8352893: C2: OrL/INode::add_ring optimize (x | -1) to -1 [v3] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 14:31:43 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> The `add_ring()` implementations of `OrINode` and `OrLNode` are missing the optimization that an or with a value where all bits are ones (since we have signed integers in this case `~0 == -1`) will always yield all zeroes. >> >> # Changes >> >> This PR makes the following straight forward changes: >> - `Or(I|L)Node::add_ring()` returns `-1` if one of the two inputs is `-1`. >> - Add `Or(I|L)` nodes to the IR framework. >> - Add a regression IR test for the implemented optimization. >> >> # Testing >> >> - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14110978686) >> - Ran tier1 through tier3 and Oracle internal testing > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Remove loop in test and instead use random values Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24289#pullrequestreview-2732401043 From roland at openjdk.org Tue Apr 1 12:50:14 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 1 Apr 2025 12:50:14 GMT Subject: RFR: 8352418: Add verification code to check that the associated loop nodes of useless Template Assertion Predicates are dead In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 12:28:59 GMT, Christian Hagedorn wrote: > As already suggested in https://github.com/openjdk/jdk/pull/23823, I want to do the following additional verification: > > After `eliminate_useless_predicates()` all now useless `OpaqueTemplateAssertionPredicate` nodes should not have any references to `CountedLoop` nodes that are still in the graph (otherwise, they would have been marked useful). This verification did not work reliably without the full Assertion Predicates fix [JDK-8350577](https://bugs.openjdk.org/browse/JDK-8350577). Since JDK-8350577 is now integrated, I propose to add this additional verification code. > > Thanks, > Christian Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24326#pullrequestreview-2732570055 From chagedorn at openjdk.org Tue Apr 1 12:50:14 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 1 Apr 2025 12:50:14 GMT Subject: RFR: 8352418: Add verification code to check that the associated loop nodes of useless Template Assertion Predicates are dead In-Reply-To: References: Message-ID: <7r2XMglIgMjvCYaPfESV79PvYsGTo8vojzPadFN-Hu4=.4d2e576e-fb9b-4dd0-add4-a60248fa03f5@github.com> On Mon, 31 Mar 2025 12:28:59 GMT, Christian Hagedorn wrote: > As already suggested in https://github.com/openjdk/jdk/pull/23823, I want to do the following additional verification: > > After `eliminate_useless_predicates()` all now useless `OpaqueTemplateAssertionPredicate` nodes should not have any references to `CountedLoop` nodes that are still in the graph (otherwise, they would have been marked useful). This verification did not work reliably without the full Assertion Predicates fix [JDK-8350577](https://bugs.openjdk.org/browse/JDK-8350577). Since JDK-8350577 is now integrated, I propose to add this additional verification code. > > Thanks, > Christian Thanks Roland for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24326#issuecomment-2769245625 From mchevalier at openjdk.org Tue Apr 1 13:04:23 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Tue, 1 Apr 2025 13:04:23 GMT Subject: RFR: 8348887: Create IR framework test for JDK-8347997 In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 04:52:53 GMT, Galder Zamarre?o wrote: >> As the ticket says: >>> Create IR framework test which checks that allocations are eliminated in the regression test included in [JDK-8347997](https://bugs.openjdk.org/browse/JDK-8347997) fix. >> >> So here it is! We can see that in case of inlining, indeed, no allocation happens. The second part is some sanity check to emphasize the difference: of course, there is an allocation without inlining. The benefit of this second part is arguable. From my point of view, it's mostly to point out the difference to a future reader. But yes, there is nothing very surprising. >> >> Thanks, >> Marc > > test/hotspot/jtreg/compiler/c2/irTests/TestContinuationPinningAndEA.java line 118: > >> 116: >> 117: @DontInline >> 118: public CrashesNoInline() throws Throwable { > > It's probably my own ignorance, but just in case are others are in the same boat, why does this crash? Could you add a brief javadoc for future readers? Same with other Crashes cases. It's rather bad (uninspired) naming. I based this test on the test introduced by [JDK-8347997](https://bugs.openjdk.org/browse/JDK-8347997), which (I suspect) is based on the reproducer mentioned in JBS. There are 2 cases: one made EA crash, the other make it fail (not detect the non escaping, as far as I understand). From Vladimir's comment on PR 23284, it used to crash because of a corrupted memory graph. Honestly, I'm not quite clear on that. There is already a test (from said ticket and PR) making sure it doesn't crash. The point of the test I'm adding is to check that the allocation is gone (thanks to EA). Maybe the best is rather to rename the cases "Crashes" and "FailEA": it made sense in the context of the original bug, but it's not very useful names for the future. But I'm not sure what would be fitting. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24328#discussion_r2022810119 From dfenacci at openjdk.org Tue Apr 1 13:19:26 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 1 Apr 2025 13:19:26 GMT Subject: RFR: 8341976: C2: use_mem_state != load->find_exact_control(load->in(0)) assert failure [v7] In-Reply-To: References: Message-ID: <-2HR8vsW5xGAbW5EviewkowFNsq-HH51yjwWA9uLC5g=.6c02442c-2e34-41e8-a808-10ab3c52eefc@github.com> On Tue, 1 Apr 2025 07:31:12 GMT, Roland Westrelin wrote: >> The `arraycopy` writes to a non escaping array so its `ArrayCopy` node >> is marked as having a narrow memory effect. One of the loads from the >> destination after the copy is transformed into a load from the source >> array (the rationale being that if there's no load from the >> destination of the copy, the `arraycopy` is not needed). The load from >> the source has the input memory state of the `ArrayCopy` as memory >> input. That load is then sunk out of the loop and its control is >> updated to be after the `ArrayCopy`. That's legal because the >> `ArrayCopy` only has a narrow memory effect and can't modify the >> source. The `ArrayCopy` can't be eliminated and is expanded. In the >> process, a `MemBar` that has a wide memory effect is added. The load >> from the source has control after the membar but memory state before >> and because the membar has a wide memory effect, the load is anti >> dependent on the membar: the graph is broken (the load can't be pinned >> after the membar and anti dependent on it). >> >> In short, the problem is that the graph is transformed under the >> assumption that the `ArrayCopy` has a narrow effect but the >> `ArrayCopy` is expanded to a subgraph that has a wide memory >> effect. The fix I propose is to not insert a membar with a wide memory >> effect. We still need a membar when the destination is non escaping >> because the expanded `ArrayCopy`, if it writes to a tighly allocated >> array, writes to raw memory and not to the destination memory slice. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: > > - review > - Merge branch 'master' into JDK-8341976 > - review > - review > - Merge branch 'master' into JDK-8341976 > - -XX:+TraceLoopOpts fix > - review > - more > - Merge branch 'master' into JDK-8341976 > - more > - ... and 6 more: https://git.openjdk.org/jdk/compare/28e6ceb4...9b21648d Looks good to me. Thanks @rwestrel. ------------- Marked as reviewed by dfenacci (Committer). PR Review: https://git.openjdk.org/jdk/pull/23465#pullrequestreview-2732658616 From dlunden at openjdk.org Tue Apr 1 14:19:10 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Tue, 1 Apr 2025 14:19:10 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v12] In-Reply-To: References: Message-ID: > If a method has a large number of parameters, we currently bail out from C2 compilation. > > ### Changeset > > Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. > > Changes: > - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. > - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. > - Remove all `can_represent` checks and bailouts. > - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. > - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. > - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, not worth it). > > ![c2-regression](https:/... Daniel Lund?n has updated the pull request incrementally with two additional commits since the last revision: - Formatting updates - Add register mask fuzzer test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20404/files - new: https://git.openjdk.org/jdk/pull/20404/files/fbfddb29..5be718e8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20404&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20404&range=10-11 Stats: 324 lines in 2 files changed: 324 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20404.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20404/head:pull/20404 PR: https://git.openjdk.org/jdk/pull/20404 From dlunden at openjdk.org Tue Apr 1 14:19:11 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Tue, 1 Apr 2025 14:19:11 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v11] In-Reply-To: <0Yf6qZwnLz7oAtSFscDwHifQAmaPuHzeSrpkqMVchDU=.c7a5e8af-9390-414b-850c-609110668eac@github.com> References: <0Yf6qZwnLz7oAtSFscDwHifQAmaPuHzeSrpkqMVchDU=.c7a5e8af-9390-414b-850c-609110668eac@github.com> Message-ID: On Mon, 24 Mar 2025 15:33:34 GMT, Daniel Lund?n wrote: >> If a method has a large number of parameters, we currently bail out from C2 compilation. >> >> ### Changeset >> >> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. >> >> Changes: >> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. >> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. >> - Remove all `can_represent` checks and bailouts. >> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. >> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. >> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, no... > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Extend example with offset register mask > As we discussed offline, the test coverage of register mask operations with extended dynamic parts, non-zero offsets, etc. is fairly low (basically limited to the new JTReg tests included in this changeset). To increase coverage, I have extended `test_regmask.cpp` with tests that perform random operations on a register mask and on a reference bit set and check that the result is equivalent on both data structures. Here is the extension: [4ee703f](https://github.com/openjdk/jdk/commit/4ee703f1ab73f8f43d4603d7fa88dcc8f4950ec0). I ran the random tests a few times on different platforms and could not find any failure, which gives a good confidence of the correctness of the register mask operation changes. I also tested the effectiveness of the tests themselves by injecting a few failures in the register mask implementation and confirming their detection. Feel free to include the test extensions in this changeset (you might want to go through the code and clean it up a bit before, thoug h, things like e.g. naming consistency). I've now reviewed the register mask fuzzer tests and found no errors. Looks good! I applied some code formatting, though. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20404#issuecomment-2769520020 From dlunden at openjdk.org Tue Apr 1 14:36:38 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Tue, 1 Apr 2025 14:36:38 GMT Subject: RFR: 8282053: IGV: refine schedule approximation In-Reply-To: References: Message-ID: <0dg9XeqluKkZEUgPNJEzwuCUHiG36RaZvr9GggckWQ4=.1efe129b-865f-41c0-92ac-27b91f055f5a@github.com> On Tue, 1 Apr 2025 07:23:04 GMT, Daniel Skantz wrote: > This patch refines the schedule approximation in IGV by 1) placing parm. and projection nodes in the same block as their predecessors, and 2) disallows erroneously considering machine nodes such as prefetchAlloc and rep_stos as CFG nodes. > > The reader may refer to the corresponding JBS issue where graphs sampled before and after the change are attached. > > Testing: T1-T3 with no failures. Opened graphs before and after the change and saw no obvious problems. Opened a large number of graphs in CFG view and observed no unexpected IGV warnings, errors or assert failures. Good CFG scheduling approximation improvement! Just one style suggestion. src/utils/IdealGraphVisualizer/ServerCompiler/src/main/java/com/sun/hotspot/igv/servercompiler/ServerCompilerScheduler.java line 800: > 798: n.isCFG = true; > 799: } else if (n.inputNode.getProperties().get("type").equals("bottom") > 800: && n.preds.size() > 0 && Suggestion: } else if (n.inputNode.getProperties().get("type").equals("bottom") && n.preds.size() > 0 && For consistent placement of `&&` (already a problem before this changeset, but might as well fix now) ------------- Marked as reviewed by dlunden (Committer). PR Review: https://git.openjdk.org/jdk/pull/24350#pullrequestreview-2732918158 PR Review Comment: https://git.openjdk.org/jdk/pull/24350#discussion_r2022983828 From dskantz at openjdk.org Tue Apr 1 14:42:47 2025 From: dskantz at openjdk.org (Daniel Skantz) Date: Tue, 1 Apr 2025 14:42:47 GMT Subject: RFR: 8282053: IGV: refine schedule approximation [v2] In-Reply-To: References: Message-ID: <0SXZ0k-28IdpWjuhtK4dSJ9ybHE58Oq56zT_sdqeQpc=.0cf9319f-f64d-4e8f-82e9-f654464bc775@github.com> > This patch refines the schedule approximation in IGV by 1) placing parm. and projection nodes in the same block as their predecessors, and 2) disallows erroneously considering machine nodes such as prefetchAlloc and rep_stos as CFG nodes. > > The reader may refer to the corresponding JBS issue where graphs sampled before and after the change are attached. > > Testing: T1-T3 with no failures. Opened graphs before and after the change and saw no obvious problems. Opened a large number of graphs in CFG view and observed no unexpected IGV warnings, errors or assert failures. Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: Update src/utils/IdealGraphVisualizer/ServerCompiler/src/main/java/com/sun/hotspot/igv/servercompiler/ServerCompilerScheduler.java Co-authored-by: Daniel Lund?n ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24350/files - new: https://git.openjdk.org/jdk/pull/24350/files/52667ad5..57ad6dc8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24350&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24350&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24350.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24350/head:pull/24350 PR: https://git.openjdk.org/jdk/pull/24350 From dlunden at openjdk.org Tue Apr 1 14:46:25 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Tue, 1 Apr 2025 14:46:25 GMT Subject: RFR: 8282053: IGV: refine schedule approximation [v2] In-Reply-To: <0SXZ0k-28IdpWjuhtK4dSJ9ybHE58Oq56zT_sdqeQpc=.0cf9319f-f64d-4e8f-82e9-f654464bc775@github.com> References: <0SXZ0k-28IdpWjuhtK4dSJ9ybHE58Oq56zT_sdqeQpc=.0cf9319f-f64d-4e8f-82e9-f654464bc775@github.com> Message-ID: On Tue, 1 Apr 2025 14:42:47 GMT, Daniel Skantz wrote: >> This patch refines the schedule approximation in IGV by 1) placing parm. and projection nodes in the same block as their predecessors, and 2) disallows erroneously considering machine nodes such as prefetchAlloc and rep_stos as CFG nodes. >> >> The reader may refer to the corresponding JBS issue where graphs sampled before and after the change are attached. >> >> Testing: T1-T3 with no failures. Opened graphs before and after the change and saw no obvious problems. Opened a large number of graphs in CFG view and observed no unexpected IGV warnings, errors or assert failures. > > Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: > > Update src/utils/IdealGraphVisualizer/ServerCompiler/src/main/java/com/sun/hotspot/igv/servercompiler/ServerCompilerScheduler.java > > Co-authored-by: Daniel Lund?n Marked as reviewed by dlunden (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24350#pullrequestreview-2732961006 From chagedorn at openjdk.org Tue Apr 1 15:41:48 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 1 Apr 2025 15:41:48 GMT Subject: RFR: 8334046: Set different values for CompLevel_any and CompLevel_all [v2] In-Reply-To: References: Message-ID: <8atBFgfznyYBW1gmJE9Brk9yoiWYXL1ts6Wr5t_KqZA=.d25be79a-3730-449c-9552-7d42ffb68d50@github.com> On Mon, 31 Mar 2025 03:43:03 GMT, Cesar Soares Lucas wrote: >> Please review this trivial patch to set different values for CompLevel_any and CompLevel_all. >> Setting different values for these fields make the implementation of [this other issue](https://bugs.openjdk.org/browse/JDK-8313713) much cleaner/easier. >> Tested on OSX/Linux Aarch64/x86_64 with JTREG. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Fix WhiteBox constants. Just to let you know, Vladimir is out this week. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24298#issuecomment-2769791125 From chagedorn at openjdk.org Tue Apr 1 15:44:36 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 1 Apr 2025 15:44:36 GMT Subject: RFR: 8350852: Implement JMH benchmark for sparse CodeCache [v3] In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 20:20:41 GMT, Evgeny Astigeevich wrote: >> This benchmark is used to check performance impact of the code cache being sparse. >> >> We use C2 compiler to compile the same Java method multiple times to produce as many code as needed. The Java method is not trivial. It adds two 40 digit positive integers. These compiled methods represent the active methods in the code cache. We split active methods into groups. We put a group into a fixed size code region. We make a code region aligned by its size. CodeCache becomes sparse when code regions are not fully filled. We measure the time taken to call all active methods. >> >> Results: code region size 2M (2097152) bytes >> - Intel Xeon Platinum 8259CL >> >> |activeMethodCount |groupCount |Methods/Group |Score |Error |Units |Diff | >> |--- |--- |--- |--- |--- |--- |--- | >> |128 |1 |128 |19.577 |0.619 |us/op | | >> |128 |32 |4 |22.968 |0.314 |us/op |17.30% | >> |128 |48 |3 |22.245 |0.388 |us/op |13.60% | >> |128 |64 |2 |23.874 |0.84 |us/op |21.90% | >> |128 |80 |2 |23.786 |0.231 |us/op |21.50% | >> |128 |96 |1 |26.224 |1.16 |us/op |34% | >> |128 |112 |1 |27.028 |0.461 |us/op |38.10% | >> |256 |1 |256 |47.43 |1.146 |us/op | | >> |256 |32 |8 |63.962 |1.671 |us/op |34.90% | >> |256 |48 |5 |63.396 |0.247 |us/op |33.70% | >> |256 |64 |4 |66.604 |2.286 |us/op |40.40% | >> |256 |80 |3 |59.746 |1.273 |us/op |26% | >> |256 |96 |3 |63.836 |1.034 |us/op |34.60% | >> |256 |112 |2 |63.538 |1.814 |us/op |34% | >> |512 |1 |512 |172.731 |4.409 |us/op | | >> |512 |32 |16 |206.772 |6.229 |us/op |19.70% | >> |512 |48 |11 |215.275 |2.228 |us/op |24.60% | >> |512 |64 |8 |212.962 |2.028 |us/op |23.30% | >> |512 |80 |6 |201.335 |12.519 |us/op |16.60% | >> |512 |96 |5 |198.133 |6.502 |us/op |14.70% | >> |512 |112 |5 |193.739 |3.812 |us/op |12.20% | >> |768 |1 |768 |325.154 |5.048 |us/op | | >> |768 |32 |24 |346.298 |20.196 |us/op |6.50% | >> |768 |48 |16 |350.746 |2.931 |us/op |7.90% | >> |768 |64 |12 |339.445 |7.927 |us/op |4.40% | >> |768 |80 |10 |347.408 |7.355 |us/op |6.80% | >> |768 |96 |8 |340.983 |3.578 |us/op |4.90% | >> |768 |112 |7 |353.949 |2.98 |us/op |8.90% | >> |1024 |1 |1024 |368.352 |5.961 |us/op | | >> |1024 |32 |32 |463.822 |6.274 |us/op |25.90% | >> |1024 |48 |21 |457.674 |15.144 |us/op |24.20% | >> |1024 |64 |16 |477.694 |0.986 |us/op |29.70% | >> |1024 |80 |13 |484.901 |32.601 |us/op |31.60% | >> |1024 |96 |11 |480.8 |27.088 |us/op |30.50% | >> |1024 |112 |9 |474.416 |10.053 |us/op |28.80% | >> >> - AArch64 Neoverse N1 >> >> |activeMethodCount |groupCount |Methods/Group |Score |Error |Units |Diff |... > > Evgeny Astigeevich has updated the pull request incrementally with two additional commits since the last revision: > > - Document assumptions about code placement in CodeCache > - Address bulasevich comment: too many parameters values Just to let you know, Vladimir is out this week. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23831#issuecomment-2769803651 From rcastanedalo at openjdk.org Tue Apr 1 15:59:20 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 1 Apr 2025 15:59:20 GMT Subject: RFR: 8282053: IGV: refine schedule approximation [v2] In-Reply-To: <0SXZ0k-28IdpWjuhtK4dSJ9ybHE58Oq56zT_sdqeQpc=.0cf9319f-f64d-4e8f-82e9-f654464bc775@github.com> References: <0SXZ0k-28IdpWjuhtK4dSJ9ybHE58Oq56zT_sdqeQpc=.0cf9319f-f64d-4e8f-82e9-f654464bc775@github.com> Message-ID: On Tue, 1 Apr 2025 14:42:47 GMT, Daniel Skantz wrote: >> This patch refines the schedule approximation in IGV by 1) placing parm. and projection nodes in the same block as their predecessors, and 2) disallows erroneously considering machine nodes such as prefetchAlloc and rep_stos as CFG nodes. >> >> The reader may refer to the corresponding JBS issue where graphs sampled before and after the change are attached. >> >> Testing: T1-T3 with no failures. Opened graphs before and after the change and saw no obvious problems. Opened a large number of graphs in CFG view and observed no unexpected IGV warnings, errors or assert failures. > > Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: > > Update src/utils/IdealGraphVisualizer/ServerCompiler/src/main/java/com/sun/hotspot/igv/servercompiler/ServerCompilerScheduler.java > > Co-authored-by: Daniel Lund?n Marked as reviewed by rcastanedalo (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24350#pullrequestreview-2733217668 From epeter at openjdk.org Tue Apr 1 16:06:22 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 1 Apr 2025 16:06:22 GMT Subject: RFR: 8252473: [TESTBUG] compiler tests fail with minimal VM: Unrecognized VM option In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 17:40:35 GMT, Zdenek Zambersky wrote: >> This adds `@requires vm.compiler2.enabled` to tests, which fail with `Unrecognized VM option` on client VM. > > Attached file which shows unrecognized VM options for individual tests. > [unrecognized-options.txt](https://github.com/user-attachments/files/19472912/unrecognized-options.txt) @zzambers Generally we want to get away from `@requires vm.compiler2.enabled`, because it means tests are only run on C2 and not other compilers. For example if C2 is disabled and we only have C1. Or only interpreter. Or Graal ... Why not just add the compile flag `-XX:-IgnoreUnrecognizedVMOptions`? That could be a good alternative for most cases, I think. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24262#issuecomment-2769867784 From zzambers at openjdk.org Tue Apr 1 16:16:26 2025 From: zzambers at openjdk.org (Zdenek Zambersky) Date: Tue, 1 Apr 2025 16:16:26 GMT Subject: RFR: 8252473: [TESTBUG] compiler tests fail with minimal VM: Unrecognized VM option In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 17:49:33 GMT, Aleksey Shipilev wrote: >> This adds `@requires vm.compiler2.enabled` to tests, which fail with `Unrecognized VM option` on client VM. > > test/hotspot/jtreg/compiler/arraycopy/TestCloneWithStressReflectiveCode.java line 28: > >> 26: * @bug 8284951 >> 27: * @summary Test clone intrinsic with StressReflectiveCode. >> 28: * @requires vm.compiler2.enabled & vm.debug > > Drive-by comment: multiple `@requires` get AND-ed automatically, so you can just drop a new line with `@requires vm.compiler2.enabled`, and it will still work. I used `@requires` on separate line in cases, where resulting line would be too long (or too messy), but I can use separate line everywhere. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24262#discussion_r2023170812 From jbhateja at openjdk.org Tue Apr 1 16:17:22 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 1 Apr 2025 16:17:22 GMT Subject: RFR: 8352585: Add special case handling for Float16.max/min x86 backend [v7] In-Reply-To: References: Message-ID: > This bugfix patch adds the special handling as per x86 AVX512-FP16 ISA specification[1][2] to compute max/min operations with +/-0.0 or NaN operands. > > Special handling leverage the instruction semantic, central idea is to shuffle the operands such that smaller input gets assigned to second operand for min operation or a larger input gets assigned to second operand for max operation, in addition result equals NaN if an unordered comparison detects first input as a NaN value else we return the result of min/max operation. > > Kindly review and share your feedback. > > Best Regards, > Jatin > > [1] https://www.felixcloutier.com/x86/vminsh > [2] https://www.felixcloutier.com/x86/vmaxsh Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolution ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24169/files - new: https://git.openjdk.org/jdk/pull/24169/files/e2faec77..1713057d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24169&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24169&range=05-06 Stats: 3 lines in 1 file changed: 0 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24169.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24169/head:pull/24169 PR: https://git.openjdk.org/jdk/pull/24169 From jbhateja at openjdk.org Tue Apr 1 16:17:22 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 1 Apr 2025 16:17:22 GMT Subject: RFR: 8352585: Add special case handling for Float16.max/min x86 backend [v6] In-Reply-To: References: <4pVsbXILQQgsiSnldLRVf1fziUMF6PrqkEnr81RoFMg=.a79353fd-5dc2-4c64-8958-01cbc0557618@github.com> Message-ID: On Fri, 28 Mar 2025 22:14:23 GMT, Sandhya Viswanathan wrote: >> Basically assert if one is NaN and other is not. > > On further thought what you have also works. Though we could simplify the assertionCheck method to just one statement: > public static boolean assertionCheck(Float16 actual, Float16 expected) { > return !actual.equals(expected); > } > This is because, the equals method takes care of NaNs. The [equals](https://docs.oracle.com/en/java/javase/24/docs/api/java.base/java/lang/Double.html#equals(java.lang.Object)) uses [representation equivalence](https://docs.oracle.com/en/java/javase/24/docs/api/java.base/java/lang/Double.html#repEquivalence), defining NaN arguments to be equal to each other. DONE ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24169#discussion_r2023172099 From zzambers at openjdk.org Tue Apr 1 16:20:53 2025 From: zzambers at openjdk.org (Zdenek Zambersky) Date: Tue, 1 Apr 2025 16:20:53 GMT Subject: RFR: 8252473: [TESTBUG] compiler tests fail with minimal VM: Unrecognized VM option In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 18:49:51 GMT, Vladimir Kozlov wrote: > Can we run some of them with Graal? When no C2 specific flags are used. Unfortunately I don't have experience with Graal. So I don't know how that would work. Does graal implement some C2-only flags? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24262#issuecomment-2769902335 From zzambers at openjdk.org Tue Apr 1 16:28:23 2025 From: zzambers at openjdk.org (Zdenek Zambersky) Date: Tue, 1 Apr 2025 16:28:23 GMT Subject: RFR: 8252473: [TESTBUG] compiler tests fail with minimal VM: Unrecognized VM option In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 16:03:40 GMT, Emanuel Peter wrote: > Why not just add the compile flag `-XX:-IgnoreUnrecognizedVMOptions`? That could be a good alternative for most cases, I think. I saw that approach sometimes used as well. (My little probably unfounded concern would be that typos in args could than be silently ignored.) I can change my PR to use `-XX:-IgnoreUnrecognizedVMOptions` instead, if that approach is preferable. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24262#issuecomment-2769917073 From rcastanedalo at openjdk.org Tue Apr 1 16:34:24 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 1 Apr 2025 16:34:24 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v12] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 14:19:10 GMT, Daniel Lund?n wrote: >> If a method has a large number of parameters, we currently bail out from C2 compilation. >> >> ### Changeset >> >> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. >> >> Changes: >> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. >> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. >> - Remove all `can_represent` checks and bailouts. >> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. >> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. >> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, no... > > Daniel Lund?n has updated the pull request incrementally with two additional commits since the last revision: > > - Formatting updates > - Add register mask fuzzer test I have gone through the entire changeset now and could not find any obvious functional issue, good job Daniel! src/hotspot/share/opto/chaitin.cpp line 1425: > 1423: // a physical register is found > 1424: if (OptoReg::is_reg(assigned)) { > 1425: assert(!lrg.mask().is_offset(), "sanity"); Suggestion: assert(!lrg.mask().is_offset(), "offset register masks can only contain stack slots"); src/hotspot/share/opto/chaitin.cpp line 1533: > 1531: // hesitation). > 1532: if (OptoReg::is_valid(reg2) && > 1533: OptoReg::is_reg(reg2 - lrg.mask().offset_bits())) { I agree that this was probably an oversight in the original code. For simplicity I suggest to replace the check with just `OptoReg::is_reg(reg2)` as you suggest, explicitly limiting the scope of the alternation heuristic to physical registers. I compared the overall effectiveness of post-allocation copy removal (as summarized by `-XX:+PrintOptoStatistics`) between this changeset and your proposed simplification and I cannot see any significant difference. I really wonder if the entire alternation heuristic really has any positive measurable effect, but that investigation belongs to another RFE. src/hotspot/share/opto/chaitin.cpp line 1591: > 1589: // will be a no-op. (Later on, if lrg runs out of possible colors in > 1590: // its chunk, a new chunk of color may be tried, in which case > 1591: // examination of neighbors is started again, at retry_next_chunk.) Doesn't the second part of the comment (`(Later on...)`) still apply after the changes? src/hotspot/share/opto/chaitin.cpp line 1655: > 1653: // Bump register mask up to next stack chunk > 1654: bool success = lrg->rollover(); > 1655: if (!success) { Was this scenario (running out of stack slots representable in `OptoRegPairs`) possible before, or was it prevented by some check removed in the changeset? Did you come across it in some compilation or is it more of a "theoretical" guard? src/hotspot/share/opto/chaitin.cpp line 1658: > 1656: // We should never get here in practice. Bail out in product, > 1657: // assert in debug. > 1658: assert(false, "should not happen"); Suggestion: assert(false, "the next available stack slots should be within the OptoRegPair range"); src/hotspot/share/opto/chaitin.cpp line 1660: > 1658: assert(false, "should not happen"); > 1659: C->record_method_not_compilable( > 1660: "chunk-rollover outside of OptoReg range"); Suggestion: "chunk-rollover outside of OptoRegPair range"); src/hotspot/share/opto/regmask.hpp line 282: > 280: _grow(src._rm_size, false); > 281: memcpy(_RM_UP_EXT, src._RM_UP_EXT, > 282: sizeof(uintptr_t) * (src._rm_size - _RM_SIZE)); This code is not very well covered by current tests, please consider adding some tests to `test_regmask.cpp` to exercise it. src/hotspot/share/opto/regmask.hpp line 293: > 291: _hwm = _rm_max(); > 292: } > 293: _set_range(src._rm_size, value, _rm_size - src._rm_size); This code is not very well covered by current tests, please consider adding some tests to `test_regmask.cpp` to exercise it. test/jdk/java/lang/invoke/BigArityTest.java line 32: > 30: * (1) have a large number of parameters, and > 31: * (2) use JSR292 methods internally (which increases the > 32: * MaxNodeLimit with a factor of 3) Just checking: these methods that cause C2 to consume an excessive amount of memory were not C2-compilable before the changeset, right? ------------- Changes requested by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20404#pullrequestreview-2733231312 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2023172642 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2023154419 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2023156355 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2023177582 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2023175078 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2023174027 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2023183358 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2023184495 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2023195229 From rcastanedalo at openjdk.org Tue Apr 1 16:38:27 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 1 Apr 2025 16:38:27 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v12] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 16:28:35 GMT, Roberto Casta?eda Lozano wrote: >> Daniel Lund?n has updated the pull request incrementally with two additional commits since the last revision: >> >> - Formatting updates >> - Add register mask fuzzer test > > test/jdk/java/lang/invoke/BigArityTest.java line 32: > >> 30: * (1) have a large number of parameters, and >> 31: * (2) use JSR292 methods internally (which increases the >> 32: * MaxNodeLimit with a factor of 3) > > Just checking: these methods that cause C2 to consume an excessive amount of memory were not C2-compilable before the changeset, right? Same question for the other `java/lang/invoke` test changes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2023204041 From epeter at openjdk.org Tue Apr 1 16:39:30 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 1 Apr 2025 16:39:30 GMT Subject: RFR: 8252473: [TESTBUG] compiler tests fail with minimal VM: Unrecognized VM option In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 18:49:51 GMT, Vladimir Kozlov wrote: >> This adds `@requires vm.compiler2.enabled` to tests, which fail with `Unrecognized VM option` on client VM. > > Can we run some of them with Graal? When no C2 specific flags are used. @vnkozlov do you agree that we should use `-XX:-IgnoreUnrecognizedVMOptions`? @zzambers Graal does not implement all flags, and so you would get the same issue with `Unrecognized VM option`. But it could still be valuable to run the tests with Graal, even if the flags are not doing anything. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24262#issuecomment-2769944061 From epeter at openjdk.org Tue Apr 1 16:39:30 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 1 Apr 2025 16:39:30 GMT Subject: RFR: 8252473: [TESTBUG] compiler tests fail with minimal VM: Unrecognized VM option In-Reply-To: References: Message-ID: <8AtiGaQ_cEwB_7Vi4fDwYUEvLMnjVy6BGwz-4vaqGq4=.096cd043-8296-40b5-bbb1-14ae9b51b12c@github.com> On Tue, 1 Apr 2025 16:23:49 GMT, Zdenek Zambersky wrote: > My little probably unfounded concern would be that typos in args could than be silently ignored. That's not completely unfounded, but I think this taking `-XX:-IgnoreUnrecognizedVMOptions` is still preferrable to `@requires vm.compiler2.enabled`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24262#issuecomment-2769946172 From sviswanathan at openjdk.org Tue Apr 1 17:38:30 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 1 Apr 2025 17:38:30 GMT Subject: RFR: 8352585: Add special case handling for Float16.max/min x86 backend [v7] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 16:17:22 GMT, Jatin Bhateja wrote: >> This bugfix patch adds the special handling as per x86 AVX512-FP16 ISA specification[1][2] to compute max/min operations with +/-0.0 or NaN operands. >> >> Special handling leverage the instruction semantic, central idea is to shuffle the operands such that smaller input gets assigned to second operand for min operation or a larger input gets assigned to second operand for max operation, in addition result equals NaN if an unordered comparison detects first input as a NaN value else we return the result of min/max operation. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://www.felixcloutier.com/x86/vminsh >> [2] https://www.felixcloutier.com/x86/vmaxsh > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolution Thanks for making this change. PR looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24169#pullrequestreview-2733543988 From sviswanathan at openjdk.org Tue Apr 1 17:38:30 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 1 Apr 2025 17:38:30 GMT Subject: RFR: 8352585: Add special case handling for Float16.max/min x86 backend [v5] In-Reply-To: References: Message-ID: On Thu, 27 Mar 2025 13:14:39 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolution > > I have not looked at the x64 instructions, but only the tests again. > > I have noticed that you only cover specific values. You could improve tests with this: > - Add non-canonical NaN values. > - Just iterate over all possible Float16 input pairs. It's onls `2^32`, that should be feasible! Then compare compiled vs interpreted results. > > It seems that bugs like these happen because somehow we do not systematically cover all inputs. Maybe we should do the same for all Float16 operations? @eme64 We are looking forward to your approval for this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24169#issuecomment-2770211207 From vlivanov at openjdk.org Tue Apr 1 18:53:13 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 1 Apr 2025 18:53:13 GMT Subject: RFR: 8353188: C1: Clean up x86 backend after 32-bit x86 removal [v2] In-Reply-To: References: <-iwh_5JGpt-TAVpfZQjwbnIG_c8hvirNKCcmiZoLNls=.3b34bf15-51fc-42bf-a294-1c23ca99754c@github.com> Message-ID: <7cW4vEajJs-DiP7wkmG1j9zmOdw5fHR5FVq6W17lJas=.6c7cfbac-c478-4c18-9b87-5a0a50658363@github.com> On Tue, 1 Apr 2025 07:58:27 GMT, Aleksey Shipilev wrote: >> Piece-wise cleanup of C1_LIRAssembler_x86, C1_MacroAssembler and related classes. C1 implements the bulk of arch-specific backend there. Major parts of this backend are already removed by #24274, this cleans up another large bulk, and hopefully most of it. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux x86_64 server fastdebug, `all` + `-XX:TieredStopAtLevel=1` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Review comments Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24301#pullrequestreview-2733756865 From vlivanov at openjdk.org Tue Apr 1 19:06:34 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 1 Apr 2025 19:06:34 GMT Subject: RFR: 8347645: C2: XOR bounded value handling blocks constant folding [v48] In-Reply-To: <1JYbwRdMBDikLGt3iXx87YRTWrF6NwzbFDH916UuoSA=.1fb10eab-4963-4d4c-a8ae-97ec3cecdfe2@github.com> References: <1JYbwRdMBDikLGt3iXx87YRTWrF6NwzbFDH916UuoSA=.1fb10eab-4963-4d4c-a8ae-97ec3cecdfe2@github.com> Message-ID: On Tue, 1 Apr 2025 02:44:03 GMT, Johannes Graham wrote: >> An interaction between xor bounds optimization and constant folding resulted in xor over constants not being optimized. This has a noticeable effect on `Long.expand` with a constant mask, on architectures that don't have instructions equivalent to `PDEP` to be used in an intrinsic. >> >> This change moves logic from the `Xor(L|I)Node::Value` methods into the `add_ring` methods, and gives priority to constant-folding. A static method was separated out to facilitate direct unit-testing. It also (subjectively) simplified the calculation of the upper bound and added an explanation of the reasoning behind it. >> >> In addition to testing for constant folding over xor, IR tests were added to `XorINodeIdealizationTests` and `XorLNodeIdealizationTests` to cover these related items: >> - Bounds optimization of xor >> - A check for `x ^ x = 0` >> - Explicit testing of xor over booleans. >> >> Also `test_xor_node.cpp` was added to more extensively test the correctness of the bounds optimization. It exhaustively tests ranges of 4-bit numbers as well as at the high and low end of the affected types. > > Johannes Graham has updated the pull request incrementally with one additional commit since the last revision: > > remove unused methods Overall, looks good. Some minor comments follow. src/hotspot/share/opto/addnode.cpp line 1012: > 1010: > 1011: if (r0->is_con() && r1->is_con()) { > 1012: // Constant fold: (c1 ^ c2) -> c3 A bit confusing. The comment mentions `c1` and `c2` while the code operate on `t0`/`r0` and `t1`/`r1`. src/hotspot/share/opto/addnode.cpp line 1019: > 1017: > 1018: if (r0->_lo >= 0 && r1->_lo >= 0) { > 1019: // Combine [0, lo_1] ^ [0, hi_1] -> [0, max] What does this comment refer to? It mentions `lo_1` and `hi_1` while `r0->_hi` and `r1->_hi` are passed into `xor_upper_bound_for_ranges`. Also, I'd avoid naming it`max`: it sort of hints to `max_jint`, but in reality it represents the upper bound of the operation. Why not `upper`/`upper_bound` instead? ------------- PR Review: https://git.openjdk.org/jdk/pull/23089#pullrequestreview-2733792136 PR Review Comment: https://git.openjdk.org/jdk/pull/23089#discussion_r2023525192 PR Review Comment: https://git.openjdk.org/jdk/pull/23089#discussion_r2023523965 From duke at openjdk.org Tue Apr 1 23:22:09 2025 From: duke at openjdk.org (Johannes Graham) Date: Tue, 1 Apr 2025 23:22:09 GMT Subject: RFR: 8347645: C2: XOR bounded value handling blocks constant folding [v49] In-Reply-To: References: Message-ID: > An interaction between xor bounds optimization and constant folding resulted in xor over constants not being optimized. This has a noticeable effect on `Long.expand` with a constant mask, on architectures that don't have instructions equivalent to `PDEP` to be used in an intrinsic. > > This change moves logic from the `Xor(L|I)Node::Value` methods into the `add_ring` methods, and gives priority to constant-folding. A static method was separated out to facilitate direct unit-testing. It also (subjectively) simplified the calculation of the upper bound and added an explanation of the reasoning behind it. > > In addition to testing for constant folding over xor, IR tests were added to `XorINodeIdealizationTests` and `XorLNodeIdealizationTests` to cover these related items: > - Bounds optimization of xor > - A check for `x ^ x = 0` > - Explicit testing of xor over booleans. > > Also `test_xor_node.cpp` was added to more extensively test the correctness of the bounds optimization. It exhaustively tests ranges of 4-bit numbers as well as at the high and low end of the affected types. Johannes Graham has updated the pull request incrementally with one additional commit since the last revision: update comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23089/files - new: https://git.openjdk.org/jdk/pull/23089/files/50d35dcd..dda134fb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23089&range=48 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23089&range=47-48 Stats: 8 lines in 1 file changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/23089.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23089/head:pull/23089 PR: https://git.openjdk.org/jdk/pull/23089 From vlivanov at openjdk.org Wed Apr 2 03:01:46 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 2 Apr 2025 03:01:46 GMT Subject: RFR: 8347645: C2: XOR bounded value handling blocks constant folding [v49] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 23:22:09 GMT, Johannes Graham wrote: >> An interaction between xor bounds optimization and constant folding resulted in xor over constants not being optimized. This has a noticeable effect on `Long.expand` with a constant mask, on architectures that don't have instructions equivalent to `PDEP` to be used in an intrinsic. >> >> This change moves logic from the `Xor(L|I)Node::Value` methods into the `add_ring` methods, and gives priority to constant-folding. A static method was separated out to facilitate direct unit-testing. It also (subjectively) simplified the calculation of the upper bound and added an explanation of the reasoning behind it. >> >> In addition to testing for constant folding over xor, IR tests were added to `XorINodeIdealizationTests` and `XorLNodeIdealizationTests` to cover these related items: >> - Bounds optimization of xor >> - A check for `x ^ x = 0` >> - Explicit testing of xor over booleans. >> >> Also `test_xor_node.cpp` was added to more extensively test the correctness of the bounds optimization. It exhaustively tests ranges of 4-bit numbers as well as at the high and low end of the affected types. > > Johannes Graham has updated the pull request incrementally with one additional commit since the last revision: > > update comments Looks good! ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23089#pullrequestreview-2734546645 From epeter at openjdk.org Wed Apr 2 06:19:39 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 2 Apr 2025 06:19:39 GMT Subject: RFR: 8347645: C2: XOR bounded value handling blocks constant folding [v46] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 02:49:45 GMT, Johannes Graham wrote: >> @iwanowww >> >>> `_non_neg` part is confusing; I'd stress instead that it works on ranges. >> >> I find it easier to think of it as calculating the upperbound of the xor of 2 non-negative integers whose upperbounds are given in the parameters. > > Renamed to `xor_upper_bound_for_ranges` before I saw your comment, @merykitty. I'd be ok with another name though. With the last changes, the method is no longer a member of the class, so it's no longer going to get as many eyes on it without context, so maybe it matters less now. @j3graham I gave it a quick look, and it looks even better now. Let me run testing again before you integrate! Please ping me in 24h for the results! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23089#issuecomment-2771441200 From duke at openjdk.org Wed Apr 2 06:32:10 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 2 Apr 2025 06:32:10 GMT Subject: RFR: 8347449: C2: UseLoopPredicate off should also turn UseProfiledLoopPredicate off [v6] In-Reply-To: References: Message-ID: <57-zPqw_-3qY6G5TZUYXG4MFzx_jmhHRDN78DR-dy0o=.c105c4e4-9ffa-4dd4-9390-70f27e48f217@github.com> On Fri, 28 Mar 2025 09:09:59 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> When running with `-XX:-UseLoopPredicate` C2 still inserts profiled loop parse predicates, despite those being a form of loop parse predicate. Further, the loop predicate code is not always consistent when to insert/expect profiled parse predicates. >> >> # Change Summary >> >> Following the rationale, that profiled predicates are a subset of loop predicates, this PR disables profiled predicates whenever loop predicates are disabled. They are disabled on the level of arguments. Further, before any checks for whether profiled predicates are enabled, this PR inserts a check that loop predicates are enabled such that the code is consistent in its intention. >> >> Concretel, this PR >> - adds parse predicate nodes to the IR testing framework, >> - turns off `UseProfiledLoopPredicate` if `UseLoopPredicate` is turned off, >> - predicates all checks for `UseProfiledLoopPredicate`on `UseLoopPredicate` first for consistency, >> - adds a regression test. >> >> >> # Testing >> >> The changes passed the following testing: >> - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14078750038) >> - tier1 through tier3 and Oracle internal testing > > Manuel H?ssig has updated the pull request incrementally with two additional commits since the last revision: > > - idealKit::loop: always call add_parse_predicates > > It was contstrained on UseParsePredicate, but this is incorrect, since > all parse predicates are added in that function. > - Improve description of UseLoopPredicate argument Thank y'all for the thorough review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24248#issuecomment-2771461584 From duke at openjdk.org Wed Apr 2 06:32:11 2025 From: duke at openjdk.org (duke) Date: Wed, 2 Apr 2025 06:32:11 GMT Subject: RFR: 8347449: C2: UseLoopPredicate off should also turn UseProfiledLoopPredicate off [v6] In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 09:09:59 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> When running with `-XX:-UseLoopPredicate` C2 still inserts profiled loop parse predicates, despite those being a form of loop parse predicate. Further, the loop predicate code is not always consistent when to insert/expect profiled parse predicates. >> >> # Change Summary >> >> Following the rationale, that profiled predicates are a subset of loop predicates, this PR disables profiled predicates whenever loop predicates are disabled. They are disabled on the level of arguments. Further, before any checks for whether profiled predicates are enabled, this PR inserts a check that loop predicates are enabled such that the code is consistent in its intention. >> >> Concretel, this PR >> - adds parse predicate nodes to the IR testing framework, >> - turns off `UseProfiledLoopPredicate` if `UseLoopPredicate` is turned off, >> - predicates all checks for `UseProfiledLoopPredicate`on `UseLoopPredicate` first for consistency, >> - adds a regression test. >> >> >> # Testing >> >> The changes passed the following testing: >> - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14078750038) >> - tier1 through tier3 and Oracle internal testing > > Manuel H?ssig has updated the pull request incrementally with two additional commits since the last revision: > > - idealKit::loop: always call add_parse_predicates > > It was contstrained on UseParsePredicate, but this is incorrect, since > all parse predicates are added in that function. > - Improve description of UseLoopPredicate argument @mhaessig Your change (at version 1561a0eea3b2049e4e9e6468d0237f60e97cd2e8) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24248#issuecomment-2771462472 From duke at openjdk.org Wed Apr 2 06:33:14 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 2 Apr 2025 06:33:14 GMT Subject: RFR: 8352893: C2: OrL/INode::add_ring optimize (x | -1) to -1 [v3] In-Reply-To: References: Message-ID: <2rYcxIlI5lZujCDgdo1RStzxjeJGym2ftPpb2eoxW38=.1006c857-1293-4e15-8fca-2d7ce163f420@github.com> On Mon, 31 Mar 2025 14:31:43 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> The `add_ring()` implementations of `OrINode` and `OrLNode` are missing the optimization that an or with a value where all bits are ones (since we have signed integers in this case `~0 == -1`) will always yield all zeroes. >> >> # Changes >> >> This PR makes the following straight forward changes: >> - `Or(I|L)Node::add_ring()` returns `-1` if one of the two inputs is `-1`. >> - Add `Or(I|L)` nodes to the IR framework. >> - Add a regression IR test for the implemented optimization. >> >> # Testing >> >> - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14110978686) >> - Ran tier1 through tier3 and Oracle internal testing > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Remove loop in test and instead use random values Thank you for reviewing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24289#issuecomment-2771459481 From duke at openjdk.org Wed Apr 2 06:33:15 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 2 Apr 2025 06:33:15 GMT Subject: Integrated: 8352893: C2: OrL/INode::add_ring optimize (x | -1) to -1 In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 10:21:57 GMT, Manuel H?ssig wrote: > # Issue Summary > > The `add_ring()` implementations of `OrINode` and `OrLNode` are missing the optimization that an or with a value where all bits are ones (since we have signed integers in this case `~0 == -1`) will always yield all zeroes. > > # Changes > > This PR makes the following straight forward changes: > - `Or(I|L)Node::add_ring()` returns `-1` if one of the two inputs is `-1`. > - Add `Or(I|L)` nodes to the IR framework. > - Add a regression IR test for the implemented optimization. > > # Testing > > - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14110978686) > - Ran tier1 through tier3 and Oracle internal testing This pull request has now been integrated. Changeset: f301663b Author: Manuel H?ssig URL: https://git.openjdk.org/jdk/commit/f301663b346bf2388ecfa429be1cf64c6e93ee8e Stats: 109 lines in 3 files changed: 109 ins; 0 del; 0 mod 8352893: C2: OrL/INode::add_ring optimize (x | -1) to -1 Reviewed-by: epeter, thartmann, kvn ------------- PR: https://git.openjdk.org/jdk/pull/24289 From epeter at openjdk.org Wed Apr 2 06:34:32 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 2 Apr 2025 06:34:32 GMT Subject: RFR: 8352585: Add special case handling for Float16.max/min x86 backend [v5] In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 04:50:17 GMT, Jatin Bhateja wrote: >> I have not looked at the x64 instructions, but only the tests again. >> >> I have noticed that you only cover specific values. You could improve tests with this: >> - Add non-canonical NaN values. >> - Just iterate over all possible Float16 input pairs. It's onls `2^32`, that should be feasible! Then compare compiled vs interpreted results. >> >> It seems that bugs like these happen because somehow we do not systematically cover all inputs. Maybe we should do the same for all Float16 operations? > > Hi @eme64 , > This specific issues is around special Float16 values i.e +/- 0.0 and NaN. > I have added a Generator for Float16 as part of https://github.com/openjdk/jdk/pull/22755 > > Best Regards, > Jatin @jatin-bhateja It looks reasonable to me now. Let me run some testing, ping me in 24h for the results! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24169#issuecomment-2771465382 From chagedorn at openjdk.org Wed Apr 2 06:50:59 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 2 Apr 2025 06:50:59 GMT Subject: RFR: 8353058: [PPC64] Some IR framework tests are failing after JDK-8352595 Message-ID: `TestPhaseIRMatching` was recently updated with [JDK-8352595](https://bugs.openjdk.org/browse/JDK-8352595) which changed some matching on opto assembly from `IRNode.ALLOC` (now matching on ideal phases) to `IRNode.FIELD_ACCESS` (still matching on opto assembly). However, the updated code matches differently on PPC for some method invocation on a parameter which let the test fail on PPC: public Object defaultOnOptoAssembly(Helper h) { return h.getString(); // emits one "Field: " string on most platforms but none on PPC } When I've revisited the test to analyze the failure, it was not evidently clear what I had in mind back there with `defaultOnX()`. My guess is that I've tried to have one method failing on ideal phases, one on mach phases and one on both while all the methods use `IRNode` entries that have default compile phases on ideal and mach phases. But that is not the case today. I've therefore rewritten the tests to adhere to my guess. I also removed the ambiguity among platforms to have the same number of field accesses on them. How to read the `@ExpectedFailure` annotation: @IR(failOn = {IRNode.STORE, IRNode.FIELD_ACCESS, IRNode.COUNTED_LOOP, IRNode.STORE_I}, counts = {IRNode.STORE, "20", IRNode.FIELD_ACCESS, "1", IRNode.COUNTED_LOOP, "2", IRNode.OOPMAP_WITH, "asdf", "2"}) // Expect rule with id 5 (the one directly above) to fail: // - We fail when matching PRINT_IDEAL with the: // - failOn attribute: The failing constraints are constraint 1 and 4 (while 2 and 3 pass) // - counts attribute: The failing constraints are constraint 2 and 4 (while 1 and 3 pass). @ExpectedFailure(ruleId = 5, phase = CompilePhase.PRINT_IDEAL, failOn = {1, 4}, counts = {1, 3}) Thanks to @TheRealMDoerr for testing the patch on PPC! Thanks, Christian ------------- Commit messages: - 8353058: [PPC64] Some IR framework tests are failing after JDK-8314999 Changes: https://git.openjdk.org/jdk/pull/24373/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24373&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353058 Stats: 54 lines in 1 file changed: 17 ins; 8 del; 29 mod Patch: https://git.openjdk.org/jdk/pull/24373.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24373/head:pull/24373 PR: https://git.openjdk.org/jdk/pull/24373 From chagedorn at openjdk.org Wed Apr 2 06:50:59 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 2 Apr 2025 06:50:59 GMT Subject: RFR: 8353058: [PPC64] Some IR framework tests are failing after JDK-8352595 In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 06:45:58 GMT, Christian Hagedorn wrote: > `TestPhaseIRMatching` was recently updated with [JDK-8352595](https://bugs.openjdk.org/browse/JDK-8352595) which changed some matching on opto assembly from `IRNode.ALLOC` (now matching on ideal phases) to `IRNode.FIELD_ACCESS` (still matching on opto assembly). However, the updated code matches differently on PPC for some method invocation on a parameter which let the test fail on PPC: > > public Object defaultOnOptoAssembly(Helper h) { > return h.getString(); // emits one "Field: " string on most platforms but none on PPC > } > > > When I've revisited the test to analyze the failure, it was not evidently clear what I had in mind back there with `defaultOnX()`. My guess is that I've tried to have one method failing on ideal phases, one on mach phases and one on both while all the methods use `IRNode` entries that have default compile phases on ideal and mach phases. But that is not the case today. I've therefore rewritten the tests to adhere to my guess. I also removed the ambiguity among platforms to have the same number of field accesses on them. > > How to read the `@ExpectedFailure` annotation: > > @IR(failOn = {IRNode.STORE, IRNode.FIELD_ACCESS, IRNode.COUNTED_LOOP, IRNode.STORE_I}, > counts = {IRNode.STORE, "20", IRNode.FIELD_ACCESS, "1", IRNode.COUNTED_LOOP, "2", IRNode.OOPMAP_WITH, "asdf", "2"}) > // Expect rule with id 5 (the one directly above) to fail: > // - We fail when matching PRINT_IDEAL with the: > // - failOn attribute: The failing constraints are constraint 1 and 4 (while 2 and 3 pass) > // - counts attribute: The failing constraints are constraint 2 and 4 (while 1 and 3 pass). > @ExpectedFailure(ruleId = 5, phase = CompilePhase.PRINT_IDEAL, failOn = {1, 4}, counts = {1, 3}) > > > Thanks to @TheRealMDoerr for testing the patch on PPC! > > Thanks, > Christian test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestPhaseIRMatching.java line 178: > 176: public void defaultOnOptoAssembly() { > 177: i = 34; > 178: l = 34; Always using this body which reliably emits two "Field: " strings in the opto assembly on all platforms. Thus removed the `Helper` class again. test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestPhaseIRMatching.java line 228: > 226: } > 227: defaultOnOptoAssembly(new Helper("a", 1)); > 228: defaultOnBoth(new Helper("a", 1)); No longer needed because we do not need to pass anything into the methods. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24373#discussion_r2024171272 PR Review Comment: https://git.openjdk.org/jdk/pull/24373#discussion_r2024171836 From duke at openjdk.org Wed Apr 2 06:51:28 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 2 Apr 2025 06:51:28 GMT Subject: Integrated: 8347449: C2: UseLoopPredicate off should also turn UseProfiledLoopPredicate off In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 09:27:59 GMT, Manuel H?ssig wrote: > # Issue Summary > > When running with `-XX:-UseLoopPredicate` C2 still inserts profiled loop parse predicates, despite those being a form of loop parse predicate. Further, the loop predicate code is not always consistent when to insert/expect profiled parse predicates. > > # Change Summary > > Following the rationale, that profiled predicates are a subset of loop predicates, this PR disables profiled predicates whenever loop predicates are disabled. They are disabled on the level of arguments. Further, before any checks for whether profiled predicates are enabled, this PR inserts a check that loop predicates are enabled such that the code is consistent in its intention. > > Concretel, this PR > - adds parse predicate nodes to the IR testing framework, > - turns off `UseProfiledLoopPredicate` if `UseLoopPredicate` is turned off, > - predicates all checks for `UseProfiledLoopPredicate`on `UseLoopPredicate` first for consistency, > - adds a regression test. > > > # Testing > > The changes passed the following testing: > - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14078750038) > - tier1 through tier3 and Oracle internal testing This pull request has now been integrated. Changeset: d358f5f4 Author: Manuel H?ssig Committer: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/d358f5f4a44aacf2d79ccdb3e362ce8ed571f6da Stats: 150 lines in 7 files changed: 128 ins; 2 del; 20 mod 8347449: C2: UseLoopPredicate off should also turn UseProfiledLoopPredicate off Reviewed-by: chagedorn, epeter ------------- PR: https://git.openjdk.org/jdk/pull/24248 From epeter at openjdk.org Wed Apr 2 07:10:23 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 2 Apr 2025 07:10:23 GMT Subject: RFR: 8352316: More MergeStoreBench [v7] In-Reply-To: References: <5fLeODHTQw8vbuvTl6G0YPNszI5_tH1b3L_tWJtCTh8=.ca1b21f2-2890-4daa-8ce2-8112a3f7146b@github.com> Message-ID: On Sat, 29 Mar 2025 07:27:24 GMT, Shaojin Wen wrote: >> Added performance tests related to String.getBytes/String.getChars/StringBuilder.append/System.arraycopy in constant scenarios to verify whether MergeStore works > > Shaojin Wen has updated the pull request incrementally with one additional commit since the last revision: > > add StringBuilderUnsafePut @wenshao @iwanowww I have a few concerns about this PR. Your current PR description says this: > Added performance tests related to String.getBytes/String.getChars/StringBuilder.append/System.arraycopy in constant scenarios to verify whether MergeStore works First: a benchmark is not the best way `to verify whether MergeStore works`. An IR test would be more helpful, as it could check reliably what IR is generated, and hence if MergeStores actually optimized anything. Second: A JMH benchmark could also be helpful, but only if you run it with and without MergeStores enabled. Otherwise how would you know if it was MergeStores or another optimization that is relevant here? Third: `getBytes` / `arraycopy` is **NOT** a MergeStores pattern. These are **COPY** patterns. So they probably should go to a separate benchmark file. I don't want the MergeStores benchmark polluted with unrelated cases. I could be wrong here, and just not see how these cases are MergeStore cases, but you need to show the details here. I put some time in understanding your PR and asking you a list of questions. You did not really respond to them, and that is frustrating to me and makes me feel like my time is not valued: https://github.com/openjdk/jdk/pull/24108#issuecomment-2762946069 You say this: > By default, in OpenJDK, COMPACT_STRINGS = true, and the String coder without UTF16 characters is LATIN1, which is implemented using System.arraycopy. However, since String is immutable and System.arraycopy is directly performed on byte[], C2 should have more opportunities for optimization. Maybe the `System.arraycopy` can be optimized. But I don't think it is the MergeStores optimization that would do that. This is really a **Copy** pattern and not a `MergeStores` pattern. Please read the PRs on MergeStores to see what patterns are covered. And like I asked in previously: > Can you investigate what code it generates, and what kinds of optimizations are missing to make it close in performance to the Unsafe benchmark? > I don't have time to do all the deep investigations myself. But feel free to ask me if you have more questions. To me, benchmarks are only helpful and worth integrating if there is some clear and documented purpose. It would be really nice if you could invest some time into that :) test/micro/org/openjdk/bench/vm/compiler/MergeStoreBench.java line 693: > 691: } > 692: BH.consume(off); > 693: } This is a copy pattern, not MergeStores. test/micro/org/openjdk/bench/vm/compiler/MergeStoreBench.java line 735: > 733: } > 734: BH.consume(off); > 735: } @wenshao This is a copy pattern. Not a MergeStore pattern. So I can tell you already now that it will not be optimized by MergeStores ;) test/micro/org/openjdk/bench/vm/compiler/MergeStoreBench.java line 799: > 797: } > 798: BH.consume(off); > 799: } @wenshao Why would MergeStores work here? This is is a copy pattern. That is not at all covered by MergeStores. test/micro/org/openjdk/bench/vm/compiler/MergeStoreBench.java line 856: > 854: } > 855: BH.consume(sb.length()); > 856: } Why would you expect MergeStores to work here? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24108#pullrequestreview-2734816014 PR Review Comment: https://git.openjdk.org/jdk/pull/24108#discussion_r2024171061 PR Review Comment: https://git.openjdk.org/jdk/pull/24108#discussion_r2024170015 PR Review Comment: https://git.openjdk.org/jdk/pull/24108#discussion_r2024169285 PR Review Comment: https://git.openjdk.org/jdk/pull/24108#discussion_r2024172517 From mchevalier at openjdk.org Wed Apr 2 07:13:19 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 2 Apr 2025 07:13:19 GMT Subject: RFR: 8353058: [PPC64] Some IR framework tests are failing after JDK-8352595 In-Reply-To: References: Message-ID: <-RonuxVG3qrg8pJV2J6lrXnAlV4oBHJC5wzdEFCKhzc=.753fea93-d133-4135-827a-bcd6ae4e32d0@github.com> On Wed, 2 Apr 2025 06:45:58 GMT, Christian Hagedorn wrote: > `TestPhaseIRMatching` was recently updated with [JDK-8352595](https://bugs.openjdk.org/browse/JDK-8352595) which changed some matching on opto assembly from `IRNode.ALLOC` (now matching on ideal phases) to `IRNode.FIELD_ACCESS` (still matching on opto assembly). However, the updated code matches differently on PPC for some method invocation on a parameter which let the test fail on PPC: > > public Object defaultOnOptoAssembly(Helper h) { > return h.getString(); // emits one "Field: " string on most platforms but none on PPC > } > > > When I've revisited the test to analyze the failure, it was not evidently clear what I had in mind back there with `defaultOnX()`. My guess is that I've tried to have one method failing on ideal phases, one on mach phases and one on both while all the methods use `IRNode` entries that have default compile phases on ideal and mach phases. But that is not the case today. I've therefore rewritten the tests to adhere to my guess. I also removed the ambiguity among platforms to have the same number of field accesses on them. > > How to read the `@ExpectedFailure` annotation: > > @IR(failOn = {IRNode.STORE, IRNode.FIELD_ACCESS, IRNode.COUNTED_LOOP, IRNode.STORE_I}, > counts = {IRNode.STORE, "20", IRNode.FIELD_ACCESS, "1", IRNode.COUNTED_LOOP, "2", IRNode.OOPMAP_WITH, "asdf", "2"}) > // Expect rule with id 5 (the one directly above) to fail: > // - We fail when matching PRINT_IDEAL with the: > // - failOn attribute: The failing constraints are constraint 1 and 4 (while 2 and 3 pass) > // - counts attribute: The failing constraints are constraint 2 and 4 (while 1 and 3 pass). > @ExpectedFailure(ruleId = 5, phase = CompilePhase.PRINT_IDEAL, failOn = {1, 4}, counts = {1, 3}) > > > Thanks to @TheRealMDoerr for testing the patch on PPC! > > Thanks, > Christian Looks good to me. I've also used `FIELD_ACCESS` in TestCompilePhaseCollector.java, but I think it's harmless there since we are not matching, but just using it for its default phase. But I still mention, just in case... ------------- PR Comment: https://git.openjdk.org/jdk/pull/24373#issuecomment-2771540722 From epeter at openjdk.org Wed Apr 2 07:15:27 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 2 Apr 2025 07:15:27 GMT Subject: RFR: 8346964: C2: Improve integer multiplication with constant in MulINode::Ideal() [v3] In-Reply-To: References: <4UC1x1GPJCcIwPXKJZfiUGxQnuRaDQjOcN53wYmUzF4=.fafd71c1-2f48-4ae4-8e7e-8844c578429a@github.com> <6PtcpyIAXa2wbi0CI5-DVvI1r2RRDvKtIWko7nvBDFo=.49b4d6f7-0dda-42e7-9f51-bfa3c06ef6f5@github.com> Message-ID: <8P3c-UQwGnV7gzMapQf_YAQHQLaIKTvYGFY3O5Of2UU=.87fa4250-e2f5-4efd-b6ab-fd2298a8bea7@github.com> On Thu, 9 Jan 2025 06:21:14 GMT, erifan wrote: >> @erifan I did some more thinking when falling asleep / waking up. This is a really interesting problem here. >> >> For `MulINode::Ideal` with patterns `var * con`, we really have these options in assembly: >> - `mul` general case. >> - `shift` and `add` when profitable. >> - `lea` could this be an improvement over `shift` and `add`? >> >> The issue is that different platforms have different characteristics here for these instructions - we would have to see how they differ. As far as I remember `mul` is not always available on all `ALU`s, but `add` and `shift` should be available. This impacts their throughput (more ports / ALU means more throughput generally). But the instructions also have different latency. Further, I could imagine that at some point more instructions may not just affect the throughput, but also the code-size: that in turn would increase IR and may at some point affect the instruction cache. >> >> Additionally: if your workload has other `mul`, `shift` and `add` mixed in, then some ports may already be saturated, and that could tilt the balance as to which option you are supposed to take. >> >> And then the characteristics of scalar ops may not be identical to vector ops. >> >> It would be interesting to have a really solid benchmark, where you explore the impact of these different effects. >> And it would be interesting to extract a table of latency + throughput characteristics for all relevant scalar + vector ops, for a number of different CPUs. Just so we get an overview of how easy this is to tune. >> >> Maybe perfect tuning is not possible. Maybe we are willing to take a `5%` regression in some cases to boost other cases by `30%`. But that is a **big maybe**: we really do not like getting regressions in existing code, it tends to upset people more if they get regressions compared to how much they enjoy speedups - so work like this can be delicate. >> >> Anyway, I don't right now have much time to investigate and work on this myself. So you'd have to do the work, benchmark, explanation etc. **But I think the `30%` speedup indicates that this work could really have potential!** >> >> As to what to do in sequence, here a suggestion: >> 1. First work on Vector API cases of vector multiplication - this should have no impact on other things. >> 2. Delay the `MulINode::Ideal` optimizations until after loop-opts: scalar code would still be handled in the old way, but auto-vectorized code would then be turned into `MulV`. And then go into the mul -> sh... > > Hi @eme64 thanks for your review. > > 1. First work on Vector API cases of vector multiplication - this should have no impact on other things. > 2. Delay the MulINode::Ideal optimizations until after loop-opts: scalar code would still be handled in the old way, but auto-vectorized code would then be turned into MulV. And then go into the mul -> shift optimization for vectors under point 1. > 3. Tackle MulINode::Ideal for scalar cases after loop-opts, and see what you can do there. > > I agree with you. I am actually working on `1`. The slightly troublesome thing is that `1` and `3` are both related to the architecture, so it might take a little more time. > >> lea could this be an improvement over shift and add? > > AARCH64 doesn't actually have a `lea` instruction. On x64 there are already some rules that turn `shift add` into `lea`. > > The issue is that different platforms have different characteristics here for these instructions - we would have to see how they differ. As far as I remember mul is not always available on all ALUs, but add and shift should be available. This impacts their throughput (more ports / ALU means more throughput generally). But the instructions also have different latency. Further, I could imagine that at some point more instructions may not just affect the throughput, but also the code-size: that in turn would increase IR and may at some point affect the instruction cache. > > Additionally: if your workload has other mul, shift and add mixed in, then some ports may already be saturated, and that could tilt the balance as to which option you are supposed to take. > > And then the characteristics of scalar ops may not be identical to vector ops. > > > Yes this is very trick, the actual performance is related to many aspects, such as pipeline, latency, throughput, ROB, and even memory performance. We can only do optimization based on certain references and generalities, such as latency and throughput of different instructions. But when it comes to generalities, it is actually difficult to say which scenario is more general. > >> It would be interesting to have a really solid benchmark, where you explore the impact of these different effects. > And it would be interesting to extract a table of latency + throughput characteristics for all relevant scalar + vector ops, for a number of different CPUs. Just so we get an overview of how easy this is to tune. > > I don't know such a benchmark suite yet. For AARCH64, I usually refer to [the Arm Optimization Guide](https:... @erifan you opened this again. Does that mean we should review again? I see that you did not make any changes since our last conversation. If it is not ready for review, could you please convert it to Draft, so it is clear that you are not asking for reviews currently? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22922#issuecomment-2771545449 From mchevalier at openjdk.org Wed Apr 2 07:19:25 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 2 Apr 2025 07:19:25 GMT Subject: RFR: 8353058: [PPC64] Some IR framework tests are failing after JDK-8352595 In-Reply-To: References: Message-ID: <2P9iZTfGS3zMibNJEqMfO_yf-Pir-hYdZFjUA3C5DSg=.fc405c98-769a-47a0-89ad-5ac2cf742fdf@github.com> On Wed, 2 Apr 2025 06:45:58 GMT, Christian Hagedorn wrote: > `TestPhaseIRMatching` was recently updated with [JDK-8352595](https://bugs.openjdk.org/browse/JDK-8352595) which changed some matching on opto assembly from `IRNode.ALLOC` (now matching on ideal phases) to `IRNode.FIELD_ACCESS` (still matching on opto assembly). However, the updated code matches differently on PPC for some method invocation on a parameter which let the test fail on PPC: > > public Object defaultOnOptoAssembly(Helper h) { > return h.getString(); // emits one "Field: " string on most platforms but none on PPC > } > > > When I've revisited the test to analyze the failure, it was not evidently clear what I had in mind back there with `defaultOnX()`. My guess is that I've tried to have one method failing on ideal phases, one on mach phases and one on both while all the methods use `IRNode` entries that have default compile phases on ideal and mach phases. But that is not the case today. I've therefore rewritten the tests to adhere to my guess. I also removed the ambiguity among platforms to have the same number of field accesses on them. > > How to read the `@ExpectedFailure` annotation: > > @IR(failOn = {IRNode.STORE, IRNode.FIELD_ACCESS, IRNode.COUNTED_LOOP, IRNode.STORE_I}, > counts = {IRNode.STORE, "20", IRNode.FIELD_ACCESS, "1", IRNode.COUNTED_LOOP, "2", IRNode.OOPMAP_WITH, "asdf", "2"}) > // Expect rule with id 5 (the one directly above) to fail: > // - We fail when matching PRINT_IDEAL with the: > // - failOn attribute: The failing constraints are constraint 1 and 4 (while 2 and 3 pass) > // - counts attribute: The failing constraints are constraint 2 and 4 (while 1 and 3 pass). > @ExpectedFailure(ruleId = 5, phase = CompilePhase.PRINT_IDEAL, failOn = {1, 4}, counts = {1, 3}) > > > Thanks to @TheRealMDoerr for testing the patch on PPC! > > Thanks, > Christian Marked as reviewed by mchevalier (Author). ------------- PR Review: https://git.openjdk.org/jdk/pull/24373#pullrequestreview-2734883035 From chagedorn at openjdk.org Wed Apr 2 07:19:26 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 2 Apr 2025 07:19:26 GMT Subject: RFR: 8353058: [PPC64] Some IR framework tests are failing after JDK-8352595 In-Reply-To: <-RonuxVG3qrg8pJV2J6lrXnAlV4oBHJC5wzdEFCKhzc=.753fea93-d133-4135-827a-bcd6ae4e32d0@github.com> References: <-RonuxVG3qrg8pJV2J6lrXnAlV4oBHJC5wzdEFCKhzc=.753fea93-d133-4135-827a-bcd6ae4e32d0@github.com> Message-ID: On Wed, 2 Apr 2025 07:10:38 GMT, Marc Chevalier wrote: > Looks good to me. I've also used `FIELD_ACCESS` in TestCompilePhaseCollector.java, but I think it's harmless there since we are not matching, but just using it for its default phase. But I still mention, just in case... Thanks for your review Marc! Yes, there we do not perform the actual IR matching, so it's not a problem for platform specific differences. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24373#issuecomment-2771546971 From dfenacci at openjdk.org Wed Apr 2 07:19:26 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 2 Apr 2025 07:19:26 GMT Subject: RFR: 8282053: IGV: refine schedule approximation [v2] In-Reply-To: <0SXZ0k-28IdpWjuhtK4dSJ9ybHE58Oq56zT_sdqeQpc=.0cf9319f-f64d-4e8f-82e9-f654464bc775@github.com> References: <0SXZ0k-28IdpWjuhtK4dSJ9ybHE58Oq56zT_sdqeQpc=.0cf9319f-f64d-4e8f-82e9-f654464bc775@github.com> Message-ID: On Tue, 1 Apr 2025 14:42:47 GMT, Daniel Skantz wrote: >> This patch refines the schedule approximation in IGV by 1) placing parm. and projection nodes in the same block as their predecessors, and 2) disallows erroneously considering machine nodes such as prefetchAlloc and rep_stos as CFG nodes. >> >> The reader may refer to the corresponding JBS issue where graphs sampled before and after the change are attached. >> >> Testing: T1-T3 with no failures. Opened graphs before and after the change and saw no obvious problems. Opened a large number of graphs in CFG view and observed no unexpected IGV warnings, errors or assert failures. > > Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: > > Update src/utils/IdealGraphVisualizer/ServerCompiler/src/main/java/com/sun/hotspot/igv/servercompiler/ServerCompilerScheduler.java > > Co-authored-by: Daniel Lund?n Nice! Thanks @danielogh! ------------- Marked as reviewed by dfenacci (Committer). PR Review: https://git.openjdk.org/jdk/pull/24350#pullrequestreview-2734882114 From epeter at openjdk.org Wed Apr 2 07:25:25 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 2 Apr 2025 07:25:25 GMT Subject: RFR: 8347499: C2: Make `PhaseIdealLoop` eliminate more redundant safepoints in loops [v2] In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 08:59:07 GMT, Qizheng Xing wrote: >> Qizheng Xing has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Merge branch 'master' into enhance-loop-safepoint-elim >> - Add IR test and microbench. >> - Make `PhaseIdealLoop` eliminate more redundant safepoints in loops. > > The second question: > >> If we now removed safepoints in places where we would actually have needed them: how would we find out? I suppose we would get longer time to safepoint - higher latency in some cases. How would we catch this with our tests? > > I tried running tier1 tests with `JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:+SafepointTimeout -XX:+AbortVMOnSafepointTimeout -XX:SafepointTimeoutDelay=1000`, and there were no failures. > > Running with `-XX:SafepointTimeoutDelay=500` caused 1 random JDK test case to fail. But then I tried to build a JDK without this patch, and it still had the random failure with this option. @MaxXSoft > Running with -XX:SafepointTimeoutDelay=500 caused 1 random JDK test case to fail. But then I tried to build a JDK without this patch, and it still had the random failure with this option. Wow, that sounds like we do not safepoint for half a second in that case. That could be a bug. Could you please tell me what test it is, and how you ran it? We may want to file a bug and investigate it. @MaxXSoft Would you mind improving the documentation comments, so that they are easier to understand? Maybe you can even add more comments around your code change, to "prove" why it is ok to do what we would do with your change? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23057#issuecomment-2771559333 PR Comment: https://git.openjdk.org/jdk/pull/23057#issuecomment-2771565388 From epeter at openjdk.org Wed Apr 2 07:25:25 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 2 Apr 2025 07:25:25 GMT Subject: RFR: 8347499: C2: Make `PhaseIdealLoop` eliminate more redundant safepoints in loops [v2] In-Reply-To: References: Message-ID: On Tue, 25 Mar 2025 09:51:46 GMT, Qizheng Xing wrote: > On the one hand, this situation won't occur in the current Compile::Optimize process. The Optimize method will always complete all inlining before performing loop optimization And what about late inlining? Does that not happen after loop opts? Maybe we insert new SafePoints when inlining, I simply don't know enough about that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23057#issuecomment-2771561565 From epeter at openjdk.org Wed Apr 2 07:26:30 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 2 Apr 2025 07:26:30 GMT Subject: RFR: 8341976: C2: use_mem_state != load->find_exact_control(load->in(0)) assert failure [v7] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 07:31:12 GMT, Roland Westrelin wrote: >> The `arraycopy` writes to a non escaping array so its `ArrayCopy` node >> is marked as having a narrow memory effect. One of the loads from the >> destination after the copy is transformed into a load from the source >> array (the rationale being that if there's no load from the >> destination of the copy, the `arraycopy` is not needed). The load from >> the source has the input memory state of the `ArrayCopy` as memory >> input. That load is then sunk out of the loop and its control is >> updated to be after the `ArrayCopy`. That's legal because the >> `ArrayCopy` only has a narrow memory effect and can't modify the >> source. The `ArrayCopy` can't be eliminated and is expanded. In the >> process, a `MemBar` that has a wide memory effect is added. The load >> from the source has control after the membar but memory state before >> and because the membar has a wide memory effect, the load is anti >> dependent on the membar: the graph is broken (the load can't be pinned >> after the membar and anti dependent on it). >> >> In short, the problem is that the graph is transformed under the >> assumption that the `ArrayCopy` has a narrow effect but the >> `ArrayCopy` is expanded to a subgraph that has a wide memory >> effect. The fix I propose is to not insert a membar with a wide memory >> effect. We still need a membar when the destination is non escaping >> because the expanded `ArrayCopy`, if it writes to a tighly allocated >> array, writes to raw memory and not to the destination memory slice. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: > > - review > - Merge branch 'master' into JDK-8341976 > - review > - review > - Merge branch 'master' into JDK-8341976 > - -XX:+TraceLoopOpts fix > - review > - more > - Merge branch 'master' into JDK-8341976 > - more > - ... and 6 more: https://git.openjdk.org/jdk/compare/5362121c...9b21648d test/hotspot/jtreg/compiler/arraycopy/TestSunkLoadAntiDependency.java line 28: > 26: * @bug 8341976 > 27: * @summary C2: use_mem_state != load->find_exact_control(load->in(0)) assert failure > 28: * @run main/othervm -XX:-BackgroundCompilation TestSunkLoadAntiDependency Would it make sense to have a run without any flags? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23465#discussion_r2024220984 From mchevalier at openjdk.org Wed Apr 2 07:32:03 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 2 Apr 2025 07:32:03 GMT Subject: RFR: 8353341: Fuzzer tests crashing: assert(projs->fallthrough_proj != nullptr) failed: must be found Message-ID: If the Mod[DF]Node has no control projection when it's being removed (because its result is unused), `extract_projections` will fail an assert. So, let's skip the removal. But that should happen only when the nodes are already unreachable (control input being transitively top). At the end of the day, the node should be dropped. because of that, so there is no rush, and let dead node deletion do the job. On the reduced reproducer, the crash is not common (even with `-XX:RepeatCompilation=300`, it might need more than a run to reproduce). So I've tried my fix on multiple thousands repeat compilations (by 300 packs) without a crash, and without having the modulo node alive at the end. Thanks, Marc ------------- Commit messages: - Don't remove Mod[DF]Node that don't have control output Changes: https://git.openjdk.org/jdk/pull/24375/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24375&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353341 Stats: 99 lines in 2 files changed: 97 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24375.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24375/head:pull/24375 PR: https://git.openjdk.org/jdk/pull/24375 From epeter at openjdk.org Wed Apr 2 07:32:08 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 2 Apr 2025 07:32:08 GMT Subject: RFR: 8353359: C2: Or(I|L)Node::Ideal is missing AddNode::Ideal call In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 06:20:48 GMT, Hannes Greule wrote: > Hi, > > this simple change adds a missing AddNode::Ideal call to Or(I|L)Node::Ideal. See the added tests for examples of optimizations that don't apply without this change. > > Please let me know what you think. @SirYwell Wow, good find! Oh dear, things like this are so easy to get wrong. Thanks for writing the IR test, that seems really to be the only way to ensure we don't get these kinds of regressions. I wonder how many more of these kinds of issues we have... Optimal would be if we had IR tests for every optimization, but that would be a lot of work! I'm running some testing, please ping me in 24h for the results! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24348#issuecomment-2771580262 From thartmann at openjdk.org Wed Apr 2 07:32:06 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 2 Apr 2025 07:32:06 GMT Subject: RFR: 8353341: Fuzzer tests crashing: assert(projs->fallthrough_proj != nullptr) failed: must be found In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 07:19:35 GMT, Marc Chevalier wrote: > If the Mod[DF]Node has no control projection when it's being removed (because its result is unused), `extract_projections` will fail an assert. So, let's skip the removal. > > But that should happen only when the nodes are already unreachable (control input being transitively top). At the end of the day, the node should be dropped. because of that, so there is no rush, and let dead node deletion do the job. > > On the reduced reproducer, the crash is not common (even with `-XX:RepeatCompilation=300`, it might need more than a run to reproduce). So I've tried my fix on multiple thousands repeat compilations (by 300 packs) without a crash, and without having the modulo node alive at the end. > > Thanks, > Marc Looks good to me! src/hotspot/share/opto/divnode.cpp line 1521: > 1519: > 1520: bool result_is_unused = proj_out_or_null(TypeFunc::Parms) == nullptr; > 1521: bool has_control_output = proj_out_or_null(TypeFunc::Control) != nullptr; Nit: Maybe replace this with `is_dead = proj_out_or_null(TypeFunc::Control) == nullptr;` and check for `!is_dead` below? test/hotspot/jtreg/compiler/c2/irTests/FPModWithoutControlProj.java line 70: > 68: } > 69: } > 70: Suggestion: test/hotspot/jtreg/compiler/c2/irTests/FPModWithoutControlProj.java line 93: > 91: } > 92: } > 93: Suggestion: ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24375#pullrequestreview-2734903737 PR Review Comment: https://git.openjdk.org/jdk/pull/24375#discussion_r2024225945 PR Review Comment: https://git.openjdk.org/jdk/pull/24375#discussion_r2024228770 PR Review Comment: https://git.openjdk.org/jdk/pull/24375#discussion_r2024224053 From epeter at openjdk.org Wed Apr 2 07:37:11 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 2 Apr 2025 07:37:11 GMT Subject: RFR: 8348887: Create IR framework test for JDK-8347997 In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 13:41:17 GMT, Marc Chevalier wrote: > As the ticket says: >> Create IR framework test which checks that allocations are eliminated in the regression test included in [JDK-8347997](https://bugs.openjdk.org/browse/JDK-8347997) fix. > > So here it is! We can see that in case of inlining, indeed, no allocation happens. The second part is some sanity check to emphasize the difference: of course, there is an allocation without inlining. The benefit of this second part is arguable. From my point of view, it's mostly to point out the difference to a future reader. But yes, there is nothing very surprising. > > Thanks, > Marc @marc-chevalier It probably makes most sense if the authors and reviewers of [JDK-8347997](https://bugs.openjdk.org/browse/JDK-8347997) review this patch (@vnkozlov @chhagedorn @TobiHartmann ). But please ping me if you don't get reviews in a week or so, then I can have a look too ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24328#issuecomment-2771592656 From jbhateja at openjdk.org Wed Apr 2 07:39:02 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 2 Apr 2025 07:39:02 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value [v2] In-Reply-To: References: Message-ID: > Hi All, > > This bugfix patch fixes incorrect value computation for Integer/Long. compress APIs. > > Problems occur with a constant input and variable mask where the input's value is equal to the lower bound of the mask value., In this case, an erroneous value range estimation results in a constant value. Existing value routine first attempts to constant fold the compression operation if both input and compression mask are constant values; otherwise, it attempts to constrain the value range of result based on the upper and lower bounds of mask type. > > New IR test covers the issue reported in the bug report along with a case for value range based logic pruning. > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23947/files - new: https://git.openjdk.org/jdk/pull/23947/files/ee67ee22..ae48895b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23947&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23947&range=00-01 Stats: 189 lines in 2 files changed: 160 ins; 4 del; 25 mod Patch: https://git.openjdk.org/jdk/pull/23947.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23947/head:pull/23947 PR: https://git.openjdk.org/jdk/pull/23947 From jbhateja at openjdk.org Wed Apr 2 07:39:03 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 2 Apr 2025 07:39:03 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value [v2] In-Reply-To: References: Message-ID: On Wed, 12 Mar 2025 08:43:23 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolutions > > @jatin-bhateja Thanks for looking into this! I left a first set of comments :) > > Primarily, it is about these issues: > - We need good comments, preferably even proofs. Because we got things wrong the last time, and there were no comments/proofs. It's difficult to get this sort of arithmetic transformation right, and it is hard to review. Proofs help to think through all the steps carefully. > - Test coverage: I would like to see some more randomized cases of input ranges. Hi @eme64 , I have addressed your comments, let me know if you need further clarifications. > src/hotspot/share/opto/intrinsicnode.cpp line 278: > >> 276: } else { >> 277: // Case 3) Mask value range only includes +ve values, this can again be >> 278: // used to ascertain known Zero bits of resultant value. > > I would put this case as the first, swapping it with Case 1). > And I would say something more explicit like this: > `Case 3) The mask value range is non-negative. Hence, the mask has at least one zero bit.` Case ordering is in accordance with the mask value range. case 1) mask value spans across -ve and -ve value ranges. case 2) mask value strictly lie within -ve value range. case 3) mask value strictly lie within +ve value range. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23947#issuecomment-2771593965 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2024244475 From jbhateja at openjdk.org Wed Apr 2 07:39:04 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 2 Apr 2025 07:39:04 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value [v2] In-Reply-To: <5fEVX0zAsdNd9v3Rk6Gr4lIXTc96g2LndUhX4Qb-bgc=.e4553c72-8da4-41c7-b71f-628bbeea14be@github.com> References: <5fEVX0zAsdNd9v3Rk6Gr4lIXTc96g2LndUhX4Qb-bgc=.e4553c72-8da4-41c7-b71f-628bbeea14be@github.com> Message-ID: <_vrEsrg7VNWQDlSYv5PO7CsGH2tNfrwyMShkxtpdqhQ=.434c6c72-e84f-40e3-8791-42e26652ee64@github.com> On Wed, 12 Mar 2025 08:08:19 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/intrinsicnode.cpp line 283: >> >>> 281: clz = bt == T_INT ? clz - 32 : clz; >>> 282: mask_max_bw = max_bw - clz; >>> 283: } >> >> Can you please put the comments for cases 1-3 either consistently before the condition, or after the condition with inlining? I would vote for inside each condition with indentation, so just like case 3), except 2 spaces indented ;) > > Why not start with the "nice" case 3) first, where we know that the range is positive, and so even after compression we cannot get negative values? > > What does this mean `only includes +ve values`? Case ordering is in accordance with the mask value range. case 1) mask value spans across -ve and -ve value ranges. case 2) mask value strictly lie within -ve value range. case 3) mask value strictly lie within +ve value range. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2024244581 From dskantz at openjdk.org Wed Apr 2 07:39:29 2025 From: dskantz at openjdk.org (Daniel Skantz) Date: Wed, 2 Apr 2025 07:39:29 GMT Subject: RFR: 8282053: IGV: refine schedule approximation [v2] In-Reply-To: <0SXZ0k-28IdpWjuhtK4dSJ9ybHE58Oq56zT_sdqeQpc=.0cf9319f-f64d-4e8f-82e9-f654464bc775@github.com> References: <0SXZ0k-28IdpWjuhtK4dSJ9ybHE58Oq56zT_sdqeQpc=.0cf9319f-f64d-4e8f-82e9-f654464bc775@github.com> Message-ID: On Tue, 1 Apr 2025 14:42:47 GMT, Daniel Skantz wrote: >> This patch refines the schedule approximation in IGV by 1) placing parm. and projection nodes in the same block as their predecessors, and 2) disallows erroneously considering machine nodes such as prefetchAlloc and rep_stos as CFG nodes. >> >> The reader may refer to the corresponding JBS issue where graphs sampled before and after the change are attached. >> >> Testing: T1-T3 with no failures. Opened graphs before and after the change and saw no obvious problems. Opened a large number of graphs in CFG view and observed no unexpected IGV warnings, errors or assert failures. > > Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: > > Update src/utils/IdealGraphVisualizer/ServerCompiler/src/main/java/com/sun/hotspot/igv/servercompiler/ServerCompilerScheduler.java > > Co-authored-by: Daniel Lund?n Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24350#issuecomment-2771595747 From duke at openjdk.org Wed Apr 2 07:39:30 2025 From: duke at openjdk.org (duke) Date: Wed, 2 Apr 2025 07:39:30 GMT Subject: RFR: 8282053: IGV: refine schedule approximation [v2] In-Reply-To: <0SXZ0k-28IdpWjuhtK4dSJ9ybHE58Oq56zT_sdqeQpc=.0cf9319f-f64d-4e8f-82e9-f654464bc775@github.com> References: <0SXZ0k-28IdpWjuhtK4dSJ9ybHE58Oq56zT_sdqeQpc=.0cf9319f-f64d-4e8f-82e9-f654464bc775@github.com> Message-ID: <8BJI6ui7ndUA4OTPv3xMzTpg5G2bzn2l9vhUlenT7IE=.f70ad9c1-52d7-4309-b7e8-3fd97e58cc76@github.com> On Tue, 1 Apr 2025 14:42:47 GMT, Daniel Skantz wrote: >> This patch refines the schedule approximation in IGV by 1) placing parm. and projection nodes in the same block as their predecessors, and 2) disallows erroneously considering machine nodes such as prefetchAlloc and rep_stos as CFG nodes. >> >> The reader may refer to the corresponding JBS issue where graphs sampled before and after the change are attached. >> >> Testing: T1-T3 with no failures. Opened graphs before and after the change and saw no obvious problems. Opened a large number of graphs in CFG view and observed no unexpected IGV warnings, errors or assert failures. > > Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: > > Update src/utils/IdealGraphVisualizer/ServerCompiler/src/main/java/com/sun/hotspot/igv/servercompiler/ServerCompilerScheduler.java > > Co-authored-by: Daniel Lund?n @danielogh Your change (at version 57ad6dc825404d2628aa376f0fa8d78090313d33) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24350#issuecomment-2771597533 From epeter at openjdk.org Wed Apr 2 07:41:09 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 2 Apr 2025 07:41:09 GMT Subject: RFR: 8352418: Add verification code to check that the associated loop nodes of useless Template Assertion Predicates are dead In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 12:28:59 GMT, Christian Hagedorn wrote: > As already suggested in https://github.com/openjdk/jdk/pull/23823, I want to do the following additional verification: > > After `eliminate_useless_predicates()` all now useless `OpaqueTemplateAssertionPredicate` nodes should not have any references to `CountedLoop` nodes that are still in the graph (otherwise, they would have been marked useful). This verification did not work reliably without the full Assertion Predicates fix [JDK-8350577](https://bugs.openjdk.org/browse/JDK-8350577). Since JDK-8350577 is now integrated, I propose to add this additional verification code. > > Thanks, > Christian Looks reasonable, nice to see more verification code :) src/hotspot/share/opto/predicates.cpp line 1250: > 1248: // graph (otherwise, they would have been marked useful instead). This is verified in this method. > 1249: void EliminateUselessPredicates::verify_loop_nodes_of_useless_templates_assertion_predicates_are_dead() const { > 1250: Unique_Node_List loop_nodes_of_useless_template_assertion_predicates = Should we add `ResourceMark` here, or is there one close by that suffices? ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24326#pullrequestreview-2734938554 PR Review Comment: https://git.openjdk.org/jdk/pull/24326#discussion_r2024248198 From thartmann at openjdk.org Wed Apr 2 07:45:10 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 2 Apr 2025 07:45:10 GMT Subject: RFR: 8353058: [PPC64] Some IR framework tests are failing after JDK-8352595 In-Reply-To: References: Message-ID: <5EDodzal0YHCnEW3k6lszJPxcNGwHtDw4qHGHhQSk_k=.7e66e90a-dd36-4bd3-bc69-26c9e828e377@github.com> On Wed, 2 Apr 2025 06:45:58 GMT, Christian Hagedorn wrote: > `TestPhaseIRMatching` was recently updated with [JDK-8352595](https://bugs.openjdk.org/browse/JDK-8352595) which changed some matching on opto assembly from `IRNode.ALLOC` (now matching on ideal phases) to `IRNode.FIELD_ACCESS` (still matching on opto assembly). However, the updated code matches differently on PPC for some method invocation on a parameter which let the test fail on PPC: > > public Object defaultOnOptoAssembly(Helper h) { > return h.getString(); // emits one "Field: " string on most platforms but none on PPC > } > > > When I've revisited the test to analyze the failure, it was not evidently clear what I had in mind back there with `defaultOnX()`. My guess is that I've tried to have one method failing on ideal phases, one on mach phases and one on both while all the methods use `IRNode` entries that have default compile phases on ideal and mach phases. But that is not the case today. I've therefore rewritten the tests to adhere to my guess. I also removed the ambiguity among platforms to have the same number of field accesses on them. > > How to read the `@ExpectedFailure` annotation: > > @IR(failOn = {IRNode.STORE, IRNode.FIELD_ACCESS, IRNode.COUNTED_LOOP, IRNode.STORE_I}, > counts = {IRNode.STORE, "20", IRNode.FIELD_ACCESS, "1", IRNode.COUNTED_LOOP, "2", IRNode.OOPMAP_WITH, "asdf", "2"}) > // Expect rule with id 5 (the one directly above) to fail: > // - We fail when matching PRINT_IDEAL with the: > // - failOn attribute: The failing constraints are constraint 1 and 4 (while 2 and 3 pass) > // - counts attribute: The failing constraints are constraint 2 and 4 (while 1 and 3 pass). > @ExpectedFailure(ruleId = 5, phase = CompilePhase.PRINT_IDEAL, failOn = {1, 4}, counts = {1, 3}) > > > Thanks to @TheRealMDoerr for testing the patch on PPC! > > Thanks, > Christian Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24373#pullrequestreview-2734953812 From mchevalier at openjdk.org Wed Apr 2 07:45:41 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 2 Apr 2025 07:45:41 GMT Subject: RFR: 8353341: C2: removal of a Mod[DF]Node crashes when the node is already dead [v2] In-Reply-To: References: Message-ID: > If the Mod[DF]Node has no control projection when it's being removed (because its result is unused), `extract_projections` will fail an assert. So, let's skip the removal. > > But that should happen only when the nodes are already unreachable (control input being transitively top). At the end of the day, the node should be dropped. because of that, so there is no rush, and let dead node deletion do the job. > > On the reduced reproducer, the crash is not common (even with `-XX:RepeatCompilation=300`, it might need more than a run to reproduce). So I've tried my fix on multiple thousands repeat compilations (by 300 packs) without a crash, and without having the modulo node alive at the end. > > Thanks, > Marc Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: Address review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24375/files - new: https://git.openjdk.org/jdk/pull/24375/files/2a347bc0..f1f0b93b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24375&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24375&range=00-01 Stats: 6 lines in 2 files changed: 0 ins; 2 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24375.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24375/head:pull/24375 PR: https://git.openjdk.org/jdk/pull/24375 From thartmann at openjdk.org Wed Apr 2 07:45:41 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 2 Apr 2025 07:45:41 GMT Subject: RFR: 8353341: C2: removal of a Mod[DF]Node crashes when the node is already dead [v2] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 07:42:37 GMT, Marc Chevalier wrote: >> If the Mod[DF]Node has no control projection when it's being removed (because its result is unused), `extract_projections` will fail an assert. So, let's skip the removal. >> >> But that should happen only when the nodes are already unreachable (control input being transitively top). At the end of the day, the node should be dropped. because of that, so there is no rush, and let dead node deletion do the job. >> >> On the reduced reproducer, the crash is not common (even with `-XX:RepeatCompilation=300`, it might need more than a run to reproduce). So I've tried my fix on multiple thousands repeat compilations (by 300 packs) without a crash, and without having the modulo node alive at the end. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments Marked as reviewed by thartmann (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24375#pullrequestreview-2734948297 From mchevalier at openjdk.org Wed Apr 2 07:45:42 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 2 Apr 2025 07:45:42 GMT Subject: RFR: 8353341: C2: removal of a Mod[DF]Node crashes when the node is already dead [v2] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 07:27:04 GMT, Tobias Hartmann wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> Address review comments > > src/hotspot/share/opto/divnode.cpp line 1521: > >> 1519: >> 1520: bool result_is_unused = proj_out_or_null(TypeFunc::Parms) == nullptr; >> 1521: bool has_control_output = proj_out_or_null(TypeFunc::Control) != nullptr; > > Nit: Maybe replace this with `is_dead = proj_out_or_null(TypeFunc::Control) == nullptr;` and check for `!is_dead` below? Fine with me! At the very least, your name is more semantic, and less "here is a name that repeats what the code says". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24375#discussion_r2024251731 From chagedorn at openjdk.org Wed Apr 2 07:45:41 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 2 Apr 2025 07:45:41 GMT Subject: RFR: 8353341: C2: removal of a Mod[DF]Node crashes when the node is already dead [v2] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 07:42:37 GMT, Marc Chevalier wrote: >> If the Mod[DF]Node has no control projection when it's being removed (because its result is unused), `extract_projections` will fail an assert. So, let's skip the removal. >> >> But that should happen only when the nodes are already unreachable (control input being transitively top). At the end of the day, the node should be dropped. because of that, so there is no rush, and let dead node deletion do the job. >> >> On the reduced reproducer, the crash is not common (even with `-XX:RepeatCompilation=300`, it might need more than a run to reproduce). So I've tried my fix on multiple thousands repeat compilations (by 300 packs) without a crash, and without having the modulo node alive at the end. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments Looks good! src/hotspot/share/opto/divnode.cpp line 1522: > 1520: bool result_is_unused = proj_out_or_null(TypeFunc::Parms) == nullptr; > 1521: bool is_dead = proj_out_or_null(TypeFunc::Control) == nullptr; > 1522: if (result_is_unused && !is_dead) { Might be easier to read when it's flipped to avoid negation with `!` but I leave it up to you to decide which one you prefer :-) bool not_dead = proj_out_or_null(TypeFunc::Control) != nullptr; if (result_is_unused && not_dead) { test/hotspot/jtreg/compiler/c2/irTests/FPModWithoutControlProj.java line 65: > 63: } > 64: iArr[1] += 5; > 65: Suggestion: test/hotspot/jtreg/compiler/c2/irTests/FPModWithoutControlProj.java line 87: > 85: } > 86: iArr[1] += 5; > 87: Suggestion: ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24375#pullrequestreview-2734945309 PR Review Comment: https://git.openjdk.org/jdk/pull/24375#discussion_r2024255180 PR Review Comment: https://git.openjdk.org/jdk/pull/24375#discussion_r2024252459 PR Review Comment: https://git.openjdk.org/jdk/pull/24375#discussion_r2024252598 From dskantz at openjdk.org Wed Apr 2 07:48:25 2025 From: dskantz at openjdk.org (Daniel Skantz) Date: Wed, 2 Apr 2025 07:48:25 GMT Subject: Integrated: 8282053: IGV: refine schedule approximation In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 07:23:04 GMT, Daniel Skantz wrote: > This patch refines the schedule approximation in IGV by 1) placing parm. and projection nodes in the same block as their predecessors, and 2) disallows erroneously considering machine nodes such as prefetchAlloc and rep_stos as CFG nodes. > > The reader may refer to the corresponding JBS issue where graphs sampled before and after the change are attached. > > Testing: T1-T3 with no failures. Opened graphs before and after the change and saw no obvious problems. Opened a large number of graphs in CFG view and observed no unexpected IGV warnings, errors or assert failures. This pull request has now been integrated. Changeset: 8fb67ac5 Author: Daniel Skantz Committer: Roberto Casta?eda Lozano URL: https://git.openjdk.org/jdk/commit/8fb67ac55bb61c029a3ae360ee849fd1edd2ac79 Stats: 23 lines in 1 file changed: 20 ins; 0 del; 3 mod 8282053: IGV: refine schedule approximation Reviewed-by: rcastanedalo, dlunden, dfenacci ------------- PR: https://git.openjdk.org/jdk/pull/24350 From thartmann at openjdk.org Wed Apr 2 07:50:29 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 2 Apr 2025 07:50:29 GMT Subject: RFR: 8348887: Create IR framework test for JDK-8347997 In-Reply-To: References: Message-ID: <_AWQLLGypcMLFX52xPmTeow5fWrbqLyqGT4WfqFZl2w=.ed830608-8f66-4198-bc1d-aaa00a71766f@github.com> On Tue, 1 Apr 2025 13:01:05 GMT, Marc Chevalier wrote: >> test/hotspot/jtreg/compiler/c2/irTests/TestContinuationPinningAndEA.java line 118: >> >>> 116: >>> 117: @DontInline >>> 118: public CrashesNoInline() throws Throwable { >> >> It's probably my own ignorance, but just in case are others are in the same boat, why does this crash? Could you add a brief javadoc for future readers? Same with other Crashes cases. > > It's rather bad (uninspired) naming. I based this test on the test introduced by [JDK-8347997](https://bugs.openjdk.org/browse/JDK-8347997), which (I suspect) is based on the reproducer mentioned in JBS. There are 2 cases: one made EA crash, the other make it fail (not detect the non escaping, as far as I understand). From Vladimir's comment on PR 23284, it used to crash because of a corrupted memory graph. Honestly, I'm not quite clear on that. There is already a test (from said ticket and PR) making sure it doesn't crash. The point of the test I'm adding is to check that the allocation is gone (thanks to EA). Maybe the best is rather to rename the cases "Crashes" and "FailEA": it made sense in the context of the original bug, but it's not very useful names for the future. But I'm not sure what would be fitting. Right, I would suggest to rename these methods. The purpose of this test is not to reproduce the crashes that happened before [JDK-8347997](https://bugs.openjdk.org/browse/JDK-8347997) (which has it's own regression test), but to verify that EA is able to remove allocations around the pin/unpin intrinsic now that the crashes are fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24328#discussion_r2024265119 From chagedorn at openjdk.org Wed Apr 2 07:53:53 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 2 Apr 2025 07:53:53 GMT Subject: RFR: 8352418: Add verification code to check that the associated loop nodes of useless Template Assertion Predicates are dead [v2] In-Reply-To: References: Message-ID: > As already suggested in https://github.com/openjdk/jdk/pull/23823, I want to do the following additional verification: > > After `eliminate_useless_predicates()` all now useless `OpaqueTemplateAssertionPredicate` nodes should not have any references to `CountedLoop` nodes that are still in the graph (otherwise, they would have been marked useful). This verification did not work reliably without the full Assertion Predicates fix [JDK-8350577](https://bugs.openjdk.org/browse/JDK-8350577). Since JDK-8350577 is now integrated, I propose to add this additional verification code. > > Thanks, > Christian Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: Add ResourceMark ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24326/files - new: https://git.openjdk.org/jdk/pull/24326/files/38e8e865..14a90a8b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24326&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24326&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24326.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24326/head:pull/24326 PR: https://git.openjdk.org/jdk/pull/24326 From chagedorn at openjdk.org Wed Apr 2 07:53:54 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 2 Apr 2025 07:53:54 GMT Subject: RFR: 8352418: Add verification code to check that the associated loop nodes of useless Template Assertion Predicates are dead [v2] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 07:36:22 GMT, Emanuel Peter wrote: >> Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: >> >> Add ResourceMark > > src/hotspot/share/opto/predicates.cpp line 1250: > >> 1248: // graph (otherwise, they would have been marked useful instead). This is verified in this method. >> 1249: void EliminateUselessPredicates::verify_loop_nodes_of_useless_templates_assertion_predicates_are_dead() const { >> 1250: Unique_Node_List loop_nodes_of_useless_template_assertion_predicates = > > Should we add `ResourceMark` here, or is there one close by that suffices? Good idea! I think the closest one will only be in `PhaseIdealLoop::optimize()` once we are done with one round of loop opts. So, it would make sense to add one here. Pushed an updated. Will ran some more testing to check that we don't hit any surprises. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24326#discussion_r2024263260 From epeter at openjdk.org Wed Apr 2 07:53:53 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 2 Apr 2025 07:53:53 GMT Subject: RFR: 8352418: Add verification code to check that the associated loop nodes of useless Template Assertion Predicates are dead [v2] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 07:49:34 GMT, Christian Hagedorn wrote: >> As already suggested in https://github.com/openjdk/jdk/pull/23823, I want to do the following additional verification: >> >> After `eliminate_useless_predicates()` all now useless `OpaqueTemplateAssertionPredicate` nodes should not have any references to `CountedLoop` nodes that are still in the graph (otherwise, they would have been marked useful). This verification did not work reliably without the full Assertion Predicates fix [JDK-8350577](https://bugs.openjdk.org/browse/JDK-8350577). Since JDK-8350577 is now integrated, I propose to add this additional verification code. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Add ResourceMark Marked as reviewed by epeter (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24326#pullrequestreview-2734966884 From mchevalier at openjdk.org Wed Apr 2 07:57:22 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 2 Apr 2025 07:57:22 GMT Subject: RFR: 8353341: C2: removal of a Mod[DF]Node crashes when the node is already dead [v2] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 07:41:01 GMT, Christian Hagedorn wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> Address review comments > > src/hotspot/share/opto/divnode.cpp line 1522: > >> 1520: bool result_is_unused = proj_out_or_null(TypeFunc::Parms) == nullptr; >> 1521: bool is_dead = proj_out_or_null(TypeFunc::Control) == nullptr; >> 1522: if (result_is_unused && !is_dead) { > > Might be easier to read when it's flipped to avoid negation with `!` but I leave it up to you to decide which one you prefer :-) > > bool not_dead = proj_out_or_null(TypeFunc::Control) != nullptr; > if (result_is_unused && not_dead) { I'd agree if I wrote `!not_dead`. But between writing `! a_positive_property` and `a_negative_property`, I'm much less decided. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24375#discussion_r2024275097 From thartmann at openjdk.org Wed Apr 2 07:57:52 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 2 Apr 2025 07:57:52 GMT Subject: RFR: 8348887: Create IR framework test for JDK-8347997 In-Reply-To: <-oDjoDP_yRuceA3tSsjHt7T8NaU7yZHbDexm8UviZPg=.a2b72ab5-f851-47c5-9003-64b6bba2092e@github.com> References: <-oDjoDP_yRuceA3tSsjHt7T8NaU7yZHbDexm8UviZPg=.a2b72ab5-f851-47c5-9003-64b6bba2092e@github.com> Message-ID: On Wed, 2 Apr 2025 07:50:35 GMT, Tobias Hartmann wrote: >> As the ticket says: >>> Create IR framework test which checks that allocations are eliminated in the regression test included in [JDK-8347997](https://bugs.openjdk.org/browse/JDK-8347997) fix. >> >> So here it is! We can see that in case of inlining, indeed, no allocation happens. The second part is some sanity check to emphasize the difference: of course, there is an allocation without inlining. The benefit of this second part is arguable. From my point of view, it's mostly to point out the difference to a future reader. But yes, there is nothing very surprising. >> >> Thanks, >> Marc > > test/hotspot/jtreg/compiler/c2/irTests/TestContinuationPinningAndEA.java line 51: > >> 49: try { >> 50: test_FailsEA(); >> 51: } catch (Throwable _) { > > These should normally not throw, right? I would just propagate the exception upwards. Otherwise we risk hiding real issues. And isn't `Exception` or a subclass sufficient? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24328#discussion_r2024276355 From thartmann at openjdk.org Wed Apr 2 07:57:52 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 2 Apr 2025 07:57:52 GMT Subject: RFR: 8348887: Create IR framework test for JDK-8347997 In-Reply-To: References: Message-ID: <-oDjoDP_yRuceA3tSsjHt7T8NaU7yZHbDexm8UviZPg=.a2b72ab5-f851-47c5-9003-64b6bba2092e@github.com> On Mon, 31 Mar 2025 13:41:17 GMT, Marc Chevalier wrote: > As the ticket says: >> Create IR framework test which checks that allocations are eliminated in the regression test included in [JDK-8347997](https://bugs.openjdk.org/browse/JDK-8347997) fix. > > So here it is! We can see that in case of inlining, indeed, no allocation happens. The second part is some sanity check to emphasize the difference: of course, there is an allocation without inlining. The benefit of this second part is arguable. From my point of view, it's mostly to point out the difference to a future reader. But yes, there is nothing very surprising. > > Thanks, > Marc test/hotspot/jtreg/compiler/c2/irTests/TestContinuationPinningAndEA.java line 51: > 49: try { > 50: test_FailsEA(); > 51: } catch (Throwable _) { These should normally not throw, right? I would just propagate the exception upwards. Otherwise we risk hiding real issues. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24328#discussion_r2024270186 From duke at openjdk.org Wed Apr 2 08:03:06 2025 From: duke at openjdk.org (kuaiwei) Date: Wed, 2 Apr 2025 08:03:06 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v10] In-Reply-To: References: Message-ID: > In this patch, I extent the merge stores optimization to merge adjacents loads. Tier1 tests are passed in my machine. > > The benchmark result of MergeLoadBench.java > AMD EPYC 9T24 96-Core Processor: > > |name | -MergeLoads | +MergeLoads |delta| > |---|---|---|---| > |MergeLoadBench.getCharB |4352.150 |4407.435 | 55.29 | > |MergeLoadBench.getCharBU |4075.320 |4084.663 | 9.34 | > |MergeLoadBench.getCharBV |3221.302 |3221.528 | 0.23 | > |MergeLoadBench.getCharC |2235.433 |2238.796 | 3.36 | > |MergeLoadBench.getCharL |4363.244 |4372.281 | 9.04 | > |MergeLoadBench.getCharLU |4072.550 |4075.744 | 3.19 | > |MergeLoadBench.getCharLV |2227.825 |2231.612 | 3.79 | > |MergeLoadBench.getIntB |11199.935 |6869.030 | -4330.90 | > |MergeLoadBench.getIntBU |6853.862 |2763.923 | -4089.94 | > |MergeLoadBench.getIntBV |306.953 |309.911 | 2.96 | > |MergeLoadBench.getIntL |10426.843 |6523.716 | -3903.13 | > |MergeLoadBench.getIntLU |6740.847 |2602.701 | -4138.15 | > |MergeLoadBench.getIntLV |2233.151 |2231.745 | -1.41 | > |MergeLoadBench.getIntRB |11335.756 |8980.619 | -2355.14 | > |MergeLoadBench.getIntRBU |7439.873 |3190.208 | -4249.66 | > |MergeLoadBench.getIntRL |16323.040 |7786.842 | -8536.20 | > |MergeLoadBench.getIntRLU |7457.745 |3364.140 | -4093.61 | > |MergeLoadBench.getIntRU |2512.621 |2511.668 | -0.95 | > |MergeLoadBench.getIntU |2501.064 |2500.629 | -0.43 | > |MergeLoadBench.getLongB |21175.442 |21103.660 | -71.78 | > |MergeLoadBench.getLongBU |14042.046 |2512.784 | -11529.26 | > |MergeLoadBench.getLongBV |606.448 |606.171 | -0.28 | > |MergeLoadBench.getLongL |23142.178 |23217.785 | 75.61 | > |MergeLoadBench.getLongLU |14112.972 |2237.659 | -11875.31 | > |MergeLoadBench.getLongLV |2230.416 |2231.224 | 0.81 | > |MergeLoadBench.getLongRB |21152.558 |21140.583 | -11.98 | > |MergeLoadBench.getLongRBU |14031.178 |2520.317 | -11510.86 | > |MergeLoadBench.getLongRL |23248.506 |23136.410 | -112.10 | > |MergeLoadBench.getLongRLU |14125.032 |2240.481 | -11884.55 | > |MergeLoadBench.getLongRU |3071.881 |3066.606 | -5.27 | > |Merg... kuaiwei has updated the pull request incrementally with one additional commit since the last revision: Move code to addnode.cpp and add more tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24023/files - new: https://git.openjdk.org/jdk/pull/24023/files/e37c4bf3..ee511bf1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24023&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24023&range=08-09 Stats: 1350 lines in 3 files changed: 770 ins; 578 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24023.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24023/head:pull/24023 PR: https://git.openjdk.org/jdk/pull/24023 From mchevalier at openjdk.org Wed Apr 2 08:08:07 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 2 Apr 2025 08:08:07 GMT Subject: RFR: 8353341: C2: removal of a Mod[DF]Node crashes when the node is already dead [v3] In-Reply-To: References: Message-ID: > If the Mod[DF]Node has no control projection when it's being removed (because its result is unused), `extract_projections` will fail an assert. So, let's skip the removal. > > But that should happen only when the nodes are already unreachable (control input being transitively top). At the end of the day, the node should be dropped. because of that, so there is no rush, and let dead node deletion do the job. > > On the reduced reproducer, the crash is not common (even with `-XX:RepeatCompilation=300`, it might need more than a run to reproduce). So I've tried my fix on multiple thousands repeat compilations (by 300 packs) without a crash, and without having the modulo node alive at the end. > > For instance, that's what happen on the reproducer. Quickly, some big sub-graph is dead, but nodes stay a while in the graph: > > Then: > > And eventually, everything is removed, so the control projection is removed, and `extract_projections` doesn't like it. > > > Thanks, > Marc Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: Address review comments, part 2 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24375/files - new: https://git.openjdk.org/jdk/pull/24375/files/f1f0b93b..48bd2037 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24375&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24375&range=01-02 Stats: 6 lines in 2 files changed: 0 ins; 2 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24375.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24375/head:pull/24375 PR: https://git.openjdk.org/jdk/pull/24375 From mchevalier at openjdk.org Wed Apr 2 08:08:08 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 2 Apr 2025 08:08:08 GMT Subject: RFR: 8353341: C2: removal of a Mod[DF]Node crashes when the node is already dead [v2] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 07:53:15 GMT, Marc Chevalier wrote: >> src/hotspot/share/opto/divnode.cpp line 1522: >> >>> 1520: bool result_is_unused = proj_out_or_null(TypeFunc::Parms) == nullptr; >>> 1521: bool is_dead = proj_out_or_null(TypeFunc::Control) == nullptr; >>> 1522: if (result_is_unused && !is_dead) { >> >> Might be easier to read when it's flipped to avoid negation with `!` but I leave it up to you to decide which one you prefer :-) >> >> bool not_dead = proj_out_or_null(TypeFunc::Control) != nullptr; >> if (result_is_unused && not_dead) { > > I'd agree if I wrote `!not_dead`. But between writing `! a_positive_property` and `a_negative_property`, I'm much less decided. I ended up flipping as suggested because I've seen fonts/colors making the `!` not clearly a non-letter (github, for instance isn't very good at that imo), and sometimes, people might be surprised. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24375#discussion_r2024293297 From mchevalier at openjdk.org Wed Apr 2 08:09:48 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 2 Apr 2025 08:09:48 GMT Subject: RFR: 8353341: C2: removal of a Mod[DF]Node crashes when the node is already dead [v2] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 07:45:41 GMT, Marc Chevalier wrote: >> If the Mod[DF]Node has no control projection when it's being removed (because its result is unused), `extract_projections` will fail an assert. So, let's skip the removal. >> >> But that should happen only when the nodes are already unreachable (control input being transitively top). At the end of the day, the node should be dropped. because of that, so there is no rush, and let dead node deletion do the job. >> >> On the reduced reproducer, the crash is not common (even with `-XX:RepeatCompilation=300`, it might need more than a run to reproduce). So I've tried my fix on multiple thousands repeat compilations (by 300 packs) without a crash, and without having the modulo node alive at the end. >> >> For instance, that's what happen on the reproducer. Quickly, some big sub-graph is dead, but nodes stay a while in the graph: >> >> Then: >> >> And eventually, everything is removed, so the control projection is removed, and `extract_projections` doesn't like it. >> >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments I've fixed everything, ready for next round. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24375#issuecomment-2771728974 From duke at openjdk.org Wed Apr 2 08:23:50 2025 From: duke at openjdk.org (kuaiwei) Date: Wed, 2 Apr 2025 08:23:50 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v11] In-Reply-To: References: Message-ID: > In this patch, I extent the merge stores optimization to merge adjacents loads. Tier1 tests are passed in my machine. > > The benchmark result of MergeLoadBench.java > AMD EPYC 9T24 96-Core Processor: > > |name | -MergeLoads | +MergeLoads |delta| > |---|---|---|---| > |MergeLoadBench.getCharB |4352.150 |4407.435 | 55.29 | > |MergeLoadBench.getCharBU |4075.320 |4084.663 | 9.34 | > |MergeLoadBench.getCharBV |3221.302 |3221.528 | 0.23 | > |MergeLoadBench.getCharC |2235.433 |2238.796 | 3.36 | > |MergeLoadBench.getCharL |4363.244 |4372.281 | 9.04 | > |MergeLoadBench.getCharLU |4072.550 |4075.744 | 3.19 | > |MergeLoadBench.getCharLV |2227.825 |2231.612 | 3.79 | > |MergeLoadBench.getIntB |11199.935 |6869.030 | -4330.90 | > |MergeLoadBench.getIntBU |6853.862 |2763.923 | -4089.94 | > |MergeLoadBench.getIntBV |306.953 |309.911 | 2.96 | > |MergeLoadBench.getIntL |10426.843 |6523.716 | -3903.13 | > |MergeLoadBench.getIntLU |6740.847 |2602.701 | -4138.15 | > |MergeLoadBench.getIntLV |2233.151 |2231.745 | -1.41 | > |MergeLoadBench.getIntRB |11335.756 |8980.619 | -2355.14 | > |MergeLoadBench.getIntRBU |7439.873 |3190.208 | -4249.66 | > |MergeLoadBench.getIntRL |16323.040 |7786.842 | -8536.20 | > |MergeLoadBench.getIntRLU |7457.745 |3364.140 | -4093.61 | > |MergeLoadBench.getIntRU |2512.621 |2511.668 | -0.95 | > |MergeLoadBench.getIntU |2501.064 |2500.629 | -0.43 | > |MergeLoadBench.getLongB |21175.442 |21103.660 | -71.78 | > |MergeLoadBench.getLongBU |14042.046 |2512.784 | -11529.26 | > |MergeLoadBench.getLongBV |606.448 |606.171 | -0.28 | > |MergeLoadBench.getLongL |23142.178 |23217.785 | 75.61 | > |MergeLoadBench.getLongLU |14112.972 |2237.659 | -11875.31 | > |MergeLoadBench.getLongLV |2230.416 |2231.224 | 0.81 | > |MergeLoadBench.getLongRB |21152.558 |21140.583 | -11.98 | > |MergeLoadBench.getLongRBU |14031.178 |2520.317 | -11510.86 | > |MergeLoadBench.getLongRL |23248.506 |23136.410 | -112.10 | > |MergeLoadBench.getLongRLU |14125.032 |2240.481 | -11884.55 | > |MergeLoadBench.getLongRU |3071.881 |3066.606 | -5.27 | > |Merg... kuaiwei has updated the pull request incrementally with one additional commit since the last revision: Remove unused code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24023/files - new: https://git.openjdk.org/jdk/pull/24023/files/ee511bf1..279c354a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24023&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24023&range=09-10 Stats: 17 lines in 3 files changed: 0 ins; 16 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24023.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24023/head:pull/24023 PR: https://git.openjdk.org/jdk/pull/24023 From mchevalier at openjdk.org Wed Apr 2 08:31:19 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 2 Apr 2025 08:31:19 GMT Subject: RFR: 8353345: C2 asserts because maskShiftAmount modifies node without deleting the hash Message-ID: First delete the hash, then `set_req`. This way, we avoid changing the node (a non-`this` node) without deleting the hash. This wrong ordering is not new from [JDK-8347459](https://bugs.openjdk.org/browse/JDK-8347459), but before that, only `this` was going through this function, so it was ok. But since, it is used with other nodes, hence the need to remove the hash. Also, not do any of that outside IGVN, but requires to register nested shifts for IGVN in parsing not to miss them later. Thanks, Marc ------------- Commit messages: - No collapse double shift left in IGVN + remove from hashtable before set_req Changes: https://git.openjdk.org/jdk/pull/24355/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24355&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353345 Stats: 62 lines in 2 files changed: 60 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24355.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24355/head:pull/24355 PR: https://git.openjdk.org/jdk/pull/24355 From shade at openjdk.org Wed Apr 2 08:56:20 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 2 Apr 2025 08:56:20 GMT Subject: RFR: 8353188: C1: Clean up x86 backend after 32-bit x86 removal [v3] In-Reply-To: <-iwh_5JGpt-TAVpfZQjwbnIG_c8hvirNKCcmiZoLNls=.3b34bf15-51fc-42bf-a294-1c23ca99754c@github.com> References: <-iwh_5JGpt-TAVpfZQjwbnIG_c8hvirNKCcmiZoLNls=.3b34bf15-51fc-42bf-a294-1c23ca99754c@github.com> Message-ID: > Piece-wise cleanup of C1_LIRAssembler_x86, C1_MacroAssembler and related classes. C1 implements the bulk of arch-specific backend there. Major parts of this backend are already removed by #24274, this cleans up another large bulk, and hopefully most of it. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux x86_64 server fastdebug, `all` + `-XX:TieredStopAtLevel=1` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Minor whitespace reverts ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24301/files - new: https://git.openjdk.org/jdk/pull/24301/files/527854ec..77262978 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24301&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24301&range=01-02 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24301.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24301/head:pull/24301 PR: https://git.openjdk.org/jdk/pull/24301 From shade at openjdk.org Wed Apr 2 08:56:21 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 2 Apr 2025 08:56:21 GMT Subject: RFR: 8353188: C1: Clean up x86 backend after 32-bit x86 removal [v2] In-Reply-To: References: <-iwh_5JGpt-TAVpfZQjwbnIG_c8hvirNKCcmiZoLNls=.3b34bf15-51fc-42bf-a294-1c23ca99754c@github.com> Message-ID: On Tue, 1 Apr 2025 07:58:27 GMT, Aleksey Shipilev wrote: >> Piece-wise cleanup of C1_LIRAssembler_x86, C1_MacroAssembler and related classes. C1 implements the bulk of arch-specific backend there. Major parts of this backend are already removed by #24274, this cleans up another large bulk, and hopefully most of it. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux x86_64 server fastdebug, `all` + `-XX:TieredStopAtLevel=1` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Review comments Thanks! Testing looks green. I need another Reviewer before I integrate. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24301#issuecomment-2771872615 From shade at openjdk.org Wed Apr 2 08:57:59 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 2 Apr 2025 08:57:59 GMT Subject: RFR: 8334046: Set different values for CompLevel_any and CompLevel_all [v2] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 03:43:03 GMT, Cesar Soares Lucas wrote: >> Please review this trivial patch to set different values for CompLevel_any and CompLevel_all. >> Setting different values for these fields make the implementation of [this other issue](https://bugs.openjdk.org/browse/JDK-8313713) much cleaner/easier. >> Tested on OSX/Linux Aarch64/x86_64 with JTREG. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Fix WhiteBox constants. Looks fine to me, but again, @veresov or someone else from Compiler team needs to take a look. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24298#pullrequestreview-2735412352 From mdoerr at openjdk.org Wed Apr 2 10:01:19 2025 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 2 Apr 2025 10:01:19 GMT Subject: RFR: 8353058: [PPC64] Some IR framework tests are failing after JDK-8352595 In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 06:45:58 GMT, Christian Hagedorn wrote: > `TestPhaseIRMatching` was recently updated with [JDK-8352595](https://bugs.openjdk.org/browse/JDK-8352595) which changed some matching on opto assembly from `IRNode.ALLOC` (now matching on ideal phases) to `IRNode.FIELD_ACCESS` (still matching on opto assembly). However, the updated code matches differently on PPC for some method invocation on a parameter which let the test fail on PPC: > > public Object defaultOnOptoAssembly(Helper h) { > return h.getString(); // emits one "Field: " string on most platforms but none on PPC > } > > > When I've revisited the test to analyze the failure, it was not evidently clear what I had in mind back there with `defaultOnX()`. My guess is that I've tried to have one method failing on ideal phases, one on mach phases and one on both while all the methods use `IRNode` entries that have default compile phases on ideal and mach phases. But that is not the case today. I've therefore rewritten the tests to adhere to my guess. I also removed the ambiguity among platforms to have the same number of field accesses on them. > > How to read the `@ExpectedFailure` annotation: > > @IR(failOn = {IRNode.STORE, IRNode.FIELD_ACCESS, IRNode.COUNTED_LOOP, IRNode.STORE_I}, > counts = {IRNode.STORE, "20", IRNode.FIELD_ACCESS, "1", IRNode.COUNTED_LOOP, "2", IRNode.OOPMAP_WITH, "asdf", "2"}) > // Expect rule with id 5 (the one directly above) to fail: > // - We fail when matching PRINT_IDEAL with the: > // - failOn attribute: The failing constraints are constraint 1 and 4 (while 2 and 3 pass) > // - counts attribute: The failing constraints are constraint 2 and 4 (while 1 and 3 pass). > @ExpectedFailure(ruleId = 5, phase = CompilePhase.PRINT_IDEAL, failOn = {1, 4}, counts = {1, 3}) > > > Thanks to @TheRealMDoerr for testing the patch on PPC! > > Thanks, > Christian Thanks for the fix! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24373#issuecomment-2772021612 From epeter at openjdk.org Wed Apr 2 10:01:43 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 2 Apr 2025 10:01:43 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value [v2] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 07:39:02 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This bugfix patch fixes incorrect value computation for Integer/Long. compress APIs. >> >> Problems occur with a constant input and variable mask where the input's value is equal to the lower bound of the mask value., In this case, an erroneous value range estimation results in a constant value. Existing value routine first attempts to constant fold the compression operation if both input and compression mask are constant values; otherwise, it attempts to constrain the value range of result based on the upper and lower bounds of mask type. >> >> New IR test covers the issue reported in the bug report along with a case for value range based logic pruning. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolutions @jatin-bhateja Thanks for the updates! I have a few more requests :) src/hotspot/share/opto/intrinsicnode.cpp line 266: > 264: if ( opc == Op_CompressBits) { > 265: // Pattern: Integer/Long.compress(src_type, mask_type) > 266: int max_mask_bit_width; Suggestion: int result_bit_width; Is this bit width not about the result? It is really not about the mask. Example: `mask_type->hi_as_long() < -1L` Here, the mask has the uppermost bit set, and so the bit width of it is the maximum 32 / 64 bits. But we still can deduce that the result has one leading zero bit, and so the bit width of the result is either 31 or 63. src/hotspot/share/opto/intrinsicnode.cpp line 274: > 272: } else if (mask_type->hi_as_long() < -1L) { > 273: // Case 2) Mask value range is less than -1, this indicates presence of at least > 274: // one zero bit in the mask value, there by constraining the result of compression Suggestion: // one zero bit in the mask value, thereby constraining the result of compression src/hotspot/share/opto/intrinsicnode.cpp line 292: > 290: // compression result will never be a -ve value and we can safely set the > 291: // lower bound of the result value range to zero. > 292: lo = max_mask_bit_width == mask_bit_width ? lo : 0L; Can you please add an assert that we are not making `lo` worse than what we already have? Someone may insert optimizations above that set `lo > 0`, and then you may lower it again here. Suggestion: assert(lo < 0, "we should not lower the value of lo"); lo = max_mask_bit_width == mask_bit_width ? lo : 0L; src/hotspot/share/opto/intrinsicnode.cpp line 298: > 296: // in case input equals above estimated lower bound. > 297: hi = src_type->hi_as_long() == lo ? hi : src_type->hi_as_long(); > 298: hi = max_mask_bit_width < mask_bit_width ? (1L << max_mask_bit_width) - 1 : hi; I still don't understand your comment here. For example, I don't see a `max_int` in the code... And I also don't see anything that deals with constants in the code explicitly. And similarly as above, how do we ensure that `hi` is not raised accidentally? src/hotspot/share/opto/intrinsicnode.cpp line 391: > 389: return TypeInteger::zero(bt); > 390: } > 391: Is this change related to the PR title? And do you have any tests for it? test/hotspot/jtreg/compiler/c2/TestBitCompressValueTransform.java line 2: > 1: /* > 2: * Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved. Ah, I just noticed the test directory. I think we can put it in a more specific location. test/hotspot/jtreg/compiler/c2/TestBitCompressValueTransform.java line 29: > 27: * @library /test/lib / > 28: * @summary C2: wrong result: Integer/Long.compress gets wrong type from CompressBitsNode::Value. > 29: * @run main/othervm -Xbatch -XX:-TieredCompilation -XX:CompileThresholdScaling=0.3 compiler.c2.TestBitCompressValueTransform Do you really need the flags here? The IR framework already makes sure that compilation happens, and then we execute the test again. So `Xbatch` may not be necessary to reproduce the bug. And same for `TieredCompilation`. Maybe we actually don't need any flags, but please check! ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23947#pullrequestreview-2734951364 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2024480144 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2024462231 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2024485135 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2024491349 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2024260171 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2024257750 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2024256227 From epeter at openjdk.org Wed Apr 2 10:01:48 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 2 Apr 2025 10:01:48 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value [v2] In-Reply-To: References: Message-ID: On Wed, 12 Mar 2025 08:20:36 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolutions > > src/hotspot/share/opto/intrinsicnode.cpp line 280: > >> 278: // used to ascertain known Zero bits of resultant value. >> 279: assert(mask_type->lo_as_long() >= 0, ""); >> 280: jlong clz = count_leading_zeros(mask_type->hi_as_long()); > > Suggestion: > > jlong clz = count_leading_zeros(mask_type->hi_as_long()); > // The mask has at least clz leading zeros, and hence also the compression > // result must have at least clz leading zeros. I think a comment like this is still missing. You should somehow say that the leading zeros in the mask translate to leading zeros in the result. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2024472048 From mchevalier at openjdk.org Wed Apr 2 10:03:30 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 2 Apr 2025 10:03:30 GMT Subject: RFR: 8348887: Create IR framework test for JDK-8347997 [v2] In-Reply-To: References: Message-ID: > As the ticket says: >> Create IR framework test which checks that allocations are eliminated in the regression test included in [JDK-8347997](https://bugs.openjdk.org/browse/JDK-8347997) fix. > > So here it is! We can see that in case of inlining, indeed, no allocation happens. The second part is some sanity check to emphasize the difference: of course, there is an allocation without inlining. The benefit of this second part is arguable. From my point of view, it's mostly to point out the difference to a future reader. But yes, there is nothing very surprising. > > Thanks, > Marc Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: Remove catch, rename, remove static ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24328/files - new: https://git.openjdk.org/jdk/pull/24328/files/f53138f7..efa712be Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24328&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24328&range=00-01 Stats: 35 lines in 1 file changed: 0 ins; 12 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/24328.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24328/head:pull/24328 PR: https://git.openjdk.org/jdk/pull/24328 From mchevalier at openjdk.org Wed Apr 2 10:03:39 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 2 Apr 2025 10:03:39 GMT Subject: RFR: 8348887: Create IR framework test for JDK-8347997 In-Reply-To: References: Message-ID: <-IKJ5CKTvSR2Y3YcTBHNtNXQkQcvfpeZkyPb0AhCS_g=.bbcfc206-de65-4e83-9bf4-3c11582af9dc@github.com> On Mon, 31 Mar 2025 13:41:17 GMT, Marc Chevalier wrote: > As the ticket says: >> Create IR framework test which checks that allocations are eliminated in the regression test included in [JDK-8347997](https://bugs.openjdk.org/browse/JDK-8347997) fix. > > So here it is! We can see that in case of inlining, indeed, no allocation happens. The second part is some sanity check to emphasize the difference: of course, there is an allocation without inlining. The benefit of this second part is arguable. From my point of view, it's mostly to point out the difference to a future reader. But yes, there is nothing very surprising. > > Thanks, > Marc Made quite some changes, but in particular: got rid of catches, and renamed cases. New names are not that much more inspired, but at least, not confusing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24328#issuecomment-2772064669 From epeter at openjdk.org Wed Apr 2 10:15:59 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 2 Apr 2025 10:15:59 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v9] In-Reply-To: References: Message-ID: On Mon, 17 Mar 2025 21:01:23 GMT, Kangcheng Xu wrote: >> [JDK-8347555](https://bugs.openjdk.org/browse/JDK-8347555) is a redo of [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) was [first merged](https://git.openjdk.org/jdk/pull/20754) then backed out due to a regression. This patch redos the feature and fixes the bit shift overflow problem. For more information please refer to the previous PR. >> >> When constanlizing multiplications (possibly in forms on `lshifts`), the multiplier is upgraded to long and then later narrowed to int if needed. However, when a `lshift` operand is exactly `32`, overflowing an int, using long has an unexpected result. (i.e., `(1 << 32) = 1` and `(int) (1L << 32) = 0`) >> >> The following was implemented to address this issue. >> >> if (UseNewCode2) { >> *multiplier = bt == T_INT >> ? (jlong) (1 << con->get_int()) // loss of precision is expected for int as it overflows >> : ((jlong) 1) << con->get_int(); >> } else { >> *multiplier = ((jlong) 1 << con->get_int()); >> } >> >> >> Two new bitshift overflow tests were added. > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > fix ((x< `a*(b + c)` are already handled by `AddNode::IdealIL`. It would just be a shame to have all the complexity of matching specific cases, but not take the chance to make it a bit more general. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23506#issuecomment-2772097646 From chagedorn at openjdk.org Wed Apr 2 10:24:00 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 2 Apr 2025 10:24:00 GMT Subject: RFR: 8353341: C2: removal of a Mod[DF]Node crashes when the node is already dead [v3] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 08:08:07 GMT, Marc Chevalier wrote: >> If the Mod[DF]Node has no control projection when it's being removed (because its result is unused), `extract_projections` will fail an assert. So, let's skip the removal. >> >> But that should happen only when the nodes are already unreachable (control input being transitively top). At the end of the day, the node should be dropped. because of that, so there is no rush, and let dead node deletion do the job. >> >> On the reduced reproducer, the crash is not common (even with `-XX:RepeatCompilation=300`, it might need more than a run to reproduce). So I've tried my fix on multiple thousands repeat compilations (by 300 packs) without a crash, and without having the modulo node alive at the end. >> >> For instance, that's what happen on the reproducer. Quickly, some big sub-graph is dead, but nodes stay a while in the graph: >> >> Then: >> >> And eventually, everything is removed, so the control projection is removed, and `extract_projections` doesn't like it. >> >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments, part 2 Thanks for the updates, looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24375#pullrequestreview-2735777321 From chagedorn at openjdk.org Wed Apr 2 10:24:58 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 2 Apr 2025 10:24:58 GMT Subject: RFR: 8353058: [PPC64] Some IR framework tests are failing after JDK-8352595 In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 06:45:58 GMT, Christian Hagedorn wrote: > `TestPhaseIRMatching` was recently updated with [JDK-8352595](https://bugs.openjdk.org/browse/JDK-8352595) which changed some matching on opto assembly from `IRNode.ALLOC` (now matching on ideal phases) to `IRNode.FIELD_ACCESS` (still matching on opto assembly). However, the updated code matches differently on PPC for some method invocation on a parameter which let the test fail on PPC: > > public Object defaultOnOptoAssembly(Helper h) { > return h.getString(); // emits one "Field: " string on most platforms but none on PPC > } > > > When I've revisited the test to analyze the failure, it was not evidently clear what I had in mind back there with `defaultOnX()`. My guess is that I've tried to have one method failing on ideal phases, one on mach phases and one on both while all the methods use `IRNode` entries that have default compile phases on ideal and mach phases. But that is not the case today. I've therefore rewritten the tests to adhere to my guess. I also removed the ambiguity among platforms to have the same number of field accesses on them. > > How to read the `@ExpectedFailure` annotation: > > @IR(failOn = {IRNode.STORE, IRNode.FIELD_ACCESS, IRNode.COUNTED_LOOP, IRNode.STORE_I}, > counts = {IRNode.STORE, "20", IRNode.FIELD_ACCESS, "1", IRNode.COUNTED_LOOP, "2", IRNode.OOPMAP_WITH, "asdf", "2"}) > // Expect rule with id 5 (the one directly above) to fail: > // - We fail when matching PRINT_IDEAL with the: > // - failOn attribute: The failing constraints are constraint 1 and 4 (while 2 and 3 pass) > // - counts attribute: The failing constraints are constraint 2 and 4 (while 1 and 3 pass). > @ExpectedFailure(ruleId = 5, phase = CompilePhase.PRINT_IDEAL, failOn = {1, 4}, counts = {1, 3}) > > > Thanks to @TheRealMDoerr for testing the patch on PPC! > > Thanks, > Christian Thanks Tobias and Martin for your reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24373#issuecomment-2772117482 From epeter at openjdk.org Wed Apr 2 10:29:06 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 2 Apr 2025 10:29:06 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v9] In-Reply-To: References: Message-ID: On Mon, 17 Mar 2025 21:01:23 GMT, Kangcheng Xu wrote: >> [JDK-8347555](https://bugs.openjdk.org/browse/JDK-8347555) is a redo of [JDK-8325495](https://bugs.openjdk.org/browse/JDK-8325495) was [first merged](https://git.openjdk.org/jdk/pull/20754) then backed out due to a regression. This patch redos the feature and fixes the bit shift overflow problem. For more information please refer to the previous PR. >> >> When constanlizing multiplications (possibly in forms on `lshifts`), the multiplier is upgraded to long and then later narrowed to int if needed. However, when a `lshift` operand is exactly `32`, overflowing an int, using long has an unexpected result. (i.e., `(1 << 32) = 1` and `(int) (1L << 32) = 0`) >> >> The following was implemented to address this issue. >> >> if (UseNewCode2) { >> *multiplier = bt == T_INT >> ? (jlong) (1 << con->get_int()) // loss of precision is expected for int as it overflows >> : ((jlong) 1) << con->get_int(); >> } else { >> *multiplier = ((jlong) 1 << con->get_int()); >> } >> >> >> Two new bitshift overflow tests were added. > > Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: > > fix ((x< 413: if (find_power_of_two_addition_pattern(this, bt).valid) { > 414: return nullptr; > 415: } Hmm. So somewhere we would have generated that pattern, probably in MulNode. Can you add a verification there, to check that we are only generating patterns that `find_power_of_two_addition_pattern` recognizes? That would make sure that we keep the code here and there in sync. src/hotspot/share/opto/addnode.cpp line 428: > 426: ((mul = find_simple_multiplication_pattern(in1, bt)).valid && mul.variable == in2) || > 427: ((mul = find_power_of_two_addition_pattern(in1, bt)).valid && mul.variable == in2) > 428: ) { I find this quite difficult to read. And it looks repetitive too. Maybe you can refactor it? Also, it would be nice to have comments with the patterns here, to see which one covers what case, so that we have a nice overview. src/hotspot/share/opto/addnode.cpp line 431: > 429: Node* con = (bt == T_INT) > 430: ? (Node*) phase->intcon((jint) (mul.multiplier + 1)) // intentional type narrowing to allow overflow at max_jint > 431: : (Node*) phase->longcon((mul.multiplier + 1)); I think just to be safe, you should use `java_add` to have correct overflow semantics. You are using `jlong` for `multiplier`, which is a signed integer type, and in C++ overflow is undefined behavior as far as I know, let's avoid that ;) Actually, do you have a test where the multiplier overflows here? src/hotspot/share/opto/addnode.cpp line 509: > 507: if (rhs.valid && rhs.variable == n->in(1)) { > 508: return Multiplication{true, rhs.variable, rhs.multiplier + 1}; > 509: } Hmm, it seems these are patterns that you did not promise you would cover in the description above. It makes it a little difficult to keep the overview... ------------- PR Review: https://git.openjdk.org/jdk/pull/23506#pullrequestreview-2735724858 PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2024508253 PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2024512437 PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2024530927 PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2024545411 From epeter at openjdk.org Wed Apr 2 10:29:08 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 2 Apr 2025 10:29:08 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v7] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 10:05:45 GMT, Emanuel Peter wrote: >> `AddNode::IdealIL` handles to more general associative patterns like `(a*b) + (a*c)` into `a*(b + c)` > > Ah interesting. It could be worth adding a comment for that here then! That was in fact a large part of my initial hesitation with this PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2024520170 From epeter at openjdk.org Wed Apr 2 10:29:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 2 Apr 2025 10:29:07 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v7] In-Reply-To: References: Message-ID: On Thu, 13 Mar 2025 17:16:14 GMT, Kangcheng Xu wrote: >> src/hotspot/share/opto/addnode.cpp line 407: >> >>> 405: } >>> 406: >>> 407: // Try to convert a serial of additions into a single multiplication. Also convert `(a * CON) + a` to `(CON + 1) * a` as >> >> What about `(a * CON1) + (a * CON2)`? Like `11 * a + 5 * a`. Do we also optimize that? > > `AddNode::IdealIL` handles to more general associative patterns like `(a*b) + (a*c)` into `a*(b + c)` Ah interesting. It could be worth adding a comment for that here then! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2024516603 From chagedorn at openjdk.org Wed Apr 2 10:33:50 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 2 Apr 2025 10:33:50 GMT Subject: RFR: 8352418: Add verification code to check that the associated loop nodes of useless Template Assertion Predicates are dead [v2] In-Reply-To: References: Message-ID: <63a9PwFdCvNjpNhf32MIoGRYl96-WggnmMck-wk13vs=.bf2b75f5-926b-48f7-8107-275019e9b0e0@github.com> On Wed, 2 Apr 2025 07:53:53 GMT, Christian Hagedorn wrote: >> As already suggested in https://github.com/openjdk/jdk/pull/23823, I want to do the following additional verification: >> >> After `eliminate_useless_predicates()` all now useless `OpaqueTemplateAssertionPredicate` nodes should not have any references to `CountedLoop` nodes that are still in the graph (otherwise, they would have been marked useful). This verification did not work reliably without the full Assertion Predicates fix [JDK-8350577](https://bugs.openjdk.org/browse/JDK-8350577). Since JDK-8350577 is now integrated, I propose to add this additional verification code. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Add ResourceMark Thanks Emanuel for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24326#issuecomment-2772139205 From epeter at openjdk.org Wed Apr 2 10:34:52 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 2 Apr 2025 10:34:52 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v7] In-Reply-To: References: Message-ID: <28nDVS1w4dstG6N-J-GAeIOVX6imXMijoxsIvXej5gU=.9451c7ef-e718-4418-8c32-37c242e906bc@github.com> On Tue, 25 Mar 2025 16:18:20 GMT, Kangcheng Xu wrote: >> This looks really interesting! >> >> I see that you are doing some special pattern matching. I wonder if it might be worth generalizing the algorithm, to search through an arbitrary "tree" of additions, collect all "leaves" of (`variable * multiplier`), sort by `variable`, and compute new additions for each `variable`. What do you think? > > @eme64 Could you please take a look at this if you have some time? Thanks! @tabjy One more comment: I have had bad experiences before with pattern matching that only covered a part of the cases, and where methods did sometimes do more than what they promised in their name or documentation. These things tend to get extended later, and the overview gets worse and worse until nobody has the overview and bugs creep in that are hard to discover in a review. Can you do some experimenting and see if you can come up with a cleaner design? Maybe write down at the beginning in `convert_serial_additions` what is the general form of the patterns you cover? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23506#issuecomment-2772140252 From thartmann at openjdk.org Wed Apr 2 10:46:05 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 2 Apr 2025 10:46:05 GMT Subject: RFR: 8348887: Create IR framework test for JDK-8347997 [v2] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 10:03:30 GMT, Marc Chevalier wrote: >> As the ticket says: >>> Create IR framework test which checks that allocations are eliminated in the regression test included in [JDK-8347997](https://bugs.openjdk.org/browse/JDK-8347997) fix. >> >> So here it is! We can see that in case of inlining, indeed, no allocation happens. The second part is some sanity check to emphasize the difference: of course, there is an allocation without inlining. The benefit of this second part is arguable. From my point of view, it's mostly to point out the difference to a future reader. But yes, there is nothing very surprising. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Remove catch, rename, remove static Thanks for making these changes, looks good to me. test/hotspot/jtreg/compiler/c2/irTests/TestContinuationPinningAndEA.java line 41: > 39: public class TestContinuationPinningAndEA { > 40: public static void main(String[] args) { > 41: TestFramework.runWithFlags("--add-modules", "java.base", "--add-exports", "java.base/jdk.internal.vm=ALL-UNNAMED"); Why is this only needed now? ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24328#pullrequestreview-2735827767 PR Review Comment: https://git.openjdk.org/jdk/pull/24328#discussion_r2024569397 From thartmann at openjdk.org Wed Apr 2 10:46:49 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 2 Apr 2025 10:46:49 GMT Subject: RFR: 8353341: C2: removal of a Mod[DF]Node crashes when the node is already dead [v3] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 08:08:07 GMT, Marc Chevalier wrote: >> If the Mod[DF]Node has no control projection when it's being removed (because its result is unused), `extract_projections` will fail an assert. So, let's skip the removal. >> >> But that should happen only when the nodes are already unreachable (control input being transitively top). At the end of the day, the node should be dropped. because of that, so there is no rush, and let dead node deletion do the job. >> >> On the reduced reproducer, the crash is not common (even with `-XX:RepeatCompilation=300`, it might need more than a run to reproduce). So I've tried my fix on multiple thousands repeat compilations (by 300 packs) without a crash, and without having the modulo node alive at the end. >> >> For instance, that's what happen on the reproducer. Quickly, some big sub-graph is dead, but nodes stay a while in the graph: >> >> Then: >> >> And eventually, everything is removed, so the control projection is removed, and `extract_projections` doesn't like it. >> >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments, part 2 Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24375#pullrequestreview-2735832842 From mchevalier at openjdk.org Wed Apr 2 11:28:50 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 2 Apr 2025 11:28:50 GMT Subject: RFR: 8348887: Create IR framework test for JDK-8347997 [v2] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 10:42:19 GMT, Tobias Hartmann wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove catch, rename, remove static > > test/hotspot/jtreg/compiler/c2/irTests/TestContinuationPinningAndEA.java line 41: > >> 39: public class TestContinuationPinningAndEA { >> 40: public static void main(String[] args) { >> 41: TestFramework.runWithFlags("--add-modules", "java.base", "--add-exports", "java.base/jdk.internal.vm=ALL-UNNAMED"); > > Why is this only needed now? I think it was already not fine. I had the all the `catch` because of the `throws` in the base example, defensively, by default (just assuming that these `throws Throwable` were there for a good reason). It hid the fact that the loading of `Continuation` failed (it's not exported). Nevertheless, I got enough for the IR check to work. I might be wrong on the reason, but my understanding, is that since it's intrinsiced, compilation can manage to produce IR and enough printing to make the check work. And I think it was really working, not just ignored: I got a test failure on the non-inlined cases before I add the `DontInline` annotation, if I remember well. But at runtime, `Continuation` class' access is checked (when loading it?), and then it throws to tell me I'm not allowed to access it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24328#discussion_r2024629833 From epeter at openjdk.org Wed Apr 2 11:36:59 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 2 Apr 2025 11:36:59 GMT Subject: RFR: 8349138: Optimize Math.copySign API for Intel e-core targets [v3] In-Reply-To: References: Message-ID: On Thu, 6 Feb 2025 17:49:54 GMT, Jatin Bhateja wrote: >> Math.copySign is only intrinsified on x86 targets supporting the AVX512 feature. >> Intel E-core Xeons support only the AVX2 feature set and still compile Java implementation which is composed of logical operations. >> >> Since there is a 3-cycle penalty for copying incoming float/double values to GPRs before being operated upon by logical operation there is an opportunity to optimize this using an efficient instruction sequence. >> >> Patch uses ANDPS and ANDPD logical instruction to generate efficient instruction sequences to absorb domain copy over penalty. Also, performs minor tuning for existing AVX512 instruction sequence based on VPTERNLOG instruction. >> >> Following are the performance numbers of the following existing microbenchmark >> https://github.com/openjdk/jdk/blob/master/test/micro/org/openjdk/bench/vm/compiler/Signum.java >> >> Patch passes following validation test >> [test/jdk/java/lang/Math/IeeeRecommendedTests.java >> ](https://github.com/openjdk/jdk/blob/master/test/jdk/java/lang/Math/IeeeRecommendedTests.java) >> >> >> Granite Rapids-AP (P-core Xeon) >> Baseline AVX512: >> Benchmark Mode Cnt Score Error Units >> Signum._5_copySignFloatTest thrpt 2 1296.141 ops/ns >> Signum._7_copySignDoubleTest thrpt 2 838.954 ops/ns >> >> Withopt : >> Benchmark Mode Cnt Score Error Units >> Signum._5_copySignFloatTest thrpt 2 940.240 ops/ns >> Signum._7_copySignDoubleTest thrpt 2 967.370 ops/ns >> >> Baseline AVX2: >> Benchmark Mode Cnt Score Error Units >> Signum._5_copySignFloatTest thrpt 2 63.673 ops/ns >> Signum._7_copySignDoubleTest thrpt 2 26.898 ops/ns >> >> Withopt : >> Benchmark Mode Cnt Score Error Units >> Signum._5_copySignFloatTest thrpt 2 785.801 ops/ns >> Signum._7_copySignDoubleTest thrpt 2 558.710 ops/ns >> >> Sierra Forest (E-core Xeon) >> Baseline: >> Benchmark (seed) Mode Cnt Score Error Units >> o.o.b.vm.compiler.Signum._5_copySignFloatTest N/A thrpt 2 40.528 ops/ns >> o.o.b.vm.compiler.Signum._7_copySignDoubleTest N/A thrpt 2 25.101 ops/ns >> >> Withopt: >> Benchmark (seed) Mode Cnt Score Error Units >> o.o.b.vm.compiler.Signum._5_copySignFloatTest N/A thrpt 2 676.... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Adding vector support along with some refactoring. Then non x64 specific code looks reasonable, though I have 2 comments ;) test/hotspot/jtreg/compiler/intrinsics/math/TestCopySignIntrinsic.java line 79: > 77: IntStream.range(0, SIZE - 8).forEach(i -> { dmagnitude[i] = rd.nextFloat(-Float.MAX_VALUE, Float.MAX_VALUE); }); > 78: IntStream.range(0, SIZE).forEach(i -> { fsign[i] = rd.nextFloat(-Float.MAX_VALUE, Float.MAX_VALUE); }); > 79: IntStream.range(0, SIZE).forEach(i -> { dsign[i] = rd.nextFloat(-Float.MAX_VALUE, Float.MAX_VALUE); }); Why not use Generators.java ? That would also give you NaN, infinity, etc ;) test/hotspot/jtreg/compiler/intrinsics/math/TestCopySignIntrinsic.java line 122: > 120: } > 121: } > 122: } Verify.checkEQ should do this for you.... though maybe you'd have to wait for https://github.com/openjdk/jdk/pull/24224 not to get into trouble with different NaN encodings. ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23386#pullrequestreview-2735939246 PR Review Comment: https://git.openjdk.org/jdk/pull/23386#discussion_r2024635995 PR Review Comment: https://git.openjdk.org/jdk/pull/23386#discussion_r2024638031 From chagedorn at openjdk.org Wed Apr 2 11:57:01 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 2 Apr 2025 11:57:01 GMT Subject: RFR: 8353345: C2 asserts because maskShiftAmount modifies node without deleting the hash In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 11:51:13 GMT, Marc Chevalier wrote: > First delete the hash, then `set_req`. This way, we avoid changing the node (a non-`this` node) without deleting the hash. This wrong ordering is not new from [JDK-8347459](https://bugs.openjdk.org/browse/JDK-8347459), but before that, only `this` was going through this function, so it was ok. But since, it is used with other nodes, hence the need to remove the hash. > > Also, not do any of that outside IGVN, but requires to register nested shifts for IGVN in parsing not to miss them later. > > Thanks, > Marc src/hotspot/share/opto/mulnode.cpp line 981: > 979: } > 980: return maskedShift; > 981: } IIUC, we are masking a shift with a too large shift amount by masking it but normally that's directly done when transforming the inner shift. However, we could also have cases where we already process an outer shift where the inner shift is not yet replaced with the masked shift amount. So, we do the inner shift transformation as part of the outer shift processing. It seems that for the double shift optimization, we only care about the actual masked value. I reckon that the inner shift is always on the worklist somewhere and will eventually be transformed later - there is no need to do it eagerly now (if it's not on the worklist, we should update the notification code for IGVN). This suggests that we could do the following instead: - Rename `maskShiftAmount` into `mask_and_replace_shift_amount()`. - Introduce `mask_shift_amount()` that only calculates the masked shift amount without updating it in the graph. - Update `mask_and_replace_shift_amount()`: First call `mask_shift_amount()` to get the masked amount. If it's different, do the graph surgery. Return the masked shift amount. - Use `mask_and_replace_shift_amount()` everywhere where we can safely do the update, i.e. where we call the `maskShiftAmount()` with `this`. - Use `mask_shift_amount()` where we used to call `maskShiftAmount()` with `non-this`, i.e. surgery is not implicitly safe. Then you also do not need the `record_fo_igvn()` code below. test/hotspot/jtreg/compiler/c2/gvn/DoubleLShiftCrashDuringIGVN.java line 35: > 33: > 34: public class DoubleLShiftCrashDuringIGVN { > 35: public static long shift=0; Suggestion: public static long shift = 0; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24355#discussion_r2024663634 PR Review Comment: https://git.openjdk.org/jdk/pull/24355#discussion_r2024578423 From epeter at openjdk.org Wed Apr 2 12:17:31 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 2 Apr 2025 12:17:31 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v7] In-Reply-To: References: Message-ID: > We should extend the functionality of Verify.checkEQ: > - Allow different NaN encodings to be seen as equal (by default). > - Compare VectorAPI vectors. > - Compare Exceptions, and their messages. > - Compare arbitrary Objects via Reflection. > > Note: this is a prerequisite for the Template Library [JDK-8352861](https://bugs.openjdk.org/browse/JDK-8352861) / https://github.com/openjdk/jdk/pull/23418. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: refactor with checkEQWithRawBits ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24224/files - new: https://git.openjdk.org/jdk/pull/24224/files/4ca42699..f2b3c371 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24224&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24224&range=05-06 Stats: 165 lines in 2 files changed: 10 ins; 53 del; 102 mod Patch: https://git.openjdk.org/jdk/pull/24224.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24224/head:pull/24224 PR: https://git.openjdk.org/jdk/pull/24224 From epeter at openjdk.org Wed Apr 2 12:17:31 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 2 Apr 2025 12:17:31 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v6] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 09:11:29 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: >> >> - upate copyright >> - Apply suggestions from code review >> >> Co-authored-by: Christian Hagedorn > > I'll have a closer look at the code later again :-) @chhagedorn Ok, I refactored it. I'm now always comparing arbitrary classes. And `checkEQWithRawBits` does the comparison with raw bits, no `Options` required any more. Added a comment about reflection making things slow. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24224#issuecomment-2772372587 From chagedorn at openjdk.org Wed Apr 2 12:18:56 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 2 Apr 2025 12:18:56 GMT Subject: RFR: 8348887: Create IR framework test for JDK-8347997 [v2] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 10:03:30 GMT, Marc Chevalier wrote: >> As the ticket says: >>> Create IR framework test which checks that allocations are eliminated in the regression test included in [JDK-8347997](https://bugs.openjdk.org/browse/JDK-8347997) fix. >> >> So here it is! We can see that in case of inlining, indeed, no allocation happens. The second part is some sanity check to emphasize the difference: of course, there is an allocation without inlining. The benefit of this second part is arguable. From my point of view, it's mostly to point out the difference to a future reader. But yes, there is nothing very surprising. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Remove catch, rename, remove static Looks good to me, too. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24328#pullrequestreview-2736051512 From chagedorn at openjdk.org Wed Apr 2 12:22:32 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 2 Apr 2025 12:22:32 GMT Subject: RFR: 8352418: Add verification code to check that the associated loop nodes of useless Template Assertion Predicates are dead [v2] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 07:53:53 GMT, Christian Hagedorn wrote: >> As already suggested in https://github.com/openjdk/jdk/pull/23823, I want to do the following additional verification: >> >> After `eliminate_useless_predicates()` all now useless `OpaqueTemplateAssertionPredicate` nodes should not have any references to `CountedLoop` nodes that are still in the graph (otherwise, they would have been marked useful). This verification did not work reliably without the full Assertion Predicates fix [JDK-8350577](https://bugs.openjdk.org/browse/JDK-8350577). Since JDK-8350577 is now integrated, I propose to add this additional verification code. >> >> Thanks, >> Christian > > Christian Hagedorn has updated the pull request incrementally with one additional commit since the last revision: > > Add ResourceMark Testing looked good! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24326#issuecomment-2772382684 From chagedorn at openjdk.org Wed Apr 2 12:22:33 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 2 Apr 2025 12:22:33 GMT Subject: Integrated: 8352418: Add verification code to check that the associated loop nodes of useless Template Assertion Predicates are dead In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 12:28:59 GMT, Christian Hagedorn wrote: > As already suggested in https://github.com/openjdk/jdk/pull/23823, I want to do the following additional verification: > > After `eliminate_useless_predicates()` all now useless `OpaqueTemplateAssertionPredicate` nodes should not have any references to `CountedLoop` nodes that are still in the graph (otherwise, they would have been marked useful). This verification did not work reliably without the full Assertion Predicates fix [JDK-8350577](https://bugs.openjdk.org/browse/JDK-8350577). Since JDK-8350577 is now integrated, I propose to add this additional verification code. > > Thanks, > Christian This pull request has now been integrated. Changeset: c9baa8a7 Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/c9baa8a7aea0be7221f0af834fe73f035436bd8d Stats: 43 lines in 2 files changed: 43 ins; 0 del; 0 mod 8352418: Add verification code to check that the associated loop nodes of useless Template Assertion Predicates are dead Reviewed-by: epeter, roland ------------- PR: https://git.openjdk.org/jdk/pull/24326 From mchevalier at openjdk.org Wed Apr 2 13:22:16 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 2 Apr 2025 13:22:16 GMT Subject: RFR: 8348887: Create IR framework test for JDK-8347997 [v2] In-Reply-To: References: Message-ID: <4NpAefp1YW42cMcYCYTC2tf2P729CAab54NaSTjBl3Q=.6d74392d-7637-465b-a429-c91c8d010958@github.com> On Wed, 2 Apr 2025 10:03:30 GMT, Marc Chevalier wrote: >> As the ticket says: >>> Create IR framework test which checks that allocations are eliminated in the regression test included in [JDK-8347997](https://bugs.openjdk.org/browse/JDK-8347997) fix. >> >> So here it is! We can see that in case of inlining, indeed, no allocation happens. The second part is some sanity check to emphasize the difference: of course, there is an allocation without inlining. The benefit of this second part is arguable. From my point of view, it's mostly to point out the difference to a future reader. But yes, there is nothing very surprising. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Remove catch, rename, remove static Thanks @chhagedorn, @TobiHartmann and @galderz for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24328#issuecomment-2772540842 From duke at openjdk.org Wed Apr 2 13:22:17 2025 From: duke at openjdk.org (duke) Date: Wed, 2 Apr 2025 13:22:17 GMT Subject: RFR: 8348887: Create IR framework test for JDK-8347997 [v2] In-Reply-To: References: Message-ID: <1z9DPapYokbDFikUWCbU6TYgL13IPHnxzUM0Qgxmz-A=.0ff1b927-a299-4dd1-96ec-64cdeeda6443@github.com> On Wed, 2 Apr 2025 10:03:30 GMT, Marc Chevalier wrote: >> As the ticket says: >>> Create IR framework test which checks that allocations are eliminated in the regression test included in [JDK-8347997](https://bugs.openjdk.org/browse/JDK-8347997) fix. >> >> So here it is! We can see that in case of inlining, indeed, no allocation happens. The second part is some sanity check to emphasize the difference: of course, there is an allocation without inlining. The benefit of this second part is arguable. From my point of view, it's mostly to point out the difference to a future reader. But yes, there is nothing very surprising. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Remove catch, rename, remove static @marc-chevalier Your change (at version efa712be6f504305ec562c83d2bf048100394fad) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24328#issuecomment-2772543590 From thartmann at openjdk.org Wed Apr 2 13:30:59 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 2 Apr 2025 13:30:59 GMT Subject: RFR: 8348887: Create IR framework test for JDK-8347997 [v2] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 11:25:53 GMT, Marc Chevalier wrote: >> test/hotspot/jtreg/compiler/c2/irTests/TestContinuationPinningAndEA.java line 41: >> >>> 39: public class TestContinuationPinningAndEA { >>> 40: public static void main(String[] args) { >>> 41: TestFramework.runWithFlags("--add-modules", "java.base", "--add-exports", "java.base/jdk.internal.vm=ALL-UNNAMED"); >> >> Why is this only needed now? > > I think it was already not fine. I had the all the `catch` because of the `throws` in the base example, defensively, by default (just assuming that these `throws Throwable` were there for a good reason). It hid the fact that the loading of `Continuation` failed (it's not exported). Nevertheless, I got enough for the IR check to work. I might be wrong on the reason, but my understanding, is that since it's intrinsiced, compilation can manage to produce IR and enough printing to make the check work. And I think it was really working, not just ignored: I got a test failure on the non-inlined cases before I add the `DontInline` annotation, if I remember well. But at runtime, `Continuation` class' access is checked (when loading it?), and then it throws to tell me I'm not allowed to access it. Okay, thanks for the clarification! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24328#discussion_r2024830323 From mchevalier at openjdk.org Wed Apr 2 13:31:01 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 2 Apr 2025 13:31:01 GMT Subject: Integrated: 8348887: Create IR framework test for JDK-8347997 In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 13:41:17 GMT, Marc Chevalier wrote: > As the ticket says: >> Create IR framework test which checks that allocations are eliminated in the regression test included in [JDK-8347997](https://bugs.openjdk.org/browse/JDK-8347997) fix. > > So here it is! We can see that in case of inlining, indeed, no allocation happens. The second part is some sanity check to emphasize the difference: of course, there is an allocation without inlining. The benefit of this second part is arguable. From my point of view, it's mostly to point out the difference to a future reader. But yes, there is nothing very surprising. > > Thanks, > Marc This pull request has now been integrated. Changeset: 8608b163 Author: Marc Chevalier Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/8608b16341ba2807c6a32f7539d10d7458c40b05 Stats: 124 lines in 1 file changed: 124 ins; 0 del; 0 mod 8348887: Create IR framework test for JDK-8347997 Reviewed-by: thartmann, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/24328 From jbhateja at openjdk.org Wed Apr 2 13:51:07 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 2 Apr 2025 13:51:07 GMT Subject: RFR: 8348638: Performance regression in Math.tanh [v2] In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 00:18:41 GMT, Mohamed Issa wrote: >> The changes described below are meant to resolve the performance regression introduced by the **x86_64 tanh** double precision floating point scalar intrinsic in #20657. >> >> 1. Check and handle high magnitude input values before those in other ranges. If found, **+/- 1** is returned almost immediately without having to go through too many computations or branches. >> 2. Reduce the lower bound of the input range that triggers a quick **+/- 1** return from **|x| >= 32** to **|x| >= 22**. This new endpoint is the exact value required for correctness that's used by the original OpenJDK implementation. >> >> The results of all tests posted below were captured with an [Intel? Xeon 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) using [OpenJDK v25-b15](https://github.com/openjdk/jdk/releases/tag/jdk-25%2B15) as the baseline version. >> >> For performance data collected with the regression micro-benchmark referenced in the bug report, see the table below. Each result is the mean of 3 individual runs. In the high value scenarios (100, 1000, 10000, 100000), the changes significantly improve execution times to the point where they are almost at parity with the baseline. Also, there is almost no impact to the low value (1, 2) scenarios. >> >> | Input range (+/-) | Baseline (ms) | No fix (ms) | With fix (ms) | No fix vs baseline (%) | Fix vs baseline (%) | >> | :------------------: | :-------------: | :-----------: | :-------------: | :----------------------: | :-------------------: | >> | 1 | 1846 | 1925 | 1972 | +4.28 | +6.83 | >> | 2 | 2099 | 1991 | 2016 | -5.15 | -3.95 | >> | 100 | 803 | 1007 | 742 | +25.40 | -7.60 | >> | 1000 | 497 | 635 | 514 | +27.77 | +3.42 | >> | 10000 | 474 | 572 | 477 | +20.68 | +0.63 | >> | 100000 | 473 | 567 | 474 | +19.87 | +0.21 | >> >> For perfo... > > Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: > > Change tanh intrinsic endpoint comparison value to match reference OpenJDK implementation Please add a micro benchmark for different value ranges ------------- PR Comment: https://git.openjdk.org/jdk/pull/23889#issuecomment-2772623273 From jkarthikeyan at openjdk.org Wed Apr 2 14:04:30 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 2 Apr 2025 14:04:30 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v7] In-Reply-To: References: Message-ID: <604ss1R67reWyL2d_GggUXb9m0xYiR-zefrBwan9Zjs=.46d46302-8c03-4228-b8bc-c428cb22c7e8@github.com> > Hi all, > This patch adds initial support for the autovectorizer to generate conversions between subword types. Currently, when superword sees two packs that have different basic types, it discards them and bails out of vectorization. This patch changes the behavior to ask the backend if a cast between the conflicting types is supported, and keeps the pack if it is. Later, when the `VTransform` graph is built, a synthetic cast is emitted when packs requiring casts are detected. Currently, only narrowing casts are supported as I wanted to re-use existing `VectorCastX2Y` logic for the initial version, but adding more conversions is simple and can be done with a subsequent RFE. I have attached a JMH benchmark and got these results on my Zen 3 machine: > > > Baseline Patch > Benchmark (SIZE) Mode Cnt Score Error Units Score Error Units Improvement > VectorSubword.intToByte 1024 avgt 12 200.049 ? 19.787 ns/op 56.228 ? 3.535 ns/op (3.56x) > VectorSubword.intToShort 1024 avgt 12 179.826 ? 1.539 ns/op 43.332 ? 1.166 ns/op (4.15x) > VectorSubword.shortToByte 1024 avgt 12 245.580 ? 6.150 ns/op 29.757 ? 1.055 ns/op (8.25x) > > > I've also added some IR tests and they pass on my linux x64 machine. Thoughts and reviews would be appreciated! Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: Implement patch with VectorCastNode::implemented ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23413/files - new: https://git.openjdk.org/jdk/pull/23413/files/b02408f7..482ddbc4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23413&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23413&range=05-06 Stats: 48 lines in 9 files changed: 2 ins; 41 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/23413.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23413/head:pull/23413 PR: https://git.openjdk.org/jdk/pull/23413 From jkarthikeyan at openjdk.org Wed Apr 2 14:07:20 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 2 Apr 2025 14:07:20 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v8] In-Reply-To: References: Message-ID: <_gqaAP3z7OIe4Bhfjz_UuojuAdSmst13fEEVvU9H_cg=.b6b86bf4-0124-4aa7-bf02-33ad2a98a0e1@github.com> > Hi all, > This patch adds initial support for the autovectorizer to generate conversions between subword types. Currently, when superword sees two packs that have different basic types, it discards them and bails out of vectorization. This patch changes the behavior to ask the backend if a cast between the conflicting types is supported, and keeps the pack if it is. Later, when the `VTransform` graph is built, a synthetic cast is emitted when packs requiring casts are detected. Currently, only narrowing casts are supported as I wanted to re-use existing `VectorCastX2Y` logic for the initial version, but adding more conversions is simple and can be done with a subsequent RFE. I have attached a JMH benchmark and got these results on my Zen 3 machine: > > > Baseline Patch > Benchmark (SIZE) Mode Cnt Score Error Units Score Error Units Improvement > VectorSubword.intToByte 1024 avgt 12 200.049 ? 19.787 ns/op 56.228 ? 3.535 ns/op (3.56x) > VectorSubword.intToShort 1024 avgt 12 179.826 ? 1.539 ns/op 43.332 ? 1.166 ns/op (4.15x) > VectorSubword.shortToByte 1024 avgt 12 245.580 ? 6.150 ns/op 29.757 ? 1.055 ns/op (8.25x) > > > I've also added some IR tests and they pass on my linux x64 machine. Thoughts and reviews would be appreciated! Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: - Merge - Implement patch with VectorCastNode::implemented - Merge branch 'master' into vectorize-subword - Address comments from review, refactor test - Add new conversions to benchmark - Fix some tests that now vectorize - Implement widening and address comments from review - Subword vectorization ------------- Changes: https://git.openjdk.org/jdk/pull/23413/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23413&range=07 Stats: 305 lines in 14 files changed: 261 ins; 7 del; 37 mod Patch: https://git.openjdk.org/jdk/pull/23413.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23413/head:pull/23413 PR: https://git.openjdk.org/jdk/pull/23413 From jkarthikeyan at openjdk.org Wed Apr 2 14:12:43 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 2 Apr 2025 14:12:43 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v9] In-Reply-To: References: Message-ID: <_2r1kKb42b0BDzIXjG9ZrpdK3yC7LqPq7G1K1mDsPHg=.dcdbef7e-3c6f-413d-bfcb-6949b9a45555@github.com> > Hi all, > This patch adds initial support for the autovectorizer to generate conversions between subword types. Currently, when superword sees two packs that have different basic types, it discards them and bails out of vectorization. This patch changes the behavior to ask the backend if a cast between the conflicting types is supported, and keeps the pack if it is. Later, when the `VTransform` graph is built, a synthetic cast is emitted when packs requiring casts are detected. Currently, only narrowing casts are supported as I wanted to re-use existing `VectorCastX2Y` logic for the initial version, but adding more conversions is simple and can be done with a subsequent RFE. I have attached a JMH benchmark and got these results on my Zen 3 machine: > > > Baseline Patch > Benchmark (SIZE) Mode Cnt Score Error Units Score Error Units Improvement > VectorSubword.intToByte 1024 avgt 12 200.049 ? 19.787 ns/op 56.228 ? 3.535 ns/op (3.56x) > VectorSubword.intToShort 1024 avgt 12 179.826 ? 1.539 ns/op 43.332 ? 1.166 ns/op (4.15x) > VectorSubword.shortToByte 1024 avgt 12 245.580 ? 6.150 ns/op 29.757 ? 1.055 ns/op (8.25x) > > > I've also added some IR tests and they pass on my linux x64 machine. Thoughts and reviews would be appreciated! Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: Fix copyright ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23413/files - new: https://git.openjdk.org/jdk/pull/23413/files/996eaed0..fc7be77c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23413&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23413&range=07-08 Stats: 5 lines in 5 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/23413.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23413/head:pull/23413 PR: https://git.openjdk.org/jdk/pull/23413 From mchevalier at openjdk.org Wed Apr 2 14:15:00 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 2 Apr 2025 14:15:00 GMT Subject: RFR: 8348853: Fold layout helper check for objects implementing non-array interfaces [v2] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 11:46:57 GMT, Roland Westrelin wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> not reinventing the wheel > > src/hotspot/share/opto/memnode.cpp line 2214: > >> 2212: if (tkls->offset() == in_bytes(Klass::layout_helper_offset()) && >> 2213: tkls->isa_instklassptr() && // not directly typed as an array >> 2214: !tkls->is_instklassptr()->might_be_an_array() // not the supertype of all T[] (java.lang.Object) or has an interface that is not Serializable or Cloneable > > Could we do the same by using `TypeKlassPtr::maybe_java_subtype_of(TypeAryKlassPtr::BOTTOM)` and define a `TypeAryKlassPtr::BOTTOM` to be a static field for the `array_interfaces`? > > AFAICT, `TypeKlassPtr::maybe_java_subtype_of()` already covers that case so it would avoid some logic duplication. Also in the test above, maybe you could simplify the test a little but by removing `tkls->isa_instklassptr()`? I think it should be TypeAryKlassPtr::BOTTOM->maybe_java_subtype_of(tkls) rather than tkls->maybe_java_subtype_of(TypeAryKlassPtr::BOTTOM) My reasoning: if `TypeAryKlassPtr::BOTTOM` is `java.lang.Object + Cloneable + Serializable` any array is a subtype of that. But so is any class implementing these interfaces. As well as as any `Object` implementing more interfaces. But for these two last cases, we know they cannot be array, which is what we want to know: are we sure it's not an array, or could it be an array? But if we check if `tkls` is a supertype of `java.lang.Object + Cloneable + Serializable`, then it has to be an `Object` (the most general class) and it implements a subset of `Cloneable` and `Serializable`. In this case, it can be an array. If `tkls` is not a super-type of `java.lang.Object + Cloneable + Serializable`, there are 2 cases: - either it is an array type directly (so, I think, in a way or another, we need to check for `is_instklassptr`), and so a fortiori it can be an array type. - it's an instance type and then cannot be an array since there is nothing between array types and `java.lang.Object + Cloneable + Serializable`. I.e. there is no type `T` that is not an array type, that is a super-type of at least one array type and that is not a super-type of `java.lang.Object + Cloneable + Serializable` (that is that is not `java.lang.Object` or that implements at least another interface). In other words, our question is \exists T: T is an array type /\ T <= tkls (where `A <= B` means `A is a subtype of B`) which is equivalent to tkls >= (java.lang.Object + Cloneable + Serializable) / (tkls <= (java.lang.Object + Cloneable + Serializable) /\ tkls is an array type) We can spare the call to `is_instklassptr` by using a virtual method instead or probably other mechanisms, that's an implementation detail. But I think we need to distinguish cases: both `int[]` and `MyClass + Cloneable + Serializable + MyInterface` are sub-types of `java.lang.Object + Cloneable + Serializable` but for one, we can conclude it's definitely an array, and the other, it's definitely not. Without distinguishing cases, the only sound approximation would be to that that everything can be an array (both sub and super types of `java.lang.Object + Cloneable + Serializable`). Does that makes sense? Did I get something wrong? is the `BOTTOM` not what you had in mind? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24245#discussion_r2024918440 From jkarthikeyan at openjdk.org Wed Apr 2 14:15:19 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 2 Apr 2025 14:15:19 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v10] In-Reply-To: References: Message-ID: <2T_qgLVG05hbfRLOkrEGthWnoxXpvUGf0T8haKyKiCE=.fa4c75c5-764c-4829-9fcd-bfe12fa4d994@github.com> > Hi all, > This patch adds initial support for the autovectorizer to generate conversions between subword types. Currently, when superword sees two packs that have different basic types, it discards them and bails out of vectorization. This patch changes the behavior to ask the backend if a cast between the conflicting types is supported, and keeps the pack if it is. Later, when the `VTransform` graph is built, a synthetic cast is emitted when packs requiring casts are detected. Currently, only narrowing casts are supported as I wanted to re-use existing `VectorCastX2Y` logic for the initial version, but adding more conversions is simple and can be done with a subsequent RFE. I have attached a JMH benchmark and got these results on my Zen 3 machine: > > > Baseline Patch > Benchmark (SIZE) Mode Cnt Score Error Units Score Error Units Improvement > VectorSubword.intToByte 1024 avgt 12 200.049 ? 19.787 ns/op 56.228 ? 3.535 ns/op (3.56x) > VectorSubword.intToShort 1024 avgt 12 179.826 ? 1.539 ns/op 43.332 ? 1.166 ns/op (4.15x) > VectorSubword.shortToByte 1024 avgt 12 245.580 ? 6.150 ns/op 29.757 ? 1.055 ns/op (8.25x) > > > I've also added some IR tests and they pass on my linux x64 machine. Thoughts and reviews would be appreciated! Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: Fix copyright after merge ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23413/files - new: https://git.openjdk.org/jdk/pull/23413/files/fc7be77c..36f598a6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23413&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23413&range=08-09 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/23413.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23413/head:pull/23413 PR: https://git.openjdk.org/jdk/pull/23413 From jkarthikeyan at openjdk.org Wed Apr 2 14:23:07 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 2 Apr 2025 14:23:07 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v6] In-Reply-To: References: Message-ID: On Fri, 21 Feb 2025 06:41:15 GMT, Emanuel Peter wrote: >> Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: >> >> - Merge branch 'master' into vectorize-subword >> - Address comments from review, refactor test >> - Add new conversions to benchmark >> - Fix some tests that now vectorize >> - Implement widening and address comments from review >> - Subword vectorization > > src/hotspot/share/opto/superwordVTransformBuilder.cpp line 194: > >> 192: >> 193: // If the use and def types are different, emit a cast node >> 194: if (use_bt != def_bt && !p0->is_Convert() && Matcher::is_vector_cast_supported(def_bt, use_bt)) { > > The usual way we check if a vector instruction is implemented is to use `VectorNode::implemented`. Ah, actually there is a `VectorCastNode::implemented`. Why are you not using that one? This is a good point! I've updated the patch to use `VectorCastNode::implemented` instead. I think I didn't see that function originally. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23413#discussion_r2024935268 From jkarthikeyan at openjdk.org Wed Apr 2 14:28:14 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 2 Apr 2025 14:28:14 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v3] In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 09:43:14 GMT, Emanuel Peter wrote: >> @eme64 I think it should be good for another look over! I've addressed your review comments in the last commit. >> >> About the potential for performance degradation, I think it would be unlikely since the code generated by the cast is quite small (as it only needs to truncate or sign-extend) and the patch increases the amount of possible code that can auto-vectorize. The one case that I can think of is that it might cause code that would be otherwise unprofitable to become vectorizable, but that would be because we don't have a cost model yet. > > @jaskarth Let me know if there is anything we can help you with here :) @eme64 Apologies for the delay! I've updated the patch to use `VectorCastNode::implemented` as suggested instead of manually implementing the logic, which simplifies the patch and provides implementations on other platforms, which I left out initially as I wasn't familiar with them. Let me know what you think! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23413#issuecomment-2772738167 From jkarthikeyan at openjdk.org Wed Apr 2 14:33:00 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 2 Apr 2025 14:33:00 GMT Subject: RFR: 8349563: Improve AbsNode::Value() for integer types In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 02:38:09 GMT, Dean Long wrote: >> Hi all, >> This is a small patch that improves the implementation of Value() for `AbsINode` and `AbsLNode` by returning the absolute value of the input range. Most of the logic is trivial except for the special case where `_lo == jint_min/jlong_min` which must return the entire type range when encountered, for which I've added a small proof in the comments. I've also added some unit tests and updated the file to limit IR check platforms with more granularity. >> >> Thoughts and reviews would be appreciated! > > src/hotspot/share/opto/subnode.cpp line 1938: > >> 1936: >> 1937: NativeType lo_abs = uabs(t->_lo); >> 1938: NativeType hi_abs = uabs(t->_hi); > > Converting unsigned to signed is C++ Undefined Behavior, is it not? This is a great point, I believe you're correct that it's UB. We currently do the same logic in the old code as well: https://github.com/openjdk/jdk/blob/a0677d94d8c83a75cee054700e098faa97edca3c/src/hotspot/share/opto/subnode.cpp#L1945-L1947 However, I'm unsure what the best way to solve this would be. Do you happen to have any ideas? Thanks a lot! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23685#discussion_r2024955772 From jkarthikeyan at openjdk.org Wed Apr 2 14:48:20 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 2 Apr 2025 14:48:20 GMT Subject: RFR: 8349563: Improve AbsNode::Value() for integer types [v2] In-Reply-To: References: Message-ID: > Hi all, > This is a small patch that improves the implementation of Value() for `AbsINode` and `AbsLNode` by returning the absolute value of the input range. Most of the logic is trivial except for the special case where `_lo == jint_min/jlong_min` which must return the entire type range when encountered, for which I've added a small proof in the comments. I've also added some unit tests and updated the file to limit IR check platforms with more granularity. > > Thoughts and reviews would be appreciated! Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: - Merge - Improve AbsNode::Value ------------- Changes: https://git.openjdk.org/jdk/pull/23685/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=23685&range=01 Stats: 145 lines in 2 files changed: 136 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/23685.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23685/head:pull/23685 PR: https://git.openjdk.org/jdk/pull/23685 From jkarthikeyan at openjdk.org Wed Apr 2 14:48:20 2025 From: jkarthikeyan at openjdk.org (Jasmine Karthikeyan) Date: Wed, 2 Apr 2025 14:48:20 GMT Subject: RFR: 8349563: Improve AbsNode::Value() for integer types [v2] In-Reply-To: References: Message-ID: On Sun, 23 Feb 2025 09:27:22 GMT, Tobias Hotz wrote: >> Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: >> >> - Merge >> - Improve AbsNode::Value > > test/hotspot/jtreg/compiler/c2/irTests/TestIRAbs.java line 295: > >> 293: public boolean testIntRange3(int in) { >> 294: // [-9, -2] => [2, 9] >> 295: return Math.abs(-((in & 7) + 2)) < 2; > > Not sure if this is in scope for this PR, but `abs(x)` should be idealized into `0 - x` if x <= 0. This seems to be missing at the moment. This is a good observation! I'll do this in a followup patch, to keep this one focused on just the Value() function. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23685#discussion_r2024979524 From mchevalier at openjdk.org Wed Apr 2 14:49:17 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 2 Apr 2025 14:49:17 GMT Subject: RFR: 8346989: C2: deoptimization and re-compilation cycle with Math.*Exact in case of frequent overflow [v4] In-Reply-To: References: <8ACplaVM_gN9cbIcQYGJmR4GNINm70PAJQ8uAgucK4Y=.14fdc7e2-e0af-4f0d-acb6-bcfe99ee8f36@github.com> Message-ID: <1RzVI3uVrE2YscRJPUC3KeGoF5pshACXrfZX9fooPAk=.cbcc9de2-c5d4-4f8a-82f3-444f7ee7ae0a@github.com> On Mon, 31 Mar 2025 08:33:42 GMT, Marc Chevalier wrote: >> `Math.*Exact` intrinsics can cause many deopt when used repeatedly with problematic arguments. >> This fix proposes not to rely on intrinsics after `too_many_traps()` has been reached. >> >> Benchmark show that this issue affects every Math.*Exact functions. And this fix improve them all. >> >> tl;dr: >> - C1: no problem, no change >> - C2: >> - with intrinsics: >> - with overflow: clear improvement. Was way worse than C1, now is similar (~4s => ~600ms) >> - without overflow: no problem, no change >> - without intrinsics: no problem, no change >> >> Before the fix: >> >> Benchmark (SIZE) Mode Cnt Score Error Units >> MathExact.C1_1.loopAddIInBounds 1000000 avgt 3 1.272 ? 0.048 ms/op >> MathExact.C1_1.loopAddIOverflow 1000000 avgt 3 641.917 ? 58.238 ms/op >> MathExact.C1_1.loopAddLInBounds 1000000 avgt 3 1.402 ? 0.842 ms/op >> MathExact.C1_1.loopAddLOverflow 1000000 avgt 3 671.013 ? 229.425 ms/op >> MathExact.C1_1.loopDecrementIInBounds 1000000 avgt 3 3.722 ? 22.244 ms/op >> MathExact.C1_1.loopDecrementIOverflow 1000000 avgt 3 653.341 ? 279.003 ms/op >> MathExact.C1_1.loopDecrementLInBounds 1000000 avgt 3 2.525 ? 0.810 ms/op >> MathExact.C1_1.loopDecrementLOverflow 1000000 avgt 3 656.750 ? 141.792 ms/op >> MathExact.C1_1.loopIncrementIInBounds 1000000 avgt 3 4.621 ? 12.822 ms/op >> MathExact.C1_1.loopIncrementIOverflow 1000000 avgt 3 651.608 ? 274.396 ms/op >> MathExact.C1_1.loopIncrementLInBounds 1000000 avgt 3 2.576 ? 3.316 ms/op >> MathExact.C1_1.loopIncrementLOverflow 1000000 avgt 3 662.216 ? 71.879 ms/op >> MathExact.C1_1.loopMultiplyIInBounds 1000000 avgt 3 1.402 ? 0.587 ms/op >> MathExact.C1_1.loopMultiplyIOverflow 1000000 avgt 3 615.836 ? 252.137 ms/op >> MathExact.C1_1.loopMultiplyLInBounds 1000000 avgt 3 2.906 ? 5.718 ms/op >> MathExact.C1_1.loopMultiplyLOverflow 1000000 avgt 3 655.576 ? 147.432 ms/op >> MathExact.C1_1.loopNegateIInBounds 1000000 avgt 3 2.023 ? 0.027 ms/op >> MathExact.C1_1.loopNegateIOverflow 1000000 avgt 3 639.136 ? 30.841 ms/op >> MathExact.C1_1.loop... > > Marc Chevalier has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - Merge branch 'master' into fix/Deoptimization-and-re-compilation-cycle-with-C2-compiled-code > - guess_exception_from_deopt_reason out of builtin_throw > - Use builtin_throw > - Merge branch 'master' into fix/Deoptimization-and-re-compilation-cycle-with-C2-compiled-code > - More exhaustive bench > - Limit inlining of math Exact operations in case of too many deopts I've applied the suggested refactoring. It looks fine to me, tests seems happy, microbench shows similar profile. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23916#issuecomment-2772794493 From mchevalier at openjdk.org Wed Apr 2 14:49:15 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 2 Apr 2025 14:49:15 GMT Subject: RFR: 8346989: C2: deoptimization and re-compilation cycle with Math.*Exact in case of frequent overflow [v5] In-Reply-To: <8ACplaVM_gN9cbIcQYGJmR4GNINm70PAJQ8uAgucK4Y=.14fdc7e2-e0af-4f0d-acb6-bcfe99ee8f36@github.com> References: <8ACplaVM_gN9cbIcQYGJmR4GNINm70PAJQ8uAgucK4Y=.14fdc7e2-e0af-4f0d-acb6-bcfe99ee8f36@github.com> Message-ID: > `Math.*Exact` intrinsics can cause many deopt when used repeatedly with problematic arguments. > This fix proposes not to rely on intrinsics after `too_many_traps()` has been reached. > > Benchmark show that this issue affects every Math.*Exact functions. And this fix improve them all. > > tl;dr: > - C1: no problem, no change > - C2: > - with intrinsics: > - with overflow: clear improvement. Was way worse than C1, now is similar (~4s => ~600ms) > - without overflow: no problem, no change > - without intrinsics: no problem, no change > > Before the fix: > > Benchmark (SIZE) Mode Cnt Score Error Units > MathExact.C1_1.loopAddIInBounds 1000000 avgt 3 1.272 ? 0.048 ms/op > MathExact.C1_1.loopAddIOverflow 1000000 avgt 3 641.917 ? 58.238 ms/op > MathExact.C1_1.loopAddLInBounds 1000000 avgt 3 1.402 ? 0.842 ms/op > MathExact.C1_1.loopAddLOverflow 1000000 avgt 3 671.013 ? 229.425 ms/op > MathExact.C1_1.loopDecrementIInBounds 1000000 avgt 3 3.722 ? 22.244 ms/op > MathExact.C1_1.loopDecrementIOverflow 1000000 avgt 3 653.341 ? 279.003 ms/op > MathExact.C1_1.loopDecrementLInBounds 1000000 avgt 3 2.525 ? 0.810 ms/op > MathExact.C1_1.loopDecrementLOverflow 1000000 avgt 3 656.750 ? 141.792 ms/op > MathExact.C1_1.loopIncrementIInBounds 1000000 avgt 3 4.621 ? 12.822 ms/op > MathExact.C1_1.loopIncrementIOverflow 1000000 avgt 3 651.608 ? 274.396 ms/op > MathExact.C1_1.loopIncrementLInBounds 1000000 avgt 3 2.576 ? 3.316 ms/op > MathExact.C1_1.loopIncrementLOverflow 1000000 avgt 3 662.216 ? 71.879 ms/op > MathExact.C1_1.loopMultiplyIInBounds 1000000 avgt 3 1.402 ? 0.587 ms/op > MathExact.C1_1.loopMultiplyIOverflow 1000000 avgt 3 615.836 ? 252.137 ms/op > MathExact.C1_1.loopMultiplyLInBounds 1000000 avgt 3 2.906 ? 5.718 ms/op > MathExact.C1_1.loopMultiplyLOverflow 1000000 avgt 3 655.576 ? 147.432 ms/op > MathExact.C1_1.loopNegateIInBounds 1000000 avgt 3 2.023 ? 0.027 ms/op > MathExact.C1_1.loopNegateIOverflow 1000000 avgt 3 639.136 ? 30.841 ms/op > MathExact.C1_1.loopNegateLInBounds 1000000 avgt 3 2.422 ? 3.59... Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: Apply @iwanowww's refactoring ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23916/files - new: https://git.openjdk.org/jdk/pull/23916/files/80a67a55..34b3b75c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23916&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23916&range=03-04 Stats: 152 lines in 4 files changed: 71 ins; 57 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/23916.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23916/head:pull/23916 PR: https://git.openjdk.org/jdk/pull/23916 From chagedorn at openjdk.org Wed Apr 2 15:02:13 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 2 Apr 2025 15:02:13 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v7] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 12:17:31 GMT, Emanuel Peter wrote: >> We should extend the functionality of Verify.checkEQ: >> - Allow different NaN encodings to be seen as equal (by default). >> - Compare VectorAPI vectors. >> - Compare Exceptions, and their messages. >> - Compare arbitrary Objects via Reflection. >> >> Note: this is a prerequisite for the Template Library [JDK-8352861](https://bugs.openjdk.org/browse/JDK-8352861) / https://github.com/openjdk/jdk/pull/23418. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > refactor with checkEQWithRawBits Thanks for the update! It's much easier to use and understand now I think. I did a complete pass and left a lot of comments but mostly minor things. Overall, I think this looks great! :-) test/hotspot/jtreg/compiler/lib/verify/Verify.java line 32: > 30: import java.lang.reflect.InvocationTargetException; > 31: import java.util.HashMap; > 32: import java.util.ArrayList; Seems unused and can be removed. Suggestion: test/hotspot/jtreg/compiler/lib/verify/Verify.java line 60: > 58: private final boolean isFloatCheckWithRawBits; > 59: private final HashMap a2b = new HashMap<>(); > 60: private final HashMap b2a = new HashMap<>(); Can you add a comment here what `a2b` and `b2a` means? See also some other comment further down about `a2b/b2a`, maybe you can share some docs or cross reference. test/hotspot/jtreg/compiler/lib/verify/Verify.java line 67: > 65: > 66: /** > 67: * Verify the content of two Objects, possibly recursively. Maybe add: Suggestion: * Verify the contents of two Objects on a raw bit level, possibly recursively. test/hotspot/jtreg/compiler/lib/verify/Verify.java line 68: > 66: /** > 67: * Verify the content of two Objects, possibly recursively. > 68: * Different NaN encodins are considered non-qual, since we compare Suggestion: * Different NaN encodings are considered non-equal, since we compare test/hotspot/jtreg/compiler/lib/verify/Verify.java line 81: > 79: > 80: /** > 81: * Verify the content of two Objects, possibly recursively. Suggestion: * Verify the contents of two Objects, possibly recursively. test/hotspot/jtreg/compiler/lib/verify/Verify.java line 109: > 107: Class ca = a.getClass(); > 108: Class cb = b.getClass(); > 109: if (ca != cb) { Only seen this in my IDE: `ca` and `cb` should be `Class` instead of the raw `Class` since `getClass()` returns a `Class` (cannot make a suggestion since it's hidden here). test/hotspot/jtreg/compiler/lib/verify/Verify.java line 124: > 122: switch (a) { > 123: case Object[] x -> checkEQimpl(x, (Object[])b, field, aParent, bParent); > 124: case Byte x -> checkEQimpl(x, ((Byte)b).byteValue(), field, aParent, bParent); Can't you just pass `(Byte) b` to rely on auto unboxing instead? test/hotspot/jtreg/compiler/lib/verify/Verify.java line 143: > 141: case Exception x -> checkEQimpl(x, (Exception) b, field, aParent, bParent); > 142: default -> { > 143: if (ca.getName().startsWith("jdk.incubator.vector") && ca.getName().contains("Vector")) { Might be worth to extract this case to own methods and structure it like this to reduce the size of the method: if (vectorClass()) { checkEQForVectorAPIClass(); } else { checkEQdispatch(); } test/hotspot/jtreg/compiler/lib/verify/Verify.java line 160: > 158: } catch (InvocationTargetException e) { > 159: throw new RuntimeException("Could not invoke toArray on " + ca.getName(), e); > 160: } You can merge them: Suggestion: } catch (NoSuchMethodException | IllegalAccessException | InvocationTargetException e) { throw new RuntimeException("Could not invoke toArray on " + ca.getName(), e); } test/hotspot/jtreg/compiler/lib/verify/Verify.java line 187: > 185: private void checkEQimpl(char a, char b, String field, Object aParent, Object bParent) { > 186: if (a != b) { > 187: System.err.println("ERROR: Verify.checkEQ failed: value mismatch: " + (int)a + " vs " + (int)b); Why do you need an upcast here? Same for `short`. test/hotspot/jtreg/compiler/lib/verify/Verify.java line 233: > 231: * of an add or mul (NaN1 * NaN2 does not have same bits as NaN2 * NaN1, because the multiplication > 232: * of two NaN values should always return the first of the two). > 233: * Hence, by default, we pick the non-raw coparison: we verify that we have the same bit Suggestion: * Hence, by default, we pick the non-raw comparison: we verify that we have the same bit test/hotspot/jtreg/compiler/lib/verify/Verify.java line 236: > 234: * pattern in all cases, except for NaN we project to the canonical NaN, using Float.floatToIntBits. > 235: */ > 236: private boolean isFloatEQ(float a, float b) { Shouldn't this be named `isFloatNotEQ` since you return true when they are different? Same for `isDoubleEQ` below. Alternatively: Return true when they are equal (i.e. flip condition). test/hotspot/jtreg/compiler/lib/verify/Verify.java line 242: > 240: > 241: /** > 242: * See comments for "isFloatEQ". We don't have Javadocs for the private methods but it could still help when navigating in the IDE to directly jump the the method when clicking on it (suggested same below for other places): Suggestion: * See comments for {@link #isFloatEQ}. test/hotspot/jtreg/compiler/lib/verify/Verify.java line 250: > 248: > 249: /** > 250: * Check that two floats are equal according to "isFloatEQ". Suggestion: * Check that two floats are equal according to {@link #isFloatEQ}. test/hotspot/jtreg/compiler/lib/verify/Verify.java line 254: > 252: private void checkEQimpl(float a, float b, String field, Object aParent, Object bParent) { > 253: if (isFloatEQ(a, b)) { > 254: System.err.println("ERROR: Verify.checkEQ failed: value mismatch. check raw: " + isFloatCheckWithRawBits); Just noticed this now (there are other places as well): Since we now have `Verify.checkEQ()` and `Verify.checkEQWithRawBits()`, it would improve the readability if we reported which method was used. It could be done with something like that (pseudo code): System.err.println("ERROR: Verify.checkEQ" + withRawBitsString() + " failed: value mismatch. String withRawBitsString() { return isFloatCheckWithRawBits ? "WithRawBits" : ""; } test/hotspot/jtreg/compiler/lib/verify/Verify.java line 256: > 254: System.err.println("ERROR: Verify.checkEQ failed: value mismatch. check raw: " + isFloatCheckWithRawBits); > 255: System.err.println(" Values: " + a + " vs " + b); > 256: System.err.println(" Raw: " + Float.floatToRawIntBits(a) + " vs " + Float.floatToRawIntBits(b)); Do we always want to dump the raw bits even when `isFloatCheckWithRawBits` is false? I guess it does not hurt. test/hotspot/jtreg/compiler/lib/verify/Verify.java line 263: > 261: > 262: /** > 263: * Check that two doubles are equal according to "isDoubleEQ". Suggestion: * Check that two doubles are equal according to {@link #isDoubleEQ}. test/hotspot/jtreg/compiler/lib/verify/Verify.java line 287: > 285: > 286: /** > 287: * Verify that the content of two MemorySegments is identical. Note: we do not check the Suggestion: * Verify that the contents of two MemorySegments are identical. Note: we do not check the test/hotspot/jtreg/compiler/lib/verify/Verify.java line 316: > 314: * Verify that the content of two MemorySegments is identical. Note: we do not check the > 315: * backing type, only the size and content. > 316: */ Probably a copy-paste error. Should be updated for exceptions. test/hotspot/jtreg/compiler/lib/verify/Verify.java line 333: > 331: > 332: /** > 333: * Verify that the content of two byte arrays is identical. Suggestion: * Verify that the contents of two byte arrays are identical. test/hotspot/jtreg/compiler/lib/verify/Verify.java line 340: > 338: > 339: /** > 340: * Verify that the content of two char arrays is identical. Suggestion: * Verify that the contents of two char arrays are identical. test/hotspot/jtreg/compiler/lib/verify/Verify.java line 347: > 345: > 346: /** > 347: * Verify that the content of two short arrays is identical. Suggestion: * Verify that the contents of two short arrays are identical. test/hotspot/jtreg/compiler/lib/verify/Verify.java line 354: > 352: > 353: /** > 354: * Verify that the content of two int arrays is identical. Suggestion: * Verify that the contents of two int arrays are identical. test/hotspot/jtreg/compiler/lib/verify/Verify.java line 361: > 359: > 360: /** > 361: * Verify that the content of two long arrays is identical. Suggestion: * Verify that the contents of two long arrays are identical. test/hotspot/jtreg/compiler/lib/verify/Verify.java line 368: > 366: > 367: /** > 368: * Check that two float arrays are equal according to "isFloatEQ". Suggestion: * Check that two float arrays are equal according to {@link #isFloatEQ}. test/hotspot/jtreg/compiler/lib/verify/Verify.java line 387: > 385: > 386: /** > 387: * Check that two double arrays are equal according to "isDoubleEQ". Suggestion: * Check that two double arrays are equal according to {@link #isDoubleEQ}. test/hotspot/jtreg/compiler/lib/verify/Verify.java line 406: > 404: > 405: /** > 406: * Verify that the content of two boolean arrays is identical. Suggestion: * Verify that the contents of two boolean arrays are identical. test/hotspot/jtreg/compiler/lib/verify/Verify.java line 425: > 423: > 424: /** > 425: * Verify that the content of two Object arrays is identical, recursively: Suggestion: * Verify that the contents of two Object arrays are identical, recursively: test/hotspot/jtreg/compiler/lib/verify/Verify.java line 443: > 441: > 442: private void checkEQArbitraryClasses(Object a, Object b) { > 443: Class c = a.getClass(); Suggestion: Class c = a.getClass(); test/hotspot/jtreg/compiler/lib/verify/Verify.java line 447: > 445: for (Field field : c.getDeclaredFields()) { > 446: Object va = null; > 447: Object vb = null; `null` can be omitted: Suggestion: Object va; Object vb; test/hotspot/jtreg/compiler/lib/verify/Verify.java line 463: > 461: private void print(Object a, Object b, String field, Object aParent, Object bParent) { > 462: System.err.println(" aParent: " + aParent); > 463: System.err.println(" bParent: " + bParent); Should we print `null` parents or just skip them? test/hotspot/jtreg/compiler/lib/verify/Verify.java line 481: > 479: case Long x -> { return false; } > 480: case Float x -> { return false; } > 481: case Double x -> { return false; } I think the convention is to us `_` when they are ignored. You can then also merge them: Suggestion: case Boolean _, Byte _, Short _, Character _, Integer _, Long _, Float _, Double _ -> { return false; } test/hotspot/jtreg/compiler/lib/verify/Verify.java line 488: > 486: Object aPrevious = b2a.get(b); > 487: if (aPrevious == null && bPrevious == null) { > 488: // Record for next time. Can you explain, maybe as comment at `checkAlreadyVisited()`, why we want to have these caches? test/hotspot/jtreg/compiler/lib/verify/Verify.java line 520: > 518: long start = Long.max(offset - range, 0); > 519: long end = Long.min(offset + range, a.byteSize()); > 520: for (long i = start; i < end; i++) { Nit below: You can replace `System.err.println("")` with `System.err.println()`. test/hotspot/jtreg/testlibrary_tests/verify/examples/TestWithVectorAPI.java line 2: > 1: /* > 2: * Copyright (c) 2024, Oracle and/or its affiliates. All rights reserved. Suggestion: * Copyright (c) 2025, Oracle and/or its affiliates. All rights reserved. ------------- PR Review: https://git.openjdk.org/jdk/pull/24224#pullrequestreview-2736390646 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024896216 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024897698 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024901854 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024898499 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024906090 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024917798 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024911572 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024921119 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024921953 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024928403 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024933737 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024970368 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024938566 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024939173 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024948324 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024953601 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024939637 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024954430 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024960438 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024961195 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024961541 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024961800 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024962110 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024962373 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024964374 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024964654 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024971548 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024984751 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024985775 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024988001 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024991410 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024995992 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2024999554 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2025002053 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2025002585 From hgreule at openjdk.org Wed Apr 2 16:24:23 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Wed, 2 Apr 2025 16:24:23 GMT Subject: RFR: 8353551: C2: Constant folding for ReverseBytes nodes Message-ID: This change implements constant folding for ReverseBytes nodes. Currently, `byteswap` is included transitively by `reverse_bits.hpp`. I'm not sure if this is fine or if I need to add an explicit include there. I appreciate any reviews and comments. ------------- Commit messages: - Implement constant folding for ReverseBytes*Nodes Changes: https://git.openjdk.org/jdk/pull/24382/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24382&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353551 Stats: 204 lines in 3 files changed: 204 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24382.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24382/head:pull/24382 PR: https://git.openjdk.org/jdk/pull/24382 From duke at openjdk.org Wed Apr 2 16:55:56 2025 From: duke at openjdk.org (Mohamed Issa) Date: Wed, 2 Apr 2025 16:55:56 GMT Subject: RFR: 8348638: Performance regression in Math.tanh [v2] In-Reply-To: References: Message-ID: <4KYVemCsJx4WaROYdA770DaFipRFOWUmlR-iGMkHkVk=.1d8ddb9b-ec6d-4fb5-828b-bc96c07ac756@github.com> On Wed, 2 Apr 2025 13:47:45 GMT, Jatin Bhateja wrote: > Please add a micro benchmark for different value ranges @jatin-bhateja Should I add different value ranges to the existing tanh micro-benchmark or create a brand new micro-benchmark? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23889#issuecomment-2773179057 From vlivanov at openjdk.org Wed Apr 2 17:13:58 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 2 Apr 2025 17:13:58 GMT Subject: RFR: 8346989: C2: deoptimization and re-compilation cycle with Math.*Exact in case of frequent overflow [v5] In-Reply-To: References: <8ACplaVM_gN9cbIcQYGJmR4GNINm70PAJQ8uAgucK4Y=.14fdc7e2-e0af-4f0d-acb6-bcfe99ee8f36@github.com> Message-ID: On Wed, 2 Apr 2025 14:49:15 GMT, Marc Chevalier wrote: >> `Math.*Exact` intrinsics can cause many deopt when used repeatedly with problematic arguments. >> This fix proposes not to rely on intrinsics after `too_many_traps()` has been reached. >> >> Benchmark show that this issue affects every Math.*Exact functions. And this fix improve them all. >> >> tl;dr: >> - C1: no problem, no change >> - C2: >> - with intrinsics: >> - with overflow: clear improvement. Was way worse than C1, now is similar (~4s => ~600ms) >> - without overflow: no problem, no change >> - without intrinsics: no problem, no change >> >> Before the fix: >> >> Benchmark (SIZE) Mode Cnt Score Error Units >> MathExact.C1_1.loopAddIInBounds 1000000 avgt 3 1.272 ? 0.048 ms/op >> MathExact.C1_1.loopAddIOverflow 1000000 avgt 3 641.917 ? 58.238 ms/op >> MathExact.C1_1.loopAddLInBounds 1000000 avgt 3 1.402 ? 0.842 ms/op >> MathExact.C1_1.loopAddLOverflow 1000000 avgt 3 671.013 ? 229.425 ms/op >> MathExact.C1_1.loopDecrementIInBounds 1000000 avgt 3 3.722 ? 22.244 ms/op >> MathExact.C1_1.loopDecrementIOverflow 1000000 avgt 3 653.341 ? 279.003 ms/op >> MathExact.C1_1.loopDecrementLInBounds 1000000 avgt 3 2.525 ? 0.810 ms/op >> MathExact.C1_1.loopDecrementLOverflow 1000000 avgt 3 656.750 ? 141.792 ms/op >> MathExact.C1_1.loopIncrementIInBounds 1000000 avgt 3 4.621 ? 12.822 ms/op >> MathExact.C1_1.loopIncrementIOverflow 1000000 avgt 3 651.608 ? 274.396 ms/op >> MathExact.C1_1.loopIncrementLInBounds 1000000 avgt 3 2.576 ? 3.316 ms/op >> MathExact.C1_1.loopIncrementLOverflow 1000000 avgt 3 662.216 ? 71.879 ms/op >> MathExact.C1_1.loopMultiplyIInBounds 1000000 avgt 3 1.402 ? 0.587 ms/op >> MathExact.C1_1.loopMultiplyIOverflow 1000000 avgt 3 615.836 ? 252.137 ms/op >> MathExact.C1_1.loopMultiplyLInBounds 1000000 avgt 3 2.906 ? 5.718 ms/op >> MathExact.C1_1.loopMultiplyLOverflow 1000000 avgt 3 655.576 ? 147.432 ms/op >> MathExact.C1_1.loopNegateIInBounds 1000000 avgt 3 2.023 ? 0.027 ms/op >> MathExact.C1_1.loopNegateIOverflow 1000000 avgt 3 639.136 ? 30.841 ms/op >> MathExact.C1_1.loop... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Apply @iwanowww's refactoring Looks good. src/hotspot/share/opto/library_call.cpp line 2009: > 2007: if (builtin_throw_too_many_traps(Deoptimization::Reason_intrinsic, > 2008: env()->ArithmeticException_instance())) { > 2009: // It has been already too many times, but we cannot use builtin_throw care (e.g. we care about backtraces), Remove "care" in "builtin_throw care"? ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23916#pullrequestreview-2737016248 PR Review Comment: https://git.openjdk.org/jdk/pull/23916#discussion_r2025260344 From mchevalier at openjdk.org Wed Apr 2 17:20:35 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 2 Apr 2025 17:20:35 GMT Subject: RFR: 8353345: C2 asserts because maskShiftAmount modifies node without deleting the hash [v2] In-Reply-To: References: Message-ID: > First delete the hash, then `set_req`. This way, we avoid changing the node (a non-`this` node) without deleting the hash. This wrong ordering is not new from [JDK-8347459](https://bugs.openjdk.org/browse/JDK-8347459), but before that, only `this` was going through this function, so it was ok. But since, it is used with other nodes, hence the need to remove the hash. > > Also, not do any of that outside IGVN, but requires to register nested shifts for IGVN in parsing not to miss them later. > > Thanks, > Marc Marc Chevalier has updated the pull request incrementally with two additional commits since the last revision: - Fix spacing - Do not eagerly replace shift amounts in nested lshift ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24355/files - new: https://git.openjdk.org/jdk/pull/24355/files/6dcc6c15..d84b3d6d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24355&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24355&range=00-01 Stats: 50 lines in 2 files changed: 21 ins; 1 del; 28 mod Patch: https://git.openjdk.org/jdk/pull/24355.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24355/head:pull/24355 PR: https://git.openjdk.org/jdk/pull/24355 From mchevalier at openjdk.org Wed Apr 2 17:20:35 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 2 Apr 2025 17:20:35 GMT Subject: RFR: 8353345: C2 asserts because maskShiftAmount modifies node without deleting the hash In-Reply-To: References: Message-ID: <_xMbLzflQQjvjOgEmTpvMb-e3YUAVBbWoztGp802zV8=.ac4dd9a2-c333-401d-88db-db8412753325@github.com> On Tue, 1 Apr 2025 11:51:13 GMT, Marc Chevalier wrote: > First delete the hash, then `set_req`. This way, we avoid changing the node (a non-`this` node) without deleting the hash. This wrong ordering is not new from [JDK-8347459](https://bugs.openjdk.org/browse/JDK-8347459), but before that, only `this` was going through this function, so it was ok. But since, it is used with other nodes, hence the need to remove the hash. > > Also, not do any of that outside IGVN, but requires to register nested shifts for IGVN in parsing not to miss them later. > > Thanks, > Marc That makes sense. I've done as described. Tests seem happy. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24355#issuecomment-2773229850 From mchevalier at openjdk.org Wed Apr 2 17:23:03 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 2 Apr 2025 17:23:03 GMT Subject: RFR: 8346989: C2: deoptimization and re-compilation cycle with Math.*Exact in case of frequent overflow [v6] In-Reply-To: <8ACplaVM_gN9cbIcQYGJmR4GNINm70PAJQ8uAgucK4Y=.14fdc7e2-e0af-4f0d-acb6-bcfe99ee8f36@github.com> References: <8ACplaVM_gN9cbIcQYGJmR4GNINm70PAJQ8uAgucK4Y=.14fdc7e2-e0af-4f0d-acb6-bcfe99ee8f36@github.com> Message-ID: > `Math.*Exact` intrinsics can cause many deopt when used repeatedly with problematic arguments. > This fix proposes not to rely on intrinsics after `too_many_traps()` has been reached. > > Benchmark show that this issue affects every Math.*Exact functions. And this fix improve them all. > > tl;dr: > - C1: no problem, no change > - C2: > - with intrinsics: > - with overflow: clear improvement. Was way worse than C1, now is similar (~4s => ~600ms) > - without overflow: no problem, no change > - without intrinsics: no problem, no change > > Before the fix: > > Benchmark (SIZE) Mode Cnt Score Error Units > MathExact.C1_1.loopAddIInBounds 1000000 avgt 3 1.272 ? 0.048 ms/op > MathExact.C1_1.loopAddIOverflow 1000000 avgt 3 641.917 ? 58.238 ms/op > MathExact.C1_1.loopAddLInBounds 1000000 avgt 3 1.402 ? 0.842 ms/op > MathExact.C1_1.loopAddLOverflow 1000000 avgt 3 671.013 ? 229.425 ms/op > MathExact.C1_1.loopDecrementIInBounds 1000000 avgt 3 3.722 ? 22.244 ms/op > MathExact.C1_1.loopDecrementIOverflow 1000000 avgt 3 653.341 ? 279.003 ms/op > MathExact.C1_1.loopDecrementLInBounds 1000000 avgt 3 2.525 ? 0.810 ms/op > MathExact.C1_1.loopDecrementLOverflow 1000000 avgt 3 656.750 ? 141.792 ms/op > MathExact.C1_1.loopIncrementIInBounds 1000000 avgt 3 4.621 ? 12.822 ms/op > MathExact.C1_1.loopIncrementIOverflow 1000000 avgt 3 651.608 ? 274.396 ms/op > MathExact.C1_1.loopIncrementLInBounds 1000000 avgt 3 2.576 ? 3.316 ms/op > MathExact.C1_1.loopIncrementLOverflow 1000000 avgt 3 662.216 ? 71.879 ms/op > MathExact.C1_1.loopMultiplyIInBounds 1000000 avgt 3 1.402 ? 0.587 ms/op > MathExact.C1_1.loopMultiplyIOverflow 1000000 avgt 3 615.836 ? 252.137 ms/op > MathExact.C1_1.loopMultiplyLInBounds 1000000 avgt 3 2.906 ? 5.718 ms/op > MathExact.C1_1.loopMultiplyLOverflow 1000000 avgt 3 655.576 ? 147.432 ms/op > MathExact.C1_1.loopNegateIInBounds 1000000 avgt 3 2.023 ? 0.027 ms/op > MathExact.C1_1.loopNegateIOverflow 1000000 avgt 3 639.136 ? 30.841 ms/op > MathExact.C1_1.loopNegateLInBounds 1000000 avgt 3 2.422 ? 3.59... Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: fix typo in comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23916/files - new: https://git.openjdk.org/jdk/pull/23916/files/34b3b75c..238b129d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23916&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23916&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23916.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23916/head:pull/23916 PR: https://git.openjdk.org/jdk/pull/23916 From mchevalier at openjdk.org Wed Apr 2 17:23:03 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 2 Apr 2025 17:23:03 GMT Subject: RFR: 8346989: C2: deoptimization and re-compilation cycle with Math.*Exact in case of frequent overflow [v5] In-Reply-To: References: <8ACplaVM_gN9cbIcQYGJmR4GNINm70PAJQ8uAgucK4Y=.14fdc7e2-e0af-4f0d-acb6-bcfe99ee8f36@github.com> Message-ID: On Wed, 2 Apr 2025 17:11:00 GMT, Vladimir Ivanov wrote: >> Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: >> >> Apply @iwanowww's refactoring > > src/hotspot/share/opto/library_call.cpp line 2009: > >> 2007: if (builtin_throw_too_many_traps(Deoptimization::Reason_intrinsic, >> 2008: env()->ArithmeticException_instance())) { >> 2009: // It has been already too many times, but we cannot use builtin_throw care (e.g. we care about backtraces), > > Remove "care" in "builtin_throw care"? Thanks! Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23916#discussion_r2025271377 From jbhateja at openjdk.org Wed Apr 2 17:35:55 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 2 Apr 2025 17:35:55 GMT Subject: RFR: 8348638: Performance regression in Math.tanh [v2] In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 00:18:41 GMT, Mohamed Issa wrote: >> The changes described below are meant to resolve the performance regression introduced by the **x86_64 tanh** double precision floating point scalar intrinsic in #20657. >> >> 1. Check and handle high magnitude input values before those in other ranges. If found, **+/- 1** is returned almost immediately without having to go through too many computations or branches. >> 2. Reduce the lower bound of the input range that triggers a quick **+/- 1** return from **|x| >= 32** to **|x| >= 22**. This new endpoint is the exact value required for correctness that's used by the original OpenJDK implementation. >> >> The results of all tests posted below were captured with an [Intel? Xeon 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) using [OpenJDK v25-b15](https://github.com/openjdk/jdk/releases/tag/jdk-25%2B15) as the baseline version. >> >> For performance data collected with the regression micro-benchmark referenced in the bug report, see the table below. Each result is the mean of 3 individual runs. In the high value scenarios (100, 1000, 10000, 100000), the changes significantly improve execution times to the point where they are almost at parity with the baseline. Also, there is almost no impact to the low value (1, 2) scenarios. >> >> | Input range (+/-) | Baseline (ms) | No fix (ms) | With fix (ms) | No fix vs baseline (%) | Fix vs baseline (%) | >> | :------------------: | :-------------: | :-----------: | :-------------: | :----------------------: | :-------------------: | >> | 1 | 1846 | 1925 | 1972 | +4.28 | +6.83 | >> | 2 | 2099 | 1991 | 2016 | -5.15 | -3.95 | >> | 100 | 803 | 1007 | 742 | +25.40 | -7.60 | >> | 1000 | 497 | 635 | 514 | +27.77 | +3.42 | >> | 10000 | 474 | 572 | 477 | +20.68 | +0.63 | >> | 100000 | 473 | 567 | 474 | +19.87 | +0.21 | >> >> For perfo... > > Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: > > Change tanh intrinsic endpoint comparison value to match reference OpenJDK implementation src/hotspot/cpu/x86/stubGenerator_x86_64_tanh.cpp line 331: > 329: __ andl(rdx, rcx); > 330: __ andl(rcx, 32767); > 331: __ cmpl(rcx, 16438); Did you try using "UCOMISD" to directly compare with constant 22.0 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23889#discussion_r2024867196 From vlivanov at openjdk.org Wed Apr 2 18:10:50 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 2 Apr 2025 18:10:50 GMT Subject: RFR: 8346989: C2: deoptimization and re-compilation cycle with Math.*Exact in case of frequent overflow [v6] In-Reply-To: References: <8ACplaVM_gN9cbIcQYGJmR4GNINm70PAJQ8uAgucK4Y=.14fdc7e2-e0af-4f0d-acb6-bcfe99ee8f36@github.com> Message-ID: On Wed, 2 Apr 2025 17:23:03 GMT, Marc Chevalier wrote: >> `Math.*Exact` intrinsics can cause many deopt when used repeatedly with problematic arguments. >> This fix proposes not to rely on intrinsics after `too_many_traps()` has been reached. >> >> Benchmark show that this issue affects every Math.*Exact functions. And this fix improve them all. >> >> tl;dr: >> - C1: no problem, no change >> - C2: >> - with intrinsics: >> - with overflow: clear improvement. Was way worse than C1, now is similar (~4s => ~600ms) >> - without overflow: no problem, no change >> - without intrinsics: no problem, no change >> >> Before the fix: >> >> Benchmark (SIZE) Mode Cnt Score Error Units >> MathExact.C1_1.loopAddIInBounds 1000000 avgt 3 1.272 ? 0.048 ms/op >> MathExact.C1_1.loopAddIOverflow 1000000 avgt 3 641.917 ? 58.238 ms/op >> MathExact.C1_1.loopAddLInBounds 1000000 avgt 3 1.402 ? 0.842 ms/op >> MathExact.C1_1.loopAddLOverflow 1000000 avgt 3 671.013 ? 229.425 ms/op >> MathExact.C1_1.loopDecrementIInBounds 1000000 avgt 3 3.722 ? 22.244 ms/op >> MathExact.C1_1.loopDecrementIOverflow 1000000 avgt 3 653.341 ? 279.003 ms/op >> MathExact.C1_1.loopDecrementLInBounds 1000000 avgt 3 2.525 ? 0.810 ms/op >> MathExact.C1_1.loopDecrementLOverflow 1000000 avgt 3 656.750 ? 141.792 ms/op >> MathExact.C1_1.loopIncrementIInBounds 1000000 avgt 3 4.621 ? 12.822 ms/op >> MathExact.C1_1.loopIncrementIOverflow 1000000 avgt 3 651.608 ? 274.396 ms/op >> MathExact.C1_1.loopIncrementLInBounds 1000000 avgt 3 2.576 ? 3.316 ms/op >> MathExact.C1_1.loopIncrementLOverflow 1000000 avgt 3 662.216 ? 71.879 ms/op >> MathExact.C1_1.loopMultiplyIInBounds 1000000 avgt 3 1.402 ? 0.587 ms/op >> MathExact.C1_1.loopMultiplyIOverflow 1000000 avgt 3 615.836 ? 252.137 ms/op >> MathExact.C1_1.loopMultiplyLInBounds 1000000 avgt 3 2.906 ? 5.718 ms/op >> MathExact.C1_1.loopMultiplyLOverflow 1000000 avgt 3 655.576 ? 147.432 ms/op >> MathExact.C1_1.loopNegateIInBounds 1000000 avgt 3 2.023 ? 0.027 ms/op >> MathExact.C1_1.loopNegateIOverflow 1000000 avgt 3 639.136 ? 30.841 ms/op >> MathExact.C1_1.loop... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > fix typo in comment Marked as reviewed by vlivanov (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/23916#pullrequestreview-2737157109 From jbhateja at openjdk.org Wed Apr 2 18:31:50 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 2 Apr 2025 18:31:50 GMT Subject: RFR: 8348638: Performance regression in Math.tanh [v2] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 13:46:47 GMT, Jatin Bhateja wrote: >> Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision: >> >> Change tanh intrinsic endpoint comparison value to match reference OpenJDK implementation > > src/hotspot/cpu/x86/stubGenerator_x86_64_tanh.cpp line 331: > >> 329: __ andl(rdx, rcx); >> 330: __ andl(rcx, 32767); >> 331: __ cmpl(rcx, 16438); > > Did you try using "UCOMISD" to directly compare with constant 22.0 [perf_tanh_delimit.txt](https://github.com/user-attachments/files/19573617/perf_tanh_delimit.txt) Proposed sequence in micro2 shows better path length, please give this a try. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23889#discussion_r2025378258 From dlong at openjdk.org Wed Apr 2 19:57:03 2025 From: dlong at openjdk.org (Dean Long) Date: Wed, 2 Apr 2025 19:57:03 GMT Subject: RFR: 8353041: NeverBranchNode causes incorrect block frequency calculation Message-ID: This fixes a quality of implementation issue for infinite loops using a NeverBranch node. We need Block::succ_prob() to return 1.0f for the 100% successful back-edge so that block frequencies are computed correctly. I also fixed Block_Stack::most_frequent_successor() to choose the correct successor. I verified that this corrects the huge frequency ratio that was detected and clamped by JDK-8346888. Currently this bug is labeled noreg-hard with no new regression test, as it's not obvious how to write such as test. ------------- Commit messages: - choose correct NeverBranch successor Changes: https://git.openjdk.org/jdk/pull/24390/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24390&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353041 Stats: 19 lines in 2 files changed: 17 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24390.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24390/head:pull/24390 PR: https://git.openjdk.org/jdk/pull/24390 From duke at openjdk.org Wed Apr 2 23:13:49 2025 From: duke at openjdk.org (Mohamed Issa) Date: Wed, 2 Apr 2025 23:13:49 GMT Subject: RFR: 8348638: Performance regression in Math.tanh [v2] In-Reply-To: References: Message-ID: <7CcUcVFlw6z6thTGD2vcAhx9n4yySRRWD_4IhCqbByg=.9f705aca-c7bb-4f00-a52d-ff897d931596@github.com> On Wed, 2 Apr 2025 18:29:34 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64_tanh.cpp line 331: >> >>> 329: __ andl(rdx, rcx); >>> 330: __ andl(rcx, 32767); >>> 331: __ cmpl(rcx, 16438); >> >> Did you try using "UCOMISD" to directly compare with constant 22.0 > > [perf_tanh_delimit.txt](https://github.com/user-attachments/files/19573617/perf_tanh_delimit.txt) > > Proposed sequence in micro2 shows better path length, please give this a try. So, I didn't try using "UCOMISD" to directly compare because I thought it wouldn't provide a benefit over the existing approach. Thanks for providing these micros though as I think they prove my suspicions. To explain, I'll start with results from the code you provided on the [Intel? Xeon 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) machine. **./perf_tanh_delimit 21.0** -> _micro1=184 ms, micro2=186 ms_ **./perf_tanh_delimit 22.0** -> _micro1=188 ms, micro2=186 ms_ **./perf_tanh_delimit 23.0** -> _micro1=187 ms, micro2=183 ms_ Please note that results with inputs strictly less than 22.0 aren't really meaningful because they would go into the heavy compute path of the actual implementations. Still, I included one for reference. Here we can see "UCOMISD" shows some improvement over the existing approach. Of course, this uplift will vary on different platforms. Unfortunately, the sequences provided only cover the positive inputs. To get a better picture, we need one that covers both positive and negative inputs. I created one with corresponding results linked below. [perf_tanh_delimit2.txt](https://github.com/user-attachments/files/19576620/perf_tanh_delimit2.txt) **./perf_tanh_delimit2 -23.0** -> _micro1=179 ms, micro2=184 ms_ **./perf_tanh_delimit2 -22.0** -> _micro1=176 ms, micro2=178 ms_ **./perf_tanh_delimit2 -21.0** -> _micro1=185 ms, micro2=181 ms_ **./perf_tanh_delimit2 21.0** -> _micro1=190 ms, micro2=179 ms_ **./perf_tanh_delimit2 22.0** -> _micro1=189 ms, micro2=185 ms_ **./perf_tanh_delimit2 23.0** -> _micro1=187 ms, micro2=185 ms_ Again, the _|x| < 22.0_ inputs aren't relevant because they don't trigger any significant computations. With that in mind, the positive inputs show improvements while the negative inputs don't. The situation would be reversed if I checked for negative input values first. We need two uses of "UCOMISD" to cover positive and negative inputs. Whereas "PEXTRW" is only required once to cover the sign and magnitude. Also, other parts of the intrinsic implementation rely on it, so I don't think we should make those blocks worse by using "UCOMISD" without getting a clear boost from it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23889#discussion_r2025719541 From jbhateja at openjdk.org Thu Apr 3 01:28:57 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 3 Apr 2025 01:28:57 GMT Subject: RFR: 8348638: Performance regression in Math.tanh [v2] In-Reply-To: <7CcUcVFlw6z6thTGD2vcAhx9n4yySRRWD_4IhCqbByg=.9f705aca-c7bb-4f00-a52d-ff897d931596@github.com> References: <7CcUcVFlw6z6thTGD2vcAhx9n4yySRRWD_4IhCqbByg=.9f705aca-c7bb-4f00-a52d-ff897d931596@github.com> Message-ID: On Wed, 2 Apr 2025 23:11:18 GMT, Mohamed Issa wrote: >> [perf_tanh_delimit.txt](https://github.com/user-attachments/files/19573617/perf_tanh_delimit.txt) >> >> Proposed sequence in micro2 shows better path length, please give this a try. > > So, I didn't try using "UCOMISD" to directly compare because I thought it wouldn't provide a benefit over the existing approach. Thanks for providing these micros though as I think they prove my suspicions. To explain, I'll start with results from the code you provided on the [Intel? Xeon 6761P](https://www.intel.com/content/www/us/en/products/sku/241842/intel-xeon-6761p-processor-336m-cache-2-50-ghz/specifications.html) machine. > > **./perf_tanh_delimit 21.0** -> _micro1=184 ms, micro2=186 ms_ > **./perf_tanh_delimit 22.0** -> _micro1=188 ms, micro2=186 ms_ > **./perf_tanh_delimit 23.0** -> _micro1=187 ms, micro2=183 ms_ > > Please note that results with inputs strictly less than 22.0 aren't really meaningful because they would go into the heavy compute path of the actual implementations. Still, I included one for reference. Here we can see "UCOMISD" shows some improvement over the existing approach. Of course, this uplift will vary on different platforms. Unfortunately, the sequences provided only cover the positive inputs. To get a better picture, we need one that covers both positive and negative inputs. I created one with corresponding results linked below. > > [perf_tanh_delimit2.txt](https://github.com/user-attachments/files/19576620/perf_tanh_delimit2.txt) > > **./perf_tanh_delimit2 -23.0** -> _micro1=179 ms, micro2=184 ms_ > **./perf_tanh_delimit2 -22.0** -> _micro1=176 ms, micro2=178 ms_ > **./perf_tanh_delimit2 -21.0** -> _micro1=185 ms, micro2=181 ms_ > **./perf_tanh_delimit2 21.0** -> _micro1=190 ms, micro2=179 ms_ > **./perf_tanh_delimit2 22.0** -> _micro1=189 ms, micro2=185 ms_ > **./perf_tanh_delimit2 23.0** -> _micro1=187 ms, micro2=185 ms_ > > Again, the _|x| < 22.0_ inputs aren't relevant because they don't trigger any significant computations. With that in mind, the positive inputs show improvements while the negative inputs don't. The situation would be reversed if I checked for negative input values first. We need two uses of "UCOMISD" to cover positive and negative inputs. Whereas "PEXTRW" is only required once to cover the sign and magnitude. Also, other parts of the intrinsic implementation rely on it, so I don't think we should make those blocks worse by using "UCOMISD" without getting a clear boost from it. Thanks for the explanation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23889#discussion_r2025855450 From jbhateja at openjdk.org Thu Apr 3 01:41:48 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 3 Apr 2025 01:41:48 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v2] In-Reply-To: References: Message-ID: > - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. > - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. > - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. > - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. > > This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. > > The patch has been regressed through tier1 and jvmci tests > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8352675 - Windows build fix - 8352675: Support Intel AVX10 converged vector ISA feature detection ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24329/files - new: https://git.openjdk.org/jdk/pull/24329/files/4c0123e7..ff03a06e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=00-01 Stats: 15628 lines in 499 files changed: 9285 ins; 5111 del; 1232 mod Patch: https://git.openjdk.org/jdk/pull/24329.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24329/head:pull/24329 PR: https://git.openjdk.org/jdk/pull/24329 From duke at openjdk.org Thu Apr 3 02:32:55 2025 From: duke at openjdk.org (kuaiwei) Date: Thu, 3 Apr 2025 02:32:55 GMT Subject: RFR: 8345485: C2 MergeLoads: merge adjacent array/native memory loads into larger load [v4] In-Reply-To: <_IhK2U23lIUOtBKOt-WMxQ3L7b2t26RzclJRdqbIgms=.3ef9a630-f99c-4de7-994a-bcabf912230b@github.com> References: <96Ny_BPjRCbNlD14DNDUOuQ0IX-F8hx21gxQKVfim9M=.d502019a-27ed-4a35-81ef-bc2aec5e7557@github.com> <_IhK2U23lIUOtBKOt-WMxQ3L7b2t26RzclJRdqbIgms=.3ef9a630-f99c-4de7-994a-bcabf912230b@github.com> Message-ID: On Mon, 24 Mar 2025 11:41:46 GMT, Emanuel Peter wrote: >>> @kuaiwei I have not yet had the time to read through the PR, but I would like to talk about `LoadNode::Ideal`. The idea with `Ideal` in general, is that you replace one node with another. After `Ideal` returns, all usages of the old node now take the new node instead. >>> >>> You copied the structure from my MergeStores implementation in `StoreNode::Idea`. There it made sense to replace `StoreB` nodes that have a memory output with `LoadI` nodes, which also have memory output. >>> >>> But it does not make sense to replace a `LoadB` that has a byte/int output with a `LoadL` that has a long output for example. >>> >>> I think your implementation should go into `OrINode`, and match the expression up from there. Because we want to replace the old `OrI` with the new `LoadL`. >>> >>> Another question: Do you have some tests where some of the nodes in the `load/shift/or` expression have other uses? Imagine this: >>> >>> ``` >>> l0 = a[0]; >>> l1 = a[1]; >>> l2 = a[2]; >>> l3 = a[3]; >>> l = ; >>> now use l1 for something else as well >>> ``` >>> >>> What happens now? Do you check that we only use the old `LoadB` in the expression we are replacing? >> >> Hi @eme64 , I understand your concern. In this patch , I check the usage of all `loadB` nodes and only allow they have only single usage into `OrNode`, I also check the `OrNode` as well. So I think it will not cause the trouble. >> >> >> l0 = a[0]; >> l1 = a[1]; >> l2 = a[2]; >> l3 = a[3]; >> l = ; >> now use l1 for something else as well >> >> For this case, because l1 has other usage, all these loads will not be merged. >> >> In my previous patch, I tried to extract value from merged `LoadNode` if origin `loadB` has other usage, such as used by uncommon trap. You can find them in https://github.com/openjdk/jdk/pull/24023/commits/b621db1cf0c17885516254a2af4b5df43e06c098 and search MergePrimitiveLoads::extract_value_for_uncommon_trap . But in my test with jtreg tier1, it never hit a case which replaced `LoadB` used by uncommon trap, I think range check smearing remove all the uncommon trap usages. So I revert it to make code simple. In my opinion, the extract_value function can be used as a general solution for other usages. But we may need a cost model to evaluate cost of new instructions which used for extracting and benefit of merged load. To simplify, I choose to check usage strictly. > > @kuaiwei Thanks for your response! > > What about these two things I brought up? > >> Do you have some tests where some of the nodes in the load/shift/or expression have other uses? > > It would be good to have these tests, even if we think your code is correct. It is good to verify it with tests. And someone in the future might break it. > >> I think your implementation should go into OrINode, and match the expression up from there. Because we want to replace the old OrI with the new LoadL. > > This is really the pattern we use in `Idea`. We replace the node at the bottom of an expression with a new node (or new expression). Hi @eme64 , I moved MergeLoads optimization to `addnode.cpp`. Now I use `_combine` for operators which can merge the adjacent loads. In this patch only `OrNode` is supported as `combine` operator, but I think it can be extended to other operator like `AddNode` and `XorNode`. They will be supported in subsequent patch. May I ask you to review it again? Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24023#issuecomment-2774212502 From jbhateja at openjdk.org Thu Apr 3 02:58:17 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 3 Apr 2025 02:58:17 GMT Subject: RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v3] In-Reply-To: References: Message-ID: > - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel? products and will be the vector ISA of choice moving into the future. > - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it. > - The initial, fully-featured version of Intel? AVX10 will be enumerated as Version 2 (denoted as Intel? AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. > - An early version of Intel? AVX10 (Version 1, or Intel? AVX10.1) that only enumerates the Intel? AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling. > > This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2. In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. > > The patch has been regressed through tier1 and jvmci tests > > Please review and share your feedback. > > Best Regards, > Jatin > > [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision: 8352675: Support Intel AVX10 converged vector ISA feature detection ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24329/files - new: https://git.openjdk.org/jdk/pull/24329/files/ff03a06e..b95ac21c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24329&range=01-02 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24329.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24329/head:pull/24329 PR: https://git.openjdk.org/jdk/pull/24329 From chagedorn at openjdk.org Thu Apr 3 05:09:56 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 3 Apr 2025 05:09:56 GMT Subject: RFR: 8347555: [REDO] C2: implement optimization for series of Add of unique value [v9] In-Reply-To: References: Message-ID: <2YsQyyHJuiGrgnTRKYADjQhRa5qIaDvCCjLd4kjfdeI=.0134b90b-fa18-4c5b-afb3-f7f4e10d6411@github.com> On Wed, 2 Apr 2025 10:25:32 GMT, Emanuel Peter wrote: >> Kangcheng Xu has updated the pull request incrementally with one additional commit since the last revision: >> >> fix ((x< > src/hotspot/share/opto/addnode.cpp line 509: > >> 507: if (rhs.valid && rhs.variable == n->in(1)) { >> 508: return Multiplication{true, rhs.variable, rhs.multiplier + 1}; >> 509: } > > Hmm, it seems these are patterns that you did not promise you would cover in the description above. > It makes it a little difficult to keep the overview... Just a drive-by comment what might help: Name the cases you cover in the description with `(1)`, (2)` etc. and add the numbers as comments in the code where you cover the patterns. This would support the mapping from description to implementation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23506#discussion_r2026196086 From chagedorn at openjdk.org Thu Apr 3 05:21:59 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 3 Apr 2025 05:21:59 GMT Subject: RFR: 8353345: C2 asserts because maskShiftAmount modifies node without deleting the hash [v2] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 17:20:35 GMT, Marc Chevalier wrote: >> First delete the hash, then `set_req`. This way, we avoid changing the node (a non-`this` node) without deleting the hash. This wrong ordering is not new from [JDK-8347459](https://bugs.openjdk.org/browse/JDK-8347459), but before that, only `this` was going through this function, so it was ok. But since, it is used with other nodes, hence the need to remove the hash. >> >> Also, not do any of that outside IGVN, but requires to register nested shifts for IGVN in parsing not to miss them later. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with two additional commits since the last revision: > > - Fix spacing > - Do not eagerly replace shift amounts in nested lshift Nice, thanks for the update! Looks much better now. Some more comments. src/hotspot/share/opto/mulnode.cpp line 953: > 951: } > 952: > 953: //============================================================================= While at it, you can also remove this line which we no longer use today Suggestion: src/hotspot/share/opto/mulnode.cpp line 966: > 964: > 965: // Returns whether the shift amount is constant. If so, sets real_shift and masked_shift. > 966: static bool mask_shift_amount(PhaseGVN* phase, const Node* shiftNode, uint nBits, int& real_shift, int& masked_shift) { While at it, we should probably use underscores instead of camelCase for `shiftNode`. Same below. src/hotspot/share/opto/mulnode.cpp line 995: > 993: if (igvn != nullptr) { > 994: igvn->rehash_node_delayed(shiftNode); > 995: } Do we still need this now? If we always call it with `shiftNode == this` then we already get the rehashing "for free" due to modifying `this` as part of `Ideal()`. src/hotspot/share/opto/mulnode.cpp line 1007: > 1005: // outer_shift = (_ << rhs0) > 1006: // We are looking for the pattern: > 1007: // outer_shift = ((X << rhs1) << rhs0) Just an idea: To better keep track of what is the outer and inner rhs, we could use `rhs_inner` and `rhs_outer`. src/hotspot/share/opto/mulnode.cpp line 1010: > 1008: // where rhs0 and rhs1 are constant > 1009: // we denote inner_shift the nested expression (X << rhs1) > 1010: // con0 = rhs1 % nbits and con0 = rhs1 % nbits Probably copy-paste error, did you want to define `con1` here as well? ------------- PR Review: https://git.openjdk.org/jdk/pull/24355#pullrequestreview-2738524807 PR Review Comment: https://git.openjdk.org/jdk/pull/24355#discussion_r2026199599 PR Review Comment: https://git.openjdk.org/jdk/pull/24355#discussion_r2026200378 PR Review Comment: https://git.openjdk.org/jdk/pull/24355#discussion_r2026202624 PR Review Comment: https://git.openjdk.org/jdk/pull/24355#discussion_r2026204009 PR Review Comment: https://git.openjdk.org/jdk/pull/24355#discussion_r2026207250 From chagedorn at openjdk.org Thu Apr 3 05:29:12 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 3 Apr 2025 05:29:12 GMT Subject: Integrated: 8353058: [PPC64] Some IR framework tests are failing after JDK-8352595 In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 06:45:58 GMT, Christian Hagedorn wrote: > `TestPhaseIRMatching` was recently updated with [JDK-8352595](https://bugs.openjdk.org/browse/JDK-8352595) which changed some matching on opto assembly from `IRNode.ALLOC` (now matching on ideal phases) to `IRNode.FIELD_ACCESS` (still matching on opto assembly). However, the updated code matches differently on PPC for some method invocation on a parameter which let the test fail on PPC: > > public Object defaultOnOptoAssembly(Helper h) { > return h.getString(); // emits one "Field: " string on most platforms but none on PPC > } > > > When I've revisited the test to analyze the failure, it was not evidently clear what I had in mind back there with `defaultOnX()`. My guess is that I've tried to have one method failing on ideal phases, one on mach phases and one on both while all the methods use `IRNode` entries that have default compile phases on ideal and mach phases. But that is not the case today. I've therefore rewritten the tests to adhere to my guess. I also removed the ambiguity among platforms to have the same number of field accesses on them. > > How to read the `@ExpectedFailure` annotation: > > @IR(failOn = {IRNode.STORE, IRNode.FIELD_ACCESS, IRNode.COUNTED_LOOP, IRNode.STORE_I}, > counts = {IRNode.STORE, "20", IRNode.FIELD_ACCESS, "1", IRNode.COUNTED_LOOP, "2", IRNode.OOPMAP_WITH, "asdf", "2"}) > // Expect rule with id 5 (the one directly above) to fail: > // - We fail when matching PRINT_IDEAL with the: > // - failOn attribute: The failing constraints are constraint 1 and 4 (while 2 and 3 pass) > // - counts attribute: The failing constraints are constraint 2 and 4 (while 1 and 3 pass). > @ExpectedFailure(ruleId = 5, phase = CompilePhase.PRINT_IDEAL, failOn = {1, 4}, counts = {1, 3}) > > > Thanks to @TheRealMDoerr for testing the patch on PPC! > > Thanks, > Christian This pull request has now been integrated. Changeset: 8d3d1d41 Author: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/8d3d1d41377cf2162aad374dce4bf7e1bcb8297c Stats: 54 lines in 1 file changed: 17 ins; 8 del; 29 mod 8353058: [PPC64] Some IR framework tests are failing after JDK-8352595 Reviewed-by: mchevalier, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/24373 From epeter at openjdk.org Thu Apr 3 05:46:30 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 3 Apr 2025 05:46:30 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v8] In-Reply-To: References: Message-ID: > We should extend the functionality of Verify.checkEQ: > - Allow different NaN encodings to be seen as equal (by default). > - Compare VectorAPI vectors. > - Compare Exceptions, and their messages. > - Compare arbitrary Objects via Reflection. > > Note: this is a prerequisite for the Template Library [JDK-8352861](https://bugs.openjdk.org/browse/JDK-8352861) / https://github.com/openjdk/jdk/pull/23418. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24224/files - new: https://git.openjdk.org/jdk/pull/24224/files/f2b3c371..8c3e9b91 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24224&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24224&range=06-07 Stats: 35 lines in 2 files changed: 0 ins; 5 del; 30 mod Patch: https://git.openjdk.org/jdk/pull/24224.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24224/head:pull/24224 PR: https://git.openjdk.org/jdk/pull/24224 From epeter at openjdk.org Thu Apr 3 05:53:50 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 3 Apr 2025 05:53:50 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v7] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 14:01:52 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> refactor with checkEQWithRawBits > > test/hotspot/jtreg/compiler/lib/verify/Verify.java line 60: > >> 58: private final boolean isFloatCheckWithRawBits; >> 59: private final HashMap a2b = new HashMap<>(); >> 60: private final HashMap b2a = new HashMap<>(); > > Can you add a comment here what `a2b` and `b2a` means? See also some other comment further down about `a2b/b2a`, maybe you can share some docs or cross reference. I added some documentation :) > test/hotspot/jtreg/compiler/lib/verify/Verify.java line 488: > >> 486: Object aPrevious = b2a.get(b); >> 487: if (aPrevious == null && bPrevious == null) { >> 488: // Record for next time. > > Can you explain, maybe as comment at `checkAlreadyVisited()`, why we want to have these caches? Added documentation :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2026251090 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2026252113 From epeter at openjdk.org Thu Apr 3 05:57:51 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 3 Apr 2025 05:57:51 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v7] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 14:11:13 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> refactor with checkEQWithRawBits > > test/hotspot/jtreg/compiler/lib/verify/Verify.java line 109: > >> 107: Class ca = a.getClass(); >> 108: Class cb = b.getClass(); >> 109: if (ca != cb) { > > Only seen this in my IDE: `ca` and `cb` should be `Class` instead of the raw `Class` since `getClass()` returns a `Class` (cannot make a suggestion since it's hidden here). fixed > test/hotspot/jtreg/compiler/lib/verify/Verify.java line 124: > >> 122: switch (a) { >> 123: case Object[] x -> checkEQimpl(x, (Object[])b, field, aParent, bParent); >> 124: case Byte x -> checkEQimpl(x, ((Byte)b).byteValue(), field, aParent, bParent); > > Can't you just pass `(Byte) b` to rely on auto unboxing instead? You are right, simplified it :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2026257889 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2026254841 From epeter at openjdk.org Thu Apr 3 06:05:52 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 3 Apr 2025 06:05:52 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v7] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 14:12:59 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> refactor with checkEQWithRawBits > > test/hotspot/jtreg/compiler/lib/verify/Verify.java line 143: > >> 141: case Exception x -> checkEQimpl(x, (Exception) b, field, aParent, bParent); >> 142: default -> { >> 143: if (ca.getName().startsWith("jdk.incubator.vector") && ca.getName().contains("Vector")) { > > Might be worth to extract this case to own methods and structure it like this to reduce the size of the method: > > if (vectorClass()) { > checkEQForVectorAPIClass(); > } else { > checkEQdispatch(); > } Refactored it :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2026266298 From epeter at openjdk.org Thu Apr 3 06:14:01 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 3 Apr 2025 06:14:01 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v7] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 14:16:58 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> refactor with checkEQWithRawBits > > test/hotspot/jtreg/compiler/lib/verify/Verify.java line 187: > >> 185: private void checkEQimpl(char a, char b, String field, Object aParent, Object bParent) { >> 186: if (a != b) { >> 187: System.err.println("ERROR: Verify.checkEQ failed: value mismatch: " + (int)a + " vs " + (int)b); > > Why do you need an upcast here? Same for `short`. Look at this ;) jshell> char a = 66; a ==> 'B' jshell> System.out.println("a: " + a); a: B jshell> System.out.println("a: " + (int)a); a: 66 But I can remove the casts for `short`. > test/hotspot/jtreg/compiler/lib/verify/Verify.java line 254: > >> 252: private void checkEQimpl(float a, float b, String field, Object aParent, Object bParent) { >> 253: if (isFloatEQ(a, b)) { >> 254: System.err.println("ERROR: Verify.checkEQ failed: value mismatch. check raw: " + isFloatCheckWithRawBits); > > Just noticed this now (there are other places as well): Since we now have `Verify.checkEQ()` and `Verify.checkEQWithRawBits()`, it would improve the readability if we reported which method was used. It could be done with something like that (pseudo code): > > System.err.println("ERROR: Verify.checkEQ" + withRawBitsString() + " failed: value mismatch. > > String withRawBitsString() { > return isFloatCheckWithRawBits ? "WithRawBits" : ""; > } Boah. That is really going to bloat the code, don't you think? The exception that is thrown will already give you the complete stack trace, including which methods were called. Is that not good enough? > test/hotspot/jtreg/compiler/lib/verify/Verify.java line 256: > >> 254: System.err.println("ERROR: Verify.checkEQ failed: value mismatch. check raw: " + isFloatCheckWithRawBits); >> 255: System.err.println(" Values: " + a + " vs " + b); >> 256: System.err.println(" Raw: " + Float.floatToRawIntBits(a) + " vs " + Float.floatToRawIntBits(b)); > > Do we always want to dump the raw bits even when `isFloatCheckWithRawBits` is false? I guess it does not hurt. Yes, I want that. It can help if there are different `NaN` encodings. Or if we somehow reinterpreted integer values as floats. It's been useful for me in the past :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2026269456 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2026272974 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2026274418 From epeter at openjdk.org Thu Apr 3 06:14:02 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 3 Apr 2025 06:14:02 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v7] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 06:06:50 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/lib/verify/Verify.java line 187: >> >>> 185: private void checkEQimpl(char a, char b, String field, Object aParent, Object bParent) { >>> 186: if (a != b) { >>> 187: System.err.println("ERROR: Verify.checkEQ failed: value mismatch: " + (int)a + " vs " + (int)b); >> >> Why do you need an upcast here? Same for `short`. > > Look at this ;) > > jshell> char a = 66; > a ==> 'B' > > jshell> System.out.println("a: " + a); > a: B > > jshell> System.out.println("a: " + (int)a); > a: 66 > > > But I can remove the casts for `short`. Added a comment as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2026271338 From epeter at openjdk.org Thu Apr 3 06:22:24 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 3 Apr 2025 06:22:24 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v9] In-Reply-To: References: Message-ID: <779UjTYbMPKwYJmlILeIwI7WTewAG8XRu4dwzm2UR2E=.c364b368-c22b-4414-b700-9be7c24d0e9f@github.com> > We should extend the functionality of Verify.checkEQ: > - Allow different NaN encodings to be seen as equal (by default). > - Compare VectorAPI vectors. > - Compare Exceptions, and their messages. > - Compare arbitrary Objects via Reflection. > > Note: this is a prerequisite for the Template Library [JDK-8352861](https://bugs.openjdk.org/browse/JDK-8352861) / https://github.com/openjdk/jdk/pull/23418. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Updates for Christian ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24224/files - new: https://git.openjdk.org/jdk/pull/24224/files/8c3e9b91..a07f201e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24224&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24224&range=07-08 Stats: 96 lines in 1 file changed: 51 ins; 15 del; 30 mod Patch: https://git.openjdk.org/jdk/pull/24224.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24224/head:pull/24224 PR: https://git.openjdk.org/jdk/pull/24224 From epeter at openjdk.org Thu Apr 3 06:22:25 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 3 Apr 2025 06:22:25 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v7] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 14:57:12 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> refactor with checkEQWithRawBits > > Thanks for the update! It's much easier to use and understand now I think. > > I did a complete pass and left a lot of comments but mostly minor things. Overall, I think this looks great! :-) @chhagedorn Thanks for the thorough review :) I think I addressed all your comments ? > test/hotspot/jtreg/compiler/lib/verify/Verify.java line 236: > >> 234: * pattern in all cases, except for NaN we project to the canonical NaN, using Float.floatToIntBits. >> 235: */ >> 236: private boolean isFloatEQ(float a, float b) { > > Shouldn't this be named `isFloatNotEQ` since you return true when they are different? Same for `isDoubleEQ` below. Alternatively: Return true when they are equal (i.e. flip condition). Good catch! Flipped the condition :) > test/hotspot/jtreg/compiler/lib/verify/Verify.java line 316: > >> 314: * Verify that the content of two MemorySegments is identical. Note: we do not check the >> 315: * backing type, only the size and content. >> 316: */ > > Probably a copy-paste error. Should be updated for exceptions. Good catch, updated it! > test/hotspot/jtreg/compiler/lib/verify/Verify.java line 463: > >> 461: private void print(Object a, Object b, String field, Object aParent, Object bParent) { >> 462: System.err.println(" aParent: " + aParent); >> 463: System.err.println(" bParent: " + bParent); > > Should we print `null` parents or just skip them? I think it does not hurt to print `null` here. It makes the code a little simpler. > test/hotspot/jtreg/compiler/lib/verify/Verify.java line 520: > >> 518: long start = Long.max(offset - range, 0); >> 519: long end = Long.min(offset + range, a.byteSize()); >> 520: for (long i = start; i < end; i++) { > > Nit below: You can replace `System.err.println("")` with `System.err.println()`. done! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24224#issuecomment-2774605365 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2026279491 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2026276939 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2026280168 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2026281282 From epeter at openjdk.org Thu Apr 3 06:27:33 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 3 Apr 2025 06:27:33 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v10] In-Reply-To: References: Message-ID: > We should extend the functionality of Verify.checkEQ: > - Allow different NaN encodings to be seen as equal (by default). > - Compare VectorAPI vectors. > - Compare Exceptions, and their messages. > - Compare arbitrary Objects via Reflection. > > Note: this is a prerequisite for the Template Library [JDK-8352861](https://bugs.openjdk.org/browse/JDK-8352861) / https://github.com/openjdk/jdk/pull/23418. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: fix whitespace issues ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24224/files - new: https://git.openjdk.org/jdk/pull/24224/files/a07f201e..752679ae Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24224&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24224&range=08-09 Stats: 7 lines in 1 file changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/24224.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24224/head:pull/24224 PR: https://git.openjdk.org/jdk/pull/24224 From hgreule at openjdk.org Thu Apr 3 07:32:50 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Thu, 3 Apr 2025 07:32:50 GMT Subject: RFR: 8353359: C2: Or(I|L)Node::Ideal is missing AddNode::Ideal call In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 07:29:10 GMT, Emanuel Peter wrote: > @SirYwell Wow, good find! Oh dear, things like this are so easy to get wrong. Thanks for writing the IR test, that seems really to be the only way to ensure we don't get these kinds of regressions. I wonder how many more of these kinds of issues we have... Optimal would be if we had IR tests for every optimization, but that would be a lot of work! > > I'm running some testing, please ping me in 24h for the results! Thanks @eme64, did the test go through? I'm wondering now if there should rather be something like an `AddNodeIdealizationTests.java` that contains the optimizations of AddNode::ideal for all(*) its subtypes rather than more specific test classes testing a mix of optimizations from different Ideal methods (e.g., `AddINodeIdealizationTests.java` has a test for `(x + 1) + 2 => x + 3`). I'm not sure if your current work on the template library could somehow cover that (replacing operators, replacing IR check rules). (*) Not all, as there are some subtypes for which the optimizations don't apply. I also noticed that I didn't add the bug id to the test headers here, I'll add them before merging. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24348#issuecomment-2774734906 From epeter at openjdk.org Thu Apr 3 07:40:48 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 3 Apr 2025 07:40:48 GMT Subject: RFR: 8353359: C2: Or(I|L)Node::Ideal is missing AddNode::Ideal call In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 07:29:50 GMT, Hannes Greule wrote: >> @SirYwell Wow, good find! Oh dear, things like this are so easy to get wrong. Thanks for writing the IR test, that seems really to be the only way to ensure we don't get these kinds of regressions. I wonder how many more of these kinds of issues we have... Optimal would be if we had IR tests for every optimization, but that would be a lot of work! >> >> I'm running some testing, please ping me in 24h for the results! > >> @SirYwell Wow, good find! Oh dear, things like this are so easy to get wrong. Thanks for writing the IR test, that seems really to be the only way to ensure we don't get these kinds of regressions. I wonder how many more of these kinds of issues we have... Optimal would be if we had IR tests for every optimization, but that would be a lot of work! >> >> I'm running some testing, please ping me in 24h for the results! > > Thanks @eme64, did the test go through? > > I'm wondering now if there should rather be something like an `AddNodeIdealizationTests.java` that contains the optimizations of AddNode::ideal for all(*) its subtypes rather than more specific test classes testing a mix of optimizations from different Ideal methods (e.g., `AddINodeIdealizationTests.java` has a test for `(x + 1) + 2 => x + 3`). I'm not sure if your current work on the template library could somehow cover that (replacing operators, replacing IR check rules). > (*) Not all, as there are some subtypes for which the optimizations don't apply. > > I also noticed that I didn't add the bug id to the test headers here, I'll add them before merging. @SirYwell Something like a `AddINodeIdealizationTests.java` sounds like a good idea. We could systematically cover `Value`, `Ideal` and `Identity`, for every single node. A good structure would really help. But collecting / writing all those IR tests is a lot of work... But we could at least start setting it up, and extend it over time. But that is work for some separate RFE's, I'll discuss it with my co-workers. Not sure if Templates really help. Because the tedious work is capturing all the patterns, and writing IR rules. That's hard to automate I think. But maybe you have some good ideas here :) Well, I suppose some patterns go over multiple types, so there we could do something. And maybe we can still cut a lot of boiler-plate code with Templates, and get a better overview that way... worth thinking about a little more! Tests have passed. I'll wait with approval until you make the updates :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24348#issuecomment-2774753603 From hgreule at openjdk.org Thu Apr 3 07:52:21 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Thu, 3 Apr 2025 07:52:21 GMT Subject: RFR: 8353359: C2: Or(I|L)Node::Ideal is missing AddNode::Ideal call [v2] In-Reply-To: References: Message-ID: <4qrmlvl4NrkGCFYYwkvzbmjQwAlIZhpyFHiol3m3NpY=.5dc8822d-3185-4569-8352-965894ba0149@github.com> > Hi, > > this simple change adds a missing AddNode::Ideal call to Or(I|L)Node::Ideal. See the added tests for examples of optimizations that don't apply without this change. > > Please let me know what you think. Hannes Greule has updated the pull request incrementally with two additional commits since the last revision: - update license year - add bug id ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24348/files - new: https://git.openjdk.org/jdk/pull/24348/files/f7fb76da..2584807a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24348&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24348&range=00-01 Stats: 4 lines in 2 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24348.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24348/head:pull/24348 PR: https://git.openjdk.org/jdk/pull/24348 From epeter at openjdk.org Thu Apr 3 07:52:21 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 3 Apr 2025 07:52:21 GMT Subject: RFR: 8353359: C2: Or(I|L)Node::Ideal is missing AddNode::Ideal call [v2] In-Reply-To: <4qrmlvl4NrkGCFYYwkvzbmjQwAlIZhpyFHiol3m3NpY=.5dc8822d-3185-4569-8352-965894ba0149@github.com> References: <4qrmlvl4NrkGCFYYwkvzbmjQwAlIZhpyFHiol3m3NpY=.5dc8822d-3185-4569-8352-965894ba0149@github.com> Message-ID: On Thu, 3 Apr 2025 07:49:46 GMT, Hannes Greule wrote: >> Hi, >> >> this simple change adds a missing AddNode::Ideal call to Or(I|L)Node::Ideal. See the added tests for examples of optimizations that don't apply without this change. >> >> Please let me know what you think. > > Hannes Greule has updated the pull request incrementally with two additional commits since the last revision: > > - update license year > - add bug id Thanks for the updates and the fix :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24348#pullrequestreview-2738839095 From hgreule at openjdk.org Thu Apr 3 07:56:56 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Thu, 3 Apr 2025 07:56:56 GMT Subject: RFR: 8353359: C2: Or(I|L)Node::Ideal is missing AddNode::Ideal call [v2] In-Reply-To: <4qrmlvl4NrkGCFYYwkvzbmjQwAlIZhpyFHiol3m3NpY=.5dc8822d-3185-4569-8352-965894ba0149@github.com> References: <4qrmlvl4NrkGCFYYwkvzbmjQwAlIZhpyFHiol3m3NpY=.5dc8822d-3185-4569-8352-965894ba0149@github.com> Message-ID: On Thu, 3 Apr 2025 07:52:21 GMT, Hannes Greule wrote: >> Hi, >> >> this simple change adds a missing AddNode::Ideal call to Or(I|L)Node::Ideal. See the added tests for examples of optimizations that don't apply without this change. >> >> Please let me know what you think. > > Hannes Greule has updated the pull request incrementally with two additional commits since the last revision: > > - update license year > - add bug id Thanks, I'll wait for a second review I guess? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24348#issuecomment-2774786364 From epeter at openjdk.org Thu Apr 3 07:56:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 3 Apr 2025 07:56:57 GMT Subject: RFR: 8353359: C2: Or(I|L)Node::Ideal is missing AddNode::Ideal call [v2] In-Reply-To: References: <4qrmlvl4NrkGCFYYwkvzbmjQwAlIZhpyFHiol3m3NpY=.5dc8822d-3185-4569-8352-965894ba0149@github.com> Message-ID: On Thu, 3 Apr 2025 07:53:06 GMT, Hannes Greule wrote: >> Hannes Greule has updated the pull request incrementally with two additional commits since the last revision: >> >> - update license year >> - add bug id > > Thanks, I'll wait for a second review I guess? @SirYwell Ah, yes, for compiler changes we require 2 reviews :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24348#issuecomment-2774788937 From chagedorn at openjdk.org Thu Apr 3 07:56:59 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 3 Apr 2025 07:56:59 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v7] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 06:08:41 GMT, Emanuel Peter wrote: >> Look at this ;) >> >> jshell> char a = 66; >> a ==> 'B' >> >> jshell> System.out.println("a: " + a); >> a: B >> >> jshell> System.out.println("a: " + (int)a); >> a: 66 >> >> >> But I can remove the casts for `short`. > > Added a comment as well. Right, that makes sense for the `char` case. But good that we could remove it for the `short` case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2026372872 From chagedorn at openjdk.org Thu Apr 3 07:56:59 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 3 Apr 2025 07:56:59 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v10] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 06:27:33 GMT, Emanuel Peter wrote: >> We should extend the functionality of Verify.checkEQ: >> - Allow different NaN encodings to be seen as equal (by default). >> - Compare VectorAPI vectors. >> - Compare Exceptions, and their messages. >> - Compare arbitrary Objects via Reflection. >> >> Note: this is a prerequisite for the Template Library [JDK-8352861](https://bugs.openjdk.org/browse/JDK-8352861) / https://github.com/openjdk/jdk/pull/23418. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > fix whitespace issues Thanks for addressing all my comments and doing the updates! I have some more final comments but then I think it's good to go from my side! test/hotspot/jtreg/compiler/lib/verify/Verify.java line 62: > 60: * When comparing arbitrary classes recursively, we need to remember which > 61: * pairs of objects {@code (a, b)} we have already visited. The maps > 62: * {@link a2b} and {@link b2a} track these edges. Caching which pairs I think it's fine to use `code` here since the Javadocs links to itself otherwise. Suggestion: * {@code a2b} and {@code b2a} track these edges. Caching which pairs test/hotspot/jtreg/compiler/lib/verify/Verify.java line 77: > 75: * Verify the contents of two Objects on a raw bit level, possibly recursively. > 76: * Different NaN encodings are considered non-equal, since we compare > 77: * floating number by their raw bits. Suggestion: * floating numbers by their raw bits. test/hotspot/jtreg/compiler/lib/verify/Verify.java line 90: > 88: /** > 89: * Verify the contents of two Objects, possibly recursively. > 90: * Different NaN encodins are considered equal. Suggestion: * Different NaN encodings are considered equal. test/hotspot/jtreg/compiler/lib/verify/Verify.java line 307: > 305: * Verify that two Exceptions have the same message. Messages are not always carried, > 306: * they are often dropped to performance, and that is ok. But if both Exceptions have > 307: * the message, we should compare them. Suggestion: * they are often dropped for performance reasons, and that is okay. But if both Exceptions * have the message, we should compare them. test/hotspot/jtreg/compiler/lib/verify/Verify.java line 438: > 436: * to add "--add-modules=jdk.incubator.vector" to the command-line of every test that uses the Verify > 437: * class. So we hack this via reflection. > 438: */ I think this background is only needed at `checkEQForVectorAPIClass()` (where you already have that comment). Here you can just describe what the code actually does or just drop the comment entirely since the method name is self-explanatory :-) test/hotspot/jtreg/compiler/lib/verify/Verify.java line 495: > 493: * When comparing arbitrary classes recursively, we need to remember which > 494: * pairs of objects {@code (a, b)} we have already visited. The maps > 495: * {@link a2b} and {@link b2a} track these edges. Caching which pairs Suggestion: * {@link #a2b} and {@link #b2a} track these edges. Caching which pairs ------------- PR Review: https://git.openjdk.org/jdk/pull/24224#pullrequestreview-2738785943 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2026406773 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2026370608 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2026371049 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2026396345 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2026401231 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2026407956 From chagedorn at openjdk.org Thu Apr 3 07:57:01 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 3 Apr 2025 07:57:01 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v7] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 06:10:14 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/lib/verify/Verify.java line 254: >> >>> 252: private void checkEQimpl(float a, float b, String field, Object aParent, Object bParent) { >>> 253: if (isFloatEQ(a, b)) { >>> 254: System.err.println("ERROR: Verify.checkEQ failed: value mismatch. check raw: " + isFloatCheckWithRawBits); >> >> Just noticed this now (there are other places as well): Since we now have `Verify.checkEQ()` and `Verify.checkEQWithRawBits()`, it would improve the readability if we reported which method was used. It could be done with something like that (pseudo code): >> >> System.err.println("ERROR: Verify.checkEQ" + withRawBitsString() + " failed: value mismatch. >> >> String withRawBitsString() { >> return isFloatCheckWithRawBits ? "WithRawBits" : ""; >> } > > Boah. That is really going to bloat the code, don't you think? > The exception that is thrown will already give you the complete stack trace, including which methods were called. Is that not good enough? Hm, it could indeed be a little bit more complicated when you are deep down in a recursion. My thought was that it could be misleading when a test is using a mix of `verifyEQ()` and `verifyEQWithRawBits()` and you only read `verifyEQ` failed. You could be start looking at the wrong check even though the stack trace would have guided you to the correct place. Maybe we can just update "Verify.checkEQ" into something more generic like "Equality matching failed" and we're good. What do you think? >> test/hotspot/jtreg/compiler/lib/verify/Verify.java line 256: >> >>> 254: System.err.println("ERROR: Verify.checkEQ failed: value mismatch. check raw: " + isFloatCheckWithRawBits); >>> 255: System.err.println(" Values: " + a + " vs " + b); >>> 256: System.err.println(" Raw: " + Float.floatToRawIntBits(a) + " vs " + Float.floatToRawIntBits(b)); >> >> Do we always want to dump the raw bits even when `isFloatCheckWithRawBits` is false? I guess it does not hurt. > > Yes, I want that. It can help if there are different `NaN` encodings. Or if we somehow reinterpreted integer values as floats. It's been useful for me in the past :) Sounds good, let's leave it in then! >> test/hotspot/jtreg/compiler/lib/verify/Verify.java line 463: >> >>> 461: private void print(Object a, Object b, String field, Object aParent, Object bParent) { >>> 462: System.err.println(" aParent: " + aParent); >>> 463: System.err.println(" bParent: " + bParent); >> >> Should we print `null` parents or just skip them? > > I think it does not hurt to print `null` here. It makes the code a little simpler. Okay, maybe we can print `` in case of a null for more clarity? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2026382031 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2026386633 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2026402954 From epeter at openjdk.org Thu Apr 3 08:09:57 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 3 Apr 2025 08:09:57 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v11] In-Reply-To: References: Message-ID: > We should extend the functionality of Verify.checkEQ: > - Allow different NaN encodings to be seen as equal (by default). > - Compare VectorAPI vectors. > - Compare Exceptions, and their messages. > - Compare arbitrary Objects via Reflection. > > Note: this is a prerequisite for the Template Library [JDK-8352861](https://bugs.openjdk.org/browse/JDK-8352861) / https://github.com/openjdk/jdk/pull/23418. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from code review Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24224/files - new: https://git.openjdk.org/jdk/pull/24224/files/752679ae..ccb8c4b7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24224&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24224&range=09-10 Stats: 6 lines in 1 file changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/24224.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24224/head:pull/24224 PR: https://git.openjdk.org/jdk/pull/24224 From mchevalier at openjdk.org Thu Apr 3 08:14:00 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 3 Apr 2025 08:14:00 GMT Subject: RFR: 8353341: C2: removal of a Mod[DF]Node crashes when the node is already dead [v3] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 08:08:07 GMT, Marc Chevalier wrote: >> If the Mod[DF]Node has no control projection when it's being removed (because its result is unused), `extract_projections` will fail an assert. So, let's skip the removal. >> >> But that should happen only when the nodes are already unreachable (control input being transitively top). At the end of the day, the node should be dropped. because of that, so there is no rush, and let dead node deletion do the job. >> >> On the reduced reproducer, the crash is not common (even with `-XX:RepeatCompilation=300`, it might need more than a run to reproduce). So I've tried my fix on multiple thousands repeat compilations (by 300 packs) without a crash, and without having the modulo node alive at the end. >> >> For instance, that's what happen on the reproducer. Quickly, some big sub-graph is dead, but nodes stay a while in the graph: >> >> Then: >> >> And eventually, everything is removed, so the control projection is removed, and `extract_projections` doesn't like it. >> >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments, part 2 Thanks @chhagedorn and @TobiHartmann! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24375#issuecomment-2774834063 From duke at openjdk.org Thu Apr 3 08:14:01 2025 From: duke at openjdk.org (duke) Date: Thu, 3 Apr 2025 08:14:01 GMT Subject: RFR: 8353341: C2: removal of a Mod[DF]Node crashes when the node is already dead [v3] In-Reply-To: References: Message-ID: <39fyYyWXS86ospwWICUF9t7L8fQ1XI6eRbH-6bg68Es=.a5409f85-85eb-400a-8023-58195a648c0a@github.com> On Wed, 2 Apr 2025 08:08:07 GMT, Marc Chevalier wrote: >> If the Mod[DF]Node has no control projection when it's being removed (because its result is unused), `extract_projections` will fail an assert. So, let's skip the removal. >> >> But that should happen only when the nodes are already unreachable (control input being transitively top). At the end of the day, the node should be dropped. because of that, so there is no rush, and let dead node deletion do the job. >> >> On the reduced reproducer, the crash is not common (even with `-XX:RepeatCompilation=300`, it might need more than a run to reproduce). So I've tried my fix on multiple thousands repeat compilations (by 300 packs) without a crash, and without having the modulo node alive at the end. >> >> For instance, that's what happen on the reproducer. Quickly, some big sub-graph is dead, but nodes stay a while in the graph: >> >> Then: >> >> And eventually, everything is removed, so the control projection is removed, and `extract_projections` doesn't like it. >> >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments, part 2 @marc-chevalier Your change (at version 48bd2037a9241f4c2956b19e91585553249e2625) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24375#issuecomment-2774836855 From epeter at openjdk.org Thu Apr 3 08:15:36 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 3 Apr 2025 08:15:36 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v12] In-Reply-To: References: Message-ID: > We should extend the functionality of Verify.checkEQ: > - Allow different NaN encodings to be seen as equal (by default). > - Compare VectorAPI vectors. > - Compare Exceptions, and their messages. > - Compare arbitrary Objects via Reflection. > > Note: this is a prerequisite for the Template Library [JDK-8352861](https://bugs.openjdk.org/browse/JDK-8352861) / https://github.com/openjdk/jdk/pull/23418. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: For Christian ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24224/files - new: https://git.openjdk.org/jdk/pull/24224/files/ccb8c4b7..b8fad69c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24224&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24224&range=10-11 Stats: 27 lines in 1 file changed: 0 ins; 5 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/24224.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24224/head:pull/24224 PR: https://git.openjdk.org/jdk/pull/24224 From epeter at openjdk.org Thu Apr 3 08:15:36 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 3 Apr 2025 08:15:36 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v10] In-Reply-To: References: Message-ID: <_8a_-zsFB8h10ir2y4whdQHDkzHn91yflaplv2o9bZ8=.0259d2a1-2aec-49c0-9d51-6774f9ed41f5@github.com> On Thu, 3 Apr 2025 07:54:24 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> fix whitespace issues > > Thanks for addressing all my comments and doing the updates! I have some more final comments but then I think it's good to go from my side! @chhagedorn Thanks for having another look! I applied all your suggestions :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24224#issuecomment-2774838773 From chagedorn at openjdk.org Thu Apr 3 08:25:09 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 3 Apr 2025 08:25:09 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v12] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 08:15:36 GMT, Emanuel Peter wrote: >> We should extend the functionality of Verify.checkEQ: >> - Allow different NaN encodings to be seen as equal (by default). >> - Compare VectorAPI vectors. >> - Compare Exceptions, and their messages. >> - Compare arbitrary Objects via Reflection. >> >> Note: this is a prerequisite for the Template Library [JDK-8352861](https://bugs.openjdk.org/browse/JDK-8352861) / https://github.com/openjdk/jdk/pull/23418. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > For Christian That looks good to me, thanks for bearing with me! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24224#pullrequestreview-2738937793 From mchevalier at openjdk.org Thu Apr 3 08:41:12 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 3 Apr 2025 08:41:12 GMT Subject: Integrated: 8353341: C2: removal of a Mod[DF]Node crashes when the node is already dead In-Reply-To: References: Message-ID: <8GAvhDoJ3ji1WXZCij79wdxfbY3fr7qtbU8yt5swpPg=.0987d06f-b27d-4c90-996b-15f969727577@github.com> On Wed, 2 Apr 2025 07:19:35 GMT, Marc Chevalier wrote: > If the Mod[DF]Node has no control projection when it's being removed (because its result is unused), `extract_projections` will fail an assert. So, let's skip the removal. > > But that should happen only when the nodes are already unreachable (control input being transitively top). At the end of the day, the node should be dropped. because of that, so there is no rush, and let dead node deletion do the job. > > On the reduced reproducer, the crash is not common (even with `-XX:RepeatCompilation=300`, it might need more than a run to reproduce). So I've tried my fix on multiple thousands repeat compilations (by 300 packs) without a crash, and without having the modulo node alive at the end. > > For instance, that's what happen on the reproducer. Quickly, some big sub-graph is dead, but nodes stay a while in the graph: > > Then: > > And eventually, everything is removed, so the control projection is removed, and `extract_projections` doesn't like it. > > > Thanks, > Marc This pull request has now been integrated. Changeset: 00a038e9 Author: Marc Chevalier Committer: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/00a038e9c559401b7934f30b4719010bb1024291 Stats: 95 lines in 2 files changed: 93 ins; 0 del; 2 mod 8353341: C2: removal of a Mod[DF]Node crashes when the node is already dead Reviewed-by: thartmann, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/24375 From jbhateja at openjdk.org Thu Apr 3 08:47:01 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 3 Apr 2025 08:47:01 GMT Subject: RFR: 8352585: Add special case handling for Float16.max/min x86 backend [v5] In-Reply-To: References: Message-ID: <26OsQVzaWO4t7wkKEnbhxfXerfizKktb-EX3ncNzBKE=.81aabcbe-58ea-4d4e-9c0f-84db4759b676@github.com> On Wed, 2 Apr 2025 06:31:20 GMT, Emanuel Peter wrote: >> Hi @eme64 , >> This specific issues is around special Float16 values i.e +/- 0.0 and NaN. >> I have added a Generator for Float16 as part of https://github.com/openjdk/jdk/pull/22755 >> >> Best Regards, >> Jatin > > @jatin-bhateja It looks reasonable to me now. Let me run some testing, ping me in 24h for the results! ping @eme64, kindly approve if your tests are all green. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24169#issuecomment-2774916278 From chagedorn at openjdk.org Thu Apr 3 08:58:49 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 3 Apr 2025 08:58:49 GMT Subject: RFR: 8353359: C2: Or(I|L)Node::Ideal is missing AddNode::Ideal call [v2] In-Reply-To: <4qrmlvl4NrkGCFYYwkvzbmjQwAlIZhpyFHiol3m3NpY=.5dc8822d-3185-4569-8352-965894ba0149@github.com> References: <4qrmlvl4NrkGCFYYwkvzbmjQwAlIZhpyFHiol3m3NpY=.5dc8822d-3185-4569-8352-965894ba0149@github.com> Message-ID: On Thu, 3 Apr 2025 07:52:21 GMT, Hannes Greule wrote: >> Hi, >> >> this simple change adds a missing AddNode::Ideal call to Or(I|L)Node::Ideal. See the added tests for examples of optimizations that don't apply without this change. >> >> Please let me know what you think. > > Hannes Greule has updated the pull request incrementally with two additional commits since the last revision: > > - update license year > - add bug id Indeed, good catch! Very hard to find these bugs but I'm afraid there is, unfortunately, not much we can do about it instead of having more IR tests to catch these cases. > I'm wondering now if there should rather be something like an `AddNodeIdealizationTests.java` that contains the optimizations of AddNode::ideal for all(*) its subtypes rather than more specific test classes testing a mix of optimizations from different Ideal methods (e.g., `AddINodeIdealizationTests.java` has a test for `(x + 1) + 2 => x + 3`). I would recommend to split it up more to easier find the tests again. I would probably first search for a `Or*Tests.java` instead of looking into `Add*Tests.java` when checking tests for `OrINode`. We can still group together multiple nodes if they only differ in the basic types like `AddI` and `AddL`. But this can still be discussed when such tests are added, which I totally agree with you we should have. Adding tests can be done incrementally and even in small or "not yet completely covering a node with many transformation" batches. > We could systematically cover Value, Ideal and Identity, for every single node That would be great! > Because the tedious work is capturing all the patterns, and writing IR rules. What might help here is when the documentation for the `Ideal()`, `Identity()`, and `Value()` methods would enumerate the different optimizations which we want to check and add the numbers to the method body where it's implemented. That would not only help to map documentation to code and spot potentially missing/wrong promises but also helps with writing clearly map-able tests: We can simply write `testAddINodeCase3b()`, for example instead of `testAddISomeHardToMapOptimizationName()`. The only downside of that: when we change the enumerations, the tests are no longer in sync. But I guess that's an okay price to pay. > But that is work for some separate RFE's Definitely! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24348#pullrequestreview-2739029429 From epeter at openjdk.org Thu Apr 3 09:05:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 3 Apr 2025 09:05:56 GMT Subject: RFR: 8352585: Add special case handling for Float16.max/min x86 backend [v7] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 16:17:22 GMT, Jatin Bhateja wrote: >> This bugfix patch adds the special handling as per x86 AVX512-FP16 ISA specification[1][2] to compute max/min operations with +/-0.0 or NaN operands. >> >> Special handling leverage the instruction semantic, central idea is to shuffle the operands such that smaller input gets assigned to second operand for min operation or a larger input gets assigned to second operand for max operation, in addition result equals NaN if an unordered comparison detects first input as a NaN value else we return the result of min/max operation. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://www.felixcloutier.com/x86/vminsh >> [2] https://www.felixcloutier.com/x86/vmaxsh > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolution Testing is green :) src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 7108: > 7106: // dst = max(xtmp1, xtmp2) > 7107: vmaxsh(dst, xtmp1, xtmp2); > 7108: // isNaN = is_unordered_quite(xtmp1) Suggestion: // isNaN = is_unordered_quiet(xtmp1) Does the Q stand for quiet or quite? src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 7129: > 7127: // dst = min(xtmp1, xtmp2) > 7128: vminsh(dst, xtmp1, xtmp2); > 7129: // isNaN = is_unordered_quite(xtmp1) Suggestion: // isNaN = is_unordered_quiet(xtmp1) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24169#pullrequestreview-2739047288 PR Review Comment: https://git.openjdk.org/jdk/pull/24169#discussion_r2026532906 PR Review Comment: https://git.openjdk.org/jdk/pull/24169#discussion_r2026533225 From jbhateja at openjdk.org Thu Apr 3 09:25:37 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 3 Apr 2025 09:25:37 GMT Subject: RFR: 8352585: Add special case handling for Float16.max/min x86 backend [v8] In-Reply-To: References: Message-ID: > This bugfix patch adds the special handling as per x86 AVX512-FP16 ISA specification[1][2] to compute max/min operations with +/-0.0 or NaN operands. > > Special handling leverage the instruction semantic, central idea is to shuffle the operands such that smaller input gets assigned to second operand for min operation or a larger input gets assigned to second operand for max operation, in addition result equals NaN if an unordered comparison detects first input as a NaN value else we return the result of min/max operation. > > Kindly review and share your feedback. > > Best Regards, > Jatin > > [1] https://www.felixcloutier.com/x86/vminsh > [2] https://www.felixcloutier.com/x86/vmaxsh Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Type fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24169/files - new: https://git.openjdk.org/jdk/pull/24169/files/1713057d..0ff84455 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24169&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24169&range=06-07 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24169.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24169/head:pull/24169 PR: https://git.openjdk.org/jdk/pull/24169 From epeter at openjdk.org Thu Apr 3 09:25:37 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 3 Apr 2025 09:25:37 GMT Subject: RFR: 8352585: Add special case handling for Float16.max/min x86 backend [v8] In-Reply-To: References: Message-ID: <789JJASgNvA9D9Fbdgp9p2DO6KOSoU0Uro08KX_QuLk=.61045de2-98f8-47cd-9421-9f161feb30bd@github.com> On Thu, 3 Apr 2025 09:22:23 GMT, Jatin Bhateja wrote: >> This bugfix patch adds the special handling as per x86 AVX512-FP16 ISA specification[1][2] to compute max/min operations with +/-0.0 or NaN operands. >> >> Special handling leverage the instruction semantic, central idea is to shuffle the operands such that smaller input gets assigned to second operand for min operation or a larger input gets assigned to second operand for max operation, in addition result equals NaN if an unordered comparison detects first input as a NaN value else we return the result of min/max operation. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://www.felixcloutier.com/x86/vminsh >> [2] https://www.felixcloutier.com/x86/vmaxsh > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Type fixes Marked as reviewed by epeter (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24169#pullrequestreview-2739096032 From jbhateja at openjdk.org Thu Apr 3 09:25:37 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 3 Apr 2025 09:25:37 GMT Subject: RFR: 8352585: Add special case handling for Float16.max/min x86 backend [v5] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 06:31:20 GMT, Emanuel Peter wrote: >> Hi @eme64 , >> This specific issues is around special Float16 values i.e +/- 0.0 and NaN. >> I have added a Generator for Float16 as part of https://github.com/openjdk/jdk/pull/22755 >> >> Best Regards, >> Jatin > > @jatin-bhateja It looks reasonable to me now. Let me run some testing, ping me in 24h for the results! Thanks, @eme64 and @sviswa7, for the reviews and approvals. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24169#issuecomment-2775029236 From jbhateja at openjdk.org Thu Apr 3 09:25:38 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 3 Apr 2025 09:25:38 GMT Subject: RFR: 8352585: Add special case handling for Float16.max/min x86 backend [v7] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 09:01:57 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolution > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 7129: > >> 7127: // dst = min(xtmp1, xtmp2) >> 7128: vminsh(dst, xtmp1, xtmp2); >> 7129: // isNaN = is_unordered_quite(xtmp1) > > Suggestion: > > // isNaN = is_unordered_quiet(xtmp1) Typo fixed. Thanks Needs re-approval :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24169#discussion_r2026558934 From jbhateja at openjdk.org Thu Apr 3 09:25:38 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 3 Apr 2025 09:25:38 GMT Subject: Integrated: 8352585: Add special case handling for Float16.max/min x86 backend In-Reply-To: References: Message-ID: On Fri, 21 Mar 2025 20:20:24 GMT, Jatin Bhateja wrote: > This bugfix patch adds the special handling as per x86 AVX512-FP16 ISA specification[1][2] to compute max/min operations with +/-0.0 or NaN operands. > > Special handling leverage the instruction semantic, central idea is to shuffle the operands such that smaller input gets assigned to second operand for min operation or a larger input gets assigned to second operand for max operation, in addition result equals NaN if an unordered comparison detects first input as a NaN value else we return the result of min/max operation. > > Kindly review and share your feedback. > > Best Regards, > Jatin > > [1] https://www.felixcloutier.com/x86/vminsh > [2] https://www.felixcloutier.com/x86/vmaxsh This pull request has now been integrated. Changeset: f7a94fee Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/f7a94feedd63775a09d0bcb9ef3313972e2a5d69 Stats: 260 lines in 6 files changed: 254 ins; 6 del; 0 mod 8352585: Add special case handling for Float16.max/min x86 backend Reviewed-by: epeter, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/24169 From mchevalier at openjdk.org Thu Apr 3 09:34:42 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 3 Apr 2025 09:34:42 GMT Subject: RFR: 8353345: C2 asserts because maskShiftAmount modifies node without deleting the hash [v3] In-Reply-To: References: Message-ID: > First delete the hash, then `set_req`. This way, we avoid changing the node (a non-`this` node) without deleting the hash. This wrong ordering is not new from [JDK-8347459](https://bugs.openjdk.org/browse/JDK-8347459), but before that, only `this` was going through this function, so it was ok. But since, it is used with other nodes, hence the need to remove the hash. > > Also, not do any of that outside IGVN, but requires to register nested shifts for IGVN in parsing not to miss them later. > > Thanks, > Marc Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: address review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24355/files - new: https://git.openjdk.org/jdk/pull/24355/files/d84b3d6d..7c9ec24a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24355&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24355&range=01-02 Stats: 25 lines in 1 file changed: 0 ins; 2 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/24355.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24355/head:pull/24355 PR: https://git.openjdk.org/jdk/pull/24355 From mchevalier at openjdk.org Thu Apr 3 09:34:45 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 3 Apr 2025 09:34:45 GMT Subject: RFR: 8353345: C2 asserts because maskShiftAmount modifies node without deleting the hash [v2] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 05:09:29 GMT, Christian Hagedorn wrote: >> Marc Chevalier has updated the pull request incrementally with two additional commits since the last revision: >> >> - Fix spacing >> - Do not eagerly replace shift amounts in nested lshift > > src/hotspot/share/opto/mulnode.cpp line 953: > >> 951: } >> 952: >> 953: //============================================================================= > > While at it, you can also remove this line which we no longer use today > Suggestion: Done. > src/hotspot/share/opto/mulnode.cpp line 995: > >> 993: if (igvn != nullptr) { >> 994: igvn->rehash_node_delayed(shiftNode); >> 995: } > > Do we still need this now? If we always call it with `shiftNode == this` then we already get the rehashing "for free" due to modifying `this` as part of `Ideal()`. As discussed, do not remove hash (useless now), but enqueue in worklist. > src/hotspot/share/opto/mulnode.cpp line 1007: > >> 1005: // outer_shift = (_ << rhs0) >> 1006: // We are looking for the pattern: >> 1007: // outer_shift = ((X << rhs1) << rhs0) > > Just an idea: To better keep track of what is the outer and inner rhs, we could use `rhs_inner` and `rhs_outer`. good idea! > src/hotspot/share/opto/mulnode.cpp line 1010: > >> 1008: // where rhs0 and rhs1 are constant >> 1009: // we denote inner_shift the nested expression (X << rhs1) >> 1010: // con0 = rhs1 % nbits and con0 = rhs1 % nbits > > Probably copy-paste error, did you want to define `con1` here as well? indeed. redone with new notations. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24355#discussion_r2026584728 PR Review Comment: https://git.openjdk.org/jdk/pull/24355#discussion_r2026585502 PR Review Comment: https://git.openjdk.org/jdk/pull/24355#discussion_r2026585778 PR Review Comment: https://git.openjdk.org/jdk/pull/24355#discussion_r2026586168 From epeter at openjdk.org Thu Apr 3 09:39:52 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 3 Apr 2025 09:39:52 GMT Subject: RFR: 8349563: Improve AbsNode::Value() for integer types [v2] In-Reply-To: References: Message-ID: On Thu, 27 Feb 2025 02:38:09 GMT, Dean Long wrote: >> Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: >> >> - Merge >> - Improve AbsNode::Value > > src/hotspot/share/opto/subnode.cpp line 1938: > >> 1936: >> 1937: NativeType lo_abs = uabs(t->_lo); >> 1938: NativeType hi_abs = uabs(t->_hi); > > Converting unsigned to signed is C++ Undefined Behavior, is it not? @dean-long ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23685#discussion_r2026597497 From duke at openjdk.org Thu Apr 3 09:44:07 2025 From: duke at openjdk.org (David Linus Briemann) Date: Thu, 3 Apr 2025 09:44:07 GMT Subject: RFR: 8352972: PPC64: Intrinsify Unsafe::setMemory [v3] In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 16:36:10 GMT, Martin Doerr wrote: >> Similar to the x86 implementation. The non-product feature for counting things like `SharedRuntime::_unsafe_set_memory_ctr` is currently not supported on PPC64. I've left it commented out. >> >> Before this patch (measured on Power10): >> >> Benchmark (aligned) (size) Mode Cnt Score Error Units >> MemorySegmentZeroUnsafe.panama true 1 avgt 30 15.048 ? 0.095 ns/op >> MemorySegmentZeroUnsafe.panama true 2 avgt 30 15.054 ? 0.089 ns/op >> MemorySegmentZeroUnsafe.panama true 3 avgt 30 15.161 ? 0.089 ns/op >> MemorySegmentZeroUnsafe.panama true 4 avgt 30 15.147 ? 0.082 ns/op >> MemorySegmentZeroUnsafe.panama true 5 avgt 30 15.198 ? 0.089 ns/op >> MemorySegmentZeroUnsafe.panama true 6 avgt 30 15.128 ? 0.099 ns/op >> MemorySegmentZeroUnsafe.panama true 7 avgt 30 19.234 ? 0.148 ns/op >> MemorySegmentZeroUnsafe.panama true 8 avgt 30 15.060 ? 0.090 ns/op >> MemorySegmentZeroUnsafe.panama true 15 avgt 30 19.229 ? 0.171 ns/op >> MemorySegmentZeroUnsafe.panama true 16 avgt 30 15.030 ? 0.082 ns/op >> MemorySegmentZeroUnsafe.panama true 63 avgt 30 85.290 ? 0.431 ns/op >> MemorySegmentZeroUnsafe.panama true 64 avgt 30 84.273 ? 0.843 ns/op >> MemorySegmentZeroUnsafe.panama true 255 avgt 30 89.551 ? 0.706 ns/op >> MemorySegmentZeroUnsafe.panama true 256 avgt 30 87.736 ? 0.679 ns/op >> MemorySegmentZeroUnsafe.panama false 1 avgt 30 15.044 ? 0.073 ns/op >> MemorySegmentZeroUnsafe.panama false 2 avgt 30 14.980 ? 0.058 ns/op >> MemorySegmentZeroUnsafe.panama false 3 avgt 30 15.138 ? 0.126 ns/op >> MemorySegmentZeroUnsafe.panama false 4 avgt 30 15.025 ? 0.049 ns/op >> MemorySegmentZeroUnsafe.panama false 5 avgt 30 15.192 ? 0.118 ns/op >> MemorySegmentZeroUnsafe.panama false 6 avgt 30 15.464 ? 0.667 ns/op >> MemorySegmentZeroUnsafe.panama false 7 avgt 30 19.179 ? 0.143 ns/op >> MemorySegmentZeroUnsafe.panama false 8 avgt 30 15.278 ? 0.130 ns/op >> MemorySegmentZeroUnsafe.panama false 15 avgt 30 19.428 ? 0.146 ns/op >> MemorySegmentZeroUnsafe.panama false 16 avgt 30 18.011 ? 1.233 ns/op >> MemorySegmentZeroUnsafe.panama false 63 avgt 30 87.090 ? 0.989 ns/op >> MemorySegmentZeroUnsaf... > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Simplify usage of UnsafeMemoryAccessMark. LGTM ------------- Marked as reviewed by dbriemann at github.com (no known OpenJDK username). PR Review: https://git.openjdk.org/jdk/pull/24254#pullrequestreview-2739162483 From mchevalier at openjdk.org Thu Apr 3 10:28:48 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 3 Apr 2025 10:28:48 GMT Subject: RFR: 8353345: C2 asserts because maskShiftAmount modifies node without deleting the hash [v3] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 09:34:42 GMT, Marc Chevalier wrote: >> First delete the hash, then `set_req`. This way, we avoid changing the node (a non-`this` node) without deleting the hash. This wrong ordering is not new from [JDK-8347459](https://bugs.openjdk.org/browse/JDK-8347459), but before that, only `this` was going through this function, so it was ok. But since, it is used with other nodes, hence the need to remove the hash. >> >> Also, not do any of that outside IGVN, but requires to register nested shifts for IGVN in parsing not to miss them later. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > address review comments Requested changes done! Ready for more review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24355#issuecomment-2775259754 From chagedorn at openjdk.org Thu Apr 3 10:56:05 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 3 Apr 2025 10:56:05 GMT Subject: RFR: 8353345: C2 asserts because maskShiftAmount modifies node without deleting the hash [v3] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 09:34:42 GMT, Marc Chevalier wrote: >> First delete the hash, then `set_req`. This way, we avoid changing the node (a non-`this` node) without deleting the hash. This wrong ordering is not new from [JDK-8347459](https://bugs.openjdk.org/browse/JDK-8347459), but before that, only `this` was going through this function, so it was ok. But since, it is used with other nodes, hence the need to remove the hash. >> >> Also, not do any of that outside IGVN, but requires to register nested shifts for IGVN in parsing not to miss them later. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > address review comments That looks good to me, thanks for all the updates! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24355#pullrequestreview-2739400194 From qamai at openjdk.org Thu Apr 3 10:59:56 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Thu, 3 Apr 2025 10:59:56 GMT Subject: RFR: 8349563: Improve AbsNode::Value() for integer types [v2] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 09:37:26 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/subnode.cpp line 1938: >> >>> 1936: >>> 1937: NativeType lo_abs = uabs(t->_lo); >>> 1938: NativeType hi_abs = uabs(t->_hi); >> >> Converting unsigned to signed is C++ Undefined Behavior, is it not? > > @dean-long ? No converting unsigned to signed is not UB, the behaviour is the same as in Java. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23685#discussion_r2026748511 From thartmann at openjdk.org Thu Apr 3 11:12:06 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 3 Apr 2025 11:12:06 GMT Subject: RFR: 8353345: C2 asserts because maskShiftAmount modifies node without deleting the hash [v3] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 09:34:42 GMT, Marc Chevalier wrote: >> First delete the hash, then `set_req`. This way, we avoid changing the node (a non-`this` node) without deleting the hash. This wrong ordering is not new from [JDK-8347459](https://bugs.openjdk.org/browse/JDK-8347459), but before that, only `this` was going through this function, so it was ok. But since, it is used with other nodes, hence the need to remove the hash. >> >> Also, not do any of that outside IGVN, but requires to register nested shifts for IGVN in parsing not to miss them later. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > address review comments Nice refactoring! Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24355#pullrequestreview-2739459799 From hgreule at openjdk.org Thu Apr 3 11:37:07 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Thu, 3 Apr 2025 11:37:07 GMT Subject: RFR: 8353359: C2: Or(I|L)Node::Ideal is missing AddNode::Ideal call [v2] In-Reply-To: <4qrmlvl4NrkGCFYYwkvzbmjQwAlIZhpyFHiol3m3NpY=.5dc8822d-3185-4569-8352-965894ba0149@github.com> References: <4qrmlvl4NrkGCFYYwkvzbmjQwAlIZhpyFHiol3m3NpY=.5dc8822d-3185-4569-8352-965894ba0149@github.com> Message-ID: On Thu, 3 Apr 2025 07:52:21 GMT, Hannes Greule wrote: >> Hi, >> >> this simple change adds a missing AddNode::Ideal call to Or(I|L)Node::Ideal. See the added tests for examples of optimizations that don't apply without this change. >> >> Please let me know what you think. > > Hannes Greule has updated the pull request incrementally with two additional commits since the last revision: > > - update license year > - add bug id Thank you for your reviews and comments :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24348#issuecomment-2775462863 From epeter at openjdk.org Thu Apr 3 11:37:07 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 3 Apr 2025 11:37:07 GMT Subject: RFR: 8353359: C2: Or(I|L)Node::Ideal is missing AddNode::Ideal call [v2] In-Reply-To: References: <4qrmlvl4NrkGCFYYwkvzbmjQwAlIZhpyFHiol3m3NpY=.5dc8822d-3185-4569-8352-965894ba0149@github.com> Message-ID: On Thu, 3 Apr 2025 11:31:28 GMT, Hannes Greule wrote: >> Hannes Greule has updated the pull request incrementally with two additional commits since the last revision: >> >> - update license year >> - add bug id > > Thank you for your reviews and comments :) @SirYwell Thanks again for the work and all the updates :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24348#issuecomment-2775469156 From hgreule at openjdk.org Thu Apr 3 11:37:08 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Thu, 3 Apr 2025 11:37:08 GMT Subject: Integrated: 8353359: C2: Or(I|L)Node::Ideal is missing AddNode::Ideal call In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 06:20:48 GMT, Hannes Greule wrote: > Hi, > > this simple change adds a missing AddNode::Ideal call to Or(I|L)Node::Ideal. See the added tests for examples of optimizations that don't apply without this change. > > Please let me know what you think. This pull request has now been integrated. Changeset: 3ceabf0f Author: Hannes Greule Committer: Emanuel Peter URL: https://git.openjdk.org/jdk/commit/3ceabf0f647beb4943c06709aa8797f7511cd48e Stats: 41 lines in 3 files changed: 33 ins; 0 del; 8 mod 8353359: C2: Or(I|L)Node::Ideal is missing AddNode::Ideal call Reviewed-by: epeter, chagedorn ------------- PR: https://git.openjdk.org/jdk/pull/24348 From dlunden at openjdk.org Thu Apr 3 11:41:55 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 3 Apr 2025 11:41:55 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v13] In-Reply-To: References: Message-ID: > If a method has a large number of parameters, we currently bail out from C2 compilation. > > ### Changeset > > Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. > > Changes: > - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. > - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. > - Remove all `can_represent` checks and bailouts. > - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. > - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. > - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, not worth it). > > ![c2-regression](https:/... Daniel Lund?n has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 20 commits: - Updates after comments - Tag short-lived register mask arena - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467 - Formatting updates - Add register mask fuzzer test - Extend example with offset register mask - Remove accidental leftover #endif - Update - Fix trailing whitespace - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467 - ... and 10 more: https://git.openjdk.org/jdk/compare/a1ab1d8d...76f6b8f8 ------------- Changes: https://git.openjdk.org/jdk/pull/20404/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20404&range=12 Stats: 12649 lines in 31 files changed: 12306 ins; 90 del; 253 mod Patch: https://git.openjdk.org/jdk/pull/20404.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20404/head:pull/20404 PR: https://git.openjdk.org/jdk/pull/20404 From dlunden at openjdk.org Thu Apr 3 11:46:10 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 3 Apr 2025 11:46:10 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v11] In-Reply-To: <-pdjdg9OQRB7YaXNFiVeVseLEoJDZb2XkMk0ml3pm3w=.2ecb257e-5618-4763-90e5-a2b1d0758e67@github.com> References: <0Yf6qZwnLz7oAtSFscDwHifQAmaPuHzeSrpkqMVchDU=.c7a5e8af-9390-414b-850c-609110668eac@github.com> <-pdjdg9OQRB7YaXNFiVeVseLEoJDZb2XkMk0ml3pm3w=.2ecb257e-5618-4763-90e5-a2b1d0758e67@github.com> Message-ID: On Mon, 31 Mar 2025 13:20:12 GMT, Roberto Casta?eda Lozano wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Extend example with offset register mask > > src/hotspot/share/opto/optoreg.hpp line 237: > >> 235: } >> 236: OptoRegPair(OptoReg::Name f) : OptoRegPair(OptoReg::Bad, f) {} >> 237: OptoRegPair() : OptoRegPair(OptoReg::Bad, OptoReg::Bad) {} > > This is preexisting, but since the changeset touches the code: these two "partial" constructors seem unused, please consider removing them (but double-check in that case that they are unused for all platforms). Thanks, removed (and double-checked usage) > src/hotspot/share/opto/regmask.hpp line 545: > >> 543: >> 544: // Overlap test. Non-zero if any registers in common, including all-stack. >> 545: bool overlap(const RegMask &rm) const { > > Please review the frequency of the different tests in this function. I ran an instrumented version and found the test in Case 4 to succeed (return true) more often that Case 2 and Case 3. Thanks, I made a note to run some benchmarks for this and gather statistics. It is critical that we run case 1 first (results in a significant performance gain), but perhaps we can gain a little by ordering the rare cases as well. > src/hotspot/share/utilities/globalDefinitions.hpp line 1363: > >> 1361: // synchronized statements in Java. >> 1362: const int BoxLockNode_slot_limit = 200; >> 1363: > > This definition seems too C2-specific to be put in this shared file, could it be moved e.g. to `optoreg.hpp`? Thanks, I was unsure where to put this definition. It doesn't really relate to `OptoReg` and is rather a limitation for `RegMask`s, so I now simply put it as a constant in `regmask.hpp`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2026826903 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2026826440 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2026823410 From thartmann at openjdk.org Thu Apr 3 12:01:59 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 3 Apr 2025 12:01:59 GMT Subject: RFR: 8346989: C2: deoptimization and re-compilation cycle with Math.*Exact in case of frequent overflow [v6] In-Reply-To: References: <8ACplaVM_gN9cbIcQYGJmR4GNINm70PAJQ8uAgucK4Y=.14fdc7e2-e0af-4f0d-acb6-bcfe99ee8f36@github.com> Message-ID: On Wed, 2 Apr 2025 17:23:03 GMT, Marc Chevalier wrote: >> `Math.*Exact` intrinsics can cause many deopt when used repeatedly with problematic arguments. >> This fix proposes not to rely on intrinsics after `too_many_traps()` has been reached. >> >> Benchmark show that this issue affects every Math.*Exact functions. And this fix improve them all. >> >> tl;dr: >> - C1: no problem, no change >> - C2: >> - with intrinsics: >> - with overflow: clear improvement. Was way worse than C1, now is similar (~4s => ~600ms) >> - without overflow: no problem, no change >> - without intrinsics: no problem, no change >> >> Before the fix: >> >> Benchmark (SIZE) Mode Cnt Score Error Units >> MathExact.C1_1.loopAddIInBounds 1000000 avgt 3 1.272 ? 0.048 ms/op >> MathExact.C1_1.loopAddIOverflow 1000000 avgt 3 641.917 ? 58.238 ms/op >> MathExact.C1_1.loopAddLInBounds 1000000 avgt 3 1.402 ? 0.842 ms/op >> MathExact.C1_1.loopAddLOverflow 1000000 avgt 3 671.013 ? 229.425 ms/op >> MathExact.C1_1.loopDecrementIInBounds 1000000 avgt 3 3.722 ? 22.244 ms/op >> MathExact.C1_1.loopDecrementIOverflow 1000000 avgt 3 653.341 ? 279.003 ms/op >> MathExact.C1_1.loopDecrementLInBounds 1000000 avgt 3 2.525 ? 0.810 ms/op >> MathExact.C1_1.loopDecrementLOverflow 1000000 avgt 3 656.750 ? 141.792 ms/op >> MathExact.C1_1.loopIncrementIInBounds 1000000 avgt 3 4.621 ? 12.822 ms/op >> MathExact.C1_1.loopIncrementIOverflow 1000000 avgt 3 651.608 ? 274.396 ms/op >> MathExact.C1_1.loopIncrementLInBounds 1000000 avgt 3 2.576 ? 3.316 ms/op >> MathExact.C1_1.loopIncrementLOverflow 1000000 avgt 3 662.216 ? 71.879 ms/op >> MathExact.C1_1.loopMultiplyIInBounds 1000000 avgt 3 1.402 ? 0.587 ms/op >> MathExact.C1_1.loopMultiplyIOverflow 1000000 avgt 3 615.836 ? 252.137 ms/op >> MathExact.C1_1.loopMultiplyLInBounds 1000000 avgt 3 2.906 ? 5.718 ms/op >> MathExact.C1_1.loopMultiplyLOverflow 1000000 avgt 3 655.576 ? 147.432 ms/op >> MathExact.C1_1.loopNegateIInBounds 1000000 avgt 3 2.023 ? 0.027 ms/op >> MathExact.C1_1.loopNegateIOverflow 1000000 avgt 3 639.136 ? 30.841 ms/op >> MathExact.C1_1.loop... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > fix typo in comment Took me a while to parse the code but the refactoring definitely improves the situation :slightly_smiling_face: Looks good! ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23916#pullrequestreview-2739594615 From duke at openjdk.org Thu Apr 3 12:10:11 2025 From: duke at openjdk.org (duke) Date: Thu, 3 Apr 2025 12:10:11 GMT Subject: RFR: 8353345: C2 asserts because maskShiftAmount modifies node without deleting the hash [v3] In-Reply-To: References: Message-ID: <-sKm43ONlJn2YNNCUxXjG3p8xL10UGc7m7CF07P-uhA=.cc4d78be-01e7-42d7-b4d7-ad751312745e@github.com> On Thu, 3 Apr 2025 09:34:42 GMT, Marc Chevalier wrote: >> First delete the hash, then `set_req`. This way, we avoid changing the node (a non-`this` node) without deleting the hash. This wrong ordering is not new from [JDK-8347459](https://bugs.openjdk.org/browse/JDK-8347459), but before that, only `this` was going through this function, so it was ok. But since, it is used with other nodes, hence the need to remove the hash. >> >> Also, not do any of that outside IGVN, but requires to register nested shifts for IGVN in parsing not to miss them later. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > address review comments @marc-chevalier Your change (at version 7c9ec24aa81df185e4b5b672d4a92e3a3f2b985f) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24355#issuecomment-2775579848 From mchevalier at openjdk.org Thu Apr 3 12:10:09 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 3 Apr 2025 12:10:09 GMT Subject: RFR: 8353345: C2 asserts because maskShiftAmount modifies node without deleting the hash [v3] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 09:34:42 GMT, Marc Chevalier wrote: >> First delete the hash, then `set_req`. This way, we avoid changing the node (a non-`this` node) without deleting the hash. This wrong ordering is not new from [JDK-8347459](https://bugs.openjdk.org/browse/JDK-8347459), but before that, only `this` was going through this function, so it was ok. But since, it is used with other nodes, hence the need to remove the hash. >> >> Also, not do any of that outside IGVN, but requires to register nested shifts for IGVN in parsing not to miss them later. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > address review comments Thanks @TobiHartmann and @chhagedorn for reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24355#issuecomment-2775578176 From swen at openjdk.org Thu Apr 3 12:19:52 2025 From: swen at openjdk.org (Shaojin Wen) Date: Thu, 3 Apr 2025 12:19:52 GMT Subject: RFR: 8352316: More MergeStoreBench [v7] In-Reply-To: <-xsgQ8uhc8vksHhI4Elu3SwNqy8GEQdzCdB3SAsPQa0=.9ef939ee-6359-40cb-8663-dabaad6611b6@github.com> References: <5fLeODHTQw8vbuvTl6G0YPNszI5_tH1b3L_tWJtCTh8=.ca1b21f2-2890-4daa-8ce2-8112a3f7146b@github.com> <-xsgQ8uhc8vksHhI4Elu3SwNqy8GEQdzCdB3SAsPQa0=.9ef939ee-6359-40cb-8663-dabaad6611b6@github.com> Message-ID: <5Pr4WqnsBZrOfnqUWe-FSZ5UBkkGx5ghH113Jw1eO1Y=.31275627-2eb7-408f-a73f-ff974d993ea4@github.com> On Sat, 29 Mar 2025 07:43:32 GMT, Shaojin Wen wrote: >> Shaojin Wen has updated the pull request incrementally with one additional commit since the last revision: >> >> add StringBuilderUnsafePut > > I added a new scenario `StringBuilderUnsafePut`, using Unsafe to modify StringBuilder directly to implement append constants. > > The performance numbers below show that ArraySetConst/StringBuilderUnsafePut/UnsafePut have better performance. > > These numbers show that Stable Value's arraycopy has great performance optimization potential, which is worth more optimization for C2. > > # 1. Scipt > > git remote add wenshao git at github.com:wenshao/jdk.git > git fetch wenshao > git checkout cd1d8fb3b137a741446c894d1893e7180535ce8f > make test TEST="micro:vm.compiler.MergeStoreBench.str" > > > # 2. aliyun_ecs_c8a_x64 (CPU AMD EPYC? Genoa) > > Benchmark Mode Cnt Score Error Units > MergeStoreBench.str4ArraySetConst avgt 5 1338.414 ? 3.209 ns/op > MergeStoreBench.str4Arraycopy avgt 5 7271.203 ? 19.400 ns/op > MergeStoreBench.str4GetBytes avgt 5 6154.684 ? 9.910 ns/op > MergeStoreBench.str4GetChars avgt 5 14078.790 ? 59.175 ns/op > MergeStoreBench.str4StringBuilder avgt 5 15766.528 ? 4634.119 ns/op > MergeStoreBench.str4StringBuilderAppendChar avgt 5 41388.364 ? 9871.409 ns/op > MergeStoreBench.str4StringBuilderUnsafePut avgt 5 1575.792 ? 4.102 ns/op > MergeStoreBench.str4UnsafePut avgt 5 1326.499 ? 2.400 ns/op > MergeStoreBench.str4Utf16ArrayCopy avgt 5 13949.307 ? 1045.255 ns/op > MergeStoreBench.str4Utf16ArraySetConst avgt 5 1511.967 ? 5.250 ns/op > MergeStoreBench.str4Utf16StringBuilder avgt 5 18030.261 ? 1656.463 ns/op > MergeStoreBench.str4Utf16StringBuilderAppendChar avgt 5 35047.855 ? 16674.635 ns/op > MergeStoreBench.str4Utf16StringBuilderUnsafePut avgt 5 2785.792 ? 5.571 ns/op > MergeStoreBench.str4Utf16UnsafePut avgt 5 1613.812 ? 1.249 ns/op > MergeStoreBench.str5ArraySetConst avgt 5 2599.310 ? 8.667 ns/op > MergeStoreBench.str5Arraycopy avgt 5 9487.926 ? 29.234 ns/op > MergeStoreBench.str5GetBytes avgt 5 5972.453 ? 16.035 ns/op > MergeStoreBench.str5GetChars avgt 5 13516.943 ? 10.978 ns/op > MergeStoreBench.str5StringBuilder avgt 5 16539.070 ? 3097.339 ns/op > MergeStoreBench.str5StringBuilderAppendChar avgt 5 50506.770 ? 11536.41... > @wenshao @iwanowww I have a few concerns about this PR. > > Your current PR description says this: > > > Added performance tests related to String.getBytes/String.getChars/StringBuilder.append/System.arraycopy in constant scenarios to verify whether MergeStore works > > First: a benchmark is not the best way `to verify whether MergeStore works`. An IR test would be more helpful, as it could check reliably what IR is generated, and hence if MergeStores actually optimized anything. > > Second: A JMH benchmark could also be helpful, but only if you run it with and without MergeStores enabled. Otherwise how would you know if it was MergeStores or another optimization that is relevant here? > > Third: `getBytes` / `arraycopy` is **NOT** a MergeStores pattern. These are **COPY** patterns. So they probably should go to a separate benchmark file. I don't want the MergeStores benchmark polluted with unrelated cases. I could be wrong here, and just not see how these cases are MergeStore cases, but you need to show the details here. > > I put some time in understanding your PR and asking you a list of questions. You did not really respond to them, and that is frustrating to me and makes me feel like my time is not valued: [#24108 (comment)](https://github.com/openjdk/jdk/pull/24108#issuecomment-2762946069) > > You say this: > > > By default, in OpenJDK, COMPACT_STRINGS = true, and the String coder without UTF16 characters is LATIN1, which is implemented using System.arraycopy. However, since String is immutable and System.arraycopy is directly performed on byte[], C2 should have more opportunities for optimization. > > Maybe the `System.arraycopy` can be optimized. But I don't think it is the MergeStores optimization that would do that. This is really a **Copy** pattern and not a `MergeStores` pattern. Please read the PRs on MergeStores to see what patterns are covered. > > And like I asked in previously: > > > Can you investigate what code it generates, and what kinds of optimizations are missing to make it close in performance to the Unsafe benchmark? > > I don't have time to do all the deep investigations myself. But feel free to ask me if you have more questions. > > To me, benchmarks are only helpful and worth integrating if there is some clear and documented purpose. It would be really nice if you could invest some time into that :) The C2 MergeStore you made is very good. I think you did a great job, so I submitted this PR, hoping that C2 can do more. But I am a Java programmer, not good at C++ and assembly. I don?t know how to investigate the details. Can you give me some suggestions? I don?t know the details of the optimizer yet, and I can?t provide IR tests. This benchmark and the performance numbers of the results prove that there is a lot of room for performance improvement in the copy of constant String and byte[]. As you said, this does not look like MergeStore, but should be a constant copy optimization. I can separate this into a separate Benchmark. Can you give me some suggestions on the name of the Benchmark? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24108#issuecomment-2775585230 From swen at openjdk.org Thu Apr 3 12:19:57 2025 From: swen at openjdk.org (Shaojin Wen) Date: Thu, 3 Apr 2025 12:19:57 GMT Subject: RFR: 8352316: More MergeStoreBench [v7] In-Reply-To: References: <5fLeODHTQw8vbuvTl6G0YPNszI5_tH1b3L_tWJtCTh8=.ca1b21f2-2890-4daa-8ce2-8112a3f7146b@github.com> Message-ID: <6EU6RBqEH5NAwR5RQFE4ynVoZ4cartnNM37ZJ98fq8k=.54e1cd98-ccd1-4189-acb8-74a76c713cce@github.com> On Wed, 2 Apr 2025 06:46:41 GMT, Emanuel Peter wrote: >> Shaojin Wen has updated the pull request incrementally with one additional commit since the last revision: >> >> add StringBuilderUnsafePut > > test/micro/org/openjdk/bench/vm/compiler/MergeStoreBench.java line 693: > >> 691: } >> 692: BH.consume(off); >> 693: } > > This is a copy pattern, not MergeStores. As above, STR_4 is a string constant of length 4. Can it be optimized to write a long? > test/micro/org/openjdk/bench/vm/compiler/MergeStoreBench.java line 735: > >> 733: } >> 734: BH.consume(off); >> 735: } > > @wenshao This is a copy pattern. Not a MergeStore pattern. So I can tell you already now that it will not be optimized by MergeStores ;) If STR_4_BYTES_UTF16 is a StableValue, is it possible to optimize to writing a long? > test/micro/org/openjdk/bench/vm/compiler/MergeStoreBench.java line 799: > >> 797: } >> 798: BH.consume(off); >> 799: } > > @wenshao Why would MergeStores work here? This is is a copy pattern. That is not at all covered by MergeStores. This is a constant of length 5. Can it be optimized to write a combination of int + byte? > test/micro/org/openjdk/bench/vm/compiler/MergeStoreBench.java line 856: > >> 854: } >> 855: BH.consume(sb.length()); >> 856: } > > Why would you expect MergeStores to work here? STR_5 is a string constant with a length of 5. Is it possible to optimize it into an implementation similar to str5StringBuilderUnsafePut? The performance can be greatly improved. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24108#discussion_r2026870739 PR Review Comment: https://git.openjdk.org/jdk/pull/24108#discussion_r2026869185 PR Review Comment: https://git.openjdk.org/jdk/pull/24108#discussion_r2026866952 PR Review Comment: https://git.openjdk.org/jdk/pull/24108#discussion_r2026875472 From mchevalier at openjdk.org Thu Apr 3 12:26:01 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 3 Apr 2025 12:26:01 GMT Subject: Integrated: 8353345: C2 asserts because maskShiftAmount modifies node without deleting the hash In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 11:51:13 GMT, Marc Chevalier wrote: > First delete the hash, then `set_req`. This way, we avoid changing the node (a non-`this` node) without deleting the hash. This wrong ordering is not new from [JDK-8347459](https://bugs.openjdk.org/browse/JDK-8347459), but before that, only `this` was going through this function, so it was ok. But since, it is used with other nodes, hence the need to remove the hash. > > Also, not do any of that outside IGVN, but requires to register nested shifts for IGVN in parsing not to miss them later. > > Thanks, > Marc This pull request has now been integrated. Changeset: 296d9d6f Author: Marc Chevalier Committer: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/296d9d6f7a734cc2bab21c58f21a941150b4cf2a Stats: 113 lines in 2 files changed: 79 ins; 3 del; 31 mod 8353345: C2 asserts because maskShiftAmount modifies node without deleting the hash Reviewed-by: chagedorn, thartmann ------------- PR: https://git.openjdk.org/jdk/pull/24355 From dlunden at openjdk.org Thu Apr 3 12:45:00 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 3 Apr 2025 12:45:00 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v11] In-Reply-To: <-pdjdg9OQRB7YaXNFiVeVseLEoJDZb2XkMk0ml3pm3w=.2ecb257e-5618-4763-90e5-a2b1d0758e67@github.com> References: <0Yf6qZwnLz7oAtSFscDwHifQAmaPuHzeSrpkqMVchDU=.c7a5e8af-9390-414b-850c-609110668eac@github.com> <-pdjdg9OQRB7YaXNFiVeVseLEoJDZb2XkMk0ml3pm3w=.2ecb257e-5618-4763-90e5-a2b1d0758e67@github.com> Message-ID: On Mon, 31 Mar 2025 13:24:09 GMT, Roberto Casta?eda Lozano wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Extend example with offset register mask > > src/hotspot/share/opto/postaloc.cpp line 686: > >> 684: assert(!(!value[ureg_lo] && lrgs(useidx).mask().is_offset() && >> 685: !lrgs(useidx).mask().Member(ureg_lo)), >> 686: "invalid assumption"); > > Could you use more descriptive names and assertion messages in this new assertion and the one below? Ideally, without having to refer to old versions. What is the invariant that we want to check? How does it relate to the surrounding code? As we've previously discussed offline, I also had my doubts when introducing these asserts. I've now had a second look (with reasonably fresh eyes), and believe I now better understand the underlying assumptions. The two problematic pieces of code in `postaloc.cpp` from before this changeset that we need to translate as part of the changeset are if (!value[ureg_lo] && (!RegMask::can_represent(ureg_lo) || lrgs(useidx).mask().Member(ureg_lo))) { // Nearly always adjacent and if( RegMask::can_represent(nreg_lo) && // Either a spill slot, or !lrgs(lidx).mask().Member(nreg_lo) ) { // Nearly always adjacent Specifically, the `RegMask::can_represent` calls check if their argument registers can fit in the statically determined size of register masks (which no longer makes sense in this changeset). The reason for the `can_represent` calls is that the subsequent `Member` calls assert internally that their arguments can fit within the static size of register masks. That is, `can_represent` worked as a guard to ensure the precondition for the call to `Member` holds. In this changeset, the `Member` function is generalized to allow arbitrary arguments (and the interal assert is removed). Therefore, we can remove the `can_represent` guards. Now to the assertions that I added (which I've now improved). From the if conditions, we can infer there is an implicit invariant that a register for which `can_represent` returns false is necessarily "adjacent". Specifically, `can_represent` returning false implies that the register is a spill slot (implied by a comment in the source code). However, registers for which `can_represent` returns true may **also** be spill splots, so using `can_represent` as a proxy check for spill slots feels clumsy. I believe that the real invariant here is that only actual registers (and not stack locations, including spill slots) can be non-adjacent. This is what I now verify with my updated asserts. For the record, I have not been able to find any cases with non-adjacency in any tests on current Oracle-supported platforms. From another comment in the source code, it looks like non-adjacent pairs are quite specific to SPARC. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2026915863 From roland at openjdk.org Thu Apr 3 12:45:19 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 3 Apr 2025 12:45:19 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v8] In-Reply-To: References: Message-ID: <76hE1JDzr58tKpurf2reT_tDoLdXHiTujP-SeD9HjrA=.550f3465-c929-4aba-bbff-9d74b6793f0e@github.com> > This is primarily motivated by 8275202 (C2: optimize out more > redundant conditions). In the following code snippet: > > > int[] array = new int[arraySize]; > if (j <= arraySize) { > if (i >= 0) { > if (i < j) { > int v = array[i]; > > > (`arraySize` is a constant) > > at the range check, `j` is known to be in `[min, arraySize]` as a > consequence, `i` is known to be `[0, arraySize-1]`. The range check > can be eliminated. > > Now, if later, `i` constant folds to some value that's positive but > out of range for the array: > > - if that happens when the new pass runs, then it can prove that: > > if (i < j) { > > is never taken. > > - if that happens during IGVN or CCP however, that condition is not > constant folded. And because the range check was removed, there's no > guard protecting the range check `CastII`. It becomes `top` and, as > a result, the graph can become broken. > > What I propose here is that when the `CastII` becomes dead, any CFG > paths that use the `CastII` node is made unreachable. So in pseudo code: > > > int[] array = new int[arraySize]; > if (j <= arraySize) { > if (i >= 0) { > if (i < j) { > halt(); > > > Finding the CFG paths is implemented in the patch by following the > uses of the node until a CFG node or a `Phi` is encountered. > > The patch applies this to all `Type` nodes as with 8275202, I also ran > in some rare corner cases with other types of nodes. The exception is > `Phi` nodes which may not be as easy to handle (and for which I had no > issue with 8275202). > > Finally, the patch includes a test case that's unrelated to the > discussion of 8275202 above. In that test case, a `CastII` becomes top > but the test that guards it doesn't constant fold. The root cause is a > transformation of: > > > (CastII (AddI > > > into > > > (AddI (CastII ) (CastII)` > > > which causes the resulting node to have a wider type. The `CastII` > captures a type before the transformation above happens. Once it has > happened, the guard for the `CastII` can't be constant folded when an > out of bound value occurs. > > This is likely fixable some other way (eventhough it doesn't seem > straightforward). Given the long history of similar issues (and the > test case that shows that they are more hiding), I think it would > make sense to try some other way of approaching them. Roland Westrelin has updated the pull request incrementally with two additional commits since the last revision: - Update src/hotspot/share/opto/node.cpp Co-authored-by: Christian Hagedorn - Update test/hotspot/jtreg/compiler/c2/TestGuardOfCastIIDoesntFold.java Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23468/files - new: https://git.openjdk.org/jdk/pull/23468/files/1ec2177a..2abd3054 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23468&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23468&range=06-07 Stats: 3 lines in 2 files changed: 0 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/23468.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23468/head:pull/23468 PR: https://git.openjdk.org/jdk/pull/23468 From roland at openjdk.org Thu Apr 3 12:45:19 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 3 Apr 2025 12:45:19 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v7] In-Reply-To: References: <09Q1vDaTXq3VlLU4xxQl_E7wDM2FT7tqR_Bc8ky8RNc=.4e11f2f8-75c3-49a1-b0b3-20eac17c4b39@github.com> Message-ID: On Tue, 1 Apr 2025 08:42:12 GMT, Christian Hagedorn wrote: >> The callers have the `ResourceMark`. This is because it's code I extracted from 8275202: I think it used to not be safe to call `PhaseIdealLoop::register_new_node` from within the `ResourceMark` but I see there were changes in that area (data structures used by `PhaseIdealLoop` no longer allocated in the resource area). So it looks like it could be changed now. > > I assume that JDK-8275202 also calls this method with a non-null `PhaseIdealLoop` pointer? Now we only pass in null, so the `loop` parameter could be removed. Right. Do you think it's better to remove the parameter that's used (for now)? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23468#discussion_r2026910749 From mli at openjdk.org Thu Apr 3 12:50:13 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 3 Apr 2025 12:50:13 GMT Subject: RFR: 8353600: RISC-V: compiler/vectorization/TestRotateByteAndShortVector.java is failing with Zvbb Message-ID: <3xU-sLLf0E_4n9BsUXL4COF7mxBjDd8YzgyIvissvQQ=.472cf772-a97a-48c8-b4e6-907fcfdd1ebb@github.com> Hi, Can you help to review this patch? Currently, the followign code is considered an RotateLeftV of byte by hotspot, but it's not a real rotate, as the `-shift` will 30, which makes `b >> -shift` zero, rather the value we expected. int shift = 2; byte b = 83; byte res = (byte) (b << shift | b >> -shift); // res = 76 // but a real left rotate of 83 should be 77 ?? ``` So, the simple fix is to enable RotateLeftV only for int and long, disable it for other types. A more rational fix should be change C2 to not convert code like ` (byte) (b << shift | b >> -shift)` to a RotateLeftV node, but it needs more investigation, and I'm not sure if it's feasible to do so, as currently no platform support RotateLeftV for non-int/long types. Thanks! ------------- Commit messages: - merge master - initial commit Changes: https://git.openjdk.org/jdk/pull/24414/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24414&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353600 Stats: 25 lines in 1 file changed: 4 ins; 21 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24414.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24414/head:pull/24414 PR: https://git.openjdk.org/jdk/pull/24414 From dlunden at openjdk.org Thu Apr 3 12:54:08 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 3 Apr 2025 12:54:08 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v12] In-Reply-To: References: Message-ID: <02SMTBF5t1QrQ7zGvm6zSyN4JUDIPDYwztAIzdynOqg=.ad1cacd5-451f-43d9-992c-831172214d6a@github.com> On Tue, 1 Apr 2025 16:00:46 GMT, Roberto Casta?eda Lozano wrote: >> Daniel Lund?n has updated the pull request incrementally with two additional commits since the last revision: >> >> - Formatting updates >> - Add register mask fuzzer test > > src/hotspot/share/opto/chaitin.cpp line 1533: > >> 1531: // hesitation). >> 1532: if (OptoReg::is_valid(reg2) && >> 1533: OptoReg::is_reg(reg2 - lrg.mask().offset_bits())) { > > I agree that this was probably an oversight in the original code. For simplicity I suggest to replace the check with just `OptoReg::is_reg(reg2)` as you suggest, explicitly limiting the scope of the alternation heuristic to physical registers. I compared the overall effectiveness of post-allocation copy removal (as summarized by `-XX:+PrintOptoStatistics`) between this changeset and your proposed simplification and I cannot see any significant difference. I really wonder if the entire alternation heuristic really has any positive measurable effect, but that investigation belongs to another RFE. Thanks for comparing! Now changed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2026931949 From dlunden at openjdk.org Thu Apr 3 12:54:15 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 3 Apr 2025 12:54:15 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v13] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 16:02:07 GMT, Roberto Casta?eda Lozano wrote: >> Daniel Lund?n has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 20 commits: >> >> - Updates after comments >> - Tag short-lived register mask arena >> - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467 >> - Formatting updates >> - Add register mask fuzzer test >> - Extend example with offset register mask >> - Remove accidental leftover #endif >> - Update >> - Fix trailing whitespace >> - Merge remote-tracking branch 'upstream/master' into many-arguments-8325467 >> - ... and 10 more: https://git.openjdk.org/jdk/compare/a1ab1d8d...76f6b8f8 > > src/hotspot/share/opto/chaitin.cpp line 1591: > >> 1589: // will be a no-op. (Later on, if lrg runs out of possible colors in >> 1590: // its chunk, a new chunk of color may be tried, in which case >> 1591: // examination of neighbors is started again, at retry_next_chunk.) > > Doesn't the second part of the comment (`(Later on...)`) still apply after the changes? Thanks, good catch. Now restored. > src/hotspot/share/opto/matcher.cpp line 148: > >> 146: C->record_method_not_compilable("unsupported incoming calling sequence"); >> 147: return OptoReg::Bad; >> 148: } > > Please consider removing the failure polls after calling `warp_incoming_stk_arg`, I believe the removal of this bailout makes them unnecessary. Thanks, I've removed the polls after `warp_incoming_stk_arg` and also after `warp_outgoing_stk_arg`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2026932391 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2026930405 From dlunden at openjdk.org Thu Apr 3 12:54:17 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 3 Apr 2025 12:54:17 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v11] In-Reply-To: <-pdjdg9OQRB7YaXNFiVeVseLEoJDZb2XkMk0ml3pm3w=.2ecb257e-5618-4763-90e5-a2b1d0758e67@github.com> References: <0Yf6qZwnLz7oAtSFscDwHifQAmaPuHzeSrpkqMVchDU=.c7a5e8af-9390-414b-850c-609110668eac@github.com> <-pdjdg9OQRB7YaXNFiVeVseLEoJDZb2XkMk0ml3pm3w=.2ecb257e-5618-4763-90e5-a2b1d0758e67@github.com> Message-ID: On Mon, 31 Mar 2025 13:33:52 GMT, Roberto Casta?eda Lozano wrote: >> Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: >> >> Extend example with offset register mask > > src/hotspot/share/opto/matcher.cpp line 195: > >> 193: if (C->failing()) { >> 194: return; >> 195: } > > Is this failure poll required after your changes? Yes, this poll is still required. We may fail in `init_spill_mask -> regmask_for_ideal_register`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2026927964 From epeter at openjdk.org Thu Apr 3 12:58:58 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 3 Apr 2025 12:58:58 GMT Subject: RFR: 8352316: More MergeStoreBench [v7] In-Reply-To: <5Pr4WqnsBZrOfnqUWe-FSZ5UBkkGx5ghH113Jw1eO1Y=.31275627-2eb7-408f-a73f-ff974d993ea4@github.com> References: <5fLeODHTQw8vbuvTl6G0YPNszI5_tH1b3L_tWJtCTh8=.ca1b21f2-2890-4daa-8ce2-8112a3f7146b@github.com> <-xsgQ8uhc8vksHhI4Elu3SwNqy8GEQdzCdB3SAsPQa0=.9ef939ee-6359-40cb-8663-dabaad6611b6@github.com> <5Pr4WqnsBZrOfnqUWe-FSZ5UBkkGx5ghH113Jw1eO1Y=.31275627-2eb7-408f-a73f-ff974d993ea4@github.com> Message-ID: <9pEJ26ThuVTSmGKiviqYDfJZeMazGf7x_4m6CaUCeQY=.79a0c1a0-beec-4965-94e0-2c60e892fd15@github.com> On Thu, 3 Apr 2025 12:09:27 GMT, Shaojin Wen wrote: >> I added a new scenario `StringBuilderUnsafePut`, using Unsafe to modify StringBuilder directly to implement append constants. >> >> The performance numbers below show that ArraySetConst/StringBuilderUnsafePut/UnsafePut have better performance. >> >> These numbers show that Stable Value's arraycopy has great performance optimization potential, which is worth more optimization for C2. >> >> # 1. Scipt >> >> git remote add wenshao git at github.com:wenshao/jdk.git >> git fetch wenshao >> git checkout cd1d8fb3b137a741446c894d1893e7180535ce8f >> make test TEST="micro:vm.compiler.MergeStoreBench.str" >> >> >> # 2. aliyun_ecs_c8a_x64 (CPU AMD EPYC? Genoa) >> >> Benchmark Mode Cnt Score Error Units >> MergeStoreBench.str4ArraySetConst avgt 5 1338.414 ? 3.209 ns/op >> MergeStoreBench.str4Arraycopy avgt 5 7271.203 ? 19.400 ns/op >> MergeStoreBench.str4GetBytes avgt 5 6154.684 ? 9.910 ns/op >> MergeStoreBench.str4GetChars avgt 5 14078.790 ? 59.175 ns/op >> MergeStoreBench.str4StringBuilder avgt 5 15766.528 ? 4634.119 ns/op >> MergeStoreBench.str4StringBuilderAppendChar avgt 5 41388.364 ? 9871.409 ns/op >> MergeStoreBench.str4StringBuilderUnsafePut avgt 5 1575.792 ? 4.102 ns/op >> MergeStoreBench.str4UnsafePut avgt 5 1326.499 ? 2.400 ns/op >> MergeStoreBench.str4Utf16ArrayCopy avgt 5 13949.307 ? 1045.255 ns/op >> MergeStoreBench.str4Utf16ArraySetConst avgt 5 1511.967 ? 5.250 ns/op >> MergeStoreBench.str4Utf16StringBuilder avgt 5 18030.261 ? 1656.463 ns/op >> MergeStoreBench.str4Utf16StringBuilderAppendChar avgt 5 35047.855 ? 16674.635 ns/op >> MergeStoreBench.str4Utf16StringBuilderUnsafePut avgt 5 2785.792 ? 5.571 ns/op >> MergeStoreBench.str4Utf16UnsafePut avgt 5 1613.812 ? 1.249 ns/op >> MergeStoreBench.str5ArraySetConst avgt 5 2599.310 ? 8.667 ns/op >> MergeStoreBench.str5Arraycopy avgt 5 9487.926 ? 29.234 ns/op >> MergeStoreBench.str5GetBytes avgt 5 5972.453 ? 16.035 ns/op >> MergeStoreBench.str5GetChars avgt 5 13516.943 ? 10.978 ns/op >> MergeStoreBench.str5StringBuilder avgt 5 16539.070 ? 3097.339 ns/op >> MergeSt... > >> @wenshao @iwanowww I have a few concerns about this PR. >> >> Your current PR description says this: >> >> > Added performance tests related to String.getBytes/String.getChars/StringBuilder.append/System.arraycopy in constant scenarios to verify whether MergeStore works >> >> First: a benchmark is not the best way `to verify whether MergeStore works`. An IR test would be more helpful, as it could check reliably what IR is generated, and hence if MergeStores actually optimized anything. >> >> Second: A JMH benchmark could also be helpful, but only if you run it with and without MergeStores enabled. Otherwise how would you know if it was MergeStores or another optimization that is relevant here? >> >> Third: `getBytes` / `arraycopy` is **NOT** a MergeStores pattern. These are **COPY** patterns. So they probably should go to a separate benchmark file. I don't want the MergeStores benchmark polluted with unrelated cases. I could be wrong here, and just not see how these cases are MergeStore cases, but you need to show the details here. >> >> I put some time in understanding your PR and asking you a list of questions. You did not really respond to them, and that is frustrating to me and makes me feel like my time is not valued: [#24108 (comment)](https://github.com/openjdk/jdk/pull/24108#issuecomment-2762946069) >> >> You say this: >> >> > By default, in OpenJDK, COMPACT_STRINGS = true, and the String coder without UTF16 characters is LATIN1, which is implemented using System.arraycopy. However, since String is immutable and System.arraycopy is directly performed on byte[], C2 should have more opportunities for optimization. >> >> Maybe the `System.arraycopy` can be optimized. But I don't think it is the MergeStores optimization that would do that. This is really a **Copy** pattern and not a `MergeStores` pattern. Please read the PRs on MergeStores to see what patterns are covered. >> >> And like I asked in previously: >> >> > Can you investigate what code it generates, and what kinds of optimizations are missing to make it close in performance to the Unsafe benchmark? >> > I don't have time to do all the deep investigations myself. But feel free to ask me if you have more questions. >> >> To me, benchmarks are only helpful and worth integrating if there is some clear and documented purpose. It would be really nice if you could invest some time into that :) > > The C2 MergeStore you made is very good. I think you did a great job, so I submitted this PR, hoping that C2 can do... @wenshao > The C2 MergeStore you made is very good. I think you did a great job, so I submitted this PR, hoping that C2 can do more. Thanks for the compliment :) What I am saying is that this is most likely not the same optimization, and you would have to investigate what other optimizations are relevant here. > But I am a Java programmer, not good at C++ and assembly. I don?t know how to investigate the details. Can you give me some suggestions? Fair enough :) Maybe it's time to learn more about C++ and assembly then :) If you are interested in learning more about the C2 internals, I recommend you read my blog series: https://eme64.github.io/blog/2024/12/24/Intro-to-C2-Part00.html > I don?t know the details of the optimizer yet, and I can?t provide IR tests. There are lots of `@IR` tests in the repository, so you can just do what they did ;) > I don?t know the details of the optimizer yet, and I can?t provide IR tests. This benchmark and the performance numbers of the results prove that there is a lot of room for performance improvement in the copy of constant String and byte[]. > As you said, this does not look like MergeStore, but should be a constant copy optimization. I can separate this into a separate Benchmark. Can you give me some suggestions on the name of the Benchmark? Hmm. Well before we know a good name, we must know what is the relevant optimization for the patterns. I recommend that you find out what the general form of these patterns are, and what optimization steps would have to be taken. Then we can continue. My blog posts will help you get started, so that you can look at the IR and the generated assembly. Feel free to post your findings here, and then maybe I can help you a little on the way. I'm sorry, I am really very busy working on other projects and cannot do all that work for you ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24108#issuecomment-2775704308 From dlunden at openjdk.org Thu Apr 3 12:59:14 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Thu, 3 Apr 2025 12:59:14 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v12] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 16:16:36 GMT, Roberto Casta?eda Lozano wrote: >> Daniel Lund?n has updated the pull request incrementally with two additional commits since the last revision: >> >> - Formatting updates >> - Add register mask fuzzer test > > src/hotspot/share/opto/chaitin.cpp line 1655: > >> 1653: // Bump register mask up to next stack chunk >> 1654: bool success = lrg->rollover(); >> 1655: if (!success) { > > Was this scenario (running out of stack slots representable in `OptoRegPairs`) possible before, or was it prevented by some check removed in the changeset? Did you come across it in some compilation or is it more of a "theoretical" guard? Yes, it is a theoretical guard (also see the discussions earlier in this PR) and could also happen before this changeset if we roll over too much in `Select`. I experimented a bit with this earlier on and was not able to construct an example where we end up in this situation. > src/hotspot/share/opto/regmask.hpp line 282: > >> 280: _grow(src._rm_size, false); >> 281: memcpy(_RM_UP_EXT, src._RM_UP_EXT, >> 282: sizeof(uintptr_t) * (src._rm_size - _RM_SIZE)); > > This code is not very well covered by current tests, please consider adding some tests to `test_regmask.cpp` to exercise it. Now added! > src/hotspot/share/opto/regmask.hpp line 293: > >> 291: _hwm = _rm_max(); >> 292: } >> 293: _set_range(src._rm_size, value, _rm_size - src._rm_size); > > This code is not very well covered by current tests, please consider adding some tests to `test_regmask.cpp` to exercise it. Now added! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2026941355 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2026941803 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2026941926 From mchevalier at openjdk.org Thu Apr 3 13:01:15 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 3 Apr 2025 13:01:15 GMT Subject: RFR: 8346989: C2: deoptimization and re-compilation cycle with Math.*Exact in case of frequent overflow [v7] In-Reply-To: <8ACplaVM_gN9cbIcQYGJmR4GNINm70PAJQ8uAgucK4Y=.14fdc7e2-e0af-4f0d-acb6-bcfe99ee8f36@github.com> References: <8ACplaVM_gN9cbIcQYGJmR4GNINm70PAJQ8uAgucK4Y=.14fdc7e2-e0af-4f0d-acb6-bcfe99ee8f36@github.com> Message-ID: > `Math.*Exact` intrinsics can cause many deopt when used repeatedly with problematic arguments. > This fix proposes not to rely on intrinsics after `too_many_traps()` has been reached. > > Benchmark show that this issue affects every Math.*Exact functions. And this fix improve them all. > > tl;dr: > - C1: no problem, no change > - C2: > - with intrinsics: > - with overflow: clear improvement. Was way worse than C1, now is similar (~4s => ~600ms) > - without overflow: no problem, no change > - without intrinsics: no problem, no change > > Before the fix: > > Benchmark (SIZE) Mode Cnt Score Error Units > MathExact.C1_1.loopAddIInBounds 1000000 avgt 3 1.272 ? 0.048 ms/op > MathExact.C1_1.loopAddIOverflow 1000000 avgt 3 641.917 ? 58.238 ms/op > MathExact.C1_1.loopAddLInBounds 1000000 avgt 3 1.402 ? 0.842 ms/op > MathExact.C1_1.loopAddLOverflow 1000000 avgt 3 671.013 ? 229.425 ms/op > MathExact.C1_1.loopDecrementIInBounds 1000000 avgt 3 3.722 ? 22.244 ms/op > MathExact.C1_1.loopDecrementIOverflow 1000000 avgt 3 653.341 ? 279.003 ms/op > MathExact.C1_1.loopDecrementLInBounds 1000000 avgt 3 2.525 ? 0.810 ms/op > MathExact.C1_1.loopDecrementLOverflow 1000000 avgt 3 656.750 ? 141.792 ms/op > MathExact.C1_1.loopIncrementIInBounds 1000000 avgt 3 4.621 ? 12.822 ms/op > MathExact.C1_1.loopIncrementIOverflow 1000000 avgt 3 651.608 ? 274.396 ms/op > MathExact.C1_1.loopIncrementLInBounds 1000000 avgt 3 2.576 ? 3.316 ms/op > MathExact.C1_1.loopIncrementLOverflow 1000000 avgt 3 662.216 ? 71.879 ms/op > MathExact.C1_1.loopMultiplyIInBounds 1000000 avgt 3 1.402 ? 0.587 ms/op > MathExact.C1_1.loopMultiplyIOverflow 1000000 avgt 3 615.836 ? 252.137 ms/op > MathExact.C1_1.loopMultiplyLInBounds 1000000 avgt 3 2.906 ? 5.718 ms/op > MathExact.C1_1.loopMultiplyLOverflow 1000000 avgt 3 655.576 ? 147.432 ms/op > MathExact.C1_1.loopNegateIInBounds 1000000 avgt 3 2.023 ? 0.027 ms/op > MathExact.C1_1.loopNegateIOverflow 1000000 avgt 3 639.136 ? 30.841 ms/op > MathExact.C1_1.loopNegateLInBounds 1000000 avgt 3 2.422 ? 3.59... Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: Remove useless flags in tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23916/files - new: https://git.openjdk.org/jdk/pull/23916/files/238b129d..e7c8f3e0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23916&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23916&range=05-06 Stats: 9 lines in 1 file changed: 0 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/23916.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23916/head:pull/23916 PR: https://git.openjdk.org/jdk/pull/23916 From mchevalier at openjdk.org Thu Apr 3 13:01:16 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Thu, 3 Apr 2025 13:01:16 GMT Subject: RFR: 8346989: C2: deoptimization and re-compilation cycle with Math.*Exact in case of frequent overflow [v6] In-Reply-To: References: <8ACplaVM_gN9cbIcQYGJmR4GNINm70PAJQ8uAgucK4Y=.14fdc7e2-e0af-4f0d-acb6-bcfe99ee8f36@github.com> Message-ID: On Wed, 2 Apr 2025 17:23:03 GMT, Marc Chevalier wrote: >> `Math.*Exact` intrinsics can cause many deopt when used repeatedly with problematic arguments. >> This fix proposes not to rely on intrinsics after `too_many_traps()` has been reached. >> >> Benchmark show that this issue affects every Math.*Exact functions. And this fix improve them all. >> >> tl;dr: >> - C1: no problem, no change >> - C2: >> - with intrinsics: >> - with overflow: clear improvement. Was way worse than C1, now is similar (~4s => ~600ms) >> - without overflow: no problem, no change >> - without intrinsics: no problem, no change >> >> Before the fix: >> >> Benchmark (SIZE) Mode Cnt Score Error Units >> MathExact.C1_1.loopAddIInBounds 1000000 avgt 3 1.272 ? 0.048 ms/op >> MathExact.C1_1.loopAddIOverflow 1000000 avgt 3 641.917 ? 58.238 ms/op >> MathExact.C1_1.loopAddLInBounds 1000000 avgt 3 1.402 ? 0.842 ms/op >> MathExact.C1_1.loopAddLOverflow 1000000 avgt 3 671.013 ? 229.425 ms/op >> MathExact.C1_1.loopDecrementIInBounds 1000000 avgt 3 3.722 ? 22.244 ms/op >> MathExact.C1_1.loopDecrementIOverflow 1000000 avgt 3 653.341 ? 279.003 ms/op >> MathExact.C1_1.loopDecrementLInBounds 1000000 avgt 3 2.525 ? 0.810 ms/op >> MathExact.C1_1.loopDecrementLOverflow 1000000 avgt 3 656.750 ? 141.792 ms/op >> MathExact.C1_1.loopIncrementIInBounds 1000000 avgt 3 4.621 ? 12.822 ms/op >> MathExact.C1_1.loopIncrementIOverflow 1000000 avgt 3 651.608 ? 274.396 ms/op >> MathExact.C1_1.loopIncrementLInBounds 1000000 avgt 3 2.576 ? 3.316 ms/op >> MathExact.C1_1.loopIncrementLOverflow 1000000 avgt 3 662.216 ? 71.879 ms/op >> MathExact.C1_1.loopMultiplyIInBounds 1000000 avgt 3 1.402 ? 0.587 ms/op >> MathExact.C1_1.loopMultiplyIOverflow 1000000 avgt 3 615.836 ? 252.137 ms/op >> MathExact.C1_1.loopMultiplyLInBounds 1000000 avgt 3 2.906 ? 5.718 ms/op >> MathExact.C1_1.loopMultiplyLOverflow 1000000 avgt 3 655.576 ? 147.432 ms/op >> MathExact.C1_1.loopNegateIInBounds 1000000 avgt 3 2.023 ? 0.027 ms/op >> MathExact.C1_1.loopNegateIOverflow 1000000 avgt 3 639.136 ? 30.841 ms/op >> MathExact.C1_1.loop... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > fix typo in comment I've made the test flags tighter as discussed offline. I'll need a fresh approval. And for completeness, there are the bench result on this last state. We can see that things behave as we expect: builtin_throw is taken and making the situation a lot better. When intrinsics or builtin_throw are disabled, we see C1-like perfs. Benchmark (SIZE) Mode Cnt Score Error Units MathExact.C1_1.loopAddIInBounds 1000000 avgt 3 1.616 ? 7.813 ms/op MathExact.C1_1.loopAddIOverflow 1000000 avgt 3 654.971 ? 573.250 ms/op MathExact.C1_1.loopAddLInBounds 1000000 avgt 3 1.398 ? 0.274 ms/op MathExact.C1_1.loopAddLOverflow 1000000 avgt 3 629.620 ? 41.181 ms/op MathExact.C1_1.loopDecrementIInBounds 1000000 avgt 3 2.048 ? 0.340 ms/op MathExact.C1_1.loopDecrementIOverflow 1000000 avgt 3 681.702 ? 63.721 ms/op MathExact.C1_1.loopDecrementLInBounds 1000000 avgt 3 3.057 ? 13.688 ms/op MathExact.C1_1.loopDecrementLOverflow 1000000 avgt 3 660.457 ? 295.393 ms/op MathExact.C1_1.loopIncrementIInBounds 1000000 avgt 3 2.531 ? 13.692 ms/op MathExact.C1_1.loopIncrementIOverflow 1000000 avgt 3 647.970 ? 65.451 ms/op MathExact.C1_1.loopIncrementLInBounds 1000000 avgt 3 5.350 ? 25.080 ms/op MathExact.C1_1.loopIncrementLOverflow 1000000 avgt 3 681.097 ? 72.604 ms/op MathExact.C1_1.loopMultiplyIInBounds 1000000 avgt 3 1.552 ? 3.145 ms/op MathExact.C1_1.loopMultiplyIOverflow 1000000 avgt 3 648.402 ? 62.995 ms/op MathExact.C1_1.loopMultiplyLInBounds 1000000 avgt 3 2.501 ? 0.720 ms/op MathExact.C1_1.loopMultiplyLOverflow 1000000 avgt 3 701.498 ? 47.948 ms/op MathExact.C1_1.loopNegateIInBounds 1000000 avgt 3 2.074 ? 0.949 ms/op MathExact.C1_1.loopNegateIOverflow 1000000 avgt 3 665.143 ? 537.941 ms/op MathExact.C1_1.loopNegateLInBounds 1000000 avgt 3 5.487 ? 7.165 ms/op MathExact.C1_1.loopNegateLOverflow 1000000 avgt 3 687.085 ? 20.738 ms/op MathExact.C1_1.loopSubtractIInBounds 1000000 avgt 3 1.329 ? 0.769 ms/op MathExact.C1_1.loopSubtractIOverflow 1000000 avgt 3 683.922 ? 70.434 ms/op MathExact.C1_1.loopSubtractLInBounds 1000000 avgt 3 1.384 ? 0.386 ms/op MathExact.C1_1.loopSubtractLOverflow 1000000 avgt 3 664.380 ? 480.847 ms/op MathExact.C1_2.loopAddIInBounds 1000000 avgt 3 1.862 ? 0.815 ms/op MathExact.C1_2.loopAddIOverflow 1000000 avgt 3 660.421 ? 506.723 ms/op MathExact.C1_2.loopAddLInBounds 1000000 avgt 3 1.829 ? 0.221 ms/op MathExact.C1_2.loopAddLOverflow 1000000 avgt 3 681.209 ? 78.976 ms/op MathExact.C1_2.loopDecrementIInBounds 1000000 avgt 3 3.533 ? 11.302 ms/op MathExact.C1_2.loopDecrementIOverflow 1000000 avgt 3 682.639 ? 225.392 ms/op MathExact.C1_2.loopDecrementLInBounds 1000000 avgt 3 3.402 ? 1.031 ms/op MathExact.C1_2.loopDecrementLOverflow 1000000 avgt 3 697.283 ? 306.867 ms/op MathExact.C1_2.loopIncrementIInBounds 1000000 avgt 3 3.326 ? 5.072 ms/op MathExact.C1_2.loopIncrementIOverflow 1000000 avgt 3 658.514 ? 636.731 ms/op MathExact.C1_2.loopIncrementLInBounds 1000000 avgt 3 3.718 ? 0.422 ms/op MathExact.C1_2.loopIncrementLOverflow 1000000 avgt 3 693.863 ? 49.201 ms/op MathExact.C1_2.loopMultiplyIInBounds 1000000 avgt 3 1.924 ? 2.800 ms/op MathExact.C1_2.loopMultiplyIOverflow 1000000 avgt 3 609.308 ? 94.814 ms/op MathExact.C1_2.loopMultiplyLInBounds 1000000 avgt 3 3.459 ? 0.625 ms/op MathExact.C1_2.loopMultiplyLOverflow 1000000 avgt 3 713.503 ? 556.995 ms/op MathExact.C1_2.loopNegateIInBounds 1000000 avgt 3 3.195 ? 0.726 ms/op MathExact.C1_2.loopNegateIOverflow 1000000 avgt 3 684.176 ? 27.164 ms/op MathExact.C1_2.loopNegateLInBounds 1000000 avgt 3 3.483 ? 0.947 ms/op MathExact.C1_2.loopNegateLOverflow 1000000 avgt 3 656.284 ? 582.286 ms/op MathExact.C1_2.loopSubtractIInBounds 1000000 avgt 3 1.728 ? 0.315 ms/op MathExact.C1_2.loopSubtractIOverflow 1000000 avgt 3 688.029 ? 25.201 ms/op MathExact.C1_2.loopSubtractLInBounds 1000000 avgt 3 1.941 ? 0.169 ms/op MathExact.C1_2.loopSubtractLOverflow 1000000 avgt 3 694.341 ? 339.431 ms/op MathExact.C1_3.loopAddIInBounds 1000000 avgt 3 3.122 ? 0.910 ms/op MathExact.C1_3.loopAddIOverflow 1000000 avgt 3 688.731 ? 308.210 ms/op MathExact.C1_3.loopAddLInBounds 1000000 avgt 3 5.492 ? 36.236 ms/op MathExact.C1_3.loopAddLOverflow 1000000 avgt 3 697.053 ? 229.958 ms/op MathExact.C1_3.loopDecrementIInBounds 1000000 avgt 3 9.155 ? 72.182 ms/op MathExact.C1_3.loopDecrementIOverflow 1000000 avgt 3 708.458 ? 788.701 ms/op MathExact.C1_3.loopDecrementLInBounds 1000000 avgt 3 6.402 ? 3.658 ms/op MathExact.C1_3.loopDecrementLOverflow 1000000 avgt 3 705.992 ? 213.542 ms/op MathExact.C1_3.loopIncrementIInBounds 1000000 avgt 3 7.699 ? 61.434 ms/op MathExact.C1_3.loopIncrementIOverflow 1000000 avgt 3 697.353 ? 105.457 ms/op MathExact.C1_3.loopIncrementLInBounds 1000000 avgt 3 6.380 ? 0.839 ms/op MathExact.C1_3.loopIncrementLOverflow 1000000 avgt 3 669.240 ? 522.870 ms/op MathExact.C1_3.loopMultiplyIInBounds 1000000 avgt 3 3.225 ? 0.140 ms/op MathExact.C1_3.loopMultiplyIOverflow 1000000 avgt 3 624.811 ? 457.059 ms/op MathExact.C1_3.loopMultiplyLInBounds 1000000 avgt 3 6.110 ? 1.265 ms/op MathExact.C1_3.loopMultiplyLOverflow 1000000 avgt 3 718.460 ? 68.166 ms/op MathExact.C1_3.loopNegateIInBounds 1000000 avgt 3 6.085 ? 1.430 ms/op MathExact.C1_3.loopNegateIOverflow 1000000 avgt 3 675.036 ? 341.177 ms/op MathExact.C1_3.loopNegateLInBounds 1000000 avgt 3 9.410 ? 93.522 ms/op MathExact.C1_3.loopNegateLOverflow 1000000 avgt 3 652.042 ? 166.119 ms/op MathExact.C1_3.loopSubtractIInBounds 1000000 avgt 3 3.432 ? 11.899 ms/op MathExact.C1_3.loopSubtractIOverflow 1000000 avgt 3 654.208 ? 120.258 ms/op MathExact.C1_3.loopSubtractLInBounds 1000000 avgt 3 5.166 ? 38.529 ms/op MathExact.C1_3.loopSubtractLOverflow 1000000 avgt 3 691.094 ? 80.676 ms/op MathExact.C2.loopAddIInBounds 1000000 avgt 3 2.276 ? 1.750 ms/op MathExact.C2.loopAddIOverflow 1000000 avgt 3 1.173 ? 1.392 ms/op MathExact.C2.loopAddLInBounds 1000000 avgt 3 0.985 ? 0.167 ms/op MathExact.C2.loopAddLOverflow 1000000 avgt 3 1.990 ? 5.310 ms/op MathExact.C2.loopDecrementIInBounds 1000000 avgt 3 2.072 ? 0.173 ms/op MathExact.C2.loopDecrementIOverflow 1000000 avgt 3 1.911 ? 0.288 ms/op MathExact.C2.loopDecrementLInBounds 1000000 avgt 3 1.845 ? 0.424 ms/op MathExact.C2.loopDecrementLOverflow 1000000 avgt 3 2.757 ? 27.268 ms/op MathExact.C2.loopIncrementIInBounds 1000000 avgt 3 2.136 ? 0.517 ms/op MathExact.C2.loopIncrementIOverflow 1000000 avgt 3 2.199 ? 4.024 ms/op MathExact.C2.loopIncrementLInBounds 1000000 avgt 3 1.957 ? 0.365 ms/op MathExact.C2.loopIncrementLOverflow 1000000 avgt 3 2.053 ? 0.779 ms/op MathExact.C2.loopMultiplyIInBounds 1000000 avgt 3 1.174 ? 0.941 ms/op MathExact.C2.loopMultiplyIOverflow 1000000 avgt 3 1.971 ? 10.040 ms/op MathExact.C2.loopMultiplyLInBounds 1000000 avgt 3 0.997 ? 0.318 ms/op MathExact.C2.loopMultiplyLOverflow 1000000 avgt 3 2.847 ? 4.548 ms/op MathExact.C2.loopNegateIInBounds 1000000 avgt 3 4.783 ? 2.454 ms/op MathExact.C2.loopNegateIOverflow 1000000 avgt 3 1.915 ? 0.009 ms/op MathExact.C2.loopNegateLInBounds 1000000 avgt 3 2.824 ? 28.297 ms/op MathExact.C2.loopNegateLOverflow 1000000 avgt 3 4.766 ? 32.627 ms/op MathExact.C2.loopSubtractIInBounds 1000000 avgt 3 0.990 ? 0.264 ms/op MathExact.C2.loopSubtractIOverflow 1000000 avgt 3 1.181 ? 2.120 ms/op MathExact.C2.loopSubtractLInBounds 1000000 avgt 3 2.363 ? 1.575 ms/op MathExact.C2.loopSubtractLOverflow 1000000 avgt 3 2.429 ? 7.120 ms/op MathExact.C2_no_builtin_throw.loopAddIInBounds 1000000 avgt 3 1.040 ? 0.181 ms/op MathExact.C2_no_builtin_throw.loopAddIOverflow 1000000 avgt 3 580.950 ? 112.050 ms/op MathExact.C2_no_builtin_throw.loopAddLInBounds 1000000 avgt 3 1.223 ? 5.700 ms/op MathExact.C2_no_builtin_throw.loopAddLOverflow 1000000 avgt 3 585.712 ? 61.699 ms/op MathExact.C2_no_builtin_throw.loopDecrementIInBounds 1000000 avgt 3 2.114 ? 0.663 ms/op MathExact.C2_no_builtin_throw.loopDecrementIOverflow 1000000 avgt 3 604.866 ? 578.502 ms/op MathExact.C2_no_builtin_throw.loopDecrementLInBounds 1000000 avgt 3 2.167 ? 9.268 ms/op MathExact.C2_no_builtin_throw.loopDecrementLOverflow 1000000 avgt 3 621.175 ? 225.858 ms/op MathExact.C2_no_builtin_throw.loopIncrementIInBounds 1000000 avgt 3 1.950 ? 0.326 ms/op MathExact.C2_no_builtin_throw.loopIncrementIOverflow 1000000 avgt 3 633.735 ? 830.255 ms/op MathExact.C2_no_builtin_throw.loopIncrementLInBounds 1000000 avgt 3 2.397 ? 11.911 ms/op MathExact.C2_no_builtin_throw.loopIncrementLOverflow 1000000 avgt 3 627.599 ? 141.709 ms/op MathExact.C2_no_builtin_throw.loopMultiplyIInBounds 1000000 avgt 3 1.167 ? 1.187 ms/op MathExact.C2_no_builtin_throw.loopMultiplyIOverflow 1000000 avgt 3 623.224 ? 298.374 ms/op MathExact.C2_no_builtin_throw.loopMultiplyLInBounds 1000000 avgt 3 0.944 ? 0.743 ms/op MathExact.C2_no_builtin_throw.loopMultiplyLOverflow 1000000 avgt 3 658.380 ? 137.021 ms/op MathExact.C2_no_builtin_throw.loopNegateIInBounds 1000000 avgt 3 2.119 ? 0.642 ms/op MathExact.C2_no_builtin_throw.loopNegateIOverflow 1000000 avgt 3 643.102 ? 452.213 ms/op MathExact.C2_no_builtin_throw.loopNegateLInBounds 1000000 avgt 3 2.036 ? 0.862 ms/op MathExact.C2_no_builtin_throw.loopNegateLOverflow 1000000 avgt 3 586.103 ? 26.173 ms/op MathExact.C2_no_builtin_throw.loopSubtractIInBounds 1000000 avgt 3 2.552 ? 3.677 ms/op MathExact.C2_no_builtin_throw.loopSubtractIOverflow 1000000 avgt 3 635.294 ? 217.034 ms/op MathExact.C2_no_builtin_throw.loopSubtractLInBounds 1000000 avgt 3 1.093 ? 1.685 ms/op MathExact.C2_no_builtin_throw.loopSubtractLOverflow 1000000 avgt 3 661.541 ? 1358.199 ms/op MathExact.C2_no_intrinsics.loopAddIInBounds 1000000 avgt 3 2.185 ? 15.103 ms/op MathExact.C2_no_intrinsics.loopAddIOverflow 1000000 avgt 3 831.812 ? 1260.546 ms/op MathExact.C2_no_intrinsics.loopAddLInBounds 1000000 avgt 3 2.145 ? 0.088 ms/op MathExact.C2_no_intrinsics.loopAddLOverflow 1000000 avgt 3 709.930 ? 658.722 ms/op MathExact.C2_no_intrinsics.loopDecrementIInBounds 1000000 avgt 3 2.288 ? 0.950 ms/op MathExact.C2_no_intrinsics.loopDecrementIOverflow 1000000 avgt 3 646.879 ? 186.231 ms/op MathExact.C2_no_intrinsics.loopDecrementLInBounds 1000000 avgt 3 1.894 ? 0.421 ms/op MathExact.C2_no_intrinsics.loopDecrementLOverflow 1000000 avgt 3 641.577 ? 323.040 ms/op MathExact.C2_no_intrinsics.loopIncrementIInBounds 1000000 avgt 3 2.027 ? 0.249 ms/op MathExact.C2_no_intrinsics.loopIncrementIOverflow 1000000 avgt 3 657.092 ? 229.818 ms/op MathExact.C2_no_intrinsics.loopIncrementLInBounds 1000000 avgt 3 3.220 ? 16.992 ms/op MathExact.C2_no_intrinsics.loopIncrementLOverflow 1000000 avgt 3 603.468 ? 73.240 ms/op MathExact.C2_no_intrinsics.loopMultiplyIInBounds 1000000 avgt 3 1.295 ? 0.413 ms/op MathExact.C2_no_intrinsics.loopMultiplyIOverflow 1000000 avgt 3 593.005 ? 576.291 ms/op MathExact.C2_no_intrinsics.loopMultiplyLInBounds 1000000 avgt 3 1.093 ? 0.916 ms/op MathExact.C2_no_intrinsics.loopMultiplyLOverflow 1000000 avgt 3 618.956 ? 554.204 ms/op MathExact.C2_no_intrinsics.loopNegateIInBounds 1000000 avgt 3 2.035 ? 0.047 ms/op MathExact.C2_no_intrinsics.loopNegateIOverflow 1000000 avgt 3 650.591 ? 1248.923 ms/op MathExact.C2_no_intrinsics.loopNegateLInBounds 1000000 avgt 3 3.505 ? 20.475 ms/op MathExact.C2_no_intrinsics.loopNegateLOverflow 1000000 avgt 3 660.686 ? 201.612 ms/op MathExact.C2_no_intrinsics.loopSubtractIInBounds 1000000 avgt 3 1.109 ? 0.726 ms/op MathExact.C2_no_intrinsics.loopSubtractIOverflow 1000000 avgt 3 670.468 ? 475.269 ms/op MathExact.C2_no_intrinsics.loopSubtractLInBounds 1000000 avgt 3 1.208 ? 0.806 ms/op MathExact.C2_no_intrinsics.loopSubtractLOverflow 1000000 avgt 3 597.522 ? 32.465 ms/op ------------- PR Comment: https://git.openjdk.org/jdk/pull/23916#issuecomment-2775707480 From roland at openjdk.org Thu Apr 3 13:06:49 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 3 Apr 2025 13:06:49 GMT Subject: RFR: 8348853: Fold layout helper check for objects implementing non-array interfaces [v2] In-Reply-To: References: Message-ID: <5c7yEX837btOgbGnTKNn8a7hlPljZRwh0TpgZI6Ogb0=.1c7f3aed-8e8c-4efe-beed-68ea192bcb99@github.com> On Wed, 2 Apr 2025 14:11:34 GMT, Marc Chevalier wrote: >> src/hotspot/share/opto/memnode.cpp line 2214: >> >>> 2212: if (tkls->offset() == in_bytes(Klass::layout_helper_offset()) && >>> 2213: tkls->isa_instklassptr() && // not directly typed as an array >>> 2214: !tkls->is_instklassptr()->might_be_an_array() // not the supertype of all T[] (java.lang.Object) or has an interface that is not Serializable or Cloneable >> >> Could we do the same by using `TypeKlassPtr::maybe_java_subtype_of(TypeAryKlassPtr::BOTTOM)` and define a `TypeAryKlassPtr::BOTTOM` to be a static field for the `array_interfaces`? >> >> AFAICT, `TypeKlassPtr::maybe_java_subtype_of()` already covers that case so it would avoid some logic duplication. Also in the test above, maybe you could simplify the test a little but by removing `tkls->isa_instklassptr()`? > > I think it should be > > TypeAryKlassPtr::BOTTOM->maybe_java_subtype_of(tkls) > > rather than > > tkls->maybe_java_subtype_of(TypeAryKlassPtr::BOTTOM) > > > My reasoning: if `TypeAryKlassPtr::BOTTOM` is `java.lang.Object + Cloneable + Serializable` any array is a subtype of that. But so is any class implementing these interfaces. As well as as any `Object` implementing more interfaces. But for these two last cases, we know they cannot be array, which is what we want to know: are we sure it's not an array, or could it be an array? > > But if we check if `tkls` is a supertype of `java.lang.Object + Cloneable + Serializable`, then it has to be an `Object` (the most general class) and it implements a subset of `Cloneable` and `Serializable`. In this case, it can be an array. If `tkls` is not a super-type of `java.lang.Object + Cloneable + Serializable`, there are 2 cases: > - either it is an array type directly (so, I think, in a way or another, we need to check for `is_instklassptr`), and so a fortiori it can be an array type. > - it's an instance type and then cannot be an array since there is nothing between array types and `java.lang.Object + Cloneable + Serializable`. I.e. there is no type `T` that is not an array type, that is a super-type of at least one array type and that is not a super-type of `java.lang.Object + Cloneable + Serializable` (that is that is not `java.lang.Object` or that implements at least another interface). > > In other words, our question is > > \exists T: T is an array type /\ T <= tkls > > (where `A <= B` means `A is a subtype of B`) which is equivalent to > > tkls >= (java.lang.Object + Cloneable + Serializable) > / (tkls <= (java.lang.Object + Cloneable + Serializable) /\ tkls is an array type) > > > We can spare the call to `is_instklassptr` by using a virtual method instead or probably other mechanisms, that's an implementation detail. But I think we need to distinguish cases: both `int[]` and `MyClass + Cloneable + Serializable + MyInterface` are sub-types of `java.lang.Object + Cloneable + Serializable` but for one, we can conclude it's definitely an array, and the other, it's definitely not. Without distinguishing cases, the only sound approximation would be to that that everything can be an array (both sub and super types of `java.lang.Object + Cloneable + Serializable`). > > Does that makes sense? Did I get something wrong? is the `BOTTOM` not what you had in mind? Yes, what I suggested doesn't work indeed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24245#discussion_r2026954565 From thartmann at openjdk.org Thu Apr 3 13:13:51 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 3 Apr 2025 13:13:51 GMT Subject: RFR: 8346989: C2: deoptimization and re-compilation cycle with Math.*Exact in case of frequent overflow [v7] In-Reply-To: References: <8ACplaVM_gN9cbIcQYGJmR4GNINm70PAJQ8uAgucK4Y=.14fdc7e2-e0af-4f0d-acb6-bcfe99ee8f36@github.com> Message-ID: <5fCOI-cNWoRD89POiHnHraJaiy_73Hlt1xZCNGLcHrY=.aebffb74-c8e5-475e-a853-3576673d6161@github.com> On Thu, 3 Apr 2025 13:01:15 GMT, Marc Chevalier wrote: >> `Math.*Exact` intrinsics can cause many deopt when used repeatedly with problematic arguments. >> This fix proposes not to rely on intrinsics after `too_many_traps()` has been reached. >> >> Benchmark show that this issue affects every Math.*Exact functions. And this fix improve them all. >> >> tl;dr: >> - C1: no problem, no change >> - C2: >> - with intrinsics: >> - with overflow: clear improvement. Was way worse than C1, now is similar (~4s => ~600ms) >> - without overflow: no problem, no change >> - without intrinsics: no problem, no change >> >> Before the fix: >> >> Benchmark (SIZE) Mode Cnt Score Error Units >> MathExact.C1_1.loopAddIInBounds 1000000 avgt 3 1.272 ? 0.048 ms/op >> MathExact.C1_1.loopAddIOverflow 1000000 avgt 3 641.917 ? 58.238 ms/op >> MathExact.C1_1.loopAddLInBounds 1000000 avgt 3 1.402 ? 0.842 ms/op >> MathExact.C1_1.loopAddLOverflow 1000000 avgt 3 671.013 ? 229.425 ms/op >> MathExact.C1_1.loopDecrementIInBounds 1000000 avgt 3 3.722 ? 22.244 ms/op >> MathExact.C1_1.loopDecrementIOverflow 1000000 avgt 3 653.341 ? 279.003 ms/op >> MathExact.C1_1.loopDecrementLInBounds 1000000 avgt 3 2.525 ? 0.810 ms/op >> MathExact.C1_1.loopDecrementLOverflow 1000000 avgt 3 656.750 ? 141.792 ms/op >> MathExact.C1_1.loopIncrementIInBounds 1000000 avgt 3 4.621 ? 12.822 ms/op >> MathExact.C1_1.loopIncrementIOverflow 1000000 avgt 3 651.608 ? 274.396 ms/op >> MathExact.C1_1.loopIncrementLInBounds 1000000 avgt 3 2.576 ? 3.316 ms/op >> MathExact.C1_1.loopIncrementLOverflow 1000000 avgt 3 662.216 ? 71.879 ms/op >> MathExact.C1_1.loopMultiplyIInBounds 1000000 avgt 3 1.402 ? 0.587 ms/op >> MathExact.C1_1.loopMultiplyIOverflow 1000000 avgt 3 615.836 ? 252.137 ms/op >> MathExact.C1_1.loopMultiplyLInBounds 1000000 avgt 3 2.906 ? 5.718 ms/op >> MathExact.C1_1.loopMultiplyLOverflow 1000000 avgt 3 655.576 ? 147.432 ms/op >> MathExact.C1_1.loopNegateIInBounds 1000000 avgt 3 2.023 ? 0.027 ms/op >> MathExact.C1_1.loopNegateIOverflow 1000000 avgt 3 639.136 ? 30.841 ms/op >> MathExact.C1_1.loop... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Remove useless flags in tests Marked as reviewed by thartmann (Reviewer). Great, thank you! ------------- PR Review: https://git.openjdk.org/jdk/pull/23916#pullrequestreview-2739795916 PR Comment: https://git.openjdk.org/jdk/pull/23916#issuecomment-2775743849 From mli at openjdk.org Thu Apr 3 13:41:20 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 3 Apr 2025 13:41:20 GMT Subject: RFR: 8353600: RISC-V: compiler/vectorization/TestRotateByteAndShortVector.java is failing with Zvbb [v2] In-Reply-To: <3xU-sLLf0E_4n9BsUXL4COF7mxBjDd8YzgyIvissvQQ=.472cf772-a97a-48c8-b4e6-907fcfdd1ebb@github.com> References: <3xU-sLLf0E_4n9BsUXL4COF7mxBjDd8YzgyIvissvQQ=.472cf772-a97a-48c8-b4e6-907fcfdd1ebb@github.com> Message-ID: > Hi, > Can you help to review this patch? > > Currently, the followign code is considered an RotateLeftV of byte by hotspot, but it's not a real rotate, as the `-shift` will 30, which makes `b >> -shift` zero, rather the value we expected. > > int shift = 2; > byte b = 83; > byte res = (byte) (b << shift | b >> -shift); // res = 76 > // but a real left rotate of 83 should be 77 ?? > ``` > > So, the simple fix is to enable RotateLeftV only for int and long, disable it for other types. > > A more rational fix should be change C2 to not convert code like ` (byte) (b << shift | b >> -shift)` to a RotateLeftV node, but it needs more investigation, and I'm not sure if it's feasible to do so, as currently no platform support RotateLeftV for non-int/long types. > > Thanks! Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: fix tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24414/files - new: https://git.openjdk.org/jdk/pull/24414/files/ca44d6b8..fa5e7375 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24414&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24414&range=00-01 Stats: 41 lines in 1 file changed: 17 ins; 20 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24414.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24414/head:pull/24414 PR: https://git.openjdk.org/jdk/pull/24414 From mli at openjdk.org Thu Apr 3 13:49:05 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 3 Apr 2025 13:49:05 GMT Subject: RFR: 8353600: RISC-V: compiler/vectorization/TestRotateByteAndShortVector.java is failing with Zvbb [v3] In-Reply-To: <3xU-sLLf0E_4n9BsUXL4COF7mxBjDd8YzgyIvissvQQ=.472cf772-a97a-48c8-b4e6-907fcfdd1ebb@github.com> References: <3xU-sLLf0E_4n9BsUXL4COF7mxBjDd8YzgyIvissvQQ=.472cf772-a97a-48c8-b4e6-907fcfdd1ebb@github.com> Message-ID: > Hi, > Can you help to review this patch? > > Currently, the followign code is considered an RotateLeftV of byte by hotspot, but it's not a real rotate, as the `-shift` will 30, which makes `b >> -shift` zero, rather the value we expected. > > int shift = 2; > byte b = 83; > byte res = (byte) (b << shift | b >> -shift); // res = 76 > // but a real left rotate of 83 should be 77 ?? > ``` > > So, the simple fix is to enable RotateLeftV only for int and long, disable it for other types. > > A more rational fix should be change C2 to not convert code like ` (byte) (b << shift | b >> -shift)` to a RotateLeftV node, but it needs more investigation, and I'm not sure if it's feasible to do so, as currently no platform support RotateLeftV for non-int/long types. > > The vector instruction behaviour is different from java language spec, so seems there is no way to do it for now. > > Thanks! Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24414/files - new: https://git.openjdk.org/jdk/pull/24414/files/fa5e7375..592d4270 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24414&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24414&range=01-02 Stats: 5 lines in 1 file changed: 4 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24414.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24414/head:pull/24414 PR: https://git.openjdk.org/jdk/pull/24414 From dfenacci at openjdk.org Thu Apr 3 14:10:34 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Thu, 3 Apr 2025 14:10:34 GMT Subject: RFR: 8352963: [REDO] Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure Message-ID: This PR is a REDO of [JDK-8302459](https://bugs.openjdk.org/browse/JDK-8302459) ([PR](https://github.com/openjdk/jdk/pull/21682), [backout](https://bugs.openjdk.org/browse/JDK-8352965) triggered by a failing internal test). There was an issue with `CallGenerator::for_method_handle_call` that could delay late inlining by creating a "generic" `LateInlineCallGenerator` instead of a more specific `LateInlineMHCallGenerator`: https://github.com/openjdk/jdk/blob/74df384a9870431efb184158bba032c79c35356e/src/hotspot/share/opto/callGenerator.cpp#L991 While running IGVN this could be misinterpreted as non-MH late-inline https://github.com/openjdk/jdk/blob/c282fb9add32f1fac8174ca84b1b68a869d2578d/src/hotspot/share/opto/callnode.cpp#L1088-L1091 eventually triggering `assert(!cg->method()->is_method_handle_intrinsic(), "required");` The fix involves creating a `LateInlineMHCallGenerator` instead. Here is what changed from the backed out PR: https://github.com/openjdk/jdk/blob/c282fb9add32f1fac8174ca84b1b68a869d2578d/src/hotspot/share/opto/callGenerator.cpp#L991-L995 ### Testing Tier 1-4 (windows-x64, linux-x64/aarch64, and macosx-x64/aarch64; release and debug mode) ------------- Commit messages: - JDK-8352963: generate specific MH late if needed when delaying inlining - 8302459: Missing late inline cleanup causes compiler/vectorapi/VectorLogicalOpIdentityTest.java IR failure Changes: https://git.openjdk.org/jdk/pull/24402/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24402&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8352963 Stats: 104 lines in 7 files changed: 49 ins; 3 del; 52 mod Patch: https://git.openjdk.org/jdk/pull/24402.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24402/head:pull/24402 PR: https://git.openjdk.org/jdk/pull/24402 From roland at openjdk.org Thu Apr 3 14:12:52 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 3 Apr 2025 14:12:52 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v7] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 08:47:37 GMT, Christian Hagedorn wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > src/hotspot/share/opto/phaseX.cpp line 1836: > >> 1834: _type_nodes.push(n); >> 1835: } >> 1836: const Type* new_type = n->Value(this); > > Could we also only add `n` to `_type_nodes` if `new_type` is top? Then we could also rename `_type_nodes` to `_maybe_top_type_nodes` or something like that. if `new_type` is top? As node's types are widen by CCP, a node `n` will initially be `top`, then one input changes and becomes not `top` but if the node has another input (say control), that other input will still be `top` so the type will be `top` again. Only once both inputs are not `top` is the type not `top`. So isn't there a good chance that most type nodes will initially be `top` and be enqueued anyway so filtering nodes when they are popped is still required and we don't gain much by doing what you suggest? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23468#discussion_r2027086883 From roland at openjdk.org Thu Apr 3 14:30:21 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 3 Apr 2025 14:30:21 GMT Subject: RFR: 8341976: C2: use_mem_state != load->find_exact_control(load->in(0)) assert failure [v8] In-Reply-To: References: Message-ID: > The `arraycopy` writes to a non escaping array so its `ArrayCopy` node > is marked as having a narrow memory effect. One of the loads from the > destination after the copy is transformed into a load from the source > array (the rationale being that if there's no load from the > destination of the copy, the `arraycopy` is not needed). The load from > the source has the input memory state of the `ArrayCopy` as memory > input. That load is then sunk out of the loop and its control is > updated to be after the `ArrayCopy`. That's legal because the > `ArrayCopy` only has a narrow memory effect and can't modify the > source. The `ArrayCopy` can't be eliminated and is expanded. In the > process, a `MemBar` that has a wide memory effect is added. The load > from the source has control after the membar but memory state before > and because the membar has a wide memory effect, the load is anti > dependent on the membar: the graph is broken (the load can't be pinned > after the membar and anti dependent on it). > > In short, the problem is that the graph is transformed under the > assumption that the `ArrayCopy` has a narrow effect but the > `ArrayCopy` is expanded to a subgraph that has a wide memory > effect. The fix I propose is to not insert a membar with a wide memory > effect. We still need a membar when the destination is non escaping > because the expanded `ArrayCopy`, if it writes to a tighly allocated > array, writes to raw memory and not to the destination memory slice. Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23465/files - new: https://git.openjdk.org/jdk/pull/23465/files/9b21648d..a76839de Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23465&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23465&range=06-07 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23465.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23465/head:pull/23465 PR: https://git.openjdk.org/jdk/pull/23465 From roland at openjdk.org Thu Apr 3 14:30:23 2025 From: roland at openjdk.org (Roland Westrelin) Date: Thu, 3 Apr 2025 14:30:23 GMT Subject: RFR: 8341976: C2: use_mem_state != load->find_exact_control(load->in(0)) assert failure [v7] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 07:23:49 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: >> >> - review >> - Merge branch 'master' into JDK-8341976 >> - review >> - review >> - Merge branch 'master' into JDK-8341976 >> - -XX:+TraceLoopOpts fix >> - review >> - more >> - Merge branch 'master' into JDK-8341976 >> - more >> - ... and 6 more: https://git.openjdk.org/jdk/compare/90c6006f...9b21648d > > test/hotspot/jtreg/compiler/arraycopy/TestSunkLoadAntiDependency.java line 28: > >> 26: * @bug 8341976 >> 27: * @summary C2: use_mem_state != load->find_exact_control(load->in(0)) assert failure >> 28: * @run main/othervm -XX:-BackgroundCompilation TestSunkLoadAntiDependency > > Would it make sense to have a run without any flags? @eme64 I made that change in new commit ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23465#discussion_r2027128101 From mli at openjdk.org Thu Apr 3 17:02:22 2025 From: mli at openjdk.org (Hamlin Li) Date: Thu, 3 Apr 2025 17:02:22 GMT Subject: RFR: 8353665: RISC-V: IR verification fails in TestSubNodeFloatDoubleNegation.java Message-ID: Hi, Can you help to review this patch? The newly added TestSubNodeFloatDoubleNegation.java (in https://github.com/openjdk/jdk/pull/24150) is to check `0 - (0 - x)` is not folded to `x` for float and double. I have manually checked the IR and generated assembly code, it's not folded on riscv either, just there is an extra SubF in some code path. So, the fix for this test on riscv should be simply make the check as `>= 2` rather than `2`. Tested on both x86 and riscv64. Thanks ------------- Commit messages: - initial commit Changes: https://git.openjdk.org/jdk/pull/24421/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24421&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353665 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24421.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24421/head:pull/24421 PR: https://git.openjdk.org/jdk/pull/24421 From duke at openjdk.org Thu Apr 3 17:04:02 2025 From: duke at openjdk.org (Johannes Graham) Date: Thu, 3 Apr 2025 17:04:02 GMT Subject: RFR: 8347645: C2: XOR bounded value handling blocks constant folding [v46] In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 06:16:26 GMT, Emanuel Peter wrote: >> Renamed to `xor_upper_bound_for_ranges` before I saw your comment, @merykitty. I'd be ok with another name though. With the last changes, the method is no longer a member of the class, so it's no longer going to get as many eyes on it without context, so maybe it matters less now. > > @j3graham I gave it a quick look, and it looks even better now. Let me run testing again before you integrate! > > Please ping me in 24h for the results! Hi @eme64, any news on test results? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23089#issuecomment-2776423931 From kvn at openjdk.org Thu Apr 3 17:09:01 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 3 Apr 2025 17:09:01 GMT Subject: RFR: 8353192: C2: Clean up x86 backend after 32-bit x86 removal In-Reply-To: References: Message-ID: <8UOZXyMUssuNga9jUwBf6F1Nmhi6a3ZIJGpXzS3KL3U=.50774170-672e-49c9-8527-077914838e94@github.com> On Fri, 28 Mar 2025 17:09:19 GMT, Aleksey Shipilev wrote: > Piece-wise cleanup of C2 x86 backend. C2_MacroAssembler, x86.ad and related files are the target for this cleanup. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux x86_64 server fastdebug, `all` + `-XX:-TieredCompilation` src/hotspot/cpu/x86/x86.ad line 2680: > 2678: break; > 2679: case Op_VecX: > 2680: #ifndef _LP64 Here and in following code you left code for 32-bit instead of 64-bits. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24300#discussion_r2027426985 From kvn at openjdk.org Thu Apr 3 17:18:59 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 3 Apr 2025 17:18:59 GMT Subject: RFR: 8353188: C1: Clean up x86 backend after 32-bit x86 removal [v3] In-Reply-To: References: <-iwh_5JGpt-TAVpfZQjwbnIG_c8hvirNKCcmiZoLNls=.3b34bf15-51fc-42bf-a294-1c23ca99754c@github.com> Message-ID: On Wed, 2 Apr 2025 08:56:20 GMT, Aleksey Shipilev wrote: >> Piece-wise cleanup of C1_LIRAssembler_x86, C1_MacroAssembler and related classes. C1 implements the bulk of arch-specific backend there. Major parts of this backend are already removed by #24274, this cleans up another large bulk, and hopefully most of it. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux x86_64 server fastdebug, `all` + `-XX:TieredStopAtLevel=1` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Minor whitespace reverts Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24301#pullrequestreview-2740633488 From kvn at openjdk.org Thu Apr 3 17:27:58 2025 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 3 Apr 2025 17:27:58 GMT Subject: RFR: 8353192: C2: Clean up x86 backend after 32-bit x86 removal In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 17:09:19 GMT, Aleksey Shipilev wrote: > Piece-wise cleanup of C2 x86 backend. C2_MacroAssembler, x86.ad and related files are the target for this cleanup. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux x86_64 server fastdebug, `all` + `-XX:-TieredCompilation` I looked on `adlc` code to make sure nothing left there and found check for `IA32`. This looks like another variable we set for 32-bit x86: [platform.m4#L556](https://github.com/openjdk/jdk/blob/master/make/autoconf/platform.m4#L556) I surprise to see `X32` too which we check in `os_linux.cpp`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24300#issuecomment-2776475220 From jbhateja at openjdk.org Thu Apr 3 18:33:36 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 3 Apr 2025 18:33:36 GMT Subject: RFR: 8346236: Auto vectorization support for various Float16 operations [v10] In-Reply-To: References: Message-ID: <_ZuUmN2CJEVZwNDql7bfQJ8gsXRsIsgOOg6AWNdWzVE=.c267d502-c53b-4ea6-afb4-0415d67ef5ac@github.com> > This is a follow-up PR for https://github.com/openjdk/jdk/pull/22754 > > The patch adds support to vectorize various float16 scalar operations (add/subtract/divide/multiply/sqrt/fma). > > Summary of changes included with the patch: > 1. C2 compiler New Vector IR creation. > 2. Auto-vectorization support. > 3. x86 backend implementation. > 4. New IR verification test for each newly supported vector operation. > > Following are the performance numbers of Float16OperationsBenchmark > > System : Intel(R) Xeon(R) Processor code-named Granite rapids > Frequency fixed at 2.5 GHz > > > Baseline > Benchmark (vectorDim) Mode Cnt Score Error Units > Float16OperationsBenchmark.absBenchmark 1024 thrpt 2 4191.787 ops/ms > Float16OperationsBenchmark.addBenchmark 1024 thrpt 2 1211.978 ops/ms > Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 1024 thrpt 2 493.026 ops/ms > Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 1024 thrpt 2 612.430 ops/ms > Float16OperationsBenchmark.cosineSimilaritySingleRoundingFP16 1024 thrpt 2 616.012 ops/ms > Float16OperationsBenchmark.divBenchmark 1024 thrpt 2 604.882 ops/ms > Float16OperationsBenchmark.dotProductFP16 1024 thrpt 2 410.798 ops/ms > Float16OperationsBenchmark.euclideanDistanceDequantizedFP16 1024 thrpt 2 602.863 ops/ms > Float16OperationsBenchmark.euclideanDistanceFP16 1024 thrpt 2 640.348 ops/ms > Float16OperationsBenchmark.fmaBenchmark 1024 thrpt 2 809.175 ops/ms > Float16OperationsBenchmark.getExponentBenchmark 1024 thrpt 2 2682.764 ops/ms > Float16OperationsBenchmark.isFiniteBenchmark 1024 thrpt 2 3373.901 ops/ms > Float16OperationsBenchmark.isFiniteCMovBenchmark 1024 thrpt 2 1881.652 ops/ms > Float16OperationsBenchmark.isFiniteStoreBenchmark 1024 thrpt 2 2273.745 ops/ms > Float16OperationsBenchmark.isInfiniteBenchmark 1024 thrpt 2 2147.913 ops/ms > Float16OperationsBenchmark.isInfiniteCMovBenchmark 1024 thrpt 2 1962.579 ops/ms... Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8346236 - Review comment resolutions - Some re-factoring - Adding tests for new float16 Generator - Removing Generator dependency on incubation module - Review comments resolution. - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8346236 - Updating benchmark - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8346236 - Updating copyright - ... and 3 more: https://git.openjdk.org/jdk/compare/d894b781...6d05863d ------------- Changes: https://git.openjdk.org/jdk/pull/22755/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22755&range=09 Stats: 1165 lines in 23 files changed: 1077 ins; 12 del; 76 mod Patch: https://git.openjdk.org/jdk/pull/22755.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22755/head:pull/22755 PR: https://git.openjdk.org/jdk/pull/22755 From dlong at openjdk.org Thu Apr 3 19:15:06 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 3 Apr 2025 19:15:06 GMT Subject: RFR: 8349563: Improve AbsNode::Value() for integer types [v2] In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 10:57:06 GMT, Quan Anh Mai wrote: >> @dean-long ? > > No converting unsigned to signed is not UB, the behaviour is the same as in Java. I believe it's actually implementation-defined, not UB, until C++ 20, according to discussion in this other PR: https://github.com/openjdk/jdk/pull/24184#discussion_r2011464234 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23685#discussion_r2027594672 From duke at openjdk.org Thu Apr 3 20:11:08 2025 From: duke at openjdk.org (Saranya Natarajan) Date: Thu, 3 Apr 2025 20:11:08 GMT Subject: RFR: 8351660: C2: SIGFPE in unsigned_mod_value Message-ID: Description :: The test program performs a`Long.remainderUnsigned` which triggers the call to the function `unsigned_mod_value`. At the end of `unsigned_mod_value`,` return TypeClass::make(static_cast(dividend % divisor))` is computed which leads to a SIGFPE as the divisor in the test program is zero. The same behaviour was observed when the ` Long.remainderUnsigned` was replaced with `Integer.remainderUnsigned` in the test program. Solution :: The fix for [JDK-8345766](https://bugs.openjdk.org/browse/JDK-8345766) emitted specific ModF/ModD nodes, which is optimized and converted to runtime calls after optimizations. This was done during parsing prior to [JDK-8345766](https://bugs.openjdk.org/browse/JDK-8345766). In the scenario where there was unsigned modulo operation, there was no check for modulo by zero. The fix proposed checks if there is modulo by zero and throws exception at runtime. ------------- Commit messages: - JDK-8351660: C2: SIGFPE in unsigned_mod_value Changes: https://git.openjdk.org/jdk/pull/24410/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24410&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351660 Stats: 5 lines in 1 file changed: 5 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24410.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24410/head:pull/24410 PR: https://git.openjdk.org/jdk/pull/24410 From dlong at openjdk.org Thu Apr 3 21:05:52 2025 From: dlong at openjdk.org (Dean Long) Date: Thu, 3 Apr 2025 21:05:52 GMT Subject: RFR: 8349563: Improve AbsNode::Value() for integer types [v2] In-Reply-To: <4BGh_S5KBKdofXCOmj6e7HYCR4GUSi9-ShxqW-h4oNQ=.2cb7d760-4f42-4e95-b993-39f99931c1d9@github.com> References: <4BGh_S5KBKdofXCOmj6e7HYCR4GUSi9-ShxqW-h4oNQ=.2cb7d760-4f42-4e95-b993-39f99931c1d9@github.com> Message-ID: On Wed, 19 Feb 2025 16:13:06 GMT, Chen Liang wrote: >> Jasmine Karthikeyan has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: >> >> - Merge >> - Improve AbsNode::Value > > src/hotspot/share/opto/subnode.cpp line 1941: > >> 1939: >> 1940: if (lo_abs < 0) { >> 1941: assert(lo_abs == std::numeric_limits::min(), "uabs(t->_lo) must be min value if negative!"); > > I think asserting `t->_lo` to be min is more straightforward, and also indicates `(t->_lo) + 1`, which yields max, is in the type. We can simplify the comment below too. If we check for the problematic t->_lo == min first, then we no longer need to use uabs(), right? Also, could we use IntegerType::MIN for the check, rather than std::numeric_limits? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23685#discussion_r2027739138 From sparasa at openjdk.org Fri Apr 4 01:21:34 2025 From: sparasa at openjdk.org (Srinivas Vamsi Parasa) Date: Fri, 4 Apr 2025 01:21:34 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same Message-ID: The current scheme for Intel APX NDD code generation favors the emission of NDD instruction on APX-enabled targets, even if destination and source registers are the same. To prevent this, this PR extends the assembler layer to demote EEVEX to REX encoding if dst matches with source operands. ------------- Commit messages: - 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same Changes: https://git.openjdk.org/jdk/pull/24431/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24431&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8351994 Stats: 3561 lines in 4 files changed: 1376 ins; 298 del; 1887 mod Patch: https://git.openjdk.org/jdk/pull/24431.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24431/head:pull/24431 PR: https://git.openjdk.org/jdk/pull/24431 From jbhateja at openjdk.org Fri Apr 4 02:10:35 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 4 Apr 2025 02:10:35 GMT Subject: RFR: 8346236: Auto vectorization support for various Float16 operations [v11] In-Reply-To: References: Message-ID: <0oYqgnHHKaYHu_AH2bVR2ZbC45JgK-evjGeFwuN0MSg=.94374b61-e094-499f-95af-a1bfbb70db4d@github.com> > This is a follow-up PR for https://github.com/openjdk/jdk/pull/22754 > > The patch adds support to vectorize various float16 scalar operations (add/subtract/divide/multiply/sqrt/fma). > > Summary of changes included with the patch: > 1. C2 compiler New Vector IR creation. > 2. Auto-vectorization support. > 3. x86 backend implementation. > 4. New IR verification test for each newly supported vector operation. > > Following are the performance numbers of Float16OperationsBenchmark > > System : Intel(R) Xeon(R) Processor code-named Granite rapids > Frequency fixed at 2.5 GHz > > > Baseline > Benchmark (vectorDim) Mode Cnt Score Error Units > Float16OperationsBenchmark.absBenchmark 1024 thrpt 2 4191.787 ops/ms > Float16OperationsBenchmark.addBenchmark 1024 thrpt 2 1211.978 ops/ms > Float16OperationsBenchmark.cosineSimilarityDequantizedFP16 1024 thrpt 2 493.026 ops/ms > Float16OperationsBenchmark.cosineSimilarityDoubleRoundingFP16 1024 thrpt 2 612.430 ops/ms > Float16OperationsBenchmark.cosineSimilaritySingleRoundingFP16 1024 thrpt 2 616.012 ops/ms > Float16OperationsBenchmark.divBenchmark 1024 thrpt 2 604.882 ops/ms > Float16OperationsBenchmark.dotProductFP16 1024 thrpt 2 410.798 ops/ms > Float16OperationsBenchmark.euclideanDistanceDequantizedFP16 1024 thrpt 2 602.863 ops/ms > Float16OperationsBenchmark.euclideanDistanceFP16 1024 thrpt 2 640.348 ops/ms > Float16OperationsBenchmark.fmaBenchmark 1024 thrpt 2 809.175 ops/ms > Float16OperationsBenchmark.getExponentBenchmark 1024 thrpt 2 2682.764 ops/ms > Float16OperationsBenchmark.isFiniteBenchmark 1024 thrpt 2 3373.901 ops/ms > Float16OperationsBenchmark.isFiniteCMovBenchmark 1024 thrpt 2 1881.652 ops/ms > Float16OperationsBenchmark.isFiniteStoreBenchmark 1024 thrpt 2 2273.745 ops/ms > Float16OperationsBenchmark.isInfiniteBenchmark 1024 thrpt 2 2147.913 ops/ms > Float16OperationsBenchmark.isInfiniteCMovBenchmark 1024 thrpt 2 1962.579 ops/ms... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Adding missing feature check ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22755/files - new: https://git.openjdk.org/jdk/pull/22755/files/6d05863d..2c09e816 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22755&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22755&range=09-10 Stats: 3 lines in 1 file changed: 3 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22755.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22755/head:pull/22755 PR: https://git.openjdk.org/jdk/pull/22755 From fyang at openjdk.org Fri Apr 4 02:46:59 2025 From: fyang at openjdk.org (Fei Yang) Date: Fri, 4 Apr 2025 02:46:59 GMT Subject: RFR: 8353695: RISC-V: compiler/cpuflags/TestAESIntrinsicsOnUnsupportedConfig.java is failing with Zvkn Message-ID: Hi, please review this small change fixing two jtreg tests. This issue menifests after https://github.com/openjdk/jdk/pull/24344 which auto detests and enables Zvkn extension. The two tests only requires "aes" feature string (vm.cpu.features ~= ".*aes.*"). But the feature string is "zvkn" for linux-riscv64 platform. This adapts "@requires" of both tests considering the Zvkn feature of this platform. Both tests works as expected with qemu-system which is equipped with the Zvkn extension. ------------- Commit messages: - 8353695: RISC-V: compiler/cpuflags/TestAESIntrinsicsOnUnsupportedConfig.java is failing with Zvkn Changes: https://git.openjdk.org/jdk/pull/24433/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24433&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353695 Stats: 5 lines in 3 files changed: 1 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24433.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24433/head:pull/24433 PR: https://git.openjdk.org/jdk/pull/24433 From jbhateja at openjdk.org Fri Apr 4 03:07:52 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 4 Apr 2025 03:07:52 GMT Subject: RFR: 8351994: Enable Extended EVEX to REX2/REX demotion when src and dst are the same In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 01:15:36 GMT, Srinivas Vamsi Parasa wrote: > The current scheme for Intel APX NDD code generation favors the emission of NDD instruction on APX-enabled targets, even if destination and source registers are the same. To prevent this, this PR extends the assembler layer to demote EEVEX to REX encoding if dst matches with source operands. src/hotspot/cpu/x86/assembler_x86.cpp line 13825: > 13823: return (!no_flags && dst_enc == nds_enc); > 13824: } > 13825: @vamsi-parasa , We are missing a case where dst_enc can be equal to src_enc; in that case, we can still demote EVEX to REX/REX2 encoding, along with a change in primary opcode if needed. This will apply to all the commutative operations (ADD/ AND / OR / XOR) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24431#discussion_r2028015376 From epeter at openjdk.org Fri Apr 4 05:35:54 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 4 Apr 2025 05:35:54 GMT Subject: RFR: 8347645: C2: XOR bounded value handling blocks constant folding [v49] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 23:22:09 GMT, Johannes Graham wrote: >> An interaction between xor bounds optimization and constant folding resulted in xor over constants not being optimized. This has a noticeable effect on `Long.expand` with a constant mask, on architectures that don't have instructions equivalent to `PDEP` to be used in an intrinsic. >> >> This change moves logic from the `Xor(L|I)Node::Value` methods into the `add_ring` methods, and gives priority to constant-folding. A static method was separated out to facilitate direct unit-testing. It also (subjectively) simplified the calculation of the upper bound and added an explanation of the reasoning behind it. >> >> In addition to testing for constant folding over xor, IR tests were added to `XorINodeIdealizationTests` and `XorLNodeIdealizationTests` to cover these related items: >> - Bounds optimization of xor >> - A check for `x ^ x = 0` >> - Explicit testing of xor over booleans. >> >> Also `test_xor_node.cpp` was added to more extensively test the correctness of the bounds optimization. It exhaustively tests ranges of 4-bit numbers as well as at the high and low end of the affected types. > > Johannes Graham has updated the pull request incrementally with one additional commit since the last revision: > > update comments Testing looks good :) ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23089#pullrequestreview-2741819043 From epeter at openjdk.org Fri Apr 4 06:06:56 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 4 Apr 2025 06:06:56 GMT Subject: RFR: 8342095: Add autovectorizer support for subword vector casts [v10] In-Reply-To: <2T_qgLVG05hbfRLOkrEGthWnoxXpvUGf0T8haKyKiCE=.fa4c75c5-764c-4829-9fcd-bfe12fa4d994@github.com> References: <2T_qgLVG05hbfRLOkrEGthWnoxXpvUGf0T8haKyKiCE=.fa4c75c5-764c-4829-9fcd-bfe12fa4d994@github.com> Message-ID: On Wed, 2 Apr 2025 14:15:19 GMT, Jasmine Karthikeyan wrote: >> Hi all, >> This patch adds initial support for the autovectorizer to generate conversions between subword types. Currently, when superword sees two packs that have different basic types, it discards them and bails out of vectorization. This patch changes the behavior to ask the backend if a cast between the conflicting types is supported, and keeps the pack if it is. Later, when the `VTransform` graph is built, a synthetic cast is emitted when packs requiring casts are detected. Currently, only narrowing casts are supported as I wanted to re-use existing `VectorCastX2Y` logic for the initial version, but adding more conversions is simple and can be done with a subsequent RFE. I have attached a JMH benchmark and got these results on my Zen 3 machine: >> >> >> Baseline Patch >> Benchmark (SIZE) Mode Cnt Score Error Units Score Error Units Improvement >> VectorSubword.intToByte 1024 avgt 12 200.049 ? 19.787 ns/op 56.228 ? 3.535 ns/op (3.56x) >> VectorSubword.intToShort 1024 avgt 12 179.826 ? 1.539 ns/op 43.332 ? 1.166 ns/op (4.15x) >> VectorSubword.shortToByte 1024 avgt 12 245.580 ? 6.150 ns/op 29.757 ? 1.055 ns/op (8.25x) >> >> >> I've also added some IR tests and they pass on my linux x64 machine. Thoughts and reviews would be appreciated! > > Jasmine Karthikeyan has updated the pull request incrementally with one additional commit since the last revision: > > Fix copyright after merge This looks really good, thanks @jaskarth for the work you put in! I have a few more comments below. src/hotspot/share/opto/superword.cpp line 2329: > 2327: // Check if the output type of def is compatible with the input type of use, i.e. if the > 2328: // types have the same size. > 2329: bool SuperWord::is_velt_basic_type_compatible_use_def(Node* use, Node* def, const uint def_size) const { Suggestion: bool SuperWord::is_velt_basic_type_compatible_use_def(Node* use, Node* def, const uint pack_size) const { I think that would be more descriptive here. It would indicate we are not interested in the size of an element (i.e. bytes per element), but the size of the pack. src/hotspot/share/opto/superword.cpp line 2361: > 2359: > 2360: // Input sizes differ, but platform supports a cast to change the def shape to the use shape > 2361: Suggestion: // Subword cast: Element sizes differ, but the platform supports a cast to change the def shape to the use shape. src/hotspot/share/opto/superwordVTransformBuilder.cpp line 195: > 193: // If the use and def types are different, emit a cast node > 194: if (use_bt != def_bt && !p0->is_Convert() > 195: && (is_subword_type(def_bt) || is_subword_type(use_bt)) && VectorCastNode::implemented(-1, pack->size(), def_bt, use_bt)) { Suggestion: if (use_bt != def_bt && !p0->is_Convert() && (is_subword_type(def_bt) || is_subword_type(use_bt)) && VectorCastNode::implemented(-1, pack->size(), def_bt, use_bt)) { Optional nit :) test/hotspot/jtreg/compiler/loopopts/superword/TestCompatibleUseDefTypeSize.java line 113: > 111: tests.put("testShortToInt", () -> { return testShortToInt(aS.clone(), bI.clone()); }); > 112: tests.put("testByteToInt", () -> { return testByteToInt(aB.clone(), bI.clone()); }); > 113: tests.put("testByteToShort", () -> { return testByteToShort(aB.clone(), bS.clone()); }); What about a `testLongToShort` etc? It could be good to just have casts from/to all types, just to be sure ;) test/micro/org/openjdk/bench/vm/compiler/VectorSubword.java line 44: > 42: private byte[] bytes; > 43: private short[] shorts; > 44: private int[] ints; It would be nice if you covered also `char` and `long`, for completeness :) ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23413#pullrequestreview-2739175207 PR Review Comment: https://git.openjdk.org/jdk/pull/23413#discussion_r2026615573 PR Review Comment: https://git.openjdk.org/jdk/pull/23413#discussion_r2028138291 PR Review Comment: https://git.openjdk.org/jdk/pull/23413#discussion_r2028139632 PR Review Comment: https://git.openjdk.org/jdk/pull/23413#discussion_r2028142164 PR Review Comment: https://git.openjdk.org/jdk/pull/23413#discussion_r2028145292 From epeter at openjdk.org Fri Apr 4 06:18:51 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 4 Apr 2025 06:18:51 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v5] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 09:29:47 GMT, Galder Zamarre?o wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8352869-Verify-NaN-Vector-Objects >> - Verify.Options refactor for Galder >> - Update test/hotspot/jtreg/compiler/lib/verify/Verify.java >> >> Co-authored-by: Galder Zamarre?o >> - Merge branch 'master' into JDK-8352869-Verify-NaN-Vector-Objects >> - clean up test >> - JDK-8352869 > > Changes requested by galder (Author). @galderz do you intend to review / approve this, or should I ask someone else? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24224#issuecomment-2777652144 From mchevalier at openjdk.org Fri Apr 4 06:54:53 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Fri, 4 Apr 2025 06:54:53 GMT Subject: RFR: 8346989: C2: deoptimization and re-execution cycle with Math.*Exact in case of frequent overflow [v7] In-Reply-To: References: <8ACplaVM_gN9cbIcQYGJmR4GNINm70PAJQ8uAgucK4Y=.14fdc7e2-e0af-4f0d-acb6-bcfe99ee8f36@github.com> Message-ID: On Thu, 3 Apr 2025 13:01:15 GMT, Marc Chevalier wrote: >> `Math.*Exact` intrinsics can cause many deopt when used repeatedly with problematic arguments. >> This fix proposes not to rely on intrinsics after `too_many_traps()` has been reached. >> >> Benchmark show that this issue affects every Math.*Exact functions. And this fix improve them all. >> >> tl;dr: >> - C1: no problem, no change >> - C2: >> - with intrinsics: >> - with overflow: clear improvement. Was way worse than C1, now is similar (~4s => ~600ms) >> - without overflow: no problem, no change >> - without intrinsics: no problem, no change >> >> Before the fix: >> >> Benchmark (SIZE) Mode Cnt Score Error Units >> MathExact.C1_1.loopAddIInBounds 1000000 avgt 3 1.272 ? 0.048 ms/op >> MathExact.C1_1.loopAddIOverflow 1000000 avgt 3 641.917 ? 58.238 ms/op >> MathExact.C1_1.loopAddLInBounds 1000000 avgt 3 1.402 ? 0.842 ms/op >> MathExact.C1_1.loopAddLOverflow 1000000 avgt 3 671.013 ? 229.425 ms/op >> MathExact.C1_1.loopDecrementIInBounds 1000000 avgt 3 3.722 ? 22.244 ms/op >> MathExact.C1_1.loopDecrementIOverflow 1000000 avgt 3 653.341 ? 279.003 ms/op >> MathExact.C1_1.loopDecrementLInBounds 1000000 avgt 3 2.525 ? 0.810 ms/op >> MathExact.C1_1.loopDecrementLOverflow 1000000 avgt 3 656.750 ? 141.792 ms/op >> MathExact.C1_1.loopIncrementIInBounds 1000000 avgt 3 4.621 ? 12.822 ms/op >> MathExact.C1_1.loopIncrementIOverflow 1000000 avgt 3 651.608 ? 274.396 ms/op >> MathExact.C1_1.loopIncrementLInBounds 1000000 avgt 3 2.576 ? 3.316 ms/op >> MathExact.C1_1.loopIncrementLOverflow 1000000 avgt 3 662.216 ? 71.879 ms/op >> MathExact.C1_1.loopMultiplyIInBounds 1000000 avgt 3 1.402 ? 0.587 ms/op >> MathExact.C1_1.loopMultiplyIOverflow 1000000 avgt 3 615.836 ? 252.137 ms/op >> MathExact.C1_1.loopMultiplyLInBounds 1000000 avgt 3 2.906 ? 5.718 ms/op >> MathExact.C1_1.loopMultiplyLOverflow 1000000 avgt 3 655.576 ? 147.432 ms/op >> MathExact.C1_1.loopNegateIInBounds 1000000 avgt 3 2.023 ? 0.027 ms/op >> MathExact.C1_1.loopNegateIOverflow 1000000 avgt 3 639.136 ? 30.841 ms/op >> MathExact.C1_1.loop... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Remove useless flags in tests Thanks @iwanowww and @TobiHartmann! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23916#issuecomment-2777705340 From duke at openjdk.org Fri Apr 4 06:54:53 2025 From: duke at openjdk.org (duke) Date: Fri, 4 Apr 2025 06:54:53 GMT Subject: RFR: 8346989: C2: deoptimization and re-execution cycle with Math.*Exact in case of frequent overflow [v7] In-Reply-To: References: <8ACplaVM_gN9cbIcQYGJmR4GNINm70PAJQ8uAgucK4Y=.14fdc7e2-e0af-4f0d-acb6-bcfe99ee8f36@github.com> Message-ID: On Thu, 3 Apr 2025 13:01:15 GMT, Marc Chevalier wrote: >> `Math.*Exact` intrinsics can cause many deopt when used repeatedly with problematic arguments. >> This fix proposes not to rely on intrinsics after `too_many_traps()` has been reached. >> >> Benchmark show that this issue affects every Math.*Exact functions. And this fix improve them all. >> >> tl;dr: >> - C1: no problem, no change >> - C2: >> - with intrinsics: >> - with overflow: clear improvement. Was way worse than C1, now is similar (~4s => ~600ms) >> - without overflow: no problem, no change >> - without intrinsics: no problem, no change >> >> Before the fix: >> >> Benchmark (SIZE) Mode Cnt Score Error Units >> MathExact.C1_1.loopAddIInBounds 1000000 avgt 3 1.272 ? 0.048 ms/op >> MathExact.C1_1.loopAddIOverflow 1000000 avgt 3 641.917 ? 58.238 ms/op >> MathExact.C1_1.loopAddLInBounds 1000000 avgt 3 1.402 ? 0.842 ms/op >> MathExact.C1_1.loopAddLOverflow 1000000 avgt 3 671.013 ? 229.425 ms/op >> MathExact.C1_1.loopDecrementIInBounds 1000000 avgt 3 3.722 ? 22.244 ms/op >> MathExact.C1_1.loopDecrementIOverflow 1000000 avgt 3 653.341 ? 279.003 ms/op >> MathExact.C1_1.loopDecrementLInBounds 1000000 avgt 3 2.525 ? 0.810 ms/op >> MathExact.C1_1.loopDecrementLOverflow 1000000 avgt 3 656.750 ? 141.792 ms/op >> MathExact.C1_1.loopIncrementIInBounds 1000000 avgt 3 4.621 ? 12.822 ms/op >> MathExact.C1_1.loopIncrementIOverflow 1000000 avgt 3 651.608 ? 274.396 ms/op >> MathExact.C1_1.loopIncrementLInBounds 1000000 avgt 3 2.576 ? 3.316 ms/op >> MathExact.C1_1.loopIncrementLOverflow 1000000 avgt 3 662.216 ? 71.879 ms/op >> MathExact.C1_1.loopMultiplyIInBounds 1000000 avgt 3 1.402 ? 0.587 ms/op >> MathExact.C1_1.loopMultiplyIOverflow 1000000 avgt 3 615.836 ? 252.137 ms/op >> MathExact.C1_1.loopMultiplyLInBounds 1000000 avgt 3 2.906 ? 5.718 ms/op >> MathExact.C1_1.loopMultiplyLOverflow 1000000 avgt 3 655.576 ? 147.432 ms/op >> MathExact.C1_1.loopNegateIInBounds 1000000 avgt 3 2.023 ? 0.027 ms/op >> MathExact.C1_1.loopNegateIOverflow 1000000 avgt 3 639.136 ? 30.841 ms/op >> MathExact.C1_1.loop... > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Remove useless flags in tests @marc-chevalier Your change (at version e7c8f3e06f46e85cb3c2dc974db84b10a57bd086) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23916#issuecomment-2777706873 From duke at openjdk.org Fri Apr 4 06:55:00 2025 From: duke at openjdk.org (duke) Date: Fri, 4 Apr 2025 06:55:00 GMT Subject: Withdrawn: 8348556: Inlining fails earlier for MemorySegment::reinterpret In-Reply-To: <3LuKBc-mbghi2A2-OnXrJD5zwvOm8URerns7Ud0Zz4c=.583514d1-d0fc-4005-b810-f4db92fcb60c@github.com> References: <3LuKBc-mbghi2A2-OnXrJD5zwvOm8URerns7Ud0Zz4c=.583514d1-d0fc-4005-b810-f4db92fcb60c@github.com> Message-ID: On Wed, 5 Feb 2025 10:17:09 GMT, Per Minborg wrote: > This PR proposes to add some `@ForceInline` annotations in the `Module` class in order to assist inlining of FFM var/method handles. > > There are also some changes in other classes (notably `j.l.Object`) which, if implemented, can take us four additional levels of inlining. However, there is a tradeoff with adding `@ForceInline` and just trying to get as deep as possible for a specific use case is probably not the best idea. > > So, we should discuss which of the proposed changes (if any), we'd like to integrate. > > Tested and passed tier1-3 This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/23460 From thartmann at openjdk.org Fri Apr 4 07:20:48 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 4 Apr 2025 07:20:48 GMT Subject: RFR: 8353665: RISC-V: IR verification fails in TestSubNodeFloatDoubleNegation.java In-Reply-To: References: Message-ID: <8rHea6mVCPjHzORjnOx2pJG7l8TjH6yHDyF9eXMM0t0=.324f1eec-21f5-4adb-8dc0-ed1ac1348117@github.com> On Thu, 3 Apr 2025 16:57:19 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > The newly added TestSubNodeFloatDoubleNegation.java (in https://github.com/openjdk/jdk/pull/24150) is to check `0 - (0 - x)` is not folded to `x` for float and double. > I have manually checked the IR and generated assembly code, it's not folded on riscv either, just there is an extra SubF in some code path. > So, the fix for this test on riscv should be simply make the check as `>= 2` rather than `2`. > > Tested on both x86 and riscv64. > > Thanks Where does the extra SubF come from? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24421#issuecomment-2777755042 From roland at openjdk.org Fri Apr 4 07:28:52 2025 From: roland at openjdk.org (Roland Westrelin) Date: Fri, 4 Apr 2025 07:28:52 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v3] In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 12:00:02 GMT, Christian Hagedorn wrote: >> There's a good chance that it can never be null. I think it's been considered good practice over the year to be particularly defensive about this (there must be other Ideal transformations where inputs can be cleared as the graph is transformed) and I tend to add checks for null inputs systematically. > >> I think it's been considered good practice over the year to be particularly defensive about this > > Makes sense from a stability point of view. I'm wondering though if it's not a bug when the cast input is null at this point. Aren't there only few CFG nodes, like regions, where we set some inputs to null already? There is other code, for example in `ConvI2L::Ideal()`, that later accesses `in(1)` without null check: > > https://github.com/openjdk/jdk/blob/1ec2177a6b25573732b902f76bb81dd1cdaf7edf/src/hotspot/share/opto/convertnode.cpp#L728 > > To be consistent, we would also need to add a check for the other accesses in the method or turn the null check into a bailout for the entire `Ideal()` method. If we agree that null is unexpected (or assume it should be), we might also want to add asserts accordingly. > > My concern is that most IGVN methods assume non-control inputs cannot be null where we normally expect a sane input. This is probably true but hard to prove. To be overally consistent, we should also consider adding bailout and assertion code there. While it's the safest solution, this could introduce a lot of new code, especially for multi input nodes, which also makes it harder to read. What are your thought about that? > > Anyway, we don't need to make a decision as part of this PR on how we should generally handle inputs in IGVN method. It's fine if we only concentrate on the touched/new code here. I agree that we would need to be consistent and that it makes little sense to add null checks in code that have been around forever and has never caused issues. Maybe we can somehow have igvn itself assert that every node it processes has a set of expected inputs non null? I suppose, every node type would need to define which of its inputs can be null, then. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23468#discussion_r2028262990 From duke at openjdk.org Fri Apr 4 08:25:56 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 4 Apr 2025 08:25:56 GMT Subject: RFR: 8353665: RISC-V: IR verification fails in TestSubNodeFloatDoubleNegation.java In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 16:57:19 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > The newly added TestSubNodeFloatDoubleNegation.java (in https://github.com/openjdk/jdk/pull/24150) is to check `0 - (0 - x)` is not folded to `x` for float and double. > I have manually checked the IR and generated assembly code, it's not folded on riscv either, just there is an extra SubF in some code path. > So, the fix for this test on riscv should be simply make the check as `>= 2` rather than `2`. > > Tested on both x86 and riscv64. > > Thanks test/hotspot/jtreg/compiler/floatingpoint/TestSubNodeFloatDoubleNegation.java line 59: > 57: // performed as float operations. > 58: @IR(counts = { IRNode.SUB, "2" }, applyIfPlatform = {"riscv64", "false"}) > 59: @IR(counts = { IRNode.SUB, ">= 2" }, applyIfPlatform = {"riscv64", "true"}) Would it perhaps make sense to fix the number of `SubNode`s or does the `Float16` code add a bunch of them on RISC-V? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24421#discussion_r2028346592 From rcastanedalo at openjdk.org Fri Apr 4 08:43:57 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 4 Apr 2025 08:43:57 GMT Subject: RFR: 8353669: IGV: dump OOP maps for MachSafePoint nodes Message-ID: This changeset dumps the OOP map of each MachSafePoint when available, i.e. at the `Final Code` phase. This should make it easier to learn about and diagnose OOP map building issues: ![final-code](https://github.com/user-attachments/assets/a477c9d1-0fe4-42ef-a367-336c54bec6a5) #### Testing - tier1 (windows-x64, linux-x64, linux-aarch64, macosx-x64, and macosx-aarch64; release and debug mode). - Tested IGV manually on a few selected graphs. Tested automatically that dumping thousands of graphs does not trigger any assertion failure (by running `java -Xcomp -XX:PrintIdealGraphLevel=1`). ------------- Commit messages: - Dump oopmaps for MachSafePoint nodes when available Changes: https://git.openjdk.org/jdk/pull/24422/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24422&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353669 Stats: 9 lines in 1 file changed: 9 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24422.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24422/head:pull/24422 PR: https://git.openjdk.org/jdk/pull/24422 From mli at openjdk.org Fri Apr 4 09:27:53 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 4 Apr 2025 09:27:53 GMT Subject: RFR: 8353665: RISC-V: IR verification fails in TestSubNodeFloatDoubleNegation.java In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 16:57:19 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > The newly added TestSubNodeFloatDoubleNegation.java (in https://github.com/openjdk/jdk/pull/24150) is to check `0 - (0 - x)` is not folded to `x` for float and double. > I have manually checked the IR and generated assembly code, it's not folded on riscv either, just there is an extra SubF in some code path. > So, the fix for this test on riscv should be simply make the check as `>= 2` rather than `2`. > > Tested on both x86 and riscv64. > > Thanks 026 lh R28, [R11, #12] # short, #@loadS ! Field: jdk/incubator/vector/Float16.value (constant) 02a NullCheck R11 02a B2: # out( B5 B3 ) <- in( B1 ) Freq: 0.999999 02a + -- // R23=Thread::current(), empty, #@tlsLoadP 02a ld R10, [R23, #464] # ptr, #@loadP 02e + fmv.h.x F2, zr # float, #@loadConH0 032 + fmv.h.x F0, R28 036 + ld R7, [R23, #480] # ptr, #@loadP 03a + addi R28, R10, #16 # ptr, #@addP_reg_imm 03e + binop_hf F3, F2, F0 042 + bgeu R28, R7, B5 #@cmpP_branch P=0.000100 C=-1.000000 046 B3: # out( B4 ) <- in( B2 ) Freq: 0.999899 046 + mv R7, #1 # long, #@loadConL 048 + sd R28, [R23, #464] # ptr, #@storeP 04c + mv R29, narrowklass: precise jdk/incubator/vector/Float16: 0x00007fa4dc396ba8 (java/io/Serializable,java/lang/Comparable):Constant:exact * # compressed klass ptr, #@loadConNKlass 058 + sd R7, [R10] # long, #@storeL 05c + sw R29, [R10, #8] # compressed klass ptr, #@storeNKlass 060 + prefetch_w [R28, #192] # Prefetch for write 064 + sw zr, [R10, #12] # int, #@storeimmI0 068 B4: # out( N1 ) <- in( B6 B3 ) Freq: 0.999999 068 + fmv.h.x F0, zr # float, #@loadConH0 06c + binop_hf F0, F0, F3 070 + fmv.x.h R28, F0 074 + sh R28, [R10, #12] # short, #@storeC 078 078 + MEMBAR-store-store #@membar_storestore 078 + # checkcastPP of R10, #@checkCastPP 078 # pop frame 48 add sp, sp, #48 ld ra, [sp,#-16] ld fp, [sp,#-8] # test polling word ld t0, [xthread,#40] bgtu sp, t0, #slow_path 08a + ret // return register, #@Ret 08c B5: # out( B8 B6 ) <- in( B2 ) Freq: 0.000100016 08c + fmv.w.x F0, zr # float, #@loadConF0 090 + convHF2SAndHF2F F2, F3 094 + fsub.s F0, F0, F2 #@subF_reg_reg 098 spill F3 -> [sp, #0] # spill size = 32 09c + spill F0 -> [sp, #4] # spill size = 32 0a0 + mv R11, precise jdk/incubator/vector/Float16: 0x00007fa4dc396ba8 (java/io/Serializable,java/lang/Comparable):Constant:exact * # ptr, #@loadConP 0b8 CALL,static 0x00007fa4cb85ba80 #@CallStaticJavaDirect wrapper for: C2 Runtime new_instance # jdk.incubator.vector.Float16::valueOf @ bci:0 (line 329) L[0]=sp + #4 # jdk.incubator.vector.Float16::subtract @ bci:9 (line 1137) L[0]=_ L[1]=_ # compiler.floatingpoint.TestSubNodeFloatDoubleNegation::testHalfFloat @ bci:12 (line 62) L[0]=_ # OopMap {off=196/0xc4} 0d0 B6: # out( B4 ) <- in( B5 ) Freq: 0.000100014 # Block is sole successor of call 0d0 + spill [sp, #0] -> F3 # spill size = 32 0d4 + j B4 #@branch Thanks for having a look, interesting! Please check the B5 block (which is when TLAB run out I think), there is an *extra* `fsub.s` putting value into F0, but F0 value is not really used, as in B4 block it just loads zero into F0. This fsub.s should be useless, although it should do no harness in the sense of correctness. I checked the x86, find out I don't have the CPU feature to generate real float16 instructions, so it only has 2 SubF rather than SubHF which I think is expected for Float16. Not sure if this useless `fsub.s` is only an issue on riscv or maybe also on x86 if `supports_avx512_fp16` return true. It will be great if someone can help to verify it. Also not sure if this useless `extra` instruction (here it's `fsub.s`) could be generated in other situations. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24421#issuecomment-2778059148 From rcastanedalo at openjdk.org Fri Apr 4 09:48:08 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 4 Apr 2025 09:48:08 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v11] In-Reply-To: References: <0Yf6qZwnLz7oAtSFscDwHifQAmaPuHzeSrpkqMVchDU=.c7a5e8af-9390-414b-850c-609110668eac@github.com> <-pdjdg9OQRB7YaXNFiVeVseLEoJDZb2XkMk0ml3pm3w=.2ecb257e-5618-4763-90e5-a2b1d0758e67@github.com> Message-ID: On Thu, 3 Apr 2025 12:48:31 GMT, Daniel Lund?n wrote: >> src/hotspot/share/opto/matcher.cpp line 195: >> >>> 193: if (C->failing()) { >>> 194: return; >>> 195: } >> >> Is this failure poll required after your changes? > > Yes, this poll is still required. We may fail in `init_spill_mask -> regmask_for_ideal_register`. Good catch, thanks for checking. >> src/hotspot/share/opto/postaloc.cpp line 686: >> >>> 684: assert(!(!value[ureg_lo] && lrgs(useidx).mask().is_offset() && >>> 685: !lrgs(useidx).mask().Member(ureg_lo)), >>> 686: "invalid assumption"); >> >> Could you use more descriptive names and assertion messages in this new assertion and the one below? Ideally, without having to refer to old versions. What is the invariant that we want to check? How does it relate to the surrounding code? > > As we've previously discussed offline, I also had my doubts when introducing these asserts. I've now had a second look (with reasonably fresh eyes), and believe I now better understand the underlying assumptions. > > The two problematic pieces of code in `postaloc.cpp` from before this changeset that we need to translate as part of the changeset are > > if (!value[ureg_lo] && > (!RegMask::can_represent(ureg_lo) || > lrgs(useidx).mask().Member(ureg_lo))) { // Nearly always adjacent > > and > > if( RegMask::can_represent(nreg_lo) && // Either a spill slot, or > !lrgs(lidx).mask().Member(nreg_lo) ) { // Nearly always adjacent > > Specifically, the `RegMask::can_represent` calls check if their argument registers can fit in the statically determined size of register masks (which no longer makes sense in this changeset). > > The reason for the `can_represent` calls is that the subsequent `Member` calls assert internally that their arguments can fit within the static size of register masks. That is, `can_represent` worked as a guard to ensure the precondition for the call to `Member` holds. In this changeset, the `Member` function is generalized to allow arbitrary arguments (and the interal assert is removed). Therefore, we can remove the `can_represent` guards. > > Now to the assertions that I added (which I've now improved). From the if conditions, we can infer there is an implicit invariant that a register for which `can_represent` returns false is necessarily "adjacent". Specifically, `can_represent` returning false implies that the register is a spill slot (implied by a comment in the source code). However, registers for which `can_represent` returns true may **also** be spill splots, so using `can_represent` as a proxy check for spill slots feels clumsy. I believe that the real invariant here is that only actual registers (and not stack locations, including spill slots) can be non-adjacent. This is what I now verify with my updated asserts. > > For the record, I have not been able to find any cases with non-adjacency in any tests on current Oracle-supported platforms. From another comment in the source code, it looks like non-adjacent pairs are quite specific to SPARC. Good analysis, thanks for investigating Daniel! Maybe worth creating an RFE to investigate whether we can assume (and statically verify) non-adjacent register pairs moving forward, and cleanup this and possibly other C2 back-end code accordingly. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2028478202 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2028473435 From duke at openjdk.org Fri Apr 4 09:54:03 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 4 Apr 2025 09:54:03 GMT Subject: RFR: 8353665: RISC-V: IR verification fails in TestSubNodeFloatDoubleNegation.java In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 16:57:19 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > The newly added TestSubNodeFloatDoubleNegation.java (in https://github.com/openjdk/jdk/pull/24150) is to check `0 - (0 - x)` is not folded to `x` for float and double. > I have manually checked the IR and generated assembly code, it's not folded on riscv either, just there is an extra SubF in some code path. > So, the fix for this test on riscv should be simply make the check as `>= 2` rather than `2`. > > Tested on both x86 and riscv64. > > Thanks To me `B5`kind of looks like a backup codepath (see branch at 042). But I cannot see what `R7`is there. One option I see would be to match two `SUB_HF` nodes for RISC-V, since it seems to always generate two of those. The only reason I match on `SUB` in the half float case is that I also do not have `supports_avx512_fp16`. I think I will file an RFE for that separately. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24421#issuecomment-2778113274 From mli at openjdk.org Fri Apr 4 09:54:03 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 4 Apr 2025 09:54:03 GMT Subject: RFR: 8353665: RISC-V: IR verification fails in TestSubNodeFloatDoubleNegation.java In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 16:57:19 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > The newly added TestSubNodeFloatDoubleNegation.java (in https://github.com/openjdk/jdk/pull/24150) is to check `0 - (0 - x)` is not folded to `x` for float and double. > I have manually checked the IR and generated assembly code, it's not folded on riscv either, just there is an extra SubF in some code path. > So, the fix for this test on riscv should be simply make the check as `>= 2` rather than `2`. > > Tested on both x86 and riscv64. > > Thanks Ah, I just checked on riscv, if I disable float16(zfh) it will not generate the extra `SubF` in slow path, i.e. just 2 SubF. So, I guess on x86, it could have the same issue, and the test could fail too if `supports_avx512_fp16` return true. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24421#issuecomment-2778117432 From mli at openjdk.org Fri Apr 4 09:54:03 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 4 Apr 2025 09:54:03 GMT Subject: RFR: 8353665: RISC-V: IR verification fails in TestSubNodeFloatDoubleNegation.java In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 09:46:02 GMT, Manuel H?ssig wrote: > To me B5kind of looks like a backup codepath (see branch at 042). But I cannot see what R7is there. R7 should be tlab_end of the thread. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24421#issuecomment-2778139803 From mli at openjdk.org Fri Apr 4 09:57:08 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 4 Apr 2025 09:57:08 GMT Subject: RFR: 8353665: RISC-V: IR verification fails in TestSubNodeFloatDoubleNegation.java In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 16:57:19 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > The newly added TestSubNodeFloatDoubleNegation.java (in https://github.com/openjdk/jdk/pull/24150) is to check `0 - (0 - x)` is not folded to `x` for float and double. > I have manually checked the IR and generated assembly code, it's not folded on riscv either, just there is an extra SubF in some code path. > So, the fix for this test on riscv should be simply make the check as `>= 2` rather than `2`. > > Tested on both x86 and riscv64. > > Thanks @jatin-bhateja Could you please help to run the test with `supports_avx512_fp16 ` if you're available? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24421#issuecomment-2778148709 From dlunden at openjdk.org Fri Apr 4 11:50:03 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Fri, 4 Apr 2025 11:50:03 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v12] In-Reply-To: References: Message-ID: <5XrG6zHnhCQoPomYLQgAAQEoSWNbS0n9dihXE0i_x7g=.cd48e569-704c-4398-9c62-e6cd33c5e417@github.com> On Tue, 1 Apr 2025 16:35:08 GMT, Roberto Casta?eda Lozano wrote: >> test/jdk/java/lang/invoke/BigArityTest.java line 32: >> >>> 30: * (1) have a large number of parameters, and >>> 31: * (2) use JSR292 methods internally (which increases the >>> 32: * MaxNodeLimit with a factor of 3) >> >> Just checking: these methods that cause C2 to consume an excessive amount of memory were not C2-compilable before the changeset, right? > > Same question for the other `java/lang/invoke` test changes. Yes, correct. No longer bailing out on too many arguments results in a lot more compilations (with `-Xcomp`) compared to before in these specific tests, which is why I've had to limit the tests with `MaxNodeLimit`s. That said, I did look into these tests a bit more now after your comment, and there are some peculiar (but artificial) compilations that we no longer bail out on and that we may want to investigate in a future RFE. These compilations each take around 40 seconds (in a release build), are very close to the `MaxNodeLimit` (80 000 nodes), and spend 99% of the time in the register allocator (in the first round of conservative coalescing, specifically). I analyzed these register allocator runs and it looks like we run into the quadratic time complexity of graph-coloring register allocation, because we have a very large number of nodes to begin with and then the interference graph is additionally very dense (contains a very large number of interferences/edges). We already have bailouts related to node count in the register allocator, but no bailouts for the interference graph size. Perhaps we should consider adding this as part of a separate RFE. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2028661113 From dlunden at openjdk.org Fri Apr 4 11:53:13 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Fri, 4 Apr 2025 11:53:13 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v11] In-Reply-To: References: <0Yf6qZwnLz7oAtSFscDwHifQAmaPuHzeSrpkqMVchDU=.c7a5e8af-9390-414b-850c-609110668eac@github.com> <-pdjdg9OQRB7YaXNFiVeVseLEoJDZb2XkMk0ml3pm3w=.2ecb257e-5618-4763-90e5-a2b1d0758e67@github.com> Message-ID: On Fri, 4 Apr 2025 09:42:43 GMT, Roberto Casta?eda Lozano wrote: >> As we've previously discussed offline, I also had my doubts when introducing these asserts. I've now had a second look (with reasonably fresh eyes), and believe I now better understand the underlying assumptions. >> >> The two problematic pieces of code in `postaloc.cpp` from before this changeset that we need to translate as part of the changeset are >> >> if (!value[ureg_lo] && >> (!RegMask::can_represent(ureg_lo) || >> lrgs(useidx).mask().Member(ureg_lo))) { // Nearly always adjacent >> >> and >> >> if( RegMask::can_represent(nreg_lo) && // Either a spill slot, or >> !lrgs(lidx).mask().Member(nreg_lo) ) { // Nearly always adjacent >> >> Specifically, the `RegMask::can_represent` calls check if their argument registers can fit in the statically determined size of register masks (which no longer makes sense in this changeset). >> >> The reason for the `can_represent` calls is that the subsequent `Member` calls assert internally that their arguments can fit within the static size of register masks. That is, `can_represent` worked as a guard to ensure the precondition for the call to `Member` holds. In this changeset, the `Member` function is generalized to allow arbitrary arguments (and the interal assert is removed). Therefore, we can remove the `can_represent` guards. >> >> Now to the assertions that I added (which I've now improved). From the if conditions, we can infer there is an implicit invariant that a register for which `can_represent` returns false is necessarily "adjacent". Specifically, `can_represent` returning false implies that the register is a spill slot (implied by a comment in the source code). However, registers for which `can_represent` returns true may **also** be spill splots, so using `can_represent` as a proxy check for spill slots feels clumsy. I believe that the real invariant here is that only actual registers (and not stack locations, including spill slots) can be non-adjacent. This is what I now verify with my updated asserts. >> >> For the record, I have not been able to find any cases with non-adjacency in any tests on current Oracle-supported platforms. From another comment in the source code, it looks like non-adjacent pairs are quite specific to SPARC. > > Good analysis, thanks for investigating Daniel! Maybe worth creating an RFE to investigate whether we can assume (and statically verify) non-adjacent register pairs moving forward, and cleanup this and possibly other C2 back-end code accordingly. I guess you mean assume adjacent register pairs? Sounds good, I'll make a note to create an RFE. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2028665257 From rcastanedalo at openjdk.org Fri Apr 4 11:59:56 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 4 Apr 2025 11:59:56 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v12] In-Reply-To: <5XrG6zHnhCQoPomYLQgAAQEoSWNbS0n9dihXE0i_x7g=.cd48e569-704c-4398-9c62-e6cd33c5e417@github.com> References: <5XrG6zHnhCQoPomYLQgAAQEoSWNbS0n9dihXE0i_x7g=.cd48e569-704c-4398-9c62-e6cd33c5e417@github.com> Message-ID: On Fri, 4 Apr 2025 11:46:56 GMT, Daniel Lund?n wrote: > Perhaps we should consider adding this as part of a separate RFE. This sounds like a good idea, I agree to postpone it to a separate RFE. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2028671476 From rcastanedalo at openjdk.org Fri Apr 4 11:59:55 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 4 Apr 2025 11:59:55 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v11] In-Reply-To: References: <0Yf6qZwnLz7oAtSFscDwHifQAmaPuHzeSrpkqMVchDU=.c7a5e8af-9390-414b-850c-609110668eac@github.com> <-pdjdg9OQRB7YaXNFiVeVseLEoJDZb2XkMk0ml3pm3w=.2ecb257e-5618-4763-90e5-a2b1d0758e67@github.com> Message-ID: On Fri, 4 Apr 2025 11:50:20 GMT, Daniel Lund?n wrote: >> Good analysis, thanks for investigating Daniel! Maybe worth creating an RFE to investigate whether we can assume (and statically verify) non-adjacent register pairs moving forward, and cleanup this and possibly other C2 back-end code accordingly. > > I guess you mean assume adjacent register pairs? Sounds good, I'll make a note to create an RFE. Right, I meant adjacent pairs, thanks for the clarification. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2028673416 From dlunden at openjdk.org Fri Apr 4 12:03:28 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Fri, 4 Apr 2025 12:03:28 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v14] In-Reply-To: References: Message-ID: > If a method has a large number of parameters, we currently bail out from C2 compilation. > > ### Changeset > > Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. > > Changes: > - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. > - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. > - Remove all `can_represent` checks and bailouts. > - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. > - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. > - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, not worth it). > > ![c2-regression](https:/... Daniel Lund?n has updated the pull request incrementally with two additional commits since the last revision: - Update test comment to also mention timeouts - Fix suboptimal max limit in _grow ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20404/files - new: https://git.openjdk.org/jdk/pull/20404/files/76f6b8f8..c41a76b9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20404&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20404&range=12-13 Stats: 5 lines in 2 files changed: 0 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/20404.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20404/head:pull/20404 PR: https://git.openjdk.org/jdk/pull/20404 From duke at openjdk.org Fri Apr 4 12:16:06 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 4 Apr 2025 12:16:06 GMT Subject: RFR: 8353665: RISC-V: IR verification fails in TestSubNodeFloatDoubleNegation.java In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 16:57:19 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > The newly added TestSubNodeFloatDoubleNegation.java (in https://github.com/openjdk/jdk/pull/24150) is to check `0 - (0 - x)` is not folded to `x` for float and double. > I have manually checked the IR and generated assembly code, it's not folded on riscv either, just there is an extra SubF in some code path. > So, the fix for this test on riscv should be simply make the check as `>= 2` rather than `2`. > > Tested on both x86 and riscv64. > > Thanks I ran the test with software emulation of `avx512-fp16` and the test failed the same way as on RISC-V: $ sde64 -gnr -- jtreg [...] test/hotspot/jtreg/compiler/floatingpoint/TestSubNodeFloatDoubleNegation.java [...] Failed IR Rules (1) of Methods (1) ---------------------------------- 1) Method "public static jdk.incubator.vector.Float16 compiler.floatingpoint.TestSubNodeFloatDoubleNegation.testHalfFloat(jdk.incubator.vector.Float16)" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#SUB#_", "2"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\d+(\s){2}(Sub(I|L|F|D|HF).*)+(\s){2}===.*)" - Failed comparison: [found] 3 = 2 [given] - Matched nodes (3): * 326 SubHF === _ 560 325 [[ 327 479 ]] !orig=[478] !jvms: Float16::valueOf @ bci:5 (line 329) Float16::subtract @ bci:9 (line 1137) TestSubNodeFloatDoubleNegation::testHalfFloat @ bci:9 (line 63) * 443 SubF === _ 561 442 [[ 607 ]] !jvms: Float16::subtract @ bci:8 (line 1137) TestSubNodeFloatDoubleNegation::testHalfFloat @ bci:12 (line 61) * 479 SubHF === _ 560 326 [[ 480 ]] !jvms: Float16::valueOf @ bci:5 (line 329) Float16::subtract @ bci:9 (line 1137) TestSubNodeFloatDoubleNegation::testHalfFloat @ bci:12 (line 61) The ideal graph shows the same "alternative codepath" that your opto assembly shows. I guess we need to generally predicate the test on native float16 support. But I can do that in a separate issue, where I also investigate ARM. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24421#issuecomment-2778516111 From rcastanedalo at openjdk.org Fri Apr 4 12:24:01 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Fri, 4 Apr 2025 12:24:01 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v14] In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 12:03:28 GMT, Daniel Lund?n wrote: >> If a method has a large number of parameters, we currently bail out from C2 compilation. >> >> ### Changeset >> >> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. >> >> Changes: >> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. >> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. >> - Remove all `can_represent` checks and bailouts. >> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. >> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. >> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, no... > > Daniel Lund?n has updated the pull request incrementally with two additional commits since the last revision: > > - Update test comment to also mention timeouts > - Fix suboptimal max limit in _grow Looks good, thanks for addressing my comments Daniel! ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20404#pullrequestreview-2742799411 From qamai at openjdk.org Fri Apr 4 12:47:06 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 4 Apr 2025 12:47:06 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v14] In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 12:03:28 GMT, Daniel Lund?n wrote: >> If a method has a large number of parameters, we currently bail out from C2 compilation. >> >> ### Changeset >> >> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. >> >> Changes: >> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. >> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. >> - Remove all `can_represent` checks and bailouts. >> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. >> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. >> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, no... > > Daniel Lund?n has updated the pull request incrementally with two additional commits since the last revision: > > - Update test comment to also mention timeouts > - Fix suboptimal max limit in _grow `TestNestedSynchronize.java` is a massive file. I wonder if you can try to generate it using `MethodHandle` or classfile API instead? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20404#issuecomment-2778636310 From duke at openjdk.org Fri Apr 4 12:52:14 2025 From: duke at openjdk.org (duke) Date: Fri, 4 Apr 2025 12:52:14 GMT Subject: RFR: 8347645: C2: XOR bounded value handling blocks constant folding [v49] In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 23:22:09 GMT, Johannes Graham wrote: >> An interaction between xor bounds optimization and constant folding resulted in xor over constants not being optimized. This has a noticeable effect on `Long.expand` with a constant mask, on architectures that don't have instructions equivalent to `PDEP` to be used in an intrinsic. >> >> This change moves logic from the `Xor(L|I)Node::Value` methods into the `add_ring` methods, and gives priority to constant-folding. A static method was separated out to facilitate direct unit-testing. It also (subjectively) simplified the calculation of the upper bound and added an explanation of the reasoning behind it. >> >> In addition to testing for constant folding over xor, IR tests were added to `XorINodeIdealizationTests` and `XorLNodeIdealizationTests` to cover these related items: >> - Bounds optimization of xor >> - A check for `x ^ x = 0` >> - Explicit testing of xor over booleans. >> >> Also `test_xor_node.cpp` was added to more extensively test the correctness of the bounds optimization. It exhaustively tests ranges of 4-bit numbers as well as at the high and low end of the affected types. > > Johannes Graham has updated the pull request incrementally with one additional commit since the last revision: > > update comments @j3graham Your change (at version dda134fbdb1c3b9647c53ef36e5b4a952f9b9576) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23089#issuecomment-2778646344 From dlunden at openjdk.org Fri Apr 4 12:53:11 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Fri, 4 Apr 2025 12:53:11 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v14] In-Reply-To: References: Message-ID: On Fri, 4 Apr 2025 12:44:39 GMT, Quan Anh Mai wrote: > `TestNestedSynchronize.java` is a massive file. I wonder if you can try to generate it using `MethodHandle` or classfile API instead? Indeed, the plan is to migrate it to the [template-based testing framework](https://github.com/openjdk/jdk/pull/24217) when that is ready. I'll have a look at using `MethodHandle`s, it would be nice to not even pollute the git history with the current version. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20404#issuecomment-2778647499 From mli at openjdk.org Fri Apr 4 13:07:57 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 4 Apr 2025 13:07:57 GMT Subject: RFR: 8353665: RISC-V: IR verification fails in TestSubNodeFloatDoubleNegation.java In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 16:57:19 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > The newly added TestSubNodeFloatDoubleNegation.java (in https://github.com/openjdk/jdk/pull/24150) is to check `0 - (0 - x)` is not folded to `x` for float and double. > I have manually checked the IR and generated assembly code, it's not folded on riscv either, just there is an extra SubF in some code path. > So, the fix for this test on riscv should be simply make the check as `>= 2` rather than `2`. > > Tested on both x86 and riscv64. > > Thanks Seems to me the `SubF` is not necessary to be generated (maybe should and could be removed? ), are you going to investigate that or just modify the test? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24421#issuecomment-2778682730 From duke at openjdk.org Fri Apr 4 13:21:57 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 4 Apr 2025 13:21:57 GMT Subject: RFR: 8353665: RISC-V: IR verification fails in TestSubNodeFloatDoubleNegation.java In-Reply-To: References: Message-ID: On Thu, 3 Apr 2025 16:57:19 GMT, Hamlin Li wrote: > Hi, > Can you help to review this patch? > The newly added TestSubNodeFloatDoubleNegation.java (in https://github.com/openjdk/jdk/pull/24150) is to check `0 - (0 - x)` is not folded to `x` for float and double. > I have manually checked the IR and generated assembly code, it's not folded on riscv either, just there is an extra SubF in some code path. > So, the fix for this test on riscv should be simply make the check as `>= 2` rather than `2`. > > Tested on both x86 and riscv64. > > Thanks Filed [JDK-8353730](https://bugs.openjdk.org/browse/JDK-8353730). I will first fix the test and then investigate the additional `SubF`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24421#issuecomment-2778715245 From mli at openjdk.org Fri Apr 4 13:25:13 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 4 Apr 2025 13:25:13 GMT Subject: RFR: 8353665: RISC-V: IR verification fails in TestSubNodeFloatDoubleNegation.java [v2] In-Reply-To: References: Message-ID: > Hi, > Can you help to review this patch? > The newly added TestSubNodeFloatDoubleNegation.java (in https://github.com/openjdk/jdk/pull/24150) is to check `0 - (0 - x)` is not folded to `x` for float and double. > I have manually checked the IR and generated assembly code, it's not folded on riscv either, just there is an extra SubF in some code path. > So, the fix for this test on riscv should be simply make the check as `>= 2` rather than `2`. > > Tested on both x86 and riscv64. > > Thanks Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: refine ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24421/files - new: https://git.openjdk.org/jdk/pull/24421/files/cd9df312..abb3d548 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24421&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24421&range=00-01 Stats: 10 lines in 2 files changed: 6 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24421.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24421/head:pull/24421 PR: https://git.openjdk.org/jdk/pull/24421 From duke at openjdk.org Fri Apr 4 13:27:09 2025 From: duke at openjdk.org (Johannes Graham) Date: Fri, 4 Apr 2025 13:27:09 GMT Subject: Integrated: 8347645: C2: XOR bounded value handling blocks constant folding In-Reply-To: References: Message-ID: On Mon, 13 Jan 2025 22:16:20 GMT, Johannes Graham wrote: > An interaction between xor bounds optimization and constant folding resulted in xor over constants not being optimized. This has a noticeable effect on `Long.expand` with a constant mask, on architectures that don't have instructions equivalent to `PDEP` to be used in an intrinsic. > > This change moves logic from the `Xor(L|I)Node::Value` methods into the `add_ring` methods, and gives priority to constant-folding. A static method was separated out to facilitate direct unit-testing. It also (subjectively) simplified the calculation of the upper bound and added an explanation of the reasoning behind it. > > In addition to testing for constant folding over xor, IR tests were added to `XorINodeIdealizationTests` and `XorLNodeIdealizationTests` to cover these related items: > - Bounds optimization of xor > - A check for `x ^ x = 0` > - Explicit testing of xor over booleans. > > Also `test_xor_node.cpp` was added to more extensively test the correctness of the bounds optimization. It exhaustively tests ranges of 4-bit numbers as well as at the high and low end of the affected types. This pull request has now been integrated. Changeset: 37f8e419 Author: Johannes Graham URL: https://git.openjdk.org/jdk/commit/37f8e419f9661ba30b3c34bd9fecef71ab1eddb1 Stats: 620 lines in 5 files changed: 568 ins; 27 del; 25 mod 8347645: C2: XOR bounded value handling blocks constant folding Reviewed-by: epeter, vlivanov, qamai, jkarthikeyan ------------- PR: https://git.openjdk.org/jdk/pull/23089 From mli at openjdk.org Fri Apr 4 13:31:59 2025 From: mli at openjdk.org (Hamlin Li) Date: Fri, 4 Apr 2025 13:31:59 GMT Subject: RFR: 8353665: RISC-V: IR verification fails in TestSubNodeFloatDoubleNegation.java [v2] In-Reply-To: References: Message-ID: <4yKbB3oktdryefE4g492Sg3RaqfONagQlPTiNZsVsPc=.a6013f02-5e69-4f4e-8b3d-9f45fe421168@github.com> On Fri, 4 Apr 2025 13:25:13 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> The newly added TestSubNodeFloatDoubleNegation.java (in https://github.com/openjdk/jdk/pull/24150) is to check `0 - (0 - x)` is not folded to `x` for float and double. >> I have manually checked the IR and generated assembly code, it's not folded on riscv either, just there is an extra SubF in some code path. >> So, the fix for this test on riscv should be simply make the check as `>= 2` rather than `2`. >> >> Tested on both x86 and riscv64. >> >> Thanks > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > refine Great! Let's fix this test first, I'll just fix riscv part, as I don't have the env to verify other platforms. Also file a bug to track this `SubF` (potential) "issue", https://bugs.openjdk.org/browse/JDK-8353732, feel free to take it when you want to start the work. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24421#issuecomment-2778738383 From duke at openjdk.org Fri Apr 4 13:47:55 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Fri, 4 Apr 2025 13:47:55 GMT Subject: RFR: 8353665: RISC-V: IR verification fails in TestSubNodeFloatDoubleNegation.java [v2] In-Reply-To: References: Message-ID: <5TGT7p9dT-jE73UeK_mzm_y8ehRlqMDq8FSeYY4ULBI=.54d3b223-a871-4e55-acc0-7159902ddc4a@github.com> On Fri, 4 Apr 2025 13:25:13 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> The newly added TestSubNodeFloatDoubleNegation.java (in https://github.com/openjdk/jdk/pull/24150) is to check `0 - (0 - x)` is not folded to `x` for float and double. >> I have manually checked the IR and generated assembly code, it's not folded on riscv either, just there is an extra SubF in some code path. >> So, the fix for this test on riscv should be simply make the check as `>= 2` rather than `2`. >> >> Tested on both x86 and riscv64. >> >> Thanks > > Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: > > refine Looks good to me, with or without my suggestion. Thank you for catching and fixing this! test/hotspot/jtreg/compiler/floatingpoint/TestSubNodeFloatDoubleNegation.java line 60: > 58: @IR(counts = { IRNode.SUB, "2" }, applyIfPlatform = {"riscv64", "false"}) > 59: @IR(counts = { IRNode.SUB, "2" }, applyIfCPUFeature = {"zfh", "false"}) > 60: @IR(counts = { IRNode.SUB, ">= 2" }, applyIfCPUFeature = {"zfh", "true"}) Just a small nit: I find the following expresses the intention of the test more precisely Suggestion: @IR(counts = { IRNode.SUB_HF, "2" }, applyIfCPUFeature = {"zfh", "true"}) ------------- Marked as reviewed by mhaessig at github.com (no known OpenJDK username). PR Review: https://git.openjdk.org/jdk/pull/24421#pullrequestreview-2743050562 PR Review Comment: https://git.openjdk.org/jdk/pull/24421#discussion_r2028842855 From dlunden at openjdk.org Fri Apr 4 14:11:56 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Fri, 4 Apr 2025 14:11:56 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v15] In-Reply-To: References: Message-ID: > If a method has a large number of parameters, we currently bail out from C2 compilation. > > ### Changeset > > Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. > > Changes: > - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. > - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. > - Remove all `can_represent` checks and bailouts. > - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. > - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. > - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, not worth it). > > ![c2-regression](https:/... Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: Revise overlap comments for frequency of cases ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20404/files - new: https://git.openjdk.org/jdk/pull/20404/files/c41a76b9..74357621 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20404&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20404&range=13-14 Stats: 8 lines in 1 file changed: 4 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/20404.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20404/head:pull/20404 PR: https://git.openjdk.org/jdk/pull/20404