From fyang at openjdk.org Tue Apr 1 01:42:15 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 1 Apr 2025 01:42:15 GMT Subject: RFR: 8353219: RISC-V: Fix client builds after JDK-8345298 In-Reply-To: References:

Message-ID: On Sat, 29 Mar 2025 03:16:37 GMT, Feilong Jiang wrote: >> Hi, please review this trivial change fixing a client build issue. >> The definitions of both `generate_float16ToFloat()` and `generate_floatToFloat16()` should be moved out of `COMPILER2_OR_JVMCI` macro scope. Testing: client builds fine on linux-riscv64 with this change. > > Marked as reviewed by fjiang (Committer). @feilongjiang @robehn : Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24307#issuecomment-2767815574 From fyang at openjdk.org Tue Apr 1 01:42:16 2025 From: fyang at openjdk.org (Fei Yang) Date: Tue, 1 Apr 2025 01:42:16 GMT Subject: Integrated: 8353219: RISC-V: Fix client builds after JDK-8345298 In-Reply-To: References: Message-ID: On Sat, 29 Mar 2025 02:01:17 GMT, Fei Yang wrote: > Hi, please review this trivial change fixing a client build issue. > The definitions of both `generate_float16ToFloat()` and `generate_floatToFloat16()` should be moved out of `COMPILER2_OR_JVMCI` macro scope. Testing: client builds fine on linux-riscv64 with this change. This pull request has now been integrated. Changeset: 860a789e Author: Fei Yang URL: https://git.openjdk.org/jdk/commit/860a789e9153448345f19d70dd07e294a0b62223 Stats: 4 lines in 1 file changed: 2 ins; 2 del; 0 mod 8353219: RISC-V: Fix client builds after JDK-8345298 Reviewed-by: fjiang, rehn ------------- PR: https://git.openjdk.org/jdk/pull/24307 From qamai at openjdk.org Tue Apr 1 02:17:14 2025 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 1 Apr 2025 02:17:14 GMT Subject: RFR: 8347645: C2: XOR bounded value handling blocks constant folding [v46] In-Reply-To: References:

Message-ID: On Mon, 31 Mar 2025 22:28:49 GMT, Vladimir Ivanov wrote: >> Johannes Graham has updated the pull request incrementally with one additional commit since the last revision: >> >> add missing import > > Thanks. > >> The naming of that method evolved during the course of the review of this PR. I believe the thinking was that the check was not necessarily an overall upper bound, and a simpler name would imply it was more general. > > There's usually a lot of invariants a function assumes and it's simply impractical to encode everything in the name. Speaking of this particular case (`calc_xor_upper_bound_of_non_neg`): > * `calc_` is redundant and IMO only adds noise; > * `_non_neg` part is confusing; I'd stress instead that it works on **ranges**. > > So, `xor_upper_bound_for_ranges` then? (And, please, explain in the comment what's the correspondense between `S` and `U` template type parameters.) > >> `addnodeXorUtil.hpp` > > I'm fine with placing it under `opto`. Please, rename the file into `src/hotspot/share/opto/utilities/xor.hpp`. @iwanowww > `_non_neg` part is confusing; I'd stress instead that it works on ranges. I find it easier to think of it as calculating the upperbound of the xor of 2 non-negative integers whose upperbounds are given in the parameters. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23089#issuecomment-2767875213 From duke at openjdk.org Tue Apr 1 02:28:15 2025 From: duke at openjdk.org (Johannes Graham) Date: Tue, 1 Apr 2025 02:28:15 GMT Subject: RFR: 8347645: C2: XOR bounded value handling blocks constant folding [v47] In-Reply-To: References: Message-ID: > An interaction between xor bounds optimization and constant folding resulted in xor over constants not being optimized. This has a noticeable effect on `Long.expand` with a constant mask, on architectures that don't have instructions equivalent to `PDEP` to be used in an intrinsic. > > This change moves logic from the `Xor(L|I)Node::Value` methods into the `add_ring` methods, and gives priority to constant-folding. A static method was separated out to facilitate direct unit-testing. It also (subjectively) simplified the calculation of the upper bound and added an explanation of the reasoning behind it. > > In addition to testing for constant folding over xor, IR tests were added to `XorINodeIdealizationTests` and `XorLNodeIdealizationTests` to cover these related items: > - Bounds optimization of xor > - A check for `x ^ x = 0` > - Explicit testing of xor over booleans. > > Also `test_xor_node.cpp` was added to more extensively test the correctness of the bounds optimization. It exhaustively tests ranges of 4-bit numbers as well as at the high and low end of the affected types. Johannes Graham has updated the pull request incrementally with one additional commit since the last revision: address review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23089/files - new: https://git.openjdk.org/jdk/pull/23089/files/94a32dba..59875d54 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23089&range=46 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23089&range=45-46 Stats: 96 lines in 4 files changed: 47 ins; 41 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/23089.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23089/head:pull/23089 PR: https://git.openjdk.org/jdk/pull/23089 From duke at openjdk.org Tue Apr 1 02:44:03 2025 From: duke at openjdk.org (Johannes Graham) Date: Tue, 1 Apr 2025 02:44:03 GMT Subject: RFR: 8347645: C2: XOR bounded value handling blocks constant folding [v48] In-Reply-To: References: Message-ID: <1JYbwRdMBDikLGt3iXx87YRTWrF6NwzbFDH916UuoSA=.1fb10eab-4963-4d4c-a8ae-97ec3cecdfe2@github.com> > An interaction between xor bounds optimization and constant folding resulted in xor over constants not being optimized. This has a noticeable effect on `Long.expand` with a constant mask, on architectures that don't have instructions equivalent to `PDEP` to be used in an intrinsic. > > This change moves logic from the `Xor(L|I)Node::Value` methods into the `add_ring` methods, and gives priority to constant-folding. A static method was separated out to facilitate direct unit-testing. It also (subjectively) simplified the calculation of the upper bound and added an explanation of the reasoning behind it. > > In addition to testing for constant folding over xor, IR tests were added to `XorINodeIdealizationTests` and `XorLNodeIdealizationTests` to cover these related items: > - Bounds optimization of xor > - A check for `x ^ x = 0` > - Explicit testing of xor over booleans. > > Also `test_xor_node.cpp` was added to more extensively test the correctness of the bounds optimization. It exhaustively tests ranges of 4-bit numbers as well as at the high and low end of the affected types. Johannes Graham has updated the pull request incrementally with one additional commit since the last revision: remove unused methods ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23089/files - new: https://git.openjdk.org/jdk/pull/23089/files/59875d54..50d35dcd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23089&range=47 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23089&range=46-47 Stats: 12 lines in 2 files changed: 0 ins; 12 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/23089.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23089/head:pull/23089 PR: https://git.openjdk.org/jdk/pull/23089 From duke at openjdk.org Tue Apr 1 02:52:17 2025 From: duke at openjdk.org (Johannes Graham) Date: Tue, 1 Apr 2025 02:52:17 GMT Subject: RFR: 8347645: C2: XOR bounded value handling blocks constant folding [v46] In-Reply-To: References:

Message-ID: On Tue, 1 Apr 2025 02:14:35 GMT, Quan Anh Mai wrote: >> Thanks. >> >>> The naming of that method evolved during the course of the review of this PR. I believe the thinking was that the check was not necessarily an overall upper bound, and a simpler name would imply it was more general. >> >> There's usually a lot of invariants a function assumes and it's simply impractical to encode everything in the name. Speaking of this particular case (`calc_xor_upper_bound_of_non_neg`): >> * `calc_` is redundant and IMO only adds noise; >> * `_non_neg` part is confusing; I'd stress instead that it works on **ranges**. >> >> So, `xor_upper_bound_for_ranges` then? (And, please, explain in the comment what's the correspondense between `S` and `U` template type parameters.) >> >>> `addnodeXorUtil.hpp` >> >> I'm fine with placing it under `opto`. Please, rename the file into `src/hotspot/share/opto/utilities/xor.hpp`. > > @iwanowww > >> `_non_neg` part is confusing; I'd stress instead that it works on ranges. > > I find it easier to think of it as calculating the upperbound of the xor of 2 non-negative integers whose upperbounds are given in the parameters. Renamed to `xor_upper_bound_for_ranges` before I saw your comment, @merykitty. I'd be ok with another name though. With the last changes, the method is no longer a member of the class, so it's no longer going to get as many eyes on it without context, so maybe it matters less now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23089#issuecomment-2767917005 From duke at openjdk.org Tue Apr 1 04:33:34 2025 From: duke at openjdk.org (Anjian-Wen) Date: Tue, 1 Apr 2025 04:33:34 GMT Subject: RFR: 8329887: RISC-V: C2: Support Zvbb Vector And-Not instruction [v3] In-Reply-To: <1KHNbMIgOO7jSZ1Fm4HzxadYaNzE4Xbq4nTitlKy3Po=.17d7860b-10de-4f19-87d8-87fc17313ce2@github.com> References: <1KHNbMIgOO7jSZ1Fm4HzxadYaNzE4Xbq4nTitlKy3Po=.17d7860b-10de-4f19-87d8-87fc17313ce2@github.com> Message-ID: > support Zvbb Vector And-Not vandn.vv match rule and add test Anjian-Wen has updated the pull request incrementally with one additional commit since the last revision: RISC-V: C2: Support Zvbb Vector And-Not instruction fix match rule for format ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24129/files - new: https://git.openjdk.org/jdk/pull/24129/files/7fc67099..a15d58dc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24129&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24129&range=01-02 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24129.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24129/head:pull/24129 PR: https://git.openjdk.org/jdk/pull/24129 From galder at openjdk.org Tue Apr 1 04:56:44 2025 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Tue, 1 Apr 2025 04:56:44 GMT Subject: RFR: 8348887: Create IR framework test for JDK-8347997 In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 13:41:17 GMT, Marc Chevalier wrote: > As the ticket says: >> Create IR framework test which checks that allocations are eliminated in the regression test included in [JDK-8347997](https://bugs.openjdk.org/browse/JDK-8347997) fix. > > So here it is! We can see that in case of inlining, indeed, no allocation happens. The second part is some sanity check to emphasize the difference: of course, there is an allocation without inlining. The benefit of this second part is arguable. From my point of view, it's mostly to point out the difference to a future reader. But yes, there is nothing very surprising. > > Thanks, > Marc Changes requested by galder (Author). test/hotspot/jtreg/compiler/c2/irTests/TestContinuationPinningAndEA.java line 118: > 116: > 117: @DontInline > 118: public CrashesNoInline() throws Throwable { It's probably my own ignorance, but just in case are others are in the same boat, why does this crash? Could you add a brief javadoc for future readers? Same with other Crashes cases. ------------- PR Review: https://git.openjdk.org/jdk/pull/24328#pullrequestreview-2731106771 PR Review Comment: https://git.openjdk.org/jdk/pull/24328#discussion_r2022140499 From hgreule at openjdk.org Tue Apr 1 06:27:49 2025 From: hgreule at openjdk.org (Hannes Greule) Date: Tue, 1 Apr 2025 06:27:49 GMT Subject: RFR: 8353359: C2: Or(I|L)Node::Ideal is missing AddNode::Ideal call Message-ID: Hi, this simple change adds a missing AddNode::Ideal call to Or(I|L)Node::Ideal. See the added tests for examples of optimizations that don't apply without this change. Please let me know what you think. ------------- Commit messages: - Call AddNode::Ideal in Or(I|L)Node::Ideal - Test AddNode::Ideal optimizations for Or(I|L) Changes: https://git.openjdk.org/jdk/pull/24348/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24348&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353359 Stats: 37 lines in 3 files changed: 33 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24348.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24348/head:pull/24348 PR: https://git.openjdk.org/jdk/pull/24348 From epeter at openjdk.org Tue Apr 1 07:06:32 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 1 Apr 2025 07:06:32 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v6] In-Reply-To: References: Message-ID: > We should extend the functionality of Verify.checkEQ: > - Allow different NaN encodings to be seen as equal (by default). > - Compare VectorAPI vectors. > - Compare Exceptions, and their messages. > - Compare arbitrary Objects via Reflection. > > Note: this is a prerequisite for the Template Library [JDK-8352861](https://bugs.openjdk.org/browse/JDK-8352861) / https://github.com/openjdk/jdk/pull/23418. Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: - upate copyright - Apply suggestions from code review Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24224/files - new: https://git.openjdk.org/jdk/pull/24224/files/d46c45de..4ca42699 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24224&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24224&range=04-05 Stats: 7 lines in 1 file changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/24224.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24224/head:pull/24224 PR: https://git.openjdk.org/jdk/pull/24224 From epeter at openjdk.org Tue Apr 1 07:06:33 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 1 Apr 2025 07:06:33 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v5] In-Reply-To: References:

Message-ID: On Mon, 31 Mar 2025 14:04:29 GMT, Christian Hagedorn wrote: >> Emanuel Peter has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: >> >> - Merge branch 'master' into JDK-8352869-Verify-NaN-Vector-Objects >> - Verify.Options refactor for Galder >> - Update test/hotspot/jtreg/compiler/lib/verify/Verify.java >> >> Co-authored-by: Galder Zamarre?o >> - Merge branch 'master' into JDK-8352869-Verify-NaN-Vector-Objects >> - clean up test >> - JDK-8352869 > > Nice extensions! Some initial comments. @chhagedorn Thanks for the suggestions and questions! I think I addressed them all :) > test/hotspot/jtreg/compiler/lib/verify/Verify.java line 25: > >> 23: >> 24: package compiler.lib.verify; >> 25: > > You should update the copyright year. done :) > test/hotspot/jtreg/compiler/lib/verify/Verify.java line 209: > >> 207: print(a, b, field, aParent, bParent); >> 208: throw new VerifyException("Object type not supported: " + ca.getName() + " -- did you mean to 'enableCheckWithArbitraryClasses'?"); >> 209: } > > What's the reason behind throwing instead of just comparing two arbitrary objects by default? If a user calls `Verify.checkEQ()` and sees this exception, I would guess he then just passes the additional option and we have the same result. But maybe I'm missing something. Good question. I think my reasoning was that comparing arbitrary classes requires reflection. And that is rather slow. So by default it would be good if that feature is not enabled, so the user tries to avoid it, and is aware when they enable it explicitly. But if you think that is not useful, I can remove the feature. @chhagedorn what do you think? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24224#issuecomment-2768381692 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2022262263 PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2022265798 From epeter at openjdk.org Tue Apr 1 07:08:27 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 1 Apr 2025 07:08:27 GMT Subject: RFR: 8352893: C2: OrL/INode::add_ring optimize (x | -1) to -1 [v3] In-Reply-To: References:

Message-ID: On Mon, 31 Mar 2025 10:38:21 GMT, Roland Westrelin wrote: >> This is primarily motivated by 8275202 (C2: optimize out more >> redundant conditions). In the following code snippet: >> >> >> int[] array = new int[arraySize]; >> if (j <= arraySize) { >> if (i >= 0) { >> if (i < j) { >> int v = array[i]; >> >> >> (`arraySize` is a constant) >> >> at the range check, `j` is known to be in `[min, arraySize]` as a >> consequence, `i` is known to be `[0, arraySize-1]`. The range check >> can be eliminated. >> >> Now, if later, `i` constant folds to some value that's positive but >> out of range for the array: >> >> - if that happens when the new pass runs, then it can prove that: >> >> if (i < j) { >> >> is never taken. >> >> - if that happens during IGVN or CCP however, that condition is not >> constant folded. And because the range check was removed, there's no >> guard protecting the range check `CastII`. It becomes `top` and, as >> a result, the graph can become broken. >> >> What I propose here is that when the `CastII` becomes dead, any CFG >> paths that use the `CastII` node is made unreachable. So in pseudo code: >> >> >> int[] array = new int[arraySize]; >> if (j <= arraySize) { >> if (i >= 0) { >> if (i < j) { >> halt(); >> >> >> Finding the CFG paths is implemented in the patch by following the >> uses of the node until a CFG node or a `Phi` is encountered. >> >> The patch applies this to all `Type` nodes as with 8275202, I also ran >> in some rare corner cases with other types of nodes. The exception is >> `Phi` nodes which may not be as easy to handle (and for which I had no >> issue with 8275202). >> >> Finally, the patch includes a test case that's unrelated to the >> discussion of 8275202 above. In that test case, a `CastII` becomes top >> but the test that guards it doesn't constant fold. The root cause is a >> transformation of: >> >> >> (CastII (AddI >> >> >> into >> >> >> (AddI (CastII ) (CastII)` >> >> >> which causes the resulting node to have a wider type. The `CastII` >> captures a type before the transformation above happens. Once it has >> happened, the guard for the `CastII` can't be constant folded when an >> out of bound value occurs. >> >> This is likely fixable some other way (eventhough it doesn't seem >> straightforward). Given the long history of similar issues (and the >> test case that shows that they are more hiding), I think it would >> make sense to try some other way of approaching them. > > Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: > > review src/hotspot/share/opto/node.cpp line 3096: > 3094: // paths. The dead paths are then replaced by a Halt node. > 3095: void TypeNode::make_paths_from_here_dead(PhaseIterGVN* igvn, PhaseIdealLoop* loop, const char* phase_str) { > 3096: Unique_Node_List wq; Should there be a `ResourceMark` here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23468#discussion_r2022275763 From epeter at openjdk.org Tue Apr 1 07:18:45 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 1 Apr 2025 07:18:45 GMT Subject: RFR: 8344942: Template-Based Testing Framework [v7] In-Reply-To: References: Message-ID: <0b56TIXbIwSy7Zo77WAx4uweu2kM8iAmjPMomeT3sts=.06d78493-7b94-4386-a7be-4fb65837926b@github.com> > **Goal** > We want to generate Java source code: > - Make it easy to generate variants of tests. E.g. for each offset, for each operator, for each type, etc. > - Enable the generation of domain specific fuzzers (e.g. random expressions and statements). > > Note: with the Template Library draft I was already able to find a [list of bugs](https://bugs.openjdk.org/issues/?jql=labels%20%3D%20template-framework%20ORDER%20BY%20created%20DESC%2C%20summary%20DESC). > > **How to get started** > When reviewing, please start by looking at: > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestSimple.java#L60-L76 > > We have a Template with two arguments. They are typed (Integer and String). We then apply the arguments `template.withArgs(42, "7")`, producing a `TemplateWithArgs`. This can then be `render`ed to a String. And then that can be compiled and executed with the CompileFramework. > > Second, look at this advanced test: > https://github.com/openjdk/jdk/blob/77079807042fc5a3af04e0ccccad4ecd89e21cdb/test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestAdvanced.java#L102-L119 > > And then for a "tutorial", look at: > `test/hotspot/jtreg/testlibrary_tests/template_framework/examples/TestTutorial.java` > > It shows these features: > - The `body` of a Template is essentially a list of `Token`s that are concatenated. > - Templates can be nested: a `TemplateWithArgs` is also a `Token`. > - We can use `#name` replacements to directly format values into the String. If we had proper String Templates in Java, we would not need this feature. > - We can use `$var` to make variable names unique: if we applied the same template twice, we would get variable collisions. `$var` is then replaced with e.g. `var_7` in one template use and `var_42` in the other template use. > - The use of `Hook`s to insert code into outer (earlier) code locations. This is useful, for example, to insert fields on demand. > - The use of recursive templates, and `fuel` to limit the recursion. > - `Name`s: useful to register field and variable names in code scopes. > > Next, look at the documentation in. This file is the heart of the Template Framework, and describes all the important features. > https://github.com/openjdk/jdk/blob/d21a8aabaf3b191e851b6997c11bb30fcd0f942f/test/hotspot/jtreg/compiler/lib/template_framework/Template.java#L31-L76 > > For a better experience, you may want to generate the `javadocs`: > `javadoc -sourcepath test/hotspot/j... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: typo ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24217/files - new: https://git.openjdk.org/jdk/pull/24217/files/77079807..be1c0ee9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24217&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24217.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24217/head:pull/24217 PR: https://git.openjdk.org/jdk/pull/24217 From dskantz at openjdk.org Tue Apr 1 07:28:22 2025 From: dskantz at openjdk.org (Daniel Skantz) Date: Tue, 1 Apr 2025 07:28:22 GMT Subject: RFR: 8282053: IGV: refine schedule approximation Message-ID: This patch refines the schedule approximation in IGV by 1) placing parm. and projection nodes in the same block as their predecessors, and 2) disallows erroneously considering machine nodes such as prefetchAlloc and rep_stos as CFG nodes. The reader may refer to the corresponding JBS issue where graphs sampled before and after the change are attached. Testing: T1-T3 with no failures. Opened graphs before and after the change and saw no obvious problems. Opened a large number of graphs in CFG view and observed no unexpected IGV warnings, errors or assert failures. ------------- Commit messages: - fix Changes: https://git.openjdk.org/jdk/pull/24350/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24350&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8282053 Stats: 21 lines in 1 file changed: 20 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24350.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24350/head:pull/24350 PR: https://git.openjdk.org/jdk/pull/24350 From roland at openjdk.org Tue Apr 1 07:31:12 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 1 Apr 2025 07:31:12 GMT Subject: RFR: 8341976: C2: use_mem_state != load->find_exact_control(load->in(0)) assert failure [v7] In-Reply-To: References: Message-ID: > The `arraycopy` writes to a non escaping array so its `ArrayCopy` node > is marked as having a narrow memory effect. One of the loads from the > destination after the copy is transformed into a load from the source > array (the rationale being that if there's no load from the > destination of the copy, the `arraycopy` is not needed). The load from > the source has the input memory state of the `ArrayCopy` as memory > input. That load is then sunk out of the loop and its control is > updated to be after the `ArrayCopy`. That's legal because the > `ArrayCopy` only has a narrow memory effect and can't modify the > source. The `ArrayCopy` can't be eliminated and is expanded. In the > process, a `MemBar` that has a wide memory effect is added. The load > from the source has control after the membar but memory state before > and because the membar has a wide memory effect, the load is anti > dependent on the membar: the graph is broken (the load can't be pinned > after the membar and anti dependent on it). > > In short, the problem is that the graph is transformed under the > assumption that the `ArrayCopy` has a narrow effect but the > `ArrayCopy` is expanded to a subgraph that has a wide memory > effect. The fix I propose is to not insert a membar with a wide memory > effect. We still need a membar when the destination is non escaping > because the expanded `ArrayCopy`, if it writes to a tighly allocated > array, writes to raw memory and not to the destination memory slice. Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: - review - Merge branch 'master' into JDK-8341976 - review - review - Merge branch 'master' into JDK-8341976 - -XX:+TraceLoopOpts fix - review - more - Merge branch 'master' into JDK-8341976 - more - ... and 6 more: https://git.openjdk.org/jdk/compare/47f2dbd6...9b21648d ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23465/files - new: https://git.openjdk.org/jdk/pull/23465/files/9f79e0b0..9b21648d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23465&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23465&range=05-06 Stats: 8742 lines in 156 files changed: 4824 ins; 3469 del; 449 mod Patch: https://git.openjdk.org/jdk/pull/23465.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23465/head:pull/23465 PR: https://git.openjdk.org/jdk/pull/23465 From roland at openjdk.org Tue Apr 1 07:32:47 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 1 Apr 2025 07:32:47 GMT Subject: RFR: 8341976: C2: use_mem_state != load->find_exact_control(load->in(0)) assert failure [v2] In-Reply-To: <9cGlvzZnXc8B5tNxXSE2Eqi2FDJzP26U7c-yan4ZdCc=.3f6b0821-8b6c-453d-87ee-91205cc6627a@github.com> References:

<9cGlvzZnXc8B5tNxXSE2Eqi2FDJzP26U7c-yan4ZdCc=.3f6b0821-8b6c-453d-87ee-91205cc6627a@github.com> Message-ID: On Mon, 31 Mar 2025 12:26:42 GMT, Christian Hagedorn wrote: >> Right. So maybe, we could treat that `Opaque` node the way we do for `OpaqueZeroTripGuard` and have it constant fold when the backedge is never taken. >> >> So I should revert the change to the `IdealLoopTree::dump_head()` and the test run with `TraceLoopOpts`? > >> So maybe, we could treat that Opaque node the way we do for OpaqueZeroTripGuard and have it constant fold when the backedge is never taken. > > Right, that sounds like a good solution. > >> So I should revert the change to the IdealLoopTree::dump_head() and the test run with TraceLoopOpts? > > Yes, that would be great. We can make a comment in [JDK-8297752](https://bugs.openjdk.org/browse/JDK-8297752) to add `-XX:+TraceLoopOpts` as additional run to this test when we fix it. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23465#discussion_r2022305281 From rcastanedalo at openjdk.org Tue Apr 1 07:44:27 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 1 Apr 2025 07:44:27 GMT Subject: RFR: 8282053: IGV: refine schedule approximation In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 07:23:04 GMT, Daniel Skantz wrote: > This patch refines the schedule approximation in IGV by 1) placing parm. and projection nodes in the same block as their predecessors, and 2) disallows erroneously considering machine nodes such as prefetchAlloc and rep_stos as CFG nodes. > > The reader may refer to the corresponding JBS issue where graphs sampled before and after the change are attached. > > Testing: T1-T3 with no failures. Opened graphs before and after the change and saw no obvious problems. Opened a large number of graphs in CFG view and observed no unexpected IGV warnings, errors or assert failures. Thanks for working on this, Daniel. Looks good! ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24350#pullrequestreview-2731778708 From shade at openjdk.org Tue Apr 1 07:58:27 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 1 Apr 2025 07:58:27 GMT Subject: RFR: 8353188: C1: Clean up x86 backend after 32-bit x86 removal [v2] In-Reply-To: <-iwh_5JGpt-TAVpfZQjwbnIG_c8hvirNKCcmiZoLNls=.3b34bf15-51fc-42bf-a294-1c23ca99754c@github.com> References: <-iwh_5JGpt-TAVpfZQjwbnIG_c8hvirNKCcmiZoLNls=.3b34bf15-51fc-42bf-a294-1c23ca99754c@github.com> Message-ID: > Piece-wise cleanup of C1_LIRAssembler_x86, C1_MacroAssembler and related classes. C1 implements the bulk of arch-specific backend there. Major parts of this backend are already removed by #24274, this cleans up another large bulk, and hopefully most of it. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux x86_64 server fastdebug, `all` + `-XX:TieredStopAtLevel=1` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24301/files - new: https://git.openjdk.org/jdk/pull/24301/files/47f239c2..527854ec Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24301&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24301&range=00-01 Stats: 12 lines in 2 files changed: 0 ins; 11 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/24301.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24301/head:pull/24301 PR: https://git.openjdk.org/jdk/pull/24301 From shade at openjdk.org Tue Apr 1 07:58:27 2025 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 1 Apr 2025 07:58:27 GMT Subject: RFR: 8353188: C1: Clean up x86 backend after 32-bit x86 removal [v2] In-Reply-To: References: <-iwh_5JGpt-TAVpfZQjwbnIG_c8hvirNKCcmiZoLNls=.3b34bf15-51fc-42bf-a294-1c23ca99754c@github.com> Message-ID: <5WuuW8GQhWOxXqYgEsVG0DZAjsu8DTjOdJZKWaae7vU=.be96f09d-9e3e-4472-94d1-3d92b487eb33@github.com> On Mon, 31 Mar 2025 21:14:45 GMT, Vladimir Ivanov wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments > > src/hotspot/cpu/x86/c1_FrameMap_x86.cpp line 45: > >> 43: Register reg = r_1->as_Register(); >> 44: if (r_2->is_Register() && (type == T_LONG || type == T_DOUBLE)) { >> 45: Register reg2 = r_2->as_Register(); > > FTR `reg2` is unused. (Moreover, `r_2` and `r_2->is_Register()` are redundant on x64.) Right. Cleaned those up too. > src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp line 827: > >> 825: // compressed klass ptrs: T_METADATA can be a compressed klass >> 826: // ptr or a 64 bit method pointer. >> 827: ShouldNotReachHere(); > > Alternatively, you could drop the whole `T_METADATA` case and defer the handling to default case. I initially thought leaving the comment there as meaningful, but now I think that comment only relates to 32-bit x86, so now is redundant. So I dropped the `T_METADATA` case completely. > src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp line 3063: > >> 3061: ExternalAddress((address)double_signflip_pool), >> 3062: rscratch1); >> 3063: > > Is it intentional or just a leftover? Merge leftover, removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24301#discussion_r2022343408 PR Review Comment: https://git.openjdk.org/jdk/pull/24301#discussion_r2022344553 PR Review Comment: https://git.openjdk.org/jdk/pull/24301#discussion_r2022344878 From roland at openjdk.org Tue Apr 1 08:06:09 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 1 Apr 2025 08:06:09 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v7] In-Reply-To: References:

Message-ID: <09Q1vDaTXq3VlLU4xxQl_E7wDM2FT7tqR_Bc8ky8RNc=.4e11f2f8-75c3-49a1-b0b3-20eac17c4b39@github.com> On Tue, 1 Apr 2025 07:09:46 GMT, Emanuel Peter wrote: >> Roland Westrelin has updated the pull request incrementally with one additional commit since the last revision: >> >> review > > src/hotspot/share/opto/node.cpp line 3096: > >> 3094: // paths. The dead paths are then replaced by a Halt node. >> 3095: void TypeNode::make_paths_from_here_dead(PhaseIterGVN* igvn, PhaseIdealLoop* loop, const char* phase_str) { >> 3096: Unique_Node_List wq; > > Should there be a `ResourceMark` here? The callers have the `ResourceMark`. This is because it's code I extracted from 8275202: I think it used to not be safe to call `PhaseIdealLoop::register_new_node` from within the `ResourceMark` but I see there were changes in that area (data structures used by `PhaseIdealLoop` no longer allocated in the resource area). So it looks like it could be changed now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23468#discussion_r2022357322 From epeter at openjdk.org Tue Apr 1 08:25:21 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 1 Apr 2025 08:25:21 GMT Subject: RFR: 8347449: C2: UseLoopPredicate off should also turn UseProfiledLoopPredicate off [v6] In-Reply-To: References:

Message-ID: On Fri, 28 Mar 2025 09:09:59 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> When running with `-XX:-UseLoopPredicate` C2 still inserts profiled loop parse predicates, despite those being a form of loop parse predicate. Further, the loop predicate code is not always consistent when to insert/expect profiled parse predicates. >> >> # Change Summary >> >> Following the rationale, that profiled predicates are a subset of loop predicates, this PR disables profiled predicates whenever loop predicates are disabled. They are disabled on the level of arguments. Further, before any checks for whether profiled predicates are enabled, this PR inserts a check that loop predicates are enabled such that the code is consistent in its intention. >> >> Concretel, this PR >> - adds parse predicate nodes to the IR testing framework, >> - turns off `UseProfiledLoopPredicate` if `UseLoopPredicate` is turned off, >> - predicates all checks for `UseProfiledLoopPredicate`on `UseLoopPredicate` first for consistency, >> - adds a regression test. >> >> >> # Testing >> >> The changes passed the following testing: >> - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14078750038) >> - tier1 through tier3 and Oracle internal testing > > Manuel H?ssig has updated the pull request incrementally with two additional commits since the last revision: > > - idealKit::loop: always call add_parse_predicates > > It was contstrained on UseParsePredicate, but this is incorrect, since > all parse predicates are added in that function. > - Improve description of UseLoopPredicate argument Looks good to me now, thanks for the updates! ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24248#pullrequestreview-2731884935 From chagedorn at openjdk.org Tue Apr 1 08:30:34 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 1 Apr 2025 08:30:34 GMT Subject: RFR: 8347449: C2: UseLoopPredicate off should also turn UseProfiledLoopPredicate off [v6] In-Reply-To: References:

Message-ID: On Fri, 28 Mar 2025 09:09:59 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> When running with `-XX:-UseLoopPredicate` C2 still inserts profiled loop parse predicates, despite those being a form of loop parse predicate. Further, the loop predicate code is not always consistent when to insert/expect profiled parse predicates. >> >> # Change Summary >> >> Following the rationale, that profiled predicates are a subset of loop predicates, this PR disables profiled predicates whenever loop predicates are disabled. They are disabled on the level of arguments. Further, before any checks for whether profiled predicates are enabled, this PR inserts a check that loop predicates are enabled such that the code is consistent in its intention. >> >> Concretel, this PR >> - adds parse predicate nodes to the IR testing framework, >> - turns off `UseProfiledLoopPredicate` if `UseLoopPredicate` is turned off, >> - predicates all checks for `UseProfiledLoopPredicate`on `UseLoopPredicate` first for consistency, >> - adds a regression test. >> >> >> # Testing >> >> The changes passed the following testing: >> - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14078750038) >> - tier1 through tier3 and Oracle internal testing > > Manuel H?ssig has updated the pull request incrementally with two additional commits since the last revision: > > - idealKit::loop: always call add_parse_predicates > > It was contstrained on UseParsePredicate, but this is incorrect, since > all parse predicates are added in that function. > - Improve description of UseLoopPredicate argument Marked as reviewed by chagedorn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24248#pullrequestreview-2731900566 From chagedorn at openjdk.org Tue Apr 1 08:35:27 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 1 Apr 2025 08:35:27 GMT Subject: RFR: 8341976: C2: use_mem_state != load->find_exact_control(load->in(0)) assert failure [v7] In-Reply-To: References:

<9cGlvzZnXc8B5tNxXSE2Eqi2FDJzP26U7c-yan4ZdCc=.3f6b0821-8b6c-453d-87ee-91205cc6627a@github.com> Message-ID: On Tue, 1 Apr 2025 07:30:47 GMT, Roland Westrelin wrote: >>> So maybe, we could treat that Opaque node the way we do for OpaqueZeroTripGuard and have it constant fold when the backedge is never taken. >> >> Right, that sounds like a good solution. >> >>> So I should revert the change to the IdealLoopTree::dump_head() and the test run with TraceLoopOpts? >> >> Yes, that would be great. We can make a comment in [JDK-8297752](https://bugs.openjdk.org/browse/JDK-8297752) to add `-XX:+TraceLoopOpts` as additional run to this test when we fix it. > > Done. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23465#discussion_r2022403189 From chagedorn at openjdk.org Tue Apr 1 08:53:32 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 1 Apr 2025 08:53:32 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v7] In-Reply-To: <09Q1vDaTXq3VlLU4xxQl_E7wDM2FT7tqR_Bc8ky8RNc=.4e11f2f8-75c3-49a1-b0b3-20eac17c4b39@github.com> References:

<09Q1vDaTXq3VlLU4xxQl_E7wDM2FT7tqR_Bc8ky8RNc=.4e11f2f8-75c3-49a1-b0b3-20eac17c4b39@github.com> Message-ID: On Tue, 1 Apr 2025 08:03:51 GMT, Roland Westrelin wrote: >> src/hotspot/share/opto/node.cpp line 3096: >> >>> 3094: // paths. The dead paths are then replaced by a Halt node. >>> 3095: void TypeNode::make_paths_from_here_dead(PhaseIterGVN* igvn, PhaseIdealLoop* loop, const char* phase_str) { >>> 3096: Unique_Node_List wq; >> >> Should there be a `ResourceMark` here? > > The callers have the `ResourceMark`. This is because it's code I extracted from 8275202: I think it used to not be safe to call `PhaseIdealLoop::register_new_node` from within the `ResourceMark` but I see there were changes in that area (data structures used by `PhaseIdealLoop` no longer allocated in the resource area). So it looks like it could be changed now. I assume that JDK-8275202 also calls this method with a non-null `PhaseIdealLoop` pointer? Now we only pass in null, so the `loop` parameter could be removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23468#discussion_r2022418989 From chagedorn at openjdk.org Tue Apr 1 08:53:33 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 1 Apr 2025 08:53:33 GMT Subject: RFR: 8349479: C2: when a Type node becomes dead, make CFG path that uses it unreachable [v7] In-Reply-To: References:

Message-ID: On Tue, 1 Apr 2025 07:06:32 GMT, Emanuel Peter wrote: >> We should extend the functionality of Verify.checkEQ: >> - Allow different NaN encodings to be seen as equal (by default). >> - Compare VectorAPI vectors. >> - Compare Exceptions, and their messages. >> - Compare arbitrary Objects via Reflection. >> >> Note: this is a prerequisite for the Template Library [JDK-8352861](https://bugs.openjdk.org/browse/JDK-8352861) / https://github.com/openjdk/jdk/pull/23418. > > Emanuel Peter has updated the pull request incrementally with two additional commits since the last revision: > > - upate copyright > - Apply suggestions from code review > > Co-authored-by: Christian Hagedorn I'll have a closer look at the code later again :-) ------------- PR Review: https://git.openjdk.org/jdk/pull/24224#pullrequestreview-2732020875 From chagedorn at openjdk.org Tue Apr 1 09:14:43 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 1 Apr 2025 09:14:43 GMT Subject: RFR: 8352869: Verify.checkEQ: extension for NaN, VectorAPI and arbitrary Objects [v5] In-Reply-To: References:

Message-ID: On Tue, 1 Apr 2025 07:02:11 GMT, Emanuel Peter wrote: >> test/hotspot/jtreg/compiler/lib/verify/Verify.java line 209: >> >>> 207: print(a, b, field, aParent, bParent); >>> 208: throw new VerifyException("Object type not supported: " + ca.getName() + " -- did you mean to 'enableCheckWithArbitraryClasses'?"); >>> 209: } >> >> What's the reason behind throwing instead of just comparing two arbitrary objects by default? If a user calls `Verify.checkEQ()` and sees this exception, I would guess he then just passes the additional option and we have the same result. But maybe I'm missing something. > > Good question. I think my reasoning was that comparing arbitrary classes requires reflection. And that is rather slow. So by default it would be good if that feature is not enabled, so the user tries to avoid it, and is aware when they enable it explicitly. > > But if you think that is not useful, I can remove the feature. > > @chhagedorn what do you think? I think the intention to let the user double check is good. I'm not sure though if the user is really aware of the potential slow down without diving deeper into the implementation. All they know is that `checkEQ` somehow does not support their some objects but there is a simple workaround to still use it. So, the real question is: How many users will then consider doing something different when facing this exception and not just enable it anyway? I guess enabling is probably the most natural thing to do. Given that, I would probably just drop this. It would also simplify the API usage in the following way: We would only have checks with NaNs being all equals and comparing raw bits (i.e. NaNs not equal). Then you could offer `checkEQ()` (default) and `checkRawBitsEQ()` or something like that. Then users do not need to worry about creating and passing in an `Options`. What do you think about these suggestions? What we could do either way at the `checkEQ()` API method: Describe the potential slow down with reflection when not using certain classes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24224#discussion_r2022461294 From chagedorn at openjdk.org Tue Apr 1 10:09:15 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 1 Apr 2025 10:09:15 GMT Subject: RFR: 8350563: C2 compilation fails because PhaseCCP does not reach a fixpoint [v7] In-Reply-To: References:

Message-ID: On Mon, 31 Mar 2025 16:08:57 GMT, Liam Miller-Cushon wrote: >> Hello, please consider this fix for [JDK-8350563](https://bugs.openjdk.org/browse/JDK-8350563) contributed by my colleague Matthias Ernst. >> >> https://github.com/openjdk/jdk/pull/22856 introduced a new `Value()` optimization for the pattern `AndIL(Con, Mask)`. >> This optimization can look through CastNodes, and therefore requires additional logic in CCP to push these >> transitive uses to the worklist. >> >> The optimization is closely related to analogous optimizations for SHIFT nodes, and we also extend the existing logic for >> CCP worklist handling: the current logic is "if the shift input to a SHIFT node changes, push indirect AND node uses to the CCP worklist". >> We extend it by adding "if the (new) type of a node is an IntegerType that `is_con, ...` to the predicate. > > Liam Miller-Cushon has updated the pull request incrementally with one additional commit since the last revision: > > Explicitly check for OP_Con instead of TypeInteger::is_con. > > 322 Phi === 303 119 255 [[ 399 388 351 751 366 377 ]] #int:-256..127 !jvms: Integer::parseInt @ bci:151 (line 625) > > While this Phi dumps as "#int:-256..127", `phase->type(expr)` returns a type that is_con -256. Thanks Matthias for having a look at the issue and proposing a fix! While this fix seems to work, I think we should address it slightly differently with an explicit bailout, though. Let's step back a bit: CCP first sets all types to top and then tries to widen them (i.e. an optimistic approach) while IGVN does the opposite: We start by setting all types to bottom and then try to narrow them (i.e. a pessimistic approach). The assert we've faced in CCP complains that we tried to narrow some type again which is against the rules of CCP - we can only widen types. Now when CCP runs, we start with every type of every node at top. When visiting `AndI` at some point, we see what you reported above: > What I observe for the Integer.parseInt reproducer is that expr dumps as a phi node with type #int:-256...127, but phase->type(expr) returns a type that is_con() with value -256. That is perfectly fine. What happened here is that only one input of the phi with type `#int:-256` is non-top. The other inputs are still top (i.e. not processed in CCP, yet). Therefore, the phi's type is set to `#int:-256`. Note that the `TypeNode::_type` field of the phi is still set to the type we had before CCP, i.e. ` #int:-256...127` . In CCP, we use `PhaseValues::_types` which are set to top in the beginning and we leave `TypeNode::_type` unchanged during the analysis. As a consequence this can happen when having a phi and only looking at the currently tracked CCP types: > In consequence, the AND(phi-node, mask) gets optimized to zero. Let's look at the output of the failure: 304 ConI === 0 [[ 506 ]] #int:255 996 CastII === 461 453 [[ 557 546 535 524 1034 506 ]] #int:-256..127 extra types: {0:int:-256} strong dependency !orig=[478] !jvms: Integer::parseInt @ bci:144 (line 550) 506 AndI === _ 996 304 [[ 507 ]] !jvms: Integer::parseInt @ bci:170 (line 552) told = int:0 tnew = top it looks like we first optimized `AndI` to zero (i.e. `told`) and then set it to top again in a later `Value()` call in CCP (i.e. `tnew`). This is a violation of the rules for CCP. When we suddenly see top again, it suggests that we prematurely applied an optimization while one of the involved inputs was actually still top. This looks wrong and we should have waited until all the involved inputs are non-top. When looking at the code, we check that `mask` is an integer type and thus non-top here: https://github.com/openjdk/jdk/blob/f25f701652900d02858c905f4cd0bb43208c13d5/src/hotspot/share/opto/mulnode.cpp#L2255-L2260 But it looks like we miss that for `expr` when it is a cast node (which is `996 CastII` in the failing test). We pass `expr` to `AndIL_min_trailing_zeros()` and then uncast it and only then check if it is a proper integer type: https://github.com/openjdk/jdk/blob/f25f701652900d02858c905f4cd0bb43208c13d5/src/hotspot/share/opto/mulnode.cpp#L2180-L2185 So, if the type of `996 CastII` in CCP is still top, we skip it with `uncast()` and then check the phi above which has first the constant type `#int:-256`. We can apply the optimization to return type zero. When later updating the type of the phi to `#int:-256...127`, we can no longer apply the optimization and fall back to `MulNode::Value()` where we return top because the input `996 CastII` is still top: https://github.com/openjdk/jdk/blob/f25f701652900d02858c905f4cd0bb43208c13d5/src/hotspot/share/opto/mulnode.cpp#L185-L187 We find top which is narrower than type zero and we fail with the assert. Long story short, you should check for `expr` being top before uncasting it. This was hard to see and is only a problem in CCP. I suggest to add the small reproducer as additional test case. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23871#issuecomment-2768852982 From thartmann at openjdk.org Tue Apr 1 11:42:17 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 1 Apr 2025 11:42:17 GMT Subject: RFR: 8352893: C2: OrL/INode::add_ring optimize (x | -1) to -1 [v3] In-Reply-To: References:

Message-ID: On Mon, 31 Mar 2025 14:31:43 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> The `add_ring()` implementations of `OrINode` and `OrLNode` are missing the optimization that an or with a value where all bits are ones (since we have signed integers in this case `~0 == -1`) will always yield all zeroes. >> >> # Changes >> >> This PR makes the following straight forward changes: >> - `Or(I|L)Node::add_ring()` returns `-1` if one of the two inputs is `-1`. >> - Add `Or(I|L)` nodes to the IR framework. >> - Add a regression IR test for the implemented optimization. >> >> # Testing >> >> - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14110978686) >> - Ran tier1 through tier3 and Oracle internal testing > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Remove loop in test and instead use random values Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24289#pullrequestreview-2732401043 From roland at openjdk.org Tue Apr 1 12:50:14 2025 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 1 Apr 2025 12:50:14 GMT Subject: RFR: 8352418: Add verification code to check that the associated loop nodes of useless Template Assertion Predicates are dead In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 12:28:59 GMT, Christian Hagedorn wrote: > As already suggested in https://github.com/openjdk/jdk/pull/23823, I want to do the following additional verification: > > After `eliminate_useless_predicates()` all now useless `OpaqueTemplateAssertionPredicate` nodes should not have any references to `CountedLoop` nodes that are still in the graph (otherwise, they would have been marked useful). This verification did not work reliably without the full Assertion Predicates fix [JDK-8350577](https://bugs.openjdk.org/browse/JDK-8350577). Since JDK-8350577 is now integrated, I propose to add this additional verification code. > > Thanks, > Christian Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24326#pullrequestreview-2732570055 From chagedorn at openjdk.org Tue Apr 1 12:50:14 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 1 Apr 2025 12:50:14 GMT Subject: RFR: 8352418: Add verification code to check that the associated loop nodes of useless Template Assertion Predicates are dead In-Reply-To: References: Message-ID: <7r2XMglIgMjvCYaPfESV79PvYsGTo8vojzPadFN-Hu4=.4d2e576e-fb9b-4dd0-add4-a60248fa03f5@github.com> On Mon, 31 Mar 2025 12:28:59 GMT, Christian Hagedorn wrote: > As already suggested in https://github.com/openjdk/jdk/pull/23823, I want to do the following additional verification: > > After `eliminate_useless_predicates()` all now useless `OpaqueTemplateAssertionPredicate` nodes should not have any references to `CountedLoop` nodes that are still in the graph (otherwise, they would have been marked useful). This verification did not work reliably without the full Assertion Predicates fix [JDK-8350577](https://bugs.openjdk.org/browse/JDK-8350577). Since JDK-8350577 is now integrated, I propose to add this additional verification code. > > Thanks, > Christian Thanks Roland for your review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24326#issuecomment-2769245625 From mchevalier at openjdk.org Tue Apr 1 13:04:23 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Tue, 1 Apr 2025 13:04:23 GMT Subject: RFR: 8348887: Create IR framework test for JDK-8347997 In-Reply-To: References:

Message-ID: On Tue, 1 Apr 2025 04:52:53 GMT, Galder Zamarre?o wrote: >> As the ticket says: >>> Create IR framework test which checks that allocations are eliminated in the regression test included in [JDK-8347997](https://bugs.openjdk.org/browse/JDK-8347997) fix. >> >> So here it is! We can see that in case of inlining, indeed, no allocation happens. The second part is some sanity check to emphasize the difference: of course, there is an allocation without inlining. The benefit of this second part is arguable. From my point of view, it's mostly to point out the difference to a future reader. But yes, there is nothing very surprising. >> >> Thanks, >> Marc > > test/hotspot/jtreg/compiler/c2/irTests/TestContinuationPinningAndEA.java line 118: > >> 116: >> 117: @DontInline >> 118: public CrashesNoInline() throws Throwable { > > It's probably my own ignorance, but just in case are others are in the same boat, why does this crash? Could you add a brief javadoc for future readers? Same with other Crashes cases. It's rather bad (uninspired) naming. I based this test on the test introduced by [JDK-8347997](https://bugs.openjdk.org/browse/JDK-8347997), which (I suspect) is based on the reproducer mentioned in JBS. There are 2 cases: one made EA crash, the other make it fail (not detect the non escaping, as far as I understand). From Vladimir's comment on PR 23284, it used to crash because of a corrupted memory graph. Honestly, I'm not quite clear on that. There is already a test (from said ticket and PR) making sure it doesn't crash. The point of the test I'm adding is to check that the allocation is gone (thanks to EA). Maybe the best is rather to rename the cases "Crashes" and "FailEA": it made sense in the context of the original bug, but it's not very useful names for the future. But I'm not sure what would be fitting. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24328#discussion_r2022810119 From dfenacci at openjdk.org Tue Apr 1 13:19:26 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Tue, 1 Apr 2025 13:19:26 GMT Subject: RFR: 8341976: C2: use_mem_state != load->find_exact_control(load->in(0)) assert failure [v7] In-Reply-To: References:

Message-ID: <-2HR8vsW5xGAbW5EviewkowFNsq-HH51yjwWA9uLC5g=.6c02442c-2e34-41e8-a808-10ab3c52eefc@github.com> On Tue, 1 Apr 2025 07:31:12 GMT, Roland Westrelin wrote: >> The `arraycopy` writes to a non escaping array so its `ArrayCopy` node >> is marked as having a narrow memory effect. One of the loads from the >> destination after the copy is transformed into a load from the source >> array (the rationale being that if there's no load from the >> destination of the copy, the `arraycopy` is not needed). The load from >> the source has the input memory state of the `ArrayCopy` as memory >> input. That load is then sunk out of the loop and its control is >> updated to be after the `ArrayCopy`. That's legal because the >> `ArrayCopy` only has a narrow memory effect and can't modify the >> source. The `ArrayCopy` can't be eliminated and is expanded. In the >> process, a `MemBar` that has a wide memory effect is added. The load >> from the source has control after the membar but memory state before >> and because the membar has a wide memory effect, the load is anti >> dependent on the membar: the graph is broken (the load can't be pinned >> after the membar and anti dependent on it). >> >> In short, the problem is that the graph is transformed under the >> assumption that the `ArrayCopy` has a narrow effect but the >> `ArrayCopy` is expanded to a subgraph that has a wide memory >> effect. The fix I propose is to not insert a membar with a wide memory >> effect. We still need a membar when the destination is non escaping >> because the expanded `ArrayCopy`, if it writes to a tighly allocated >> array, writes to raw memory and not to the destination memory slice. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: > > - review > - Merge branch 'master' into JDK-8341976 > - review > - review > - Merge branch 'master' into JDK-8341976 > - -XX:+TraceLoopOpts fix > - review > - more > - Merge branch 'master' into JDK-8341976 > - more > - ... and 6 more: https://git.openjdk.org/jdk/compare/28e6ceb4...9b21648d Looks good to me. Thanks @rwestrel. ------------- Marked as reviewed by dfenacci (Committer). PR Review: https://git.openjdk.org/jdk/pull/23465#pullrequestreview-2732658616 From dlunden at openjdk.org Tue Apr 1 14:19:10 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Tue, 1 Apr 2025 14:19:10 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v12] In-Reply-To: References: Message-ID: > If a method has a large number of parameters, we currently bail out from C2 compilation. > > ### Changeset > > Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. > > Changes: > - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. > - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. > - Remove all `can_represent` checks and bailouts. > - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. > - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. > > ### Testing > > - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) > - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. > - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. > - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, not worth it). > > ![c2-regression](https:/... Daniel Lund?n has updated the pull request incrementally with two additional commits since the last revision: - Formatting updates - Add register mask fuzzer test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20404/files - new: https://git.openjdk.org/jdk/pull/20404/files/fbfddb29..5be718e8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20404&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20404&range=10-11 Stats: 324 lines in 2 files changed: 324 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20404.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20404/head:pull/20404 PR: https://git.openjdk.org/jdk/pull/20404 From dlunden at openjdk.org Tue Apr 1 14:19:11 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Tue, 1 Apr 2025 14:19:11 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v11] In-Reply-To: <0Yf6qZwnLz7oAtSFscDwHifQAmaPuHzeSrpkqMVchDU=.c7a5e8af-9390-414b-850c-609110668eac@github.com> References: <0Yf6qZwnLz7oAtSFscDwHifQAmaPuHzeSrpkqMVchDU=.c7a5e8af-9390-414b-850c-609110668eac@github.com> Message-ID: On Mon, 24 Mar 2025 15:33:34 GMT, Daniel Lund?n wrote: >> If a method has a large number of parameters, we currently bail out from C2 compilation. >> >> ### Changeset >> >> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. >> >> Changes: >> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. >> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. >> - Remove all `can_represent` checks and bailouts. >> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. >> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. >> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, no... > > Daniel Lund?n has updated the pull request incrementally with one additional commit since the last revision: > > Extend example with offset register mask > As we discussed offline, the test coverage of register mask operations with extended dynamic parts, non-zero offsets, etc. is fairly low (basically limited to the new JTReg tests included in this changeset). To increase coverage, I have extended `test_regmask.cpp` with tests that perform random operations on a register mask and on a reference bit set and check that the result is equivalent on both data structures. Here is the extension: [4ee703f](https://github.com/openjdk/jdk/commit/4ee703f1ab73f8f43d4603d7fa88dcc8f4950ec0). I ran the random tests a few times on different platforms and could not find any failure, which gives a good confidence of the correctness of the register mask operation changes. I also tested the effectiveness of the tests themselves by injecting a few failures in the register mask implementation and confirming their detection. Feel free to include the test extensions in this changeset (you might want to go through the code and clean it up a bit before, thoug h, things like e.g. naming consistency). I've now reviewed the register mask fuzzer tests and found no errors. Looks good! I applied some code formatting, though. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20404#issuecomment-2769520020 From dlunden at openjdk.org Tue Apr 1 14:36:38 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Tue, 1 Apr 2025 14:36:38 GMT Subject: RFR: 8282053: IGV: refine schedule approximation In-Reply-To: References: Message-ID: <0dg9XeqluKkZEUgPNJEzwuCUHiG36RaZvr9GggckWQ4=.1efe129b-865f-41c0-92ac-27b91f055f5a@github.com> On Tue, 1 Apr 2025 07:23:04 GMT, Daniel Skantz wrote: > This patch refines the schedule approximation in IGV by 1) placing parm. and projection nodes in the same block as their predecessors, and 2) disallows erroneously considering machine nodes such as prefetchAlloc and rep_stos as CFG nodes. > > The reader may refer to the corresponding JBS issue where graphs sampled before and after the change are attached. > > Testing: T1-T3 with no failures. Opened graphs before and after the change and saw no obvious problems. Opened a large number of graphs in CFG view and observed no unexpected IGV warnings, errors or assert failures. Good CFG scheduling approximation improvement! Just one style suggestion. src/utils/IdealGraphVisualizer/ServerCompiler/src/main/java/com/sun/hotspot/igv/servercompiler/ServerCompilerScheduler.java line 800: > 798: n.isCFG = true; > 799: } else if (n.inputNode.getProperties().get("type").equals("bottom") > 800: && n.preds.size() > 0 && Suggestion: } else if (n.inputNode.getProperties().get("type").equals("bottom") && n.preds.size() > 0 && For consistent placement of `&&` (already a problem before this changeset, but might as well fix now) ------------- Marked as reviewed by dlunden (Committer). PR Review: https://git.openjdk.org/jdk/pull/24350#pullrequestreview-2732918158 PR Review Comment: https://git.openjdk.org/jdk/pull/24350#discussion_r2022983828 From dskantz at openjdk.org Tue Apr 1 14:42:47 2025 From: dskantz at openjdk.org (Daniel Skantz) Date: Tue, 1 Apr 2025 14:42:47 GMT Subject: RFR: 8282053: IGV: refine schedule approximation [v2] In-Reply-To: References: Message-ID: <0SXZ0k-28IdpWjuhtK4dSJ9ybHE58Oq56zT_sdqeQpc=.0cf9319f-f64d-4e8f-82e9-f654464bc775@github.com> > This patch refines the schedule approximation in IGV by 1) placing parm. and projection nodes in the same block as their predecessors, and 2) disallows erroneously considering machine nodes such as prefetchAlloc and rep_stos as CFG nodes. > > The reader may refer to the corresponding JBS issue where graphs sampled before and after the change are attached. > > Testing: T1-T3 with no failures. Opened graphs before and after the change and saw no obvious problems. Opened a large number of graphs in CFG view and observed no unexpected IGV warnings, errors or assert failures. Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: Update src/utils/IdealGraphVisualizer/ServerCompiler/src/main/java/com/sun/hotspot/igv/servercompiler/ServerCompilerScheduler.java Co-authored-by: Daniel Lund?n ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24350/files - new: https://git.openjdk.org/jdk/pull/24350/files/52667ad5..57ad6dc8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24350&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24350&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24350.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24350/head:pull/24350 PR: https://git.openjdk.org/jdk/pull/24350 From dlunden at openjdk.org Tue Apr 1 14:46:25 2025 From: dlunden at openjdk.org (Daniel =?UTF-8?B?THVuZMOpbg==?=) Date: Tue, 1 Apr 2025 14:46:25 GMT Subject: RFR: 8282053: IGV: refine schedule approximation [v2] In-Reply-To: <0SXZ0k-28IdpWjuhtK4dSJ9ybHE58Oq56zT_sdqeQpc=.0cf9319f-f64d-4e8f-82e9-f654464bc775@github.com> References: <0SXZ0k-28IdpWjuhtK4dSJ9ybHE58Oq56zT_sdqeQpc=.0cf9319f-f64d-4e8f-82e9-f654464bc775@github.com> Message-ID: On Tue, 1 Apr 2025 14:42:47 GMT, Daniel Skantz wrote: >> This patch refines the schedule approximation in IGV by 1) placing parm. and projection nodes in the same block as their predecessors, and 2) disallows erroneously considering machine nodes such as prefetchAlloc and rep_stos as CFG nodes. >> >> The reader may refer to the corresponding JBS issue where graphs sampled before and after the change are attached. >> >> Testing: T1-T3 with no failures. Opened graphs before and after the change and saw no obvious problems. Opened a large number of graphs in CFG view and observed no unexpected IGV warnings, errors or assert failures. > > Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: > > Update src/utils/IdealGraphVisualizer/ServerCompiler/src/main/java/com/sun/hotspot/igv/servercompiler/ServerCompilerScheduler.java > > Co-authored-by: Daniel Lund?n Marked as reviewed by dlunden (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24350#pullrequestreview-2732961006 From chagedorn at openjdk.org Tue Apr 1 15:41:48 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 1 Apr 2025 15:41:48 GMT Subject: RFR: 8334046: Set different values for CompLevel_any and CompLevel_all [v2] In-Reply-To: References:

Message-ID: <8atBFgfznyYBW1gmJE9Brk9yoiWYXL1ts6Wr5t_KqZA=.d25be79a-3730-449c-9552-7d42ffb68d50@github.com> On Mon, 31 Mar 2025 03:43:03 GMT, Cesar Soares Lucas wrote: >> Please review this trivial patch to set different values for CompLevel_any and CompLevel_all. >> Setting different values for these fields make the implementation of [this other issue](https://bugs.openjdk.org/browse/JDK-8313713) much cleaner/easier. >> Tested on OSX/Linux Aarch64/x86_64 with JTREG. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Fix WhiteBox constants. Just to let you know, Vladimir is out this week. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24298#issuecomment-2769791125 From chagedorn at openjdk.org Tue Apr 1 15:44:36 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 1 Apr 2025 15:44:36 GMT Subject: RFR: 8350852: Implement JMH benchmark for sparse CodeCache [v3] In-Reply-To: References:

Message-ID: On Fri, 28 Mar 2025 20:20:41 GMT, Evgeny Astigeevich wrote: >> This benchmark is used to check performance impact of the code cache being sparse. >> >> We use C2 compiler to compile the same Java method multiple times to produce as many code as needed. The Java method is not trivial. It adds two 40 digit positive integers. These compiled methods represent the active methods in the code cache. We split active methods into groups. We put a group into a fixed size code region. We make a code region aligned by its size. CodeCache becomes sparse when code regions are not fully filled. We measure the time taken to call all active methods. >> >> Results: code region size 2M (2097152) bytes >> - Intel Xeon Platinum 8259CL >> >> |activeMethodCount |groupCount |Methods/Group |Score |Error |Units |Diff | >> |--- |--- |--- |--- |--- |--- |--- | >> |128 |1 |128 |19.577 |0.619 |us/op | | >> |128 |32 |4 |22.968 |0.314 |us/op |17.30% | >> |128 |48 |3 |22.245 |0.388 |us/op |13.60% | >> |128 |64 |2 |23.874 |0.84 |us/op |21.90% | >> |128 |80 |2 |23.786 |0.231 |us/op |21.50% | >> |128 |96 |1 |26.224 |1.16 |us/op |34% | >> |128 |112 |1 |27.028 |0.461 |us/op |38.10% | >> |256 |1 |256 |47.43 |1.146 |us/op | | >> |256 |32 |8 |63.962 |1.671 |us/op |34.90% | >> |256 |48 |5 |63.396 |0.247 |us/op |33.70% | >> |256 |64 |4 |66.604 |2.286 |us/op |40.40% | >> |256 |80 |3 |59.746 |1.273 |us/op |26% | >> |256 |96 |3 |63.836 |1.034 |us/op |34.60% | >> |256 |112 |2 |63.538 |1.814 |us/op |34% | >> |512 |1 |512 |172.731 |4.409 |us/op | | >> |512 |32 |16 |206.772 |6.229 |us/op |19.70% | >> |512 |48 |11 |215.275 |2.228 |us/op |24.60% | >> |512 |64 |8 |212.962 |2.028 |us/op |23.30% | >> |512 |80 |6 |201.335 |12.519 |us/op |16.60% | >> |512 |96 |5 |198.133 |6.502 |us/op |14.70% | >> |512 |112 |5 |193.739 |3.812 |us/op |12.20% | >> |768 |1 |768 |325.154 |5.048 |us/op | | >> |768 |32 |24 |346.298 |20.196 |us/op |6.50% | >> |768 |48 |16 |350.746 |2.931 |us/op |7.90% | >> |768 |64 |12 |339.445 |7.927 |us/op |4.40% | >> |768 |80 |10 |347.408 |7.355 |us/op |6.80% | >> |768 |96 |8 |340.983 |3.578 |us/op |4.90% | >> |768 |112 |7 |353.949 |2.98 |us/op |8.90% | >> |1024 |1 |1024 |368.352 |5.961 |us/op | | >> |1024 |32 |32 |463.822 |6.274 |us/op |25.90% | >> |1024 |48 |21 |457.674 |15.144 |us/op |24.20% | >> |1024 |64 |16 |477.694 |0.986 |us/op |29.70% | >> |1024 |80 |13 |484.901 |32.601 |us/op |31.60% | >> |1024 |96 |11 |480.8 |27.088 |us/op |30.50% | >> |1024 |112 |9 |474.416 |10.053 |us/op |28.80% | >> >> - AArch64 Neoverse N1 >> >> |activeMethodCount |groupCount |Methods/Group |Score |Error |Units |Diff |... > > Evgeny Astigeevich has updated the pull request incrementally with two additional commits since the last revision: > > - Document assumptions about code placement in CodeCache > - Address bulasevich comment: too many parameters values Just to let you know, Vladimir is out this week. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23831#issuecomment-2769803651 From rcastanedalo at openjdk.org Tue Apr 1 15:59:20 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 1 Apr 2025 15:59:20 GMT Subject: RFR: 8282053: IGV: refine schedule approximation [v2] In-Reply-To: <0SXZ0k-28IdpWjuhtK4dSJ9ybHE58Oq56zT_sdqeQpc=.0cf9319f-f64d-4e8f-82e9-f654464bc775@github.com> References: <0SXZ0k-28IdpWjuhtK4dSJ9ybHE58Oq56zT_sdqeQpc=.0cf9319f-f64d-4e8f-82e9-f654464bc775@github.com> Message-ID: On Tue, 1 Apr 2025 14:42:47 GMT, Daniel Skantz wrote: >> This patch refines the schedule approximation in IGV by 1) placing parm. and projection nodes in the same block as their predecessors, and 2) disallows erroneously considering machine nodes such as prefetchAlloc and rep_stos as CFG nodes. >> >> The reader may refer to the corresponding JBS issue where graphs sampled before and after the change are attached. >> >> Testing: T1-T3 with no failures. Opened graphs before and after the change and saw no obvious problems. Opened a large number of graphs in CFG view and observed no unexpected IGV warnings, errors or assert failures. > > Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: > > Update src/utils/IdealGraphVisualizer/ServerCompiler/src/main/java/com/sun/hotspot/igv/servercompiler/ServerCompilerScheduler.java > > Co-authored-by: Daniel Lund?n Marked as reviewed by rcastanedalo (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24350#pullrequestreview-2733217668 From epeter at openjdk.org Tue Apr 1 16:06:22 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 1 Apr 2025 16:06:22 GMT Subject: RFR: 8252473: [TESTBUG] compiler tests fail with minimal VM: Unrecognized VM option In-Reply-To: References:

Message-ID: On Wed, 26 Mar 2025 17:40:35 GMT, Zdenek Zambersky wrote: >> This adds `@requires vm.compiler2.enabled` to tests, which fail with `Unrecognized VM option` on client VM. > > Attached file which shows unrecognized VM options for individual tests. > [unrecognized-options.txt](https://github.com/user-attachments/files/19472912/unrecognized-options.txt) @zzambers Generally we want to get away from `@requires vm.compiler2.enabled`, because it means tests are only run on C2 and not other compilers. For example if C2 is disabled and we only have C1. Or only interpreter. Or Graal ... Why not just add the compile flag `-XX:-IgnoreUnrecognizedVMOptions`? That could be a good alternative for most cases, I think. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24262#issuecomment-2769867784 From zzambers at openjdk.org Tue Apr 1 16:16:26 2025 From: zzambers at openjdk.org (Zdenek Zambersky) Date: Tue, 1 Apr 2025 16:16:26 GMT Subject: RFR: 8252473: [TESTBUG] compiler tests fail with minimal VM: Unrecognized VM option In-Reply-To: References:

Message-ID: On Wed, 26 Mar 2025 17:49:33 GMT, Aleksey Shipilev wrote: >> This adds `@requires vm.compiler2.enabled` to tests, which fail with `Unrecognized VM option` on client VM. > > test/hotspot/jtreg/compiler/arraycopy/TestCloneWithStressReflectiveCode.java line 28: > >> 26: * @bug 8284951 >> 27: * @summary Test clone intrinsic with StressReflectiveCode. >> 28: * @requires vm.compiler2.enabled & vm.debug > > Drive-by comment: multiple `@requires` get AND-ed automatically, so you can just drop a new line with `@requires vm.compiler2.enabled`, and it will still work. I used `@requires` on separate line in cases, where resulting line would be too long (or too messy), but I can use separate line everywhere. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24262#discussion_r2023170812 From jbhateja at openjdk.org Tue Apr 1 16:17:22 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 1 Apr 2025 16:17:22 GMT Subject: RFR: 8352585: Add special case handling for Float16.max/min x86 backend [v7] In-Reply-To: References: Message-ID: > This bugfix patch adds the special handling as per x86 AVX512-FP16 ISA specification[1][2] to compute max/min operations with +/-0.0 or NaN operands. > > Special handling leverage the instruction semantic, central idea is to shuffle the operands such that smaller input gets assigned to second operand for min operation or a larger input gets assigned to second operand for max operation, in addition result equals NaN if an unordered comparison detects first input as a NaN value else we return the result of min/max operation. > > Kindly review and share your feedback. > > Best Regards, > Jatin > > [1] https://www.felixcloutier.com/x86/vminsh > [2] https://www.felixcloutier.com/x86/vmaxsh Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolution ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24169/files - new: https://git.openjdk.org/jdk/pull/24169/files/e2faec77..1713057d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24169&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24169&range=05-06 Stats: 3 lines in 1 file changed: 0 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/24169.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24169/head:pull/24169 PR: https://git.openjdk.org/jdk/pull/24169 From jbhateja at openjdk.org Tue Apr 1 16:17:22 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 1 Apr 2025 16:17:22 GMT Subject: RFR: 8352585: Add special case handling for Float16.max/min x86 backend [v6] In-Reply-To: References:

<4pVsbXILQQgsiSnldLRVf1fziUMF6PrqkEnr81RoFMg=.a79353fd-5dc2-4c64-8958-01cbc0557618@github.com>

Message-ID: On Fri, 28 Mar 2025 22:14:23 GMT, Sandhya Viswanathan wrote: >> Basically assert if one is NaN and other is not. > > On further thought what you have also works. Though we could simplify the assertionCheck method to just one statement: > public static boolean assertionCheck(Float16 actual, Float16 expected) { > return !actual.equals(expected); > } > This is because, the equals method takes care of NaNs. The [equals](https://docs.oracle.com/en/java/javase/24/docs/api/java.base/java/lang/Double.html#equals(java.lang.Object)) uses [representation equivalence](https://docs.oracle.com/en/java/javase/24/docs/api/java.base/java/lang/Double.html#repEquivalence), defining NaN arguments to be equal to each other. DONE ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24169#discussion_r2023172099 From zzambers at openjdk.org Tue Apr 1 16:20:53 2025 From: zzambers at openjdk.org (Zdenek Zambersky) Date: Tue, 1 Apr 2025 16:20:53 GMT Subject: RFR: 8252473: [TESTBUG] compiler tests fail with minimal VM: Unrecognized VM option In-Reply-To: References:

Message-ID: On Wed, 26 Mar 2025 18:49:51 GMT, Vladimir Kozlov wrote: > Can we run some of them with Graal? When no C2 specific flags are used. Unfortunately I don't have experience with Graal. So I don't know how that would work. Does graal implement some C2-only flags? ------------- PR Comment: https://git.openjdk.org/jdk/pull/24262#issuecomment-2769902335 From zzambers at openjdk.org Tue Apr 1 16:28:23 2025 From: zzambers at openjdk.org (Zdenek Zambersky) Date: Tue, 1 Apr 2025 16:28:23 GMT Subject: RFR: 8252473: [TESTBUG] compiler tests fail with minimal VM: Unrecognized VM option In-Reply-To: References:

Message-ID: On Tue, 1 Apr 2025 16:03:40 GMT, Emanuel Peter wrote: > Why not just add the compile flag `-XX:-IgnoreUnrecognizedVMOptions`? That could be a good alternative for most cases, I think. I saw that approach sometimes used as well. (My little probably unfounded concern would be that typos in args could than be silently ignored.) I can change my PR to use `-XX:-IgnoreUnrecognizedVMOptions` instead, if that approach is preferable. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24262#issuecomment-2769917073 From rcastanedalo at openjdk.org Tue Apr 1 16:34:24 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 1 Apr 2025 16:34:24 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v12] In-Reply-To: References:

Message-ID: On Tue, 1 Apr 2025 14:19:10 GMT, Daniel Lund?n wrote: >> If a method has a large number of parameters, we currently bail out from C2 compilation. >> >> ### Changeset >> >> Allowing C2 compilation of methods with a large number of parameters requires fundamental changes to the register mask data structure, used in many places in C2. In particular, register masks currently have a statically determined size and cannot represent arbitrary numbers of stack slots. This is needed if we want to compile methods with arbitrary numbers of parameters. Register mask operations are present in performance-sensitive parts of C2, which further complicates changes. >> >> Changes: >> - Add functionality to dynamically grow/extend register masks. I experimented with a number of design choices to achieve this. To keep the common case (normal number of method parameters) quick and also to avoid more intrusive changes to the current `RegMask` interface, I decided to leave the "base" statically allocated memory for masks unchanged and only use dynamically allocated memory in the rare cases where it is needed. >> - Generalize the "chunk"-logic from `PhaseChaitin::Select()` to allow arbitrary-sized chunks, and also move most of the logic into register mask methods to separate concerns and to make the `PhaseChaitin::Select()` code more readable. >> - Remove all `can_represent` checks and bailouts. >> - Performance tuning. A particularly important change is the early-exit optimization in `RegMask::overlap`, used in the performance-sensitive method `PhaseChaitin::interfere_with_live`. >> - Add a new test case `TestManyMethodArguments.java` and extend an old test `TestNestedSynchronize.java`. >> >> ### Testing >> >> - [GitHub Actions](https://github.com/dlunde/jdk/actions/runs/10178060450) >> - `tier1` to `tier4` (and additional Oracle-internal testing) on Windows x64, Linux x64, Linux aarch64, macOS x64, and macOS aarch64. >> - Standard performance benchmarking. No observed conclusive overall performance degradation/improvement. >> - Specific benchmarking of C2 compilation time. The changes increase C2 compilation time by, approximately and on average, 1% for methods that could also be compiled before this changeset (see the figure below). The reason for the degradation is further checks required in performance-sensitive code (in particular `PhaseChaitin::remove_bound_register_from_interfering_live_ranges`). I have tried optimizing in various ways, but changes I found that lead to improvement also lead to less readable code (and are, in my opinion, no... > > Daniel Lund?n has updated the pull request incrementally with two additional commits since the last revision: > > - Formatting updates > - Add register mask fuzzer test I have gone through the entire changeset now and could not find any obvious functional issue, good job Daniel! src/hotspot/share/opto/chaitin.cpp line 1425: > 1423: // a physical register is found > 1424: if (OptoReg::is_reg(assigned)) { > 1425: assert(!lrg.mask().is_offset(), "sanity"); Suggestion: assert(!lrg.mask().is_offset(), "offset register masks can only contain stack slots"); src/hotspot/share/opto/chaitin.cpp line 1533: > 1531: // hesitation). > 1532: if (OptoReg::is_valid(reg2) && > 1533: OptoReg::is_reg(reg2 - lrg.mask().offset_bits())) { I agree that this was probably an oversight in the original code. For simplicity I suggest to replace the check with just `OptoReg::is_reg(reg2)` as you suggest, explicitly limiting the scope of the alternation heuristic to physical registers. I compared the overall effectiveness of post-allocation copy removal (as summarized by `-XX:+PrintOptoStatistics`) between this changeset and your proposed simplification and I cannot see any significant difference. I really wonder if the entire alternation heuristic really has any positive measurable effect, but that investigation belongs to another RFE. src/hotspot/share/opto/chaitin.cpp line 1591: > 1589: // will be a no-op. (Later on, if lrg runs out of possible colors in > 1590: // its chunk, a new chunk of color may be tried, in which case > 1591: // examination of neighbors is started again, at retry_next_chunk.) Doesn't the second part of the comment (`(Later on...)`) still apply after the changes? src/hotspot/share/opto/chaitin.cpp line 1655: > 1653: // Bump register mask up to next stack chunk > 1654: bool success = lrg->rollover(); > 1655: if (!success) { Was this scenario (running out of stack slots representable in `OptoRegPairs`) possible before, or was it prevented by some check removed in the changeset? Did you come across it in some compilation or is it more of a "theoretical" guard? src/hotspot/share/opto/chaitin.cpp line 1658: > 1656: // We should never get here in practice. Bail out in product, > 1657: // assert in debug. > 1658: assert(false, "should not happen"); Suggestion: assert(false, "the next available stack slots should be within the OptoRegPair range"); src/hotspot/share/opto/chaitin.cpp line 1660: > 1658: assert(false, "should not happen"); > 1659: C->record_method_not_compilable( > 1660: "chunk-rollover outside of OptoReg range"); Suggestion: "chunk-rollover outside of OptoRegPair range"); src/hotspot/share/opto/regmask.hpp line 282: > 280: _grow(src._rm_size, false); > 281: memcpy(_RM_UP_EXT, src._RM_UP_EXT, > 282: sizeof(uintptr_t) * (src._rm_size - _RM_SIZE)); This code is not very well covered by current tests, please consider adding some tests to `test_regmask.cpp` to exercise it. src/hotspot/share/opto/regmask.hpp line 293: > 291: _hwm = _rm_max(); > 292: } > 293: _set_range(src._rm_size, value, _rm_size - src._rm_size); This code is not very well covered by current tests, please consider adding some tests to `test_regmask.cpp` to exercise it. test/jdk/java/lang/invoke/BigArityTest.java line 32: > 30: * (1) have a large number of parameters, and > 31: * (2) use JSR292 methods internally (which increases the > 32: * MaxNodeLimit with a factor of 3) Just checking: these methods that cause C2 to consume an excessive amount of memory were not C2-compilable before the changeset, right? ------------- Changes requested by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20404#pullrequestreview-2733231312 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2023172642 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2023154419 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2023156355 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2023177582 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2023175078 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2023174027 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2023183358 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2023184495 PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2023195229 From rcastanedalo at openjdk.org Tue Apr 1 16:38:27 2025 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 1 Apr 2025 16:38:27 GMT Subject: RFR: 8325467: Support methods with many arguments in C2 [v12] In-Reply-To: References:

Message-ID: On Tue, 1 Apr 2025 16:28:35 GMT, Roberto Casta?eda Lozano wrote: >> Daniel Lund?n has updated the pull request incrementally with two additional commits since the last revision: >> >> - Formatting updates >> - Add register mask fuzzer test > > test/jdk/java/lang/invoke/BigArityTest.java line 32: > >> 30: * (1) have a large number of parameters, and >> 31: * (2) use JSR292 methods internally (which increases the >> 32: * MaxNodeLimit with a factor of 3) > > Just checking: these methods that cause C2 to consume an excessive amount of memory were not C2-compilable before the changeset, right? Same question for the other `java/lang/invoke` test changes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20404#discussion_r2023204041 From epeter at openjdk.org Tue Apr 1 16:39:30 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 1 Apr 2025 16:39:30 GMT Subject: RFR: 8252473: [TESTBUG] compiler tests fail with minimal VM: Unrecognized VM option In-Reply-To: References:

Message-ID: On Wed, 26 Mar 2025 18:49:51 GMT, Vladimir Kozlov wrote: >> This adds `@requires vm.compiler2.enabled` to tests, which fail with `Unrecognized VM option` on client VM. > > Can we run some of them with Graal? When no C2 specific flags are used. @vnkozlov do you agree that we should use `-XX:-IgnoreUnrecognizedVMOptions`? @zzambers Graal does not implement all flags, and so you would get the same issue with `Unrecognized VM option`. But it could still be valuable to run the tests with Graal, even if the flags are not doing anything. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24262#issuecomment-2769944061 From epeter at openjdk.org Tue Apr 1 16:39:30 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 1 Apr 2025 16:39:30 GMT Subject: RFR: 8252473: [TESTBUG] compiler tests fail with minimal VM: Unrecognized VM option In-Reply-To: References:

Message-ID: <8AtiGaQ_cEwB_7Vi4fDwYUEvLMnjVy6BGwz-4vaqGq4=.096cd043-8296-40b5-bbb1-14ae9b51b12c@github.com> On Tue, 1 Apr 2025 16:23:49 GMT, Zdenek Zambersky wrote: > My little probably unfounded concern would be that typos in args could than be silently ignored. That's not completely unfounded, but I think this taking `-XX:-IgnoreUnrecognizedVMOptions` is still preferrable to `@requires vm.compiler2.enabled`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24262#issuecomment-2769946172 From sviswanathan at openjdk.org Tue Apr 1 17:38:30 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 1 Apr 2025 17:38:30 GMT Subject: RFR: 8352585: Add special case handling for Float16.max/min x86 backend [v7] In-Reply-To: References:

Message-ID: On Tue, 1 Apr 2025 16:17:22 GMT, Jatin Bhateja wrote: >> This bugfix patch adds the special handling as per x86 AVX512-FP16 ISA specification[1][2] to compute max/min operations with +/-0.0 or NaN operands. >> >> Special handling leverage the instruction semantic, central idea is to shuffle the operands such that smaller input gets assigned to second operand for min operation or a larger input gets assigned to second operand for max operation, in addition result equals NaN if an unordered comparison detects first input as a NaN value else we return the result of min/max operation. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin >> >> [1] https://www.felixcloutier.com/x86/vminsh >> [2] https://www.felixcloutier.com/x86/vmaxsh > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolution Thanks for making this change. PR looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24169#pullrequestreview-2733543988 From sviswanathan at openjdk.org Tue Apr 1 17:38:30 2025 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 1 Apr 2025 17:38:30 GMT Subject: RFR: 8352585: Add special case handling for Float16.max/min x86 backend [v5] In-Reply-To: References:

Message-ID: On Thu, 27 Mar 2025 13:14:39 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolution > > I have not looked at the x64 instructions, but only the tests again. > > I have noticed that you only cover specific values. You could improve tests with this: > - Add non-canonical NaN values. > - Just iterate over all possible Float16 input pairs. It's onls `2^32`, that should be feasible! Then compare compiled vs interpreted results. > > It seems that bugs like these happen because somehow we do not systematically cover all inputs. Maybe we should do the same for all Float16 operations? @eme64 We are looking forward to your approval for this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24169#issuecomment-2770211207 From vlivanov at openjdk.org Tue Apr 1 18:53:13 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 1 Apr 2025 18:53:13 GMT Subject: RFR: 8353188: C1: Clean up x86 backend after 32-bit x86 removal [v2] In-Reply-To: References: <-iwh_5JGpt-TAVpfZQjwbnIG_c8hvirNKCcmiZoLNls=.3b34bf15-51fc-42bf-a294-1c23ca99754c@github.com> Message-ID: <7cW4vEajJs-DiP7wkmG1j9zmOdw5fHR5FVq6W17lJas=.6c7cfbac-c478-4c18-9b87-5a0a50658363@github.com> On Tue, 1 Apr 2025 07:58:27 GMT, Aleksey Shipilev wrote: >> Piece-wise cleanup of C1_LIRAssembler_x86, C1_MacroAssembler and related classes. C1 implements the bulk of arch-specific backend there. Major parts of this backend are already removed by #24274, this cleans up another large bulk, and hopefully most of it. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux x86_64 server fastdebug, `all` + `-XX:TieredStopAtLevel=1` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Review comments Looks good. ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24301#pullrequestreview-2733756865 From vlivanov at openjdk.org Tue Apr 1 19:06:34 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 1 Apr 2025 19:06:34 GMT Subject: RFR: 8347645: C2: XOR bounded value handling blocks constant folding [v48] In-Reply-To: <1JYbwRdMBDikLGt3iXx87YRTWrF6NwzbFDH916UuoSA=.1fb10eab-4963-4d4c-a8ae-97ec3cecdfe2@github.com> References: <1JYbwRdMBDikLGt3iXx87YRTWrF6NwzbFDH916UuoSA=.1fb10eab-4963-4d4c-a8ae-97ec3cecdfe2@github.com> Message-ID: On Tue, 1 Apr 2025 02:44:03 GMT, Johannes Graham wrote: >> An interaction between xor bounds optimization and constant folding resulted in xor over constants not being optimized. This has a noticeable effect on `Long.expand` with a constant mask, on architectures that don't have instructions equivalent to `PDEP` to be used in an intrinsic. >> >> This change moves logic from the `Xor(L|I)Node::Value` methods into the `add_ring` methods, and gives priority to constant-folding. A static method was separated out to facilitate direct unit-testing. It also (subjectively) simplified the calculation of the upper bound and added an explanation of the reasoning behind it. >> >> In addition to testing for constant folding over xor, IR tests were added to `XorINodeIdealizationTests` and `XorLNodeIdealizationTests` to cover these related items: >> - Bounds optimization of xor >> - A check for `x ^ x = 0` >> - Explicit testing of xor over booleans. >> >> Also `test_xor_node.cpp` was added to more extensively test the correctness of the bounds optimization. It exhaustively tests ranges of 4-bit numbers as well as at the high and low end of the affected types. > > Johannes Graham has updated the pull request incrementally with one additional commit since the last revision: > > remove unused methods Overall, looks good. Some minor comments follow. src/hotspot/share/opto/addnode.cpp line 1012: > 1010: > 1011: if (r0->is_con() && r1->is_con()) { > 1012: // Constant fold: (c1 ^ c2) -> c3 A bit confusing. The comment mentions `c1` and `c2` while the code operate on `t0`/`r0` and `t1`/`r1`. src/hotspot/share/opto/addnode.cpp line 1019: > 1017: > 1018: if (r0->_lo >= 0 && r1->_lo >= 0) { > 1019: // Combine [0, lo_1] ^ [0, hi_1] -> [0, max] What does this comment refer to? It mentions `lo_1` and `hi_1` while `r0->_hi` and `r1->_hi` are passed into `xor_upper_bound_for_ranges`. Also, I'd avoid naming it`max`: it sort of hints to `max_jint`, but in reality it represents the upper bound of the operation. Why not `upper`/`upper_bound` instead? ------------- PR Review: https://git.openjdk.org/jdk/pull/23089#pullrequestreview-2733792136 PR Review Comment: https://git.openjdk.org/jdk/pull/23089#discussion_r2023525192 PR Review Comment: https://git.openjdk.org/jdk/pull/23089#discussion_r2023523965 From duke at openjdk.org Tue Apr 1 23:22:09 2025 From: duke at openjdk.org (Johannes Graham) Date: Tue, 1 Apr 2025 23:22:09 GMT Subject: RFR: 8347645: C2: XOR bounded value handling blocks constant folding [v49] In-Reply-To: References: Message-ID: > An interaction between xor bounds optimization and constant folding resulted in xor over constants not being optimized. This has a noticeable effect on `Long.expand` with a constant mask, on architectures that don't have instructions equivalent to `PDEP` to be used in an intrinsic. > > This change moves logic from the `Xor(L|I)Node::Value` methods into the `add_ring` methods, and gives priority to constant-folding. A static method was separated out to facilitate direct unit-testing. It also (subjectively) simplified the calculation of the upper bound and added an explanation of the reasoning behind it. > > In addition to testing for constant folding over xor, IR tests were added to `XorINodeIdealizationTests` and `XorLNodeIdealizationTests` to cover these related items: > - Bounds optimization of xor > - A check for `x ^ x = 0` > - Explicit testing of xor over booleans. > > Also `test_xor_node.cpp` was added to more extensively test the correctness of the bounds optimization. It exhaustively tests ranges of 4-bit numbers as well as at the high and low end of the affected types. Johannes Graham has updated the pull request incrementally with one additional commit since the last revision: update comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23089/files - new: https://git.openjdk.org/jdk/pull/23089/files/50d35dcd..dda134fb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23089&range=48 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23089&range=47-48 Stats: 8 lines in 1 file changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/23089.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23089/head:pull/23089 PR: https://git.openjdk.org/jdk/pull/23089 From vlivanov at openjdk.org Wed Apr 2 03:01:46 2025 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Wed, 2 Apr 2025 03:01:46 GMT Subject: RFR: 8347645: C2: XOR bounded value handling blocks constant folding [v49] In-Reply-To: References:

Message-ID: On Tue, 1 Apr 2025 23:22:09 GMT, Johannes Graham wrote: >> An interaction between xor bounds optimization and constant folding resulted in xor over constants not being optimized. This has a noticeable effect on `Long.expand` with a constant mask, on architectures that don't have instructions equivalent to `PDEP` to be used in an intrinsic. >> >> This change moves logic from the `Xor(L|I)Node::Value` methods into the `add_ring` methods, and gives priority to constant-folding. A static method was separated out to facilitate direct unit-testing. It also (subjectively) simplified the calculation of the upper bound and added an explanation of the reasoning behind it. >> >> In addition to testing for constant folding over xor, IR tests were added to `XorINodeIdealizationTests` and `XorLNodeIdealizationTests` to cover these related items: >> - Bounds optimization of xor >> - A check for `x ^ x = 0` >> - Explicit testing of xor over booleans. >> >> Also `test_xor_node.cpp` was added to more extensively test the correctness of the bounds optimization. It exhaustively tests ranges of 4-bit numbers as well as at the high and low end of the affected types. > > Johannes Graham has updated the pull request incrementally with one additional commit since the last revision: > > update comments Looks good! ------------- Marked as reviewed by vlivanov (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/23089#pullrequestreview-2734546645 From epeter at openjdk.org Wed Apr 2 06:19:39 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 2 Apr 2025 06:19:39 GMT Subject: RFR: 8347645: C2: XOR bounded value handling blocks constant folding [v46] In-Reply-To: References:

Message-ID: On Tue, 1 Apr 2025 02:49:45 GMT, Johannes Graham wrote: >> @iwanowww >> >>> `_non_neg` part is confusing; I'd stress instead that it works on ranges. >> >> I find it easier to think of it as calculating the upperbound of the xor of 2 non-negative integers whose upperbounds are given in the parameters. > > Renamed to `xor_upper_bound_for_ranges` before I saw your comment, @merykitty. I'd be ok with another name though. With the last changes, the method is no longer a member of the class, so it's no longer going to get as many eyes on it without context, so maybe it matters less now. @j3graham I gave it a quick look, and it looks even better now. Let me run testing again before you integrate! Please ping me in 24h for the results! ------------- PR Comment: https://git.openjdk.org/jdk/pull/23089#issuecomment-2771441200 From duke at openjdk.org Wed Apr 2 06:32:10 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 2 Apr 2025 06:32:10 GMT Subject: RFR: 8347449: C2: UseLoopPredicate off should also turn UseProfiledLoopPredicate off [v6] In-Reply-To: References:

Message-ID: <57-zPqw_-3qY6G5TZUYXG4MFzx_jmhHRDN78DR-dy0o=.c105c4e4-9ffa-4dd4-9390-70f27e48f217@github.com> On Fri, 28 Mar 2025 09:09:59 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> When running with `-XX:-UseLoopPredicate` C2 still inserts profiled loop parse predicates, despite those being a form of loop parse predicate. Further, the loop predicate code is not always consistent when to insert/expect profiled parse predicates. >> >> # Change Summary >> >> Following the rationale, that profiled predicates are a subset of loop predicates, this PR disables profiled predicates whenever loop predicates are disabled. They are disabled on the level of arguments. Further, before any checks for whether profiled predicates are enabled, this PR inserts a check that loop predicates are enabled such that the code is consistent in its intention. >> >> Concretel, this PR >> - adds parse predicate nodes to the IR testing framework, >> - turns off `UseProfiledLoopPredicate` if `UseLoopPredicate` is turned off, >> - predicates all checks for `UseProfiledLoopPredicate`on `UseLoopPredicate` first for consistency, >> - adds a regression test. >> >> >> # Testing >> >> The changes passed the following testing: >> - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14078750038) >> - tier1 through tier3 and Oracle internal testing > > Manuel H?ssig has updated the pull request incrementally with two additional commits since the last revision: > > - idealKit::loop: always call add_parse_predicates > > It was contstrained on UseParsePredicate, but this is incorrect, since > all parse predicates are added in that function. > - Improve description of UseLoopPredicate argument Thank y'all for the thorough review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24248#issuecomment-2771461584 From duke at openjdk.org Wed Apr 2 06:32:11 2025 From: duke at openjdk.org (duke) Date: Wed, 2 Apr 2025 06:32:11 GMT Subject: RFR: 8347449: C2: UseLoopPredicate off should also turn UseProfiledLoopPredicate off [v6] In-Reply-To: References:

Message-ID: On Fri, 28 Mar 2025 09:09:59 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> When running with `-XX:-UseLoopPredicate` C2 still inserts profiled loop parse predicates, despite those being a form of loop parse predicate. Further, the loop predicate code is not always consistent when to insert/expect profiled parse predicates. >> >> # Change Summary >> >> Following the rationale, that profiled predicates are a subset of loop predicates, this PR disables profiled predicates whenever loop predicates are disabled. They are disabled on the level of arguments. Further, before any checks for whether profiled predicates are enabled, this PR inserts a check that loop predicates are enabled such that the code is consistent in its intention. >> >> Concretel, this PR >> - adds parse predicate nodes to the IR testing framework, >> - turns off `UseProfiledLoopPredicate` if `UseLoopPredicate` is turned off, >> - predicates all checks for `UseProfiledLoopPredicate`on `UseLoopPredicate` first for consistency, >> - adds a regression test. >> >> >> # Testing >> >> The changes passed the following testing: >> - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14078750038) >> - tier1 through tier3 and Oracle internal testing > > Manuel H?ssig has updated the pull request incrementally with two additional commits since the last revision: > > - idealKit::loop: always call add_parse_predicates > > It was contstrained on UseParsePredicate, but this is incorrect, since > all parse predicates are added in that function. > - Improve description of UseLoopPredicate argument @mhaessig Your change (at version 1561a0eea3b2049e4e9e6468d0237f60e97cd2e8) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24248#issuecomment-2771462472 From duke at openjdk.org Wed Apr 2 06:33:14 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 2 Apr 2025 06:33:14 GMT Subject: RFR: 8352893: C2: OrL/INode::add_ring optimize (x | -1) to -1 [v3] In-Reply-To: References:

Message-ID: <2rYcxIlI5lZujCDgdo1RStzxjeJGym2ftPpb2eoxW38=.1006c857-1293-4e15-8fca-2d7ce163f420@github.com> On Mon, 31 Mar 2025 14:31:43 GMT, Manuel H?ssig wrote: >> # Issue Summary >> >> The `add_ring()` implementations of `OrINode` and `OrLNode` are missing the optimization that an or with a value where all bits are ones (since we have signed integers in this case `~0 == -1`) will always yield all zeroes. >> >> # Changes >> >> This PR makes the following straight forward changes: >> - `Or(I|L)Node::add_ring()` returns `-1` if one of the two inputs is `-1`. >> - Add `Or(I|L)` nodes to the IR framework. >> - Add a regression IR test for the implemented optimization. >> >> # Testing >> >> - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14110978686) >> - Ran tier1 through tier3 and Oracle internal testing > > Manuel H?ssig has updated the pull request incrementally with one additional commit since the last revision: > > Remove loop in test and instead use random values Thank you for reviewing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24289#issuecomment-2771459481 From duke at openjdk.org Wed Apr 2 06:33:15 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 2 Apr 2025 06:33:15 GMT Subject: Integrated: 8352893: C2: OrL/INode::add_ring optimize (x | -1) to -1 In-Reply-To: References: Message-ID: On Fri, 28 Mar 2025 10:21:57 GMT, Manuel H?ssig wrote: > # Issue Summary > > The `add_ring()` implementations of `OrINode` and `OrLNode` are missing the optimization that an or with a value where all bits are ones (since we have signed integers in this case `~0 == -1`) will always yield all zeroes. > > # Changes > > This PR makes the following straight forward changes: > - `Or(I|L)Node::add_ring()` returns `-1` if one of the two inputs is `-1`. > - Add `Or(I|L)` nodes to the IR framework. > - Add a regression IR test for the implemented optimization. > > # Testing > > - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14110978686) > - Ran tier1 through tier3 and Oracle internal testing This pull request has now been integrated. Changeset: f301663b Author: Manuel H?ssig URL: https://git.openjdk.org/jdk/commit/f301663b346bf2388ecfa429be1cf64c6e93ee8e Stats: 109 lines in 3 files changed: 109 ins; 0 del; 0 mod 8352893: C2: OrL/INode::add_ring optimize (x | -1) to -1 Reviewed-by: epeter, thartmann, kvn ------------- PR: https://git.openjdk.org/jdk/pull/24289 From epeter at openjdk.org Wed Apr 2 06:34:32 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 2 Apr 2025 06:34:32 GMT Subject: RFR: 8352585: Add special case handling for Float16.max/min x86 backend [v5] In-Reply-To: References:

Message-ID: On Fri, 28 Mar 2025 04:50:17 GMT, Jatin Bhateja wrote: >> I have not looked at the x64 instructions, but only the tests again. >> >> I have noticed that you only cover specific values. You could improve tests with this: >> - Add non-canonical NaN values. >> - Just iterate over all possible Float16 input pairs. It's onls `2^32`, that should be feasible! Then compare compiled vs interpreted results. >> >> It seems that bugs like these happen because somehow we do not systematically cover all inputs. Maybe we should do the same for all Float16 operations? > > Hi @eme64 , > This specific issues is around special Float16 values i.e +/- 0.0 and NaN. > I have added a Generator for Float16 as part of https://github.com/openjdk/jdk/pull/22755 > > Best Regards, > Jatin @jatin-bhateja It looks reasonable to me now. Let me run some testing, ping me in 24h for the results! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24169#issuecomment-2771465382 From chagedorn at openjdk.org Wed Apr 2 06:50:59 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 2 Apr 2025 06:50:59 GMT Subject: RFR: 8353058: [PPC64] Some IR framework tests are failing after JDK-8352595 Message-ID: `TestPhaseIRMatching` was recently updated with [JDK-8352595](https://bugs.openjdk.org/browse/JDK-8352595) which changed some matching on opto assembly from `IRNode.ALLOC` (now matching on ideal phases) to `IRNode.FIELD_ACCESS` (still matching on opto assembly). However, the updated code matches differently on PPC for some method invocation on a parameter which let the test fail on PPC: public Object defaultOnOptoAssembly(Helper h) { return h.getString(); // emits one "Field: " string on most platforms but none on PPC } When I've revisited the test to analyze the failure, it was not evidently clear what I had in mind back there with `defaultOnX()`. My guess is that I've tried to have one method failing on ideal phases, one on mach phases and one on both while all the methods use `IRNode` entries that have default compile phases on ideal and mach phases. But that is not the case today. I've therefore rewritten the tests to adhere to my guess. I also removed the ambiguity among platforms to have the same number of field accesses on them. How to read the `@ExpectedFailure` annotation: @IR(failOn = {IRNode.STORE, IRNode.FIELD_ACCESS, IRNode.COUNTED_LOOP, IRNode.STORE_I}, counts = {IRNode.STORE, "20", IRNode.FIELD_ACCESS, "1", IRNode.COUNTED_LOOP, "2", IRNode.OOPMAP_WITH, "asdf", "2"}) // Expect rule with id 5 (the one directly above) to fail: // - We fail when matching PRINT_IDEAL with the: // - failOn attribute: The failing constraints are constraint 1 and 4 (while 2 and 3 pass) // - counts attribute: The failing constraints are constraint 2 and 4 (while 1 and 3 pass). @ExpectedFailure(ruleId = 5, phase = CompilePhase.PRINT_IDEAL, failOn = {1, 4}, counts = {1, 3}) Thanks to @TheRealMDoerr for testing the patch on PPC! Thanks, Christian ------------- Commit messages: - 8353058: [PPC64] Some IR framework tests are failing after JDK-8314999 Changes: https://git.openjdk.org/jdk/pull/24373/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24373&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353058 Stats: 54 lines in 1 file changed: 17 ins; 8 del; 29 mod Patch: https://git.openjdk.org/jdk/pull/24373.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24373/head:pull/24373 PR: https://git.openjdk.org/jdk/pull/24373 From chagedorn at openjdk.org Wed Apr 2 06:50:59 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 2 Apr 2025 06:50:59 GMT Subject: RFR: 8353058: [PPC64] Some IR framework tests are failing after JDK-8352595 In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 06:45:58 GMT, Christian Hagedorn wrote: > `TestPhaseIRMatching` was recently updated with [JDK-8352595](https://bugs.openjdk.org/browse/JDK-8352595) which changed some matching on opto assembly from `IRNode.ALLOC` (now matching on ideal phases) to `IRNode.FIELD_ACCESS` (still matching on opto assembly). However, the updated code matches differently on PPC for some method invocation on a parameter which let the test fail on PPC: > > public Object defaultOnOptoAssembly(Helper h) { > return h.getString(); // emits one "Field: " string on most platforms but none on PPC > } > > > When I've revisited the test to analyze the failure, it was not evidently clear what I had in mind back there with `defaultOnX()`. My guess is that I've tried to have one method failing on ideal phases, one on mach phases and one on both while all the methods use `IRNode` entries that have default compile phases on ideal and mach phases. But that is not the case today. I've therefore rewritten the tests to adhere to my guess. I also removed the ambiguity among platforms to have the same number of field accesses on them. > > How to read the `@ExpectedFailure` annotation: > > @IR(failOn = {IRNode.STORE, IRNode.FIELD_ACCESS, IRNode.COUNTED_LOOP, IRNode.STORE_I}, > counts = {IRNode.STORE, "20", IRNode.FIELD_ACCESS, "1", IRNode.COUNTED_LOOP, "2", IRNode.OOPMAP_WITH, "asdf", "2"}) > // Expect rule with id 5 (the one directly above) to fail: > // - We fail when matching PRINT_IDEAL with the: > // - failOn attribute: The failing constraints are constraint 1 and 4 (while 2 and 3 pass) > // - counts attribute: The failing constraints are constraint 2 and 4 (while 1 and 3 pass). > @ExpectedFailure(ruleId = 5, phase = CompilePhase.PRINT_IDEAL, failOn = {1, 4}, counts = {1, 3}) > > > Thanks to @TheRealMDoerr for testing the patch on PPC! > > Thanks, > Christian test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestPhaseIRMatching.java line 178: > 176: public void defaultOnOptoAssembly() { > 177: i = 34; > 178: l = 34; Always using this body which reliably emits two "Field: " strings in the opto assembly on all platforms. Thus removed the `Helper` class again. test/hotspot/jtreg/testlibrary_tests/ir_framework/tests/TestPhaseIRMatching.java line 228: > 226: } > 227: defaultOnOptoAssembly(new Helper("a", 1)); > 228: defaultOnBoth(new Helper("a", 1)); No longer needed because we do not need to pass anything into the methods. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/24373#discussion_r2024171272 PR Review Comment: https://git.openjdk.org/jdk/pull/24373#discussion_r2024171836 From duke at openjdk.org Wed Apr 2 06:51:28 2025 From: duke at openjdk.org (Manuel =?UTF-8?B?SMOkc3NpZw==?=) Date: Wed, 2 Apr 2025 06:51:28 GMT Subject: Integrated: 8347449: C2: UseLoopPredicate off should also turn UseProfiledLoopPredicate off In-Reply-To: References: Message-ID: On Wed, 26 Mar 2025 09:27:59 GMT, Manuel H?ssig wrote: > # Issue Summary > > When running with `-XX:-UseLoopPredicate` C2 still inserts profiled loop parse predicates, despite those being a form of loop parse predicate. Further, the loop predicate code is not always consistent when to insert/expect profiled parse predicates. > > # Change Summary > > Following the rationale, that profiled predicates are a subset of loop predicates, this PR disables profiled predicates whenever loop predicates are disabled. They are disabled on the level of arguments. Further, before any checks for whether profiled predicates are enabled, this PR inserts a check that loop predicates are enabled such that the code is consistent in its intention. > > Concretel, this PR > - adds parse predicate nodes to the IR testing framework, > - turns off `UseProfiledLoopPredicate` if `UseLoopPredicate` is turned off, > - predicates all checks for `UseProfiledLoopPredicate`on `UseLoopPredicate` first for consistency, > - adds a regression test. > > > # Testing > > The changes passed the following testing: > - [Github Actions](https://github.com/mhaessig/jdk/actions/runs/14078750038) > - tier1 through tier3 and Oracle internal testing This pull request has now been integrated. Changeset: d358f5f4 Author: Manuel H?ssig Committer: Christian Hagedorn URL: https://git.openjdk.org/jdk/commit/d358f5f4a44aacf2d79ccdb3e362ce8ed571f6da Stats: 150 lines in 7 files changed: 128 ins; 2 del; 20 mod 8347449: C2: UseLoopPredicate off should also turn UseProfiledLoopPredicate off Reviewed-by: chagedorn, epeter ------------- PR: https://git.openjdk.org/jdk/pull/24248 From epeter at openjdk.org Wed Apr 2 07:10:23 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 2 Apr 2025 07:10:23 GMT Subject: RFR: 8352316: More MergeStoreBench [v7] In-Reply-To: References: <5fLeODHTQw8vbuvTl6G0YPNszI5_tH1b3L_tWJtCTh8=.ca1b21f2-2890-4daa-8ce2-8112a3f7146b@github.com> Message-ID: On Sat, 29 Mar 2025 07:27:24 GMT, Shaojin Wen wrote: >> Added performance tests related to String.getBytes/String.getChars/StringBuilder.append/System.arraycopy in constant scenarios to verify whether MergeStore works > > Shaojin Wen has updated the pull request incrementally with one additional commit since the last revision: > > add StringBuilderUnsafePut @wenshao @iwanowww I have a few concerns about this PR. Your current PR description says this: > Added performance tests related to String.getBytes/String.getChars/StringBuilder.append/System.arraycopy in constant scenarios to verify whether MergeStore works First: a benchmark is not the best way `to verify whether MergeStore works`. An IR test would be more helpful, as it could check reliably what IR is generated, and hence if MergeStores actually optimized anything. Second: A JMH benchmark could also be helpful, but only if you run it with and without MergeStores enabled. Otherwise how would you know if it was MergeStores or another optimization that is relevant here? Third: `getBytes` / `arraycopy` is **NOT** a MergeStores pattern. These are **COPY** patterns. So they probably should go to a separate benchmark file. I don't want the MergeStores benchmark polluted with unrelated cases. I could be wrong here, and just not see how these cases are MergeStore cases, but you need to show the details here. I put some time in understanding your PR and asking you a list of questions. You did not really respond to them, and that is frustrating to me and makes me feel like my time is not valued: https://github.com/openjdk/jdk/pull/24108#issuecomment-2762946069 You say this: > By default, in OpenJDK, COMPACT_STRINGS = true, and the String coder without UTF16 characters is LATIN1, which is implemented using System.arraycopy. However, since String is immutable and System.arraycopy is directly performed on byte[], C2 should have more opportunities for optimization. Maybe the `System.arraycopy` can be optimized. But I don't think it is the MergeStores optimization that would do that. This is really a **Copy** pattern and not a `MergeStores` pattern. Please read the PRs on MergeStores to see what patterns are covered. And like I asked in previously: > Can you investigate what code it generates, and what kinds of optimizations are missing to make it close in performance to the Unsafe benchmark? > I don't have time to do all the deep investigations myself. But feel free to ask me if you have more questions. To me, benchmarks are only helpful and worth integrating if there is some clear and documented purpose. It would be really nice if you could invest some time into that :) test/micro/org/openjdk/bench/vm/compiler/MergeStoreBench.java line 693: > 691: } > 692: BH.consume(off); > 693: } This is a copy pattern, not MergeStores. test/micro/org/openjdk/bench/vm/compiler/MergeStoreBench.java line 735: > 733: } > 734: BH.consume(off); > 735: } @wenshao This is a copy pattern. Not a MergeStore pattern. So I can tell you already now that it will not be optimized by MergeStores ;) test/micro/org/openjdk/bench/vm/compiler/MergeStoreBench.java line 799: > 797: } > 798: BH.consume(off); > 799: } @wenshao Why would MergeStores work here? This is is a copy pattern. That is not at all covered by MergeStores. test/micro/org/openjdk/bench/vm/compiler/MergeStoreBench.java line 856: > 854: } > 855: BH.consume(sb.length()); > 856: } Why would you expect MergeStores to work here? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24108#pullrequestreview-2734816014 PR Review Comment: https://git.openjdk.org/jdk/pull/24108#discussion_r2024171061 PR Review Comment: https://git.openjdk.org/jdk/pull/24108#discussion_r2024170015 PR Review Comment: https://git.openjdk.org/jdk/pull/24108#discussion_r2024169285 PR Review Comment: https://git.openjdk.org/jdk/pull/24108#discussion_r2024172517 From mchevalier at openjdk.org Wed Apr 2 07:13:19 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 2 Apr 2025 07:13:19 GMT Subject: RFR: 8353058: [PPC64] Some IR framework tests are failing after JDK-8352595 In-Reply-To: References: Message-ID: <-RonuxVG3qrg8pJV2J6lrXnAlV4oBHJC5wzdEFCKhzc=.753fea93-d133-4135-827a-bcd6ae4e32d0@github.com> On Wed, 2 Apr 2025 06:45:58 GMT, Christian Hagedorn wrote: > `TestPhaseIRMatching` was recently updated with [JDK-8352595](https://bugs.openjdk.org/browse/JDK-8352595) which changed some matching on opto assembly from `IRNode.ALLOC` (now matching on ideal phases) to `IRNode.FIELD_ACCESS` (still matching on opto assembly). However, the updated code matches differently on PPC for some method invocation on a parameter which let the test fail on PPC: > > public Object defaultOnOptoAssembly(Helper h) { > return h.getString(); // emits one "Field: " string on most platforms but none on PPC > } > > > When I've revisited the test to analyze the failure, it was not evidently clear what I had in mind back there with `defaultOnX()`. My guess is that I've tried to have one method failing on ideal phases, one on mach phases and one on both while all the methods use `IRNode` entries that have default compile phases on ideal and mach phases. But that is not the case today. I've therefore rewritten the tests to adhere to my guess. I also removed the ambiguity among platforms to have the same number of field accesses on them. > > How to read the `@ExpectedFailure` annotation: > > @IR(failOn = {IRNode.STORE, IRNode.FIELD_ACCESS, IRNode.COUNTED_LOOP, IRNode.STORE_I}, > counts = {IRNode.STORE, "20", IRNode.FIELD_ACCESS, "1", IRNode.COUNTED_LOOP, "2", IRNode.OOPMAP_WITH, "asdf", "2"}) > // Expect rule with id 5 (the one directly above) to fail: > // - We fail when matching PRINT_IDEAL with the: > // - failOn attribute: The failing constraints are constraint 1 and 4 (while 2 and 3 pass) > // - counts attribute: The failing constraints are constraint 2 and 4 (while 1 and 3 pass). > @ExpectedFailure(ruleId = 5, phase = CompilePhase.PRINT_IDEAL, failOn = {1, 4}, counts = {1, 3}) > > > Thanks to @TheRealMDoerr for testing the patch on PPC! > > Thanks, > Christian Looks good to me. I've also used `FIELD_ACCESS` in TestCompilePhaseCollector.java, but I think it's harmless there since we are not matching, but just using it for its default phase. But I still mention, just in case... ------------- PR Comment: https://git.openjdk.org/jdk/pull/24373#issuecomment-2771540722 From epeter at openjdk.org Wed Apr 2 07:15:27 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 2 Apr 2025 07:15:27 GMT Subject: RFR: 8346964: C2: Improve integer multiplication with constant in MulINode::Ideal() [v3] In-Reply-To: References: <4UC1x1GPJCcIwPXKJZfiUGxQnuRaDQjOcN53wYmUzF4=.fafd71c1-2f48-4ae4-8e7e-8844c578429a@github.com>

<6PtcpyIAXa2wbi0CI5-DVvI1r2RRDvKtIWko7nvBDFo=.49b4d6f7-0dda-42e7-9f51-bfa3c06ef6f5@github.com> Message-ID: <8P3c-UQwGnV7gzMapQf_YAQHQLaIKTvYGFY3O5Of2UU=.87fa4250-e2f5-4efd-b6ab-fd2298a8bea7@github.com> On Thu, 9 Jan 2025 06:21:14 GMT, erifan wrote: >> @erifan I did some more thinking when falling asleep / waking up. This is a really interesting problem here. >> >> For `MulINode::Ideal` with patterns `var * con`, we really have these options in assembly: >> - `mul` general case. >> - `shift` and `add` when profitable. >> - `lea` could this be an improvement over `shift` and `add`? >> >> The issue is that different platforms have different characteristics here for these instructions - we would have to see how they differ. As far as I remember `mul` is not always available on all `ALU`s, but `add` and `shift` should be available. This impacts their throughput (more ports / ALU means more throughput generally). But the instructions also have different latency. Further, I could imagine that at some point more instructions may not just affect the throughput, but also the code-size: that in turn would increase IR and may at some point affect the instruction cache. >> >> Additionally: if your workload has other `mul`, `shift` and `add` mixed in, then some ports may already be saturated, and that could tilt the balance as to which option you are supposed to take. >> >> And then the characteristics of scalar ops may not be identical to vector ops. >> >> It would be interesting to have a really solid benchmark, where you explore the impact of these different effects. >> And it would be interesting to extract a table of latency + throughput characteristics for all relevant scalar + vector ops, for a number of different CPUs. Just so we get an overview of how easy this is to tune. >> >> Maybe perfect tuning is not possible. Maybe we are willing to take a `5%` regression in some cases to boost other cases by `30%`. But that is a **big maybe**: we really do not like getting regressions in existing code, it tends to upset people more if they get regressions compared to how much they enjoy speedups - so work like this can be delicate. >> >> Anyway, I don't right now have much time to investigate and work on this myself. So you'd have to do the work, benchmark, explanation etc. **But I think the `30%` speedup indicates that this work could really have potential!** >> >> As to what to do in sequence, here a suggestion: >> 1. First work on Vector API cases of vector multiplication - this should have no impact on other things. >> 2. Delay the `MulINode::Ideal` optimizations until after loop-opts: scalar code would still be handled in the old way, but auto-vectorized code would then be turned into `MulV`. And then go into the mul -> sh... > > Hi @eme64 thanks for your review. > > 1. First work on Vector API cases of vector multiplication - this should have no impact on other things. > 2. Delay the MulINode::Ideal optimizations until after loop-opts: scalar code would still be handled in the old way, but auto-vectorized code would then be turned into MulV. And then go into the mul -> shift optimization for vectors under point 1. > 3. Tackle MulINode::Ideal for scalar cases after loop-opts, and see what you can do there. > > I agree with you. I am actually working on `1`. The slightly troublesome thing is that `1` and `3` are both related to the architecture, so it might take a little more time. > >> lea could this be an improvement over shift and add? > > AARCH64 doesn't actually have a `lea` instruction. On x64 there are already some rules that turn `shift add` into `lea`. > > The issue is that different platforms have different characteristics here for these instructions - we would have to see how they differ. As far as I remember mul is not always available on all ALUs, but add and shift should be available. This impacts their throughput (more ports / ALU means more throughput generally). But the instructions also have different latency. Further, I could imagine that at some point more instructions may not just affect the throughput, but also the code-size: that in turn would increase IR and may at some point affect the instruction cache. > > Additionally: if your workload has other mul, shift and add mixed in, then some ports may already be saturated, and that could tilt the balance as to which option you are supposed to take. > > And then the characteristics of scalar ops may not be identical to vector ops. > > > Yes this is very trick, the actual performance is related to many aspects, such as pipeline, latency, throughput, ROB, and even memory performance. We can only do optimization based on certain references and generalities, such as latency and throughput of different instructions. But when it comes to generalities, it is actually difficult to say which scenario is more general. > >> It would be interesting to have a really solid benchmark, where you explore the impact of these different effects. > And it would be interesting to extract a table of latency + throughput characteristics for all relevant scalar + vector ops, for a number of different CPUs. Just so we get an overview of how easy this is to tune. > > I don't know such a benchmark suite yet. For AARCH64, I usually refer to [the Arm Optimization Guide](https:... @erifan you opened this again. Does that mean we should review again? I see that you did not make any changes since our last conversation. If it is not ready for review, could you please convert it to Draft, so it is clear that you are not asking for reviews currently? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22922#issuecomment-2771545449 From mchevalier at openjdk.org Wed Apr 2 07:19:25 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 2 Apr 2025 07:19:25 GMT Subject: RFR: 8353058: [PPC64] Some IR framework tests are failing after JDK-8352595 In-Reply-To: References: Message-ID: <2P9iZTfGS3zMibNJEqMfO_yf-Pir-hYdZFjUA3C5DSg=.fc405c98-769a-47a0-89ad-5ac2cf742fdf@github.com> On Wed, 2 Apr 2025 06:45:58 GMT, Christian Hagedorn wrote: > `TestPhaseIRMatching` was recently updated with [JDK-8352595](https://bugs.openjdk.org/browse/JDK-8352595) which changed some matching on opto assembly from `IRNode.ALLOC` (now matching on ideal phases) to `IRNode.FIELD_ACCESS` (still matching on opto assembly). However, the updated code matches differently on PPC for some method invocation on a parameter which let the test fail on PPC: > > public Object defaultOnOptoAssembly(Helper h) { > return h.getString(); // emits one "Field: " string on most platforms but none on PPC > } > > > When I've revisited the test to analyze the failure, it was not evidently clear what I had in mind back there with `defaultOnX()`. My guess is that I've tried to have one method failing on ideal phases, one on mach phases and one on both while all the methods use `IRNode` entries that have default compile phases on ideal and mach phases. But that is not the case today. I've therefore rewritten the tests to adhere to my guess. I also removed the ambiguity among platforms to have the same number of field accesses on them. > > How to read the `@ExpectedFailure` annotation: > > @IR(failOn = {IRNode.STORE, IRNode.FIELD_ACCESS, IRNode.COUNTED_LOOP, IRNode.STORE_I}, > counts = {IRNode.STORE, "20", IRNode.FIELD_ACCESS, "1", IRNode.COUNTED_LOOP, "2", IRNode.OOPMAP_WITH, "asdf", "2"}) > // Expect rule with id 5 (the one directly above) to fail: > // - We fail when matching PRINT_IDEAL with the: > // - failOn attribute: The failing constraints are constraint 1 and 4 (while 2 and 3 pass) > // - counts attribute: The failing constraints are constraint 2 and 4 (while 1 and 3 pass). > @ExpectedFailure(ruleId = 5, phase = CompilePhase.PRINT_IDEAL, failOn = {1, 4}, counts = {1, 3}) > > > Thanks to @TheRealMDoerr for testing the patch on PPC! > > Thanks, > Christian Marked as reviewed by mchevalier (Author). ------------- PR Review: https://git.openjdk.org/jdk/pull/24373#pullrequestreview-2734883035 From chagedorn at openjdk.org Wed Apr 2 07:19:26 2025 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 2 Apr 2025 07:19:26 GMT Subject: RFR: 8353058: [PPC64] Some IR framework tests are failing after JDK-8352595 In-Reply-To: <-RonuxVG3qrg8pJV2J6lrXnAlV4oBHJC5wzdEFCKhzc=.753fea93-d133-4135-827a-bcd6ae4e32d0@github.com> References: <-RonuxVG3qrg8pJV2J6lrXnAlV4oBHJC5wzdEFCKhzc=.753fea93-d133-4135-827a-bcd6ae4e32d0@github.com> Message-ID: On Wed, 2 Apr 2025 07:10:38 GMT, Marc Chevalier wrote: > Looks good to me. I've also used `FIELD_ACCESS` in TestCompilePhaseCollector.java, but I think it's harmless there since we are not matching, but just using it for its default phase. But I still mention, just in case... Thanks for your review Marc! Yes, there we do not perform the actual IR matching, so it's not a problem for platform specific differences. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24373#issuecomment-2771546971 From dfenacci at openjdk.org Wed Apr 2 07:19:26 2025 From: dfenacci at openjdk.org (Damon Fenacci) Date: Wed, 2 Apr 2025 07:19:26 GMT Subject: RFR: 8282053: IGV: refine schedule approximation [v2] In-Reply-To: <0SXZ0k-28IdpWjuhtK4dSJ9ybHE58Oq56zT_sdqeQpc=.0cf9319f-f64d-4e8f-82e9-f654464bc775@github.com> References: <0SXZ0k-28IdpWjuhtK4dSJ9ybHE58Oq56zT_sdqeQpc=.0cf9319f-f64d-4e8f-82e9-f654464bc775@github.com> Message-ID: On Tue, 1 Apr 2025 14:42:47 GMT, Daniel Skantz wrote: >> This patch refines the schedule approximation in IGV by 1) placing parm. and projection nodes in the same block as their predecessors, and 2) disallows erroneously considering machine nodes such as prefetchAlloc and rep_stos as CFG nodes. >> >> The reader may refer to the corresponding JBS issue where graphs sampled before and after the change are attached. >> >> Testing: T1-T3 with no failures. Opened graphs before and after the change and saw no obvious problems. Opened a large number of graphs in CFG view and observed no unexpected IGV warnings, errors or assert failures. > > Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: > > Update src/utils/IdealGraphVisualizer/ServerCompiler/src/main/java/com/sun/hotspot/igv/servercompiler/ServerCompilerScheduler.java > > Co-authored-by: Daniel Lund?n Nice! Thanks @danielogh! ------------- Marked as reviewed by dfenacci (Committer). PR Review: https://git.openjdk.org/jdk/pull/24350#pullrequestreview-2734882114 From epeter at openjdk.org Wed Apr 2 07:25:25 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 2 Apr 2025 07:25:25 GMT Subject: RFR: 8347499: C2: Make `PhaseIdealLoop` eliminate more redundant safepoints in loops [v2] In-Reply-To: References:

Message-ID: On Wed, 26 Mar 2025 08:59:07 GMT, Qizheng Xing wrote: >> Qizheng Xing has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Merge branch 'master' into enhance-loop-safepoint-elim >> - Add IR test and microbench. >> - Make `PhaseIdealLoop` eliminate more redundant safepoints in loops. > > The second question: > >> If we now removed safepoints in places where we would actually have needed them: how would we find out? I suppose we would get longer time to safepoint - higher latency in some cases. How would we catch this with our tests? > > I tried running tier1 tests with `JAVA_OPTIONS=-XX:+UnlockDiagnosticVMOptions -XX:+SafepointTimeout -XX:+AbortVMOnSafepointTimeout -XX:SafepointTimeoutDelay=1000`, and there were no failures. > > Running with `-XX:SafepointTimeoutDelay=500` caused 1 random JDK test case to fail. But then I tried to build a JDK without this patch, and it still had the random failure with this option. @MaxXSoft > Running with -XX:SafepointTimeoutDelay=500 caused 1 random JDK test case to fail. But then I tried to build a JDK without this patch, and it still had the random failure with this option. Wow, that sounds like we do not safepoint for half a second in that case. That could be a bug. Could you please tell me what test it is, and how you ran it? We may want to file a bug and investigate it. @MaxXSoft Would you mind improving the documentation comments, so that they are easier to understand? Maybe you can even add more comments around your code change, to "prove" why it is ok to do what we would do with your change? ------------- PR Comment: https://git.openjdk.org/jdk/pull/23057#issuecomment-2771559333 PR Comment: https://git.openjdk.org/jdk/pull/23057#issuecomment-2771565388 From epeter at openjdk.org Wed Apr 2 07:25:25 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 2 Apr 2025 07:25:25 GMT Subject: RFR: 8347499: C2: Make `PhaseIdealLoop` eliminate more redundant safepoints in loops [v2] In-Reply-To: References:

Message-ID: On Tue, 25 Mar 2025 09:51:46 GMT, Qizheng Xing wrote: > On the one hand, this situation won't occur in the current Compile::Optimize process. The Optimize method will always complete all inlining before performing loop optimization And what about late inlining? Does that not happen after loop opts? Maybe we insert new SafePoints when inlining, I simply don't know enough about that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23057#issuecomment-2771561565 From epeter at openjdk.org Wed Apr 2 07:26:30 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 2 Apr 2025 07:26:30 GMT Subject: RFR: 8341976: C2: use_mem_state != load->find_exact_control(load->in(0)) assert failure [v7] In-Reply-To: References:

Message-ID: On Tue, 1 Apr 2025 07:31:12 GMT, Roland Westrelin wrote: >> The `arraycopy` writes to a non escaping array so its `ArrayCopy` node >> is marked as having a narrow memory effect. One of the loads from the >> destination after the copy is transformed into a load from the source >> array (the rationale being that if there's no load from the >> destination of the copy, the `arraycopy` is not needed). The load from >> the source has the input memory state of the `ArrayCopy` as memory >> input. That load is then sunk out of the loop and its control is >> updated to be after the `ArrayCopy`. That's legal because the >> `ArrayCopy` only has a narrow memory effect and can't modify the >> source. The `ArrayCopy` can't be eliminated and is expanded. In the >> process, a `MemBar` that has a wide memory effect is added. The load >> from the source has control after the membar but memory state before >> and because the membar has a wide memory effect, the load is anti >> dependent on the membar: the graph is broken (the load can't be pinned >> after the membar and anti dependent on it). >> >> In short, the problem is that the graph is transformed under the >> assumption that the `ArrayCopy` has a narrow effect but the >> `ArrayCopy` is expanded to a subgraph that has a wide memory >> effect. The fix I propose is to not insert a membar with a wide memory >> effect. We still need a membar when the destination is non escaping >> because the expanded `ArrayCopy`, if it writes to a tighly allocated >> array, writes to raw memory and not to the destination memory slice. > > Roland Westrelin has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 16 additional commits since the last revision: > > - review > - Merge branch 'master' into JDK-8341976 > - review > - review > - Merge branch 'master' into JDK-8341976 > - -XX:+TraceLoopOpts fix > - review > - more > - Merge branch 'master' into JDK-8341976 > - more > - ... and 6 more: https://git.openjdk.org/jdk/compare/5362121c...9b21648d test/hotspot/jtreg/compiler/arraycopy/TestSunkLoadAntiDependency.java line 28: > 26: * @bug 8341976 > 27: * @summary C2: use_mem_state != load->find_exact_control(load->in(0)) assert failure > 28: * @run main/othervm -XX:-BackgroundCompilation TestSunkLoadAntiDependency Would it make sense to have a run without any flags? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23465#discussion_r2024220984 From mchevalier at openjdk.org Wed Apr 2 07:32:03 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 2 Apr 2025 07:32:03 GMT Subject: RFR: 8353341: Fuzzer tests crashing: assert(projs->fallthrough_proj != nullptr) failed: must be found Message-ID: If the Mod[DF]Node has no control projection when it's being removed (because its result is unused), `extract_projections` will fail an assert. So, let's skip the removal. But that should happen only when the nodes are already unreachable (control input being transitively top). At the end of the day, the node should be dropped. because of that, so there is no rush, and let dead node deletion do the job. On the reduced reproducer, the crash is not common (even with `-XX:RepeatCompilation=300`, it might need more than a run to reproduce). So I've tried my fix on multiple thousands repeat compilations (by 300 packs) without a crash, and without having the modulo node alive at the end. Thanks, Marc ------------- Commit messages: - Don't remove Mod[DF]Node that don't have control output Changes: https://git.openjdk.org/jdk/pull/24375/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=24375&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8353341 Stats: 99 lines in 2 files changed: 97 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/24375.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24375/head:pull/24375 PR: https://git.openjdk.org/jdk/pull/24375 From epeter at openjdk.org Wed Apr 2 07:32:08 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 2 Apr 2025 07:32:08 GMT Subject: RFR: 8353359: C2: Or(I|L)Node::Ideal is missing AddNode::Ideal call In-Reply-To: References: Message-ID: On Tue, 1 Apr 2025 06:20:48 GMT, Hannes Greule wrote: > Hi, > > this simple change adds a missing AddNode::Ideal call to Or(I|L)Node::Ideal. See the added tests for examples of optimizations that don't apply without this change. > > Please let me know what you think. @SirYwell Wow, good find! Oh dear, things like this are so easy to get wrong. Thanks for writing the IR test, that seems really to be the only way to ensure we don't get these kinds of regressions. I wonder how many more of these kinds of issues we have... Optimal would be if we had IR tests for every optimization, but that would be a lot of work! I'm running some testing, please ping me in 24h for the results! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24348#issuecomment-2771580262 From thartmann at openjdk.org Wed Apr 2 07:32:06 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 2 Apr 2025 07:32:06 GMT Subject: RFR: 8353341: Fuzzer tests crashing: assert(projs->fallthrough_proj != nullptr) failed: must be found In-Reply-To: References: Message-ID: On Wed, 2 Apr 2025 07:19:35 GMT, Marc Chevalier wrote: > If the Mod[DF]Node has no control projection when it's being removed (because its result is unused), `extract_projections` will fail an assert. So, let's skip the removal. > > But that should happen only when the nodes are already unreachable (control input being transitively top). At the end of the day, the node should be dropped. because of that, so there is no rush, and let dead node deletion do the job. > > On the reduced reproducer, the crash is not common (even with `-XX:RepeatCompilation=300`, it might need more than a run to reproduce). So I've tried my fix on multiple thousands repeat compilations (by 300 packs) without a crash, and without having the modulo node alive at the end. > > Thanks, > Marc Looks good to me! src/hotspot/share/opto/divnode.cpp line 1521: > 1519: > 1520: bool result_is_unused = proj_out_or_null(TypeFunc::Parms) == nullptr; > 1521: bool has_control_output = proj_out_or_null(TypeFunc::Control) != nullptr; Nit: Maybe replace this with `is_dead = proj_out_or_null(TypeFunc::Control) == nullptr;` and check for `!is_dead` below? test/hotspot/jtreg/compiler/c2/irTests/FPModWithoutControlProj.java line 70: > 68: } > 69: } > 70: Suggestion: test/hotspot/jtreg/compiler/c2/irTests/FPModWithoutControlProj.java line 93: > 91: } > 92: } > 93: Suggestion: ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24375#pullrequestreview-2734903737 PR Review Comment: https://git.openjdk.org/jdk/pull/24375#discussion_r2024225945 PR Review Comment: https://git.openjdk.org/jdk/pull/24375#discussion_r2024228770 PR Review Comment: https://git.openjdk.org/jdk/pull/24375#discussion_r2024224053 From epeter at openjdk.org Wed Apr 2 07:37:11 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 2 Apr 2025 07:37:11 GMT Subject: RFR: 8348887: Create IR framework test for JDK-8347997 In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 13:41:17 GMT, Marc Chevalier wrote: > As the ticket says: >> Create IR framework test which checks that allocations are eliminated in the regression test included in [JDK-8347997](https://bugs.openjdk.org/browse/JDK-8347997) fix. > > So here it is! We can see that in case of inlining, indeed, no allocation happens. The second part is some sanity check to emphasize the difference: of course, there is an allocation without inlining. The benefit of this second part is arguable. From my point of view, it's mostly to point out the difference to a future reader. But yes, there is nothing very surprising. > > Thanks, > Marc @marc-chevalier It probably makes most sense if the authors and reviewers of [JDK-8347997](https://bugs.openjdk.org/browse/JDK-8347997) review this patch (@vnkozlov @chhagedorn @TobiHartmann ). But please ping me if you don't get reviews in a week or so, then I can have a look too ;) ------------- PR Comment: https://git.openjdk.org/jdk/pull/24328#issuecomment-2771592656 From jbhateja at openjdk.org Wed Apr 2 07:39:02 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 2 Apr 2025 07:39:02 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value [v2] In-Reply-To: References: Message-ID: > Hi All, > > This bugfix patch fixes incorrect value computation for Integer/Long. compress APIs. > > Problems occur with a constant input and variable mask where the input's value is equal to the lower bound of the mask value., In this case, an erroneous value range estimation results in a constant value. Existing value routine first attempts to constant fold the compression operation if both input and compression mask are constant values; otherwise, it attempts to constrain the value range of result based on the upper and lower bounds of mask type. > > New IR test covers the issue reported in the bug report along with a case for value range based logic pruning. > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolutions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/23947/files - new: https://git.openjdk.org/jdk/pull/23947/files/ee67ee22..ae48895b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=23947&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=23947&range=00-01 Stats: 189 lines in 2 files changed: 160 ins; 4 del; 25 mod Patch: https://git.openjdk.org/jdk/pull/23947.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/23947/head:pull/23947 PR: https://git.openjdk.org/jdk/pull/23947 From jbhateja at openjdk.org Wed Apr 2 07:39:03 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 2 Apr 2025 07:39:03 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value [v2] In-Reply-To: References:

Message-ID: On Wed, 12 Mar 2025 08:43:23 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolutions > > @jatin-bhateja Thanks for looking into this! I left a first set of comments :) > > Primarily, it is about these issues: > - We need good comments, preferably even proofs. Because we got things wrong the last time, and there were no comments/proofs. It's difficult to get this sort of arithmetic transformation right, and it is hard to review. Proofs help to think through all the steps carefully. > - Test coverage: I would like to see some more randomized cases of input ranges. Hi @eme64 , I have addressed your comments, let me know if you need further clarifications. > src/hotspot/share/opto/intrinsicnode.cpp line 278: > >> 276: } else { >> 277: // Case 3) Mask value range only includes +ve values, this can again be >> 278: // used to ascertain known Zero bits of resultant value. > > I would put this case as the first, swapping it with Case 1). > And I would say something more explicit like this: > `Case 3) The mask value range is non-negative. Hence, the mask has at least one zero bit.` Case ordering is in accordance with the mask value range. case 1) mask value spans across -ve and -ve value ranges. case 2) mask value strictly lie within -ve value range. case 3) mask value strictly lie within +ve value range. ------------- PR Comment: https://git.openjdk.org/jdk/pull/23947#issuecomment-2771593965 PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2024244475 From jbhateja at openjdk.org Wed Apr 2 07:39:04 2025 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 2 Apr 2025 07:39:04 GMT Subject: RFR: 8350896: Integer/Long.compress gets wrong type from CompressBitsNode::Value [v2] In-Reply-To: <5fEVX0zAsdNd9v3Rk6Gr4lIXTc96g2LndUhX4Qb-bgc=.e4553c72-8da4-41c7-b71f-628bbeea14be@github.com> References:

<5fEVX0zAsdNd9v3Rk6Gr4lIXTc96g2LndUhX4Qb-bgc=.e4553c72-8da4-41c7-b71f-628bbeea14be@github.com> Message-ID: <_vrEsrg7VNWQDlSYv5PO7CsGH2tNfrwyMShkxtpdqhQ=.434c6c72-e84f-40e3-8791-42e26652ee64@github.com> On Wed, 12 Mar 2025 08:08:19 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/intrinsicnode.cpp line 283: >> >>> 281: clz = bt == T_INT ? clz - 32 : clz; >>> 282: mask_max_bw = max_bw - clz; >>> 283: } >> >> Can you please put the comments for cases 1-3 either consistently before the condition, or after the condition with inlining? I would vote for inside each condition with indentation, so just like case 3), except 2 spaces indented ;) > > Why not start with the "nice" case 3) first, where we know that the range is positive, and so even after compression we cannot get negative values? > > What does this mean `only includes +ve values`? Case ordering is in accordance with the mask value range. case 1) mask value spans across -ve and -ve value ranges. case 2) mask value strictly lie within -ve value range. case 3) mask value strictly lie within +ve value range. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/23947#discussion_r2024244581 From dskantz at openjdk.org Wed Apr 2 07:39:29 2025 From: dskantz at openjdk.org (Daniel Skantz) Date: Wed, 2 Apr 2025 07:39:29 GMT Subject: RFR: 8282053: IGV: refine schedule approximation [v2] In-Reply-To: <0SXZ0k-28IdpWjuhtK4dSJ9ybHE58Oq56zT_sdqeQpc=.0cf9319f-f64d-4e8f-82e9-f654464bc775@github.com> References: <0SXZ0k-28IdpWjuhtK4dSJ9ybHE58Oq56zT_sdqeQpc=.0cf9319f-f64d-4e8f-82e9-f654464bc775@github.com> Message-ID: On Tue, 1 Apr 2025 14:42:47 GMT, Daniel Skantz wrote: >> This patch refines the schedule approximation in IGV by 1) placing parm. and projection nodes in the same block as their predecessors, and 2) disallows erroneously considering machine nodes such as prefetchAlloc and rep_stos as CFG nodes. >> >> The reader may refer to the corresponding JBS issue where graphs sampled before and after the change are attached. >> >> Testing: T1-T3 with no failures. Opened graphs before and after the change and saw no obvious problems. Opened a large number of graphs in CFG view and observed no unexpected IGV warnings, errors or assert failures. > > Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: > > Update src/utils/IdealGraphVisualizer/ServerCompiler/src/main/java/com/sun/hotspot/igv/servercompiler/ServerCompilerScheduler.java > > Co-authored-by: Daniel Lund?n Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/24350#issuecomment-2771595747 From duke at openjdk.org Wed Apr 2 07:39:30 2025 From: duke at openjdk.org (duke) Date: Wed, 2 Apr 2025 07:39:30 GMT Subject: RFR: 8282053: IGV: refine schedule approximation [v2] In-Reply-To: <0SXZ0k-28IdpWjuhtK4dSJ9ybHE58Oq56zT_sdqeQpc=.0cf9319f-f64d-4e8f-82e9-f654464bc775@github.com> References: <0SXZ0k-28IdpWjuhtK4dSJ9ybHE58Oq56zT_sdqeQpc=.0cf9319f-f64d-4e8f-82e9-f654464bc775@github.com> Message-ID: <8BJI6ui7ndUA4OTPv3xMzTpg5G2bzn2l9vhUlenT7IE=.f70ad9c1-52d7-4309-b7e8-3fd97e58cc76@github.com> On Tue, 1 Apr 2025 14:42:47 GMT, Daniel Skantz wrote: >> This patch refines the schedule approximation in IGV by 1) placing parm. and projection nodes in the same block as their predecessors, and 2) disallows erroneously considering machine nodes such as prefetchAlloc and rep_stos as CFG nodes. >> >> The reader may refer to the corresponding JBS issue where graphs sampled before and after the change are attached. >> >> Testing: T1-T3 with no failures. Opened graphs before and after the change and saw no obvious problems. Opened a large number of graphs in CFG view and observed no unexpected IGV warnings, errors or assert failures. > > Daniel Skantz has updated the pull request incrementally with one additional commit since the last revision: > > Update src/utils/IdealGraphVisualizer/ServerCompiler/src/main/java/com/sun/hotspot/igv/servercompiler/ServerCompilerScheduler.java > > Co-authored-by: Daniel Lund?n @danielogh Your change (at version 57ad6dc825404d2628aa376f0fa8d78090313d33) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/24350#issuecomment-2771597533 From epeter at openjdk.org Wed Apr 2 07:41:09 2025 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 2 Apr 2025 07:41:09 GMT Subject: RFR: 8352418: Add verification code to check that the associated loop nodes of useless Template Assertion Predicates are dead In-Reply-To: References: Message-ID: On Mon, 31 Mar 2025 12:28:59 GMT, Christian Hagedorn wrote: > As already suggested in https://github.com/openjdk/jdk/pull/23823, I want to do the following additional verification: > > After `eliminate_useless_predicates()` all now useless `OpaqueTemplateAssertionPredicate` nodes should not have any references to `CountedLoop` nodes that are still in the graph (otherwise, they would have been marked useful). This verification did not work reliably without the full Assertion Predicates fix [JDK-8350577](https://bugs.openjdk.org/browse/JDK-8350577). Since JDK-8350577 is now integrated, I propose to add this additional verification code. > > Thanks, > Christian Looks reasonable, nice to see more verification code :) src/hotspot/share/opto/predicates.cpp line 1250: > 1248: // graph (otherwise, they would have been marked useful instead). This is verified in this method. > 1249: void EliminateUselessPredicates::verify_loop_nodes_of_useless_templates_assertion_predicates_are_dead() const { > 1250: Unique_Node_List loop_nodes_of_useless_template_assertion_predicates = Should we add `ResourceMark` here, or is there one close by that suffices? ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24326#pullrequestreview-2734938554 PR Review Comment: https://git.openjdk.org/jdk/pull/24326#discussion_r2024248198 From thartmann at openjdk.org Wed Apr 2 07:45:10 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 2 Apr 2025 07:45:10 GMT Subject: RFR: 8353058: [PPC64] Some IR framework tests are failing after JDK-8352595 In-Reply-To: References: Message-ID: <5EDodzal0YHCnEW3k6lszJPxcNGwHtDw4qHGHhQSk_k=.7e66e90a-dd36-4bd3-bc69-26c9e828e377@github.com> On Wed, 2 Apr 2025 06:45:58 GMT, Christian Hagedorn wrote: > `TestPhaseIRMatching` was recently updated with [JDK-8352595](https://bugs.openjdk.org/browse/JDK-8352595) which changed some matching on opto assembly from `IRNode.ALLOC` (now matching on ideal phases) to `IRNode.FIELD_ACCESS` (still matching on opto assembly). However, the updated code matches differently on PPC for some method invocation on a parameter which let the test fail on PPC: > > public Object defaultOnOptoAssembly(Helper h) { > return h.getString(); // emits one "Field: " string on most platforms but none on PPC > } > > > When I've revisited the test to analyze the failure, it was not evidently clear what I had in mind back there with `defaultOnX()`. My guess is that I've tried to have one method failing on ideal phases, one on mach phases and one on both while all the methods use `IRNode` entries that have default compile phases on ideal and mach phases. But that is not the case today. I've therefore rewritten the tests to adhere to my guess. I also removed the ambiguity among platforms to have the same number of field accesses on them. > > How to read the `@ExpectedFailure` annotation: > > @IR(failOn = {IRNode.STORE, IRNode.FIELD_ACCESS, IRNode.COUNTED_LOOP, IRNode.STORE_I}, > counts = {IRNode.STORE, "20", IRNode.FIELD_ACCESS, "1", IRNode.COUNTED_LOOP, "2", IRNode.OOPMAP_WITH, "asdf", "2"}) > // Expect rule with id 5 (the one directly above) to fail: > // - We fail when matching PRINT_IDEAL with the: > // - failOn attribute: The failing constraints are constraint 1 and 4 (while 2 and 3 pass) > // - counts attribute: The failing constraints are constraint 2 and 4 (while 1 and 3 pass). > @ExpectedFailure(ruleId = 5, phase = CompilePhase.PRINT_IDEAL, failOn = {1, 4}, counts = {1, 3}) > > > Thanks to @TheRealMDoerr for testing the patch on PPC! > > Thanks, > Christian Looks good to me too. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/24373#pullrequestreview-2734953812 From mchevalier at openjdk.org Wed Apr 2 07:45:41 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 2 Apr 2025 07:45:41 GMT Subject: RFR: 8353341: C2: removal of a Mod[DF]Node crashes when the node is already dead [v2] In-Reply-To: References: Message-ID: > If the Mod[DF]Node has no control projection when it's being removed (because its result is unused), `extract_projections` will fail an assert. So, let's skip the removal. > > But that should happen only when the nodes are already unreachable (control input being transitively top). At the end of the day, the node should be dropped. because of that, so there is no rush, and let dead node deletion do the job. > > On the reduced reproducer, the crash is not common (even with `-XX:RepeatCompilation=300`, it might need more than a run to reproduce). So I've tried my fix on multiple thousands repeat compilations (by 300 packs) without a crash, and without having the modulo node alive at the end. > > Thanks, > Marc Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: Address review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/24375/files - new: https://git.openjdk.org/jdk/pull/24375/files/2a347bc0..f1f0b93b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=24375&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=24375&range=00-01 Stats: 6 lines in 2 files changed: 0 ins; 2 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/24375.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/24375/head:pull/24375 PR: https://git.openjdk.org/jdk/pull/24375 From thartmann at openjdk.org Wed Apr 2 07:45:41 2025 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 2 Apr 2025 07:45:41 GMT Subject: RFR: 8353341: C2: removal of a Mod[DF]Node crashes when the node is already dead [v2] In-Reply-To: References:

Message-ID: On Wed, 2 Apr 2025 07:42:37 GMT, Marc Chevalier wrote: >> If the Mod[DF]Node has no control projection when it's being removed (because its result is unused), `extract_projections` will fail an assert. So, let's skip the removal. >> >> But that should happen only when the nodes are already unreachable (control input being transitively top). At the end of the day, the node should be dropped. because of that, so there is no rush, and let dead node deletion do the job. >> >> On the reduced reproducer, the crash is not common (even with `-XX:RepeatCompilation=300`, it might need more than a run to reproduce). So I've tried my fix on multiple thousands repeat compilations (by 300 packs) without a crash, and without having the modulo node alive at the end. >> >> Thanks, >> Marc > > Marc Chevalier has updated the pull request incrementally with one additional commit since the last revision: > > Address review comments Marked as reviewed by thartmann (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/24375#pullrequestreview-2734948297 From mchevalier at openjdk.org Wed Apr 2 07:45:42 2025 From: mchevalier at openjdk.org (Marc Chevalier) Date: Wed, 2 Apr 2025 07:45:42 GMT Subject: RFR: 8353341: C2: removal of a Mod[DF]Node crashes when the node is already dead [v2] In-Reply-To: